head 1.18; access; symbols pkgsrc-2023Q4:1.18.0.2 pkgsrc-2023Q4-base:1.18 pkgsrc-2023Q3:1.15.0.2 pkgsrc-2023Q3-base:1.15 pkgsrc-2023Q2:1.14.0.2 pkgsrc-2023Q2-base:1.14 pkgsrc-2023Q1:1.13.0.4 pkgsrc-2023Q1-base:1.13 pkgsrc-2022Q4:1.13.0.2 pkgsrc-2022Q4-base:1.13 pkgsrc-2022Q3:1.12.0.2 pkgsrc-2022Q3-base:1.12 pkgsrc-2022Q2:1.10.0.4 pkgsrc-2022Q2-base:1.10 pkgsrc-2022Q1:1.10.0.2 pkgsrc-2022Q1-base:1.10 pkgsrc-2021Q4:1.7.0.2 pkgsrc-2021Q4-base:1.7 pkgsrc-2021Q3:1.2.0.2 pkgsrc-2021Q3-base:1.2; locks; strict; comment @# @; 1.18 date 2023.11.01.09.14.56; author adam; state Exp; branches; next 1.17; commitid F0wy0OFGuBBQ3TKE; 1.17 date 2023.10.23.07.56.04; author adam; state Exp; branches; next 1.16; commitid 8oWzvUXxc7YLUIJE; 1.16 date 2023.09.30.17.16.30; author adam; state Exp; branches; next 1.15; commitid uD4g5U0o1OVLKOGE; 1.15 date 2023.07.08.04.35.31; author adam; state Exp; branches; next 1.14; commitid fx7YfeO1EVvbfXvE; 1.14 date 2023.04.24.10.30.04; author adam; state Exp; branches; next 1.13; commitid DWlMBl8WtjfeClmE; 1.13 date 2022.11.18.18.50.29; author adam; state Exp; branches; next 1.12; commitid Hy5FX45uvX7Nqd2E; 1.12 date 2022.09.14.11.10.00; author adam; state Exp; branches; next 1.11; commitid vLHAS0ocqFTkYOTD; 1.11 date 2022.08.05.13.59.38; author adam; state Exp; branches; next 1.10; commitid GVazTi6xVb18cHOD; 1.10 date 2022.02.12.17.53.15; author adam; state Exp; branches; next 1.9; commitid Mr9tbGpAcKw95msD; 1.9 date 2022.01.31.11.04.38; author adam; state Exp; branches; next 1.8; commitid TYCuxxPK2NbXcMqD; 1.8 date 2022.01.07.16.37.10; author adam; state Exp; branches; next 1.7; commitid Fw83d16bVJeUOInD; 1.7 date 2021.12.11.20.47.41; author adam; state Exp; branches; next 1.6; commitid vvLpTBAjwfMv4hkD; 1.6 date 2021.11.25.08.10.29; author adam; state Exp; branches; next 1.5; commitid p77DRUBIpqozo9iD; 1.5 date 2021.10.26.10.06.49; author nia; state Exp; branches; next 1.4; commitid JMRLe4kpApoo0jeD; 1.4 date 2021.10.12.09.12.20; author adam; state Exp; branches; next 1.3; commitid QCgqXWXxWq0E9vcD; 1.3 date 2021.10.07.13.29.11; author nia; state Exp; branches; next 1.2; commitid OZwjTf8PLdPJJSbD; 1.2 date 2021.09.19.10.39.10; author adam; state Exp; branches; next 1.1; commitid f9Y0tIH3F4zWmy9D; 1.1 date 2021.07.30.04.14.49; author adam; state Exp; branches; next ; commitid CSdxlg07KL15TX2D; desc @@ 1.18 log @py-charset-normalizer: updated to 3.3.2 3.3.2 Fixed - Unintentional memory usage regression when using large payload that match several encoding - Regression on some detection case showcased in the documentation @ text @$NetBSD: distinfo,v 1.17 2023/10/23 07:56:04 adam Exp $ BLAKE2s (charset-normalizer-3.3.2.tar.gz) = dfd8383a38e7c340acd24fad8eefd65dbf83ac5aad97baf6178296964d23af23 SHA512 (charset-normalizer-3.3.2.tar.gz) = 227dd9496e080310b3262fe0ffc32b5ebed16e5b3a294877555c0b04dee0cb073a2a0a4fa8dbad3029703ffaf1857acf24d9b87ca74d75fa2f0ba8fd3413e9c4 Size (charset-normalizer-3.3.2.tar.gz) = 104809 bytes @ 1.17 log @py-charset-normalizer: updated to 3.3.1 3.3.1 Changed - Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8 - Improved the general detection reliability based on reports from the community @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.16 2023/09/30 17:16:30 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-3.3.1.tar.gz) = 4dac413a9ef9c13442eb211fd40dcc70d60d246c6aa27d3f4795618b7d2d80ea SHA512 (charset-normalizer-3.3.1.tar.gz) = d5f9564efd5d0112e07429d01d3b91db14af98e494e7993151724599e9abaf862cfb40c26fd47050256b0f2b36ce58c50d6dd697faa932ec3648265fb4e934f3 Size (charset-normalizer-3.3.1.tar.gz) = 104095 bytes @ 1.16 log @py-charset-normalizer: updated to 3.3.0 3.3.0 Added - Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer` - Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias Removed - (internal) Redundant utils.is_ascii function and unused function is_private_use_only - (internal) charset_normalizer.assets is moved inside charset_normalizer.constant Changed - (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection - Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7 Fixed - Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.15 2023/07/08 04:35:31 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-3.3.0.tar.gz) = 27875b66e853c874525873a7b0a9c1ce8f49885dd9d5478e06501f46ec26ea44 SHA512 (charset-normalizer-3.3.0.tar.gz) = c12bf31250ad03be6e4e78f056242bf4c61aaf33e73a3f9514ee6288b61aa94ca0d97bb2b237e89ab9139da54169bc6b1b51155903257272f954dfc3da65b25f Size (charset-normalizer-3.3.0.tar.gz) = 103776 bytes @ 1.15 log @py-charset-normalizer: updated to 3.2.0 3.2.0 Changed - Typehint for function `from_path` no longer enforce `PathLike` as its first argument - Minor improvement over the global detection reliability Added - Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries - Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True) - Explicit support for Python 3.12 Fixed - Edge case detection failure where a file would contain 'very-long' camel cased word @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.14 2023/04/24 10:30:04 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-3.2.0.tar.gz) = 196da0f4c21efcba88e92c65587c03c2ecb9d20c5bb059c3d9892ea2ef23939d SHA512 (charset-normalizer-3.2.0.tar.gz) = 0e3967b489561394ca848c1fe7dfaa72a330a3f645e9386c1d2d2dc8c2e35a34a8186e6f3377eda2aed503a3e7e626fe116d7b34c2f4a3fd8446a4c1a8fb74cc Size (charset-normalizer-3.2.0.tar.gz) = 97063 bytes @ 1.14 log @py-charset-normalizer: updated to 3.1.0 3.1.0 Added - Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors Removed - Support for Python 3.6 Changed - Optional speedup provided by mypy/c 1.0.1 @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.13 2022/11/18 18:50:29 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-3.1.0.tar.gz) = ee442ba12c04e1449d4318967f0b7b53c4f6c72642a006df52def9a21b2d15c6 SHA512 (charset-normalizer-3.1.0.tar.gz) = 4a0a728fb0247e438693efb5d4b759548c82c9011850f0ab3d5844973efa0810d49b460b05a39448049c020a8619f28632278f1f540c89e49b656a8e32cdfdc1 Size (charset-normalizer-3.1.0.tar.gz) = 95987 bytes @ 1.13 log @py-charset-normalizer: updated to 3.0.1 3.0.1 (2022-11-18) Fixed Multi-bytes cutter/chunk generator did not always cut correctly Changed Speedup provided by mypy/c 0.990 on Python >= 3.7 3.0.0 (2022-10-20) Added Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl) Changed Build with static metadata using 'build' frontend Make the language detection stricter Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1 Fixed CLI with opt --normalize fail when using full path for files TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it Sphinx warnings when generating the documentation Removed Coherence detector no longer return 'Simple English' instead return 'English' Coherence detector no longer return 'Classical Chinese' instead return 'Chinese' Breaking: Method first() and best() from CharsetMatch UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII) Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches Breaking: Top-level function normalize Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch Support for the backport unicodedata2 @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.12 2022/09/14 11:10:00 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-3.0.1.tar.gz) = 385192a53e6991a0b0cf6a76223ef657c146941bfb9ec7ada48e55aeb4f6347f SHA512 (charset-normalizer-3.0.1.tar.gz) = 25bfb8d708f2c1827d4f074f1b3c4f9932f7a00b833423f9edd6d5a942af39eeb703dea7471bdf2764094e8d01af7d98017c030f7b7a2a1a24e65c1161aef52f Size (charset-normalizer-3.0.1.tar.gz) = 92842 bytes @ 1.12 log @py-charset-normalizer: updated to 2.1.1 2.1.1 Deprecated - Function `normalize` scheduled for removal in 3.0 Changed - Removed useless call to decode in fn is_unprintable Fixed - Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.11 2022/08/05 13:59:38 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.1.1.tar.gz) = bd1f7d198d6717323c1f4e9e497e6e970ee751c4fdabf2165452bbe4b442e277 SHA512 (charset-normalizer-2.1.1.tar.gz) = f52abab683ebda4100d67ec6ee0349713baee453a742d60a1356f405c5ce2c3b4d850b0891527f08f92fa1217d59c46d6b181dc4ff1b962ce60d9c5ef8c913d1 Size (charset-normalizer-2.1.1.tar.gz) = 82360 bytes @ 1.11 log @py-charset-normalizer: updated to 2.1.0 2.1.0 (2022-06-19) Added Output the Unicode table version when running the CLI with --version Changed Re-use decoded buffer for single byte character sets Fixing some performance bottlenecks Fixed Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space CLI default threshold aligned with the API threshold Removed Support for Python 3.5 Deprecated Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.10 2022/02/12 17:53:15 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.1.0.tar.gz) = ce6a9a8dfb5729f1dab6320849cf40376ca0d3c5ff164246cdf82cfb73e556c2 SHA512 (charset-normalizer-2.1.0.tar.gz) = bfeb3fbd82c91382adf7c28f45d690f2504427961ef8ec3f0b66cb66788265a91661988f1abaa9fff0b54dc2d4b2286544f2ca2d30d41167d390fbc059b1a44e Size (charset-normalizer-2.1.0.tar.gz) = 81769 bytes @ 1.10 log @py-charset-normalizer: updated to 2.0.12 2.0.12 Fixed - ASCII miss-detection on rare cases @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.9 2022/01/31 11:04:38 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.0.12.tar.gz) = 8d0705c4942977f068a9ab5f269d3f28099677d14024de06180855c66c390584 SHA512 (charset-normalizer-2.0.12.tar.gz) = 5177c562bf6556719353a116f8b597685ee4845198cdeb49ef57c2da8be6597fbb7e9d5fbbdb420c1b415395096303f828408f01066a909786b97d30b79b5e75 Size (charset-normalizer-2.0.12.tar.gz) = 79105 bytes @ 1.9 log @py-charset-normalizer: updated to 2.0.11 2.0.11: Added - Explicit support for Python 3.11 Changed - The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.8 2022/01/07 16:37:10 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.0.11.tar.gz) = 49fccaf7585b244c5e32392f660929d9ff1013956069a3ac9c9c8790cfeb26e4 SHA512 (charset-normalizer-2.0.11.tar.gz) = 57f2ff5694c960274cc54f8eec2aaa87feb247f1663c1baeb9ec613f164bb49f99c83b430acbb1113bee7eb1a2ba5cb5d95c45a5affc908df6dab505ace88062 Size (charset-normalizer-2.0.11.tar.gz) = 79082 bytes @ 1.8 log @py-charset-normalizer: updated to 2.0.10 2.0.10: Fixed - Fallback match entries might lead to UnicodeDecodeError for large bytes sequence @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.7 2021/12/11 20:47:41 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.0.10.tar.gz) = a7515effc18612183462d14cd367d007823c7f802709ba6610a7055af508e30a SHA512 (charset-normalizer-2.0.10.tar.gz) = ee4934978a7939d7d4575dc0018b4c92f1c43fff45e3976907c21937b0a3db03d2d19d3e13431f59b5dfb86b14fac9665767433577c1bd3d4eca1f3c88d74ab5 Size (charset-normalizer-2.0.10.tar.gz) = 78245 bytes @ 1.7 log @py-charset-normalizer: updated to 2.0.9 2.0.9 Changed - Moderating the logging impact (since 2.0.8) for specific environments Fixed - Wrong logging level applied when setting kwarg `explain` to True @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.6 2021/11/25 08:10:29 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.0.9.tar.gz) = 8eb6e7a7c93b8569666d76296b0483301ead7b546bd5e29b1419ca7b5aedb857 SHA512 (charset-normalizer-2.0.9.tar.gz) = 0dea4ada59c197a1f7090b0c06218774746f77c2007eaff37dae4589cab1382ed670c98327d890347d2b72869b6053eb8c3362587b034921958624b9db9aff95 Size (charset-normalizer-2.0.9.tar.gz) = 75753 bytes @ 1.6 log @py-charset-normalizer: updated to 2.0.8 2.0.8 Changed - Improvement over Vietnamese detection - MD improvement on trailing data and long foreign (non-pure latin) data - Efficiency improvements in cd/alphabet_languages from [@@adbar](https://github.com/adbar) - call sum() without an intermediary list following PEP 289 recommendations from [@@adbar](https://github.com/adbar) - Code style as refactored by Sourcery-AI - Minor adjustment on the MD around european words - Remove and replace SRTs from assets / tests - Initialize the library logger with a `NullHandler` by default from [@@nmaynes](https://github.com/nmaynes) - Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.5 2021/10/26 10:06:49 nia Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.0.8.tar.gz) = cbe581abb5e6e2239bff54ee0687a4b5e4a7b04f5d12d305fa1e43cfffed558a SHA512 (charset-normalizer-2.0.8.tar.gz) = d06289a07b9091336dc9da47091ff25bdaa8714b498031464c4537f7180b6f66727148012f3b15be9b8224a2768939020ac6e81b88fe0a7c45cdba75792e636e Size (charset-normalizer-2.0.8.tar.gz) = 75598 bytes @ 1.5 log @converters: Replace RMD160 checksums with BLAKE2s checksums All checksums have been double-checked against existing RMD160 and SHA512 hashes @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.4 2021/10/12 09:12:20 adam Exp $ d3 3 a5 3 BLAKE2s (charset-normalizer-2.0.7.tar.gz) = e0ae5075b6af5adf2c7a9470f18c0cf11e338269bbe34e0d5dfc7cee1917ab2b SHA512 (charset-normalizer-2.0.7.tar.gz) = 7096fa23fade52c8eee2577b87aa574deffe6f66e8986f71172f3b9212bd6c6fb17901cceab90144c9d07de8bc6f5e320497daa3d3f749f436788232d4cba088 Size (charset-normalizer-2.0.7.tar.gz) = 362802 bytes @ 1.4 log @py-charset-normalizer: updated to 2.0.7 Version 2.0.7 Changes: Addition: 🍱 Add support for Kazakh (Cyrillic) language detection Improvement: ❇️ Further improve inferring the language from a given code page (single-byte) Removed: 🔥 Remove redundant logging entry about detected language(s) Miscellaneous: 🔧 Trying to leverage PEP263 when PEP3120 is not supported While I do not think that this (116) will actually fix something, it will rather raise a SyntaxError (Not about ASCII decoding error) for those trying to install this package using a non-supported Python version Improvement: ⚡ Refactoring for potential performance improvements in loops Improvement: ✨ Various detection improvement (MD+CD) Bugfix: 🐛 Fix a minor inconsistency between Python 3.5 and other versions regarding language detection @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.3 2021/10/07 13:29:11 nia Exp $ d3 1 a3 1 RMD160 (charset-normalizer-2.0.7.tar.gz) = df764f7a9ee7130fd768ee99bf5879cd6eef9534 @ 1.3 log @converters: Remove SHA1 hashes for distfiles @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.2 2021/09/19 10:39:10 adam Exp $ d3 3 a5 3 RMD160 (charset-normalizer-2.0.6.tar.gz) = b98a1841408753922b62e7448550df2e01b05b50 SHA512 (charset-normalizer-2.0.6.tar.gz) = 78a754e53b493382941ff3fcc286db644d7921bbd233717676269fd992a03ff5a35f121949af339be21be2e933423279e634197e6aa1018d8bd0f9cbac11f6f5 Size (charset-normalizer-2.0.6.tar.gz) = 361879 bytes @ 1.2 log @py-charset-normalizer: updated to 2.0.6 Version 2.0.6 Changes: Bugfix: 🐛 Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x Bugfix: 🐛 Fix CLI crash when using --minimal output in certain cases Improvement: ✨ Minor improvement to the detection efficiency (less than 1%) Version 2.0.5 Changes: Internal: 🎨 The project now comply with: flake8, mypy, isort and black to ensure a better overall quality Internal: 🎨 The MANIFEST.in was not exhaustive Improvement: ✨ The BC-support with v1.x was improved, the old staticmethods are restored Remove: 🔥 The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead Improvement: ✨ The Unicode detection is slightly improved Bugfix: 🐛 In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection Bugfix: 🐛 Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection Improvement: 🎨 Add syntax sugar __bool__ for results CharsetMatches list-container This release push further the detection coverage to 97 % ! Version 2.0.4 Changes: Improvement: ❇️ Adjust the MD to lower the sensitivity, thus improving the global detection reliability Improvement: ❇️ Allow fallback on specified encoding if any Bugfix: 🐛 The CLI no longer raise an unexpected exception when no encoding has been found Bugfix: 🐛 Fix accessing the 'alphabets' property when the payload contains surrogate characters Bugfix: 🐛 ✏️ The logger could mislead (explain=True) on detected languages and the impact of one MBCS match Bugfix: 🐛 Submatch factoring could be wrong in rare edge cases Bugfix: 🐛 Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) Internal: 🎨 Fix line endings from CRLF to LF for certain files @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.1 2021/07/30 04:14:49 adam Exp $ a2 1 SHA1 (charset-normalizer-2.0.6.tar.gz) = 480d3723b896aaa7ade31b8a601b9ef2399c1295 @ 1.1 log @py-charset-normalizer: added version 2.0.3 A library that helps you read text from an unknown charset encoding. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.10 2021/01/04 11:53:14 wiz Exp $ d3 4 a6 4 SHA1 (charset-normalizer-2.0.3.tar.gz) = cf4fa3dc318a441e05006cedbf2f0aa869c79ee9 RMD160 (charset-normalizer-2.0.3.tar.gz) = ebc2a59e965da631a9b1ede64e623c7a4990ba33 SHA512 (charset-normalizer-2.0.3.tar.gz) = daf84526f620f5565d3ad9dbbcf1c6e83af47ae1b267a1e3925ce0c79ddff6c3a0f50663dabd7897882500771ec27fbc1935ff0109c8ca97fbc4049ed9e33b6a Size (charset-normalizer-2.0.3.tar.gz) = 345714 bytes @