head 1.8; access; symbols pkgsrc-2023Q4:1.8.0.18 pkgsrc-2023Q4-base:1.8 pkgsrc-2023Q3:1.8.0.16 pkgsrc-2023Q3-base:1.8 pkgsrc-2023Q2:1.8.0.14 pkgsrc-2023Q2-base:1.8 pkgsrc-2023Q1:1.8.0.12 pkgsrc-2023Q1-base:1.8 pkgsrc-2022Q4:1.8.0.10 pkgsrc-2022Q4-base:1.8 pkgsrc-2022Q3:1.8.0.8 pkgsrc-2022Q3-base:1.8 pkgsrc-2022Q2:1.8.0.6 pkgsrc-2022Q2-base:1.8 pkgsrc-2022Q1:1.8.0.4 pkgsrc-2022Q1-base:1.8 pkgsrc-2021Q4:1.8.0.2 pkgsrc-2021Q4-base:1.8 pkgsrc-2021Q3:1.5.0.6 pkgsrc-2021Q3-base:1.5 pkgsrc-2021Q2:1.5.0.4 pkgsrc-2021Q2-base:1.5 pkgsrc-2021Q1:1.5.0.2 pkgsrc-2021Q1-base:1.5 pkgsrc-2020Q4:1.4.0.10 pkgsrc-2020Q4-base:1.4 pkgsrc-2020Q3:1.4.0.8 pkgsrc-2020Q3-base:1.4 pkgsrc-2020Q2:1.4.0.6 pkgsrc-2020Q2-base:1.4 pkgsrc-2020Q1:1.4.0.2 pkgsrc-2020Q1-base:1.4 pkgsrc-2019Q4:1.4.0.4 pkgsrc-2019Q4-base:1.4 pkgsrc-2019Q3:1.3.0.2 pkgsrc-2019Q3-base:1.3 pkgsrc-2019Q2:1.1.0.30 pkgsrc-2019Q2-base:1.1 pkgsrc-2019Q1:1.1.0.28 pkgsrc-2019Q1-base:1.1 pkgsrc-2018Q4:1.1.0.26 pkgsrc-2018Q4-base:1.1 pkgsrc-2018Q3:1.1.0.24 pkgsrc-2018Q3-base:1.1 pkgsrc-2018Q2:1.1.0.22 pkgsrc-2018Q2-base:1.1 pkgsrc-2018Q1:1.1.0.20 pkgsrc-2018Q1-base:1.1 pkgsrc-2017Q4:1.1.0.18 pkgsrc-2017Q4-base:1.1 pkgsrc-2017Q3:1.1.0.16 pkgsrc-2017Q3-base:1.1 pkgsrc-2017Q2:1.1.0.12 pkgsrc-2017Q2-base:1.1 pkgsrc-2017Q1:1.1.0.10 pkgsrc-2017Q1-base:1.1 pkgsrc-2016Q4:1.1.0.8 pkgsrc-2016Q4-base:1.1 pkgsrc-2016Q3:1.1.0.6 pkgsrc-2016Q3-base:1.1 pkgsrc-2016Q2:1.1.0.4 pkgsrc-2016Q2-base:1.1 pkgsrc-2016Q1:1.1.0.2 pkgsrc-2016Q1-base:1.1; locks; strict; comment @# @; 1.8 date 2021.11.18.19.38.01; author adam; state Exp; branches; next 1.7; commitid 38Lk7P1XEF3oqjhD; 1.7 date 2021.10.26.11.23.13; author nia; state Exp; branches; next 1.6; commitid TS3y6sgAeGKWpjeD; 1.6 date 2021.10.07.15.02.19; author nia; state Exp; branches; next 1.5; commitid 0fS32tEWoNe7fTbD; 1.5 date 2021.02.09.10.28.26; author adam; state Exp; branches; next 1.4; commitid c1L1V1D7VTMZr1HC; 1.4 date 2019.10.16.07.24.13; author adam; state Exp; branches; next 1.3; commitid 84ajI0k2OQIgV3HB; 1.3 date 2019.09.07.07.22.14; author adam; state Exp; branches; next 1.2; commitid 3p1Xu6Dz2YDia3CB; 1.2 date 2019.08.25.12.35.50; author adam; state Exp; branches; next 1.1; commitid DARKkhlESdBZjpAB; 1.1 date 2016.01.17.13.16.49; author wiz; state Exp; branches; next ; commitid zQaydfMipN9LQhRy; desc @@ 1.8 log @py-snowballstemmer: updated to 2.2.0 Snowball 2.2.0 (2021-11-10) =========================== New Code Generators ------------------- * Add Ada generator from Stephane Carrez Javascript ---------- * Fix generated code to use integer division rather than floating point division. Noted by David Corbett. Pascal ------ * Fix code generated for division. Previously real division was used and the generated code would fail to compile with a "Incompatible types" error. Noted by David Corbett. * Fix code generated for Snowball's `minint` and `maxint` constant. Python ------ * Python 2 is no longer actively supported, as proposed on the mailing list: https://lists.tartarus.org/pipermail/snowball-discuss/2021-August/001721.html * Fix code generated for division. Previously the Python code we generated used integer division but rounded negative fractions towards negative infinity rather than zero under Python 2, and under Python 3 used floating point division. Noted by David Corbett. Code Quality Improvements ------------------------- * C#: An `among` without functions is now generated as `static` and groupings are now generated as constant. Code generation improvements ---------------------------- * General: + Constant numeric subexpressions and constant numeric tests are now evaluated at Snowball compile time. Behavioural changes to existing algorithms ------------------------------------------ * german2: Fix handling of `qu` to match algorithm description. Previously the implementation erroneously did `skip 2` after `qu`. We suspect this was intended to skip the `qu` but that's already been done by the substring/among matching, so it actually skips an extra two characters. The implementation has always differed in this way, but there's no good reason to skip two extra characters here so overall it seems best to change the code to match the description. This change only affects the stemming of a single word in the sample vocabulary - `quae` which seems to actually be Latin rather than German. Optimisations to existing algorithms ------------------------------------ * arabic: Handle exception cases in the among they're exceptions to. * greek: Remove unused slice setting, handle exception cases in the among they're exceptions to, and turn `substring ... among ... or substring ... among ...` into a single `substring ... among ...` in cases where it is trivial to do so. * hindi: Eliminate the need for variable `p`. * irish: Minor optimisation in setting `pV` and `p1`. * yiddish: Make use of `among` more. Compiler -------- * Fix handling of `len` and `lenof` being declared as names. For compatibility with programs written for older Snowball versions len and lenof stop being tokens if declared as names. However this code didn't work correctly if the tokeniser's name buffer needed to be enlarged to hold the token name (i.e. 3 or 5 elements respectively). * Report a clearer error if `=` is used instead of `==` in an integer test. * Replace a single entry command list with its contents in the internal syntax tree. This puts things in a more canonical form, which helps subsequent optimisations. Build system ------------ * Support building on Microsoft Windows (using mingw+msys or a similar Unix-like environment). * Split out INCLUDES from CPPFLAGS so that CPPFLAGS can now be overridden by the user if required. * Regenerate algorithms.mk only when needed rather than on every `make` run. libstemmer ---------- * The libstemmer static library now has a `.a` extension, rather than `.o`. Testsuite --------- * stemtest: Test that numbers and numeric codes aren't damaged by any of the algorithms. * ada: Fix ada tests to fail if output differs. There was an extra `| head -300` compared to other languages, which meant that the exit code of `diff` was ignored. It seems more helpful (and is more consistent) not to limit how many differences are shown so just drop this addition. * go: Stop thinning testdata. It looks like we only are because the test harness code was based on that for rust, which was based on that for javascript, which was only thinning because it was reading everything into memory and the larger vocabulary lists were resulting in out of memory issues. * javascript: Speed up stemwords.js. Process input line-by-line rather than reading the whole file into memory, splitting, iterating, and creating an array with all the output, joining and writing out a single huge string. This also means we can stop thinning the test data for javascript, which we were only doing because the huge arabic test data file was causing out of memory errors. Also drop the -p option, which isn't useful here and complicates the code. * rust: Turn on optimisation in the makefile rather than the CI config. This makes the tests run in about 1/5 of the time and there's really no reason to be thinning the testdata for rust. Documentation ------------- * CONTRIBUTING.rst: Improve documentation for adding a new stemming algorithm. * Improve wording of Python docs. @ text @$NetBSD: distinfo,v 1.7 2021/10/26 11:23:13 nia Exp $ BLAKE2s (snowballstemmer-2.2.0.tar.gz) = 7003153e7592ed98d73f2748d7b7103568a53acfc6367ace7568e5103005ac7a SHA512 (snowballstemmer-2.2.0.tar.gz) = f1dee83e06fc79ffb250892fe62c75e3393b9af07fbf7cde413e6391870aa74934302771239dea5c9bc89806684f95059b00c9ffbcf7340375c9dd8f1216cd37 Size (snowballstemmer-2.2.0.tar.gz) = 86699 bytes @ 1.7 log @textproc: Replace RMD160 checksums with BLAKE2s checksums All checksums have been double-checked against existing RMD160 and SHA512 hashes Unfetchable distfiles (fetched conditionally?): ./textproc/convertlit/distinfo clit18src.zip @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.6 2021/10/07 15:02:19 nia Exp $ d3 3 a5 3 BLAKE2s (snowballstemmer-2.1.0.tar.gz) = cc580da7781577e95be41df302c6ba5650f18e0d6d09a527870b1c8c1351aee3 SHA512 (snowballstemmer-2.1.0.tar.gz) = e0550d3389074d7686d26397ff2289519cd8b26cf7090fe781d6407d1c2b95f912347d70cd25e02d6016c454ad6c5cf6d648e54ef87161328ac57bc1ceaf7826 Size (snowballstemmer-2.1.0.tar.gz) = 85674 bytes @ 1.6 log @textproc: Remove SHA1 hashes for distfiles @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.5 2021/02/09 10:28:26 adam Exp $ d3 1 a3 1 RMD160 (snowballstemmer-2.1.0.tar.gz) = 9c4d7c512477590572b77f9b1fcab09941e4c33f @ 1.5 log @py-snowballstemmer: updated to 2.1.0 2.1.0: * Fix snowballstemmer.algorithms() method. * Update code to generate trove language classifiers for PyPI. All the natural languages we previously had stemmers for have now been added to PyPI's list, but Armenian and Yiddish aren't on it. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.4 2019/10/16 07:24:13 adam Exp $ a2 1 SHA1 (snowballstemmer-2.1.0.tar.gz) = b5e307fc808d032cd553d1d9aa9c99c55abd2288 @ 1.4 log @py-snowballstemmer: updated to 2.0.0 snowballstemmer 2.0.0: * Simplified generated code for ``repeat`` and ``atleast`` commands. * Implemented “go grouping” optimisation. * Removed caching layer. * Enabled building wheels. * Updated package README. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.3 2019/09/07 07:22:14 adam Exp $ d3 4 a6 4 SHA1 (snowballstemmer-2.0.0.tar.gz) = d303e11059ecbb8ad3a6d7b08af5d2f15b83ddec RMD160 (snowballstemmer-2.0.0.tar.gz) = 7bf4fe09842bfd07049336c19c3f56b500d48125 SHA512 (snowballstemmer-2.0.0.tar.gz) = d673205cacc7f6e81eaee23e6c50064af77c3c4464dbdf5dc1c3f5682dec2688fe6e7069b7ed2e59259312ba926d3be84bd846a132b6138e30b4ff2b9a9353e8 Size (snowballstemmer-2.0.0.tar.gz) = 79284 bytes @ 1.3 log @py-snowballstemmer: updated to 1.9.1 snowballstemmer 1.9.1: * Added Hindi stemmer. * Added Basque and Catalan stemmers. * Improved Greek stemmer. * Various Python code improvements. * Fixed AttributeError when clearing cache. * The tarball now includes a COPYING file. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.2 2019/08/25 12:35:50 adam Exp $ d3 4 a6 4 SHA1 (snowballstemmer-1.9.1.tar.gz) = 079516d1b04c685fa0490871ee03c9a3150e464f RMD160 (snowballstemmer-1.9.1.tar.gz) = 2c422c29ab852cbe33729fb1f27960a0dcd8d130 SHA512 (snowballstemmer-1.9.1.tar.gz) = 014cad553ce5be67e614f51a6023e1673c34ee6cdaad7de872aadad385c3ab5599cc06fa7449819da5f61a91ea45eab8fcf85315a2c2b89979edb70c7d29501b Size (snowballstemmer-1.9.1.tar.gz) = 82045 bytes @ 1.2 log @py-snowballstemmer: updated to 1.9.0 1.9.0: Unknown changes @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.1 2016/01/17 13:16:49 wiz Exp $ d3 4 a6 4 SHA1 (snowballstemmer-1.9.0.tar.gz) = f4d9a9b072cc6cdaa80ecf2f037e8b8ab2b76425 RMD160 (snowballstemmer-1.9.0.tar.gz) = 32470e6eec372f61106a0fd9421836d0bbcbe274 SHA512 (snowballstemmer-1.9.0.tar.gz) = 7af4963ffb88477f64bd477bf4ea958cf9b7613393f01ea873f4a8e1891d1f331c288e59632b271dd5b53b604c8291780990600ccf22aed1d2151809cd8608df Size (snowballstemmer-1.9.0.tar.gz) = 76910 bytes @ 1.1 log @Import py-snowballstemmer-1.2.1 as textproc/py-snowballstemmer. This package provides 16 stemmer algorithms (15 + Poerter English stemmer) generated from Snowball algorithms. It includes following language algorithms: Danish Dutch English (Standard, Porter) Finnish French German Hungarian Italian Norwegian Portuguese Romanian Russian Spanish Swedish Turkish This is a pure Python stemming library. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.2 2015/11/04 02:00:01 agc Exp $ d3 4 a6 4 SHA1 (snowballstemmer-1.2.1.tar.gz) = 377be08ed935d401a53cba79319d1812cfe46b81 RMD160 (snowballstemmer-1.2.1.tar.gz) = 433de19c91b3f0914bb280f37cebc21324da4db5 SHA512 (snowballstemmer-1.2.1.tar.gz) = 09f860f383d84d12a83c87ef6654fba4ac10bca07e8d2ce88dd428c72754110d56a4b698e125a18818699a289455bf61cf67ea68e349ee8a12d6dfff0a3fbed9 Size (snowballstemmer-1.2.1.tar.gz) = 49626 bytes @