head 1.7; access; symbols pkgsrc-2026Q1:1.5.0.2 pkgsrc-2026Q1-base:1.5 pkgsrc-2025Q4:1.3.0.2 pkgsrc-2025Q4-base:1.3 pkgsrc-2025Q3:1.2.0.2 pkgsrc-2025Q3-base:1.2; locks; strict; comment @# @; 1.7 date 2026.06.05.10.16.31; author pin; state Exp; branches; next 1.6; commitid NHLc3PMhHXeNXAIG; 1.6 date 2026.04.09.18.12.26; author pin; state Exp; branches; next 1.5; commitid 38pkdPO8pttFqjBG; 1.5 date 2026.03.08.13.28.11; author pin; state Exp; branches; next 1.4; commitid zTob0LhRMQNVSaxG; 1.4 date 2026.02.17.13.55.58; author pin; state Exp; branches; next 1.3; commitid 54K8PC7HVDOjEJuG; 1.3 date 2025.11.18.13.41.20; author pin; state Exp; branches; next 1.2; commitid xY5lO98qKM1Fu2jG; 1.2 date 2025.07.31.11.49.48; author pin; state Exp; branches; next 1.1; commitid 9V8TJEtQtxTBpT4G; 1.1 date 2025.07.26.08.58.27; author pin; state Exp; branches; next ; commitid NryrI6RLNRcNCe4G; desc @@ 1.7 log @textproc/xan: update to 0.58.0 Breaking Stopping to serialize moonblade lists either as joined by some separator or JSON. This was awkard, error-prone & potentially lossy. Use the join function manually to format output when required. As per previous point, dropping xan scrape --sep. Dropping implicit unary function calls in moonblade pipelines. This feature was not well-known, confusing (an indentifier, could be understood as a call in a pipeline, only if not in first position...), and mostly useless now that moonblade has had a proper dot operator. xan plot -A/--aggregate does not take an expression anymore but has an automatic selection of two modes: sum and mean. It should also be faster. Renaming the index function as row_index for clarity. xan agg -C/--along-columns & -M/--along-matrix & xan groupby -C/--along-columns & -M/--along-matrix will not map current column index to the result of the index() function. The col_index() can be now used instead for this very purpose. xan window -g/--groupby does not require the file to be sorted anymore. This means using -g/--groupby will now require the whole file to be buffered into memory by the command. The old behavior can still be used through the -S/--sorted flag, thus aligning the xan window command with the rest of the tool. row_index will now error if the expression has no concept of row index, instead of returning nothing. xan parallel -z/--compress now take the desired compression (either gzip or zstd). Retiring the xan grep command in favor of xan search -Z/--fast-parser. xan tokenize --keep short flag becomes -k instead of -K to harmonize with other commands. Retiring the xan flatmap command in favor of xan explode -e. Retiring the xan fuzzy-join command in favor of a consolidated xan join command. Changing xan from -f txt -c default to line instead of value. Renaming xan join -L/--prefix-left & -R/--prefix-right short flags to -l & -r respectively to avoid colliding with the added -R/--reverse flag that can be used for merge joins. Dropping xan plot -B/--bars. It never worked very well and its use-case will be redirected to xan spark. Changing xan heatmap --width short flag from -w to -W so that adding a -H/--height flag remain consistent and avoids clashing with -h/--help. Dropping xan heatmap --show-gradients in favor of xan help gradients. Renaming xan search -A/--all flag to --every-column for clarity and avoid clash with -A/--after-context. Dropping xan sort -U/--unstable. It was never used and the performance boost it supposedly provides cannot be observed. Features Adding xan parallel --dont-chunk. Adding nullary col, col_index & header variants, to work with expression applied in series to multiple columns at once. Adding prev_col & next_col functions. Adding xan (search|filter) -B/--before-context & -A/--after-context. Adding xan window -O/--overwrite. Adding xan map -C/--along-columns. Adding xan window -C/--along-columns. Adding xan cat rows --raw, -P/--preprocess & -H/--shell-preprocess. Improving xan select DSL star selectors. You can now do stuff like vec_*_count, *[1], vec_*[1] etc. xan p -H/--shell-preprocess now works on Windows. Adding native zsh completions (@@apcamargo). Adding xan dedup --u32. Adding xan explode -e/--evaluate, -f/--evaluate-file, --pad & -k/--keep. xan to npy is now able to stream. Adding xan parallel top & xan top -p/--parallel, -t/--threads. Adding xan network edgelist --range. Adding xan network nodelist. Adding the xan run command. Adding xan view --name. Adding xan join -S/--sorted, -R/--reverse & -N/--numeric. Adding xan parallel --run & xan cat rows --run. Adding xan to md -l/--limit. Adding the xan spark command. Adding xan stats -R/--report, --color, --cols, --sep. Adding xan (freq|p freq) -X/--approx-algo. Adding xan plot -D/--density-gradient, --density-scale, --hide-legend, --hide-x-axis, --hide-y-axis, --hide-all & -Q/--square. xan separate will now avoid emitting columns with an empty name given to --into. Adding xan separate --txt & --F/--filter. Adding pow & sqrt scales. Fixes Fixing issues related to nested lambdas in expressions. Fixing xan rename consistency regarding CRLF newlines and first row normalization when using -n/--no-headers. Fixing xan map --overwrite --filter. Fixing lead window function when there is not enough rows ahead. Fixing xan network --format not being validated early enough. Fixing xan explode -D/--drop-empty when selecting multiple columns. Fixing xan merge -u row precedence. Fixing xan join -D/--drop-key automatic selection when using --full. Fixing granularity inference of xan plot -T. Fixing xan from -f (json|ndjson) to emit empty outputs from empty inputs. Fixing xan headers layout when input files have a very large number of columns (>= 1000). Fixing arity validation of top, argtop, most_common & most_common_counts aggregation functions. Performance moonblade expressions are now faster overall and allocate more cautiously, thus saving memory. Improving performance of xan transform, xan flatmap, xan agg & xan groupby. Improving performance of xan rename. Faster xan range. Faster xan parallel -H/--shell-preprocess. Faster xan tokenize words. Adding fast path for xan explode when only a single column is selected. Faster xan sort -e. Quality of Life xan plot will now display label in legends. xan cat rows will now error when input have inconsistent columns. Automatic column alignement with xan to md. xan from now consider .log files as text lines. @ text @@@comment $NetBSD$ bin/xan share/doc/xan/LOVE_LETTER.md share/doc/xan/NOTES.md share/doc/xan/PIPELINES.md share/doc/xan/README.md share/doc/xan/XANZINE.md share/doc/xan/blog/csv_base_jumping.md share/doc/xan/blog/img/csv-base-jumping.png share/doc/xan/cmd/agg.md share/doc/xan/cmd/behead.md share/doc/xan/cmd/bins.md share/doc/xan/cmd/bisect.md share/doc/xan/cmd/blank.md share/doc/xan/cmd/cat.md share/doc/xan/cmd/complete.md share/doc/xan/cmd/count.md share/doc/xan/cmd/dedup.md share/doc/xan/cmd/drop.md share/doc/xan/cmd/enum.md share/doc/xan/cmd/eval.md share/doc/xan/cmd/explode.md share/doc/xan/cmd/fill.md share/doc/xan/cmd/filter.md share/doc/xan/cmd/fixlengths.md share/doc/xan/cmd/flatten.md share/doc/xan/cmd/fmt.md share/doc/xan/cmd/frequency.md share/doc/xan/cmd/from.md share/doc/xan/cmd/groupby.md share/doc/xan/cmd/head.md share/doc/xan/cmd/headers.md share/doc/xan/cmd/heatmap.md share/doc/xan/cmd/help.md share/doc/xan/cmd/hist.md share/doc/xan/cmd/implode.md share/doc/xan/cmd/input.md share/doc/xan/cmd/join.md share/doc/xan/cmd/map.md share/doc/xan/cmd/matrix.md share/doc/xan/cmd/merge.md share/doc/xan/cmd/network.md share/doc/xan/cmd/parallel.md share/doc/xan/cmd/partition.md share/doc/xan/cmd/pivot.md share/doc/xan/cmd/plot.md share/doc/xan/cmd/progress.md share/doc/xan/cmd/range.md share/doc/xan/cmd/rename.md share/doc/xan/cmd/reverse.md share/doc/xan/cmd/run.md share/doc/xan/cmd/sample.md share/doc/xan/cmd/scrape.md share/doc/xan/cmd/search.md share/doc/xan/cmd/select.md share/doc/xan/cmd/separate.md share/doc/xan/cmd/shuffle.md share/doc/xan/cmd/slice.md share/doc/xan/cmd/sort.md share/doc/xan/cmd/spark.md share/doc/xan/cmd/split.md share/doc/xan/cmd/stats.md share/doc/xan/cmd/tail.md share/doc/xan/cmd/to.md share/doc/xan/cmd/tokenize.md share/doc/xan/cmd/top.md share/doc/xan/cmd/transform.md share/doc/xan/cmd/transpose.md share/doc/xan/cmd/unpivot.md share/doc/xan/cmd/view.md share/doc/xan/cmd/vocab.md share/doc/xan/cmd/window.md share/doc/xan/cookbook/dates.md share/doc/xan/cookbook/dedup.md share/doc/xan/cookbook/frequency_tables.md share/doc/xan/cookbook/urls.md share/doc/xan/design/dates_overhaul.md share/doc/xan/gazettes/1_2023_sep.md share/doc/xan/gazettes/2_2023_oct.md share/doc/xan/gazettes/3_2023_nov.md share/doc/xan/gazettes/4_2024_feb.md share/doc/xan/gazettes/img/clown-hist.png share/doc/xan/gazettes/img/flatten.png share/doc/xan/gazettes/img/hist.png share/doc/xan/gazettes/img/sleeker-flatten.png share/doc/xan/gazettes/img/sleeker-view.png share/doc/xan/gazettes/img/view.png share/doc/xan/img/flatten.png share/doc/xan/img/grid/categ-hist.png share/doc/xan/img/grid/corr-heatmap.png share/doc/xan/img/grid/correlation.png share/doc/xan/img/grid/flatten.png share/doc/xan/img/grid/heatmap.png share/doc/xan/img/grid/hist.png share/doc/xan/img/grid/parallel.png share/doc/xan/img/grid/scatter.png share/doc/xan/img/grid/series.png share/doc/xan/img/grid/small-multiples.png share/doc/xan/img/grid/view-grid.png share/doc/xan/img/grid/view.png share/doc/xan/img/hist.png share/doc/xan/img/line.png share/doc/xan/img/pipelines/separate-log1.png share/doc/xan/img/pipelines/separate-log2.png share/doc/xan/img/pipelines/twitter-heatmap.png share/doc/xan/img/progress.gif share/doc/xan/img/scatter.png share/doc/xan/img/view.png share/doc/xan/memes/formats.jpeg share/doc/xan/memes/lixanalgaib.jpg share/doc/xan/moonblade/aggs.md share/doc/xan/moonblade/cheatsheet.md share/doc/xan/moonblade/functions.md share/doc/xan/moonblade/scraping.md share/doc/xan/moonblade/window.md share/doc/xan/scrapers/echojs.css share/doc/xan/scrapers/hacker-news.css share/doc/xan/xanzines/5_2024_mar.md share/doc/xan/xanzines/6_2024_may.md share/doc/xan/xanzines/7_2024_sep.md share/doc/xan/xanzines/8_2025_feb.md share/doc/xan/xanzines/img/flatten-sentences.png share/doc/xan/xanzines/img/line.png share/doc/xan/xanzines/img/plural-flatten.png share/doc/xan/xanzines/img/progress.gif share/doc/xan/xanzines/img/scatter.png @ 1.6 log @textproc/xan: update to 0.57.0 The temporal update. Breaking xan select -n will not error anymore on empty inputs and, generally, empty files should not trigger selection errors when using commands with -n/--no-headers. xan heatmap -C/--cram becomes a flag accepting either auto, always or never. Dropping -C short flag for xan sort --cells (it could be confused with --columns or --check). Completely overhauled how datetimes work in moonblade. xan separate will not trim splitted values with some modes by default anymore. Dropping xan network --stats in favor of -f stats. -D becomes short flag for xan network --degrees instead of --disjoint-keys. xan separate --capture-groups is dropped in favor of -c/--captures & -C/--all-captures. Renaming xan search --breakdown shortflag to -b to allow for future -B/--before-context. Features Adding xan matrix count & xan matrix adj. Adding front_coding window function. Timestamp support with xan plot -LT. Adding xan rename -n/--no-headers support for -p/--prefix & -x/--suffix. Adding xan from -f parquet (requires the parquet feature). Adding xan to latex. Adding xan top -L/--lexicographic. Adding xan heatmap flags: -w/--width, -F/--fill, -a/--align, -U/--unit, -Z/--show-normalized, -A/--ascii, -l/--label & -v/--values. Adding new gradients to xan heatmap. Adding range & repeat moonblade functions. Adding xan sort --columns. Adding xan view -T/--tee. Adding now, fractional_days, to_timezone, to_local_timezone, with_timezone, with_local_timezone, without_timezone, to_timestamp, to_timestamp_ms, from_timestamp, from_timestamp_ms, span, date & time moonblade functions. Better type inference with xan stats, and the type & types aggregation functions, now including more types for temporal values (zoned_datetime, datetime, date & time). Adding xan input -T/--tolerant. Adding xan separate --trim. Adding xan grep -B/--before-context & -A/--after-context. Adding xan network -f=components, -S/--simple, --union-find, --minify & --sample-size . Adding xan plot --timezone. Adding xan hist --log shorthand flag for --scale=log. Adding log_dist sparkline column to xan stats -q output. Adding dist & log_dist aggregation functions. Adding xan search -L/--levenshtein & -D/--damerau-levenshtein . Fixes Fixing xan separate automatic column prefix extraction. Fixing xan heatmap -n. Fixing xan heatmap --repeat-headers --cram always not repeating x-axis legend. Fixing correctness of xan plot -T and increase resolution to microseconds. Fixing moonblade column-related functions returning incorrect results wrt -n/--no-headers. xan search should now properly error when handling invalid utf-8 in relevant modes. Fixing xan search -iR & xan search -i --replacement-column. Performance Improving performance of xan complete, xan top, xan plot -T & xan hist. Improving overall performance of xan network. Slightly optimizing xan vocab by allowing needless heap allocation & indirection. Improving performance and memory usage of xan separate. Quality of Life Adding proper help to xan heatmap. @ text @a25 1 share/doc/xan/cmd/flatmap.md a29 2 share/doc/xan/cmd/fuzzy-join.md share/doc/xan/cmd/grep.md d51 1 d60 1 @ 1.5 log @textproc/xan: update to 0.56.0 Features Adding xan bisect. Adding xan flatten -N/--non-empty. Adding the soundex, refined_soundex & phonogram moonblade functions for phonetic encoding. Fixes Fixing xan to (md|html) --no-headers. Fixing xan plot -R/--regression-line. Quality of Life Adding xan to markdown as an alias for xan to md. xan flatten & xan view will stop masquerading trimmed empty cells as empty. @ text @d78 1 @ 1.4 log @textproc/xan: update to 0.55.0 Breaking Changing how xan separate generates default column names. xan from -f=(json|ndjson|jsonl) will now emit column in input order by default. Changing xan to -B/--buffer-size to --sample-size to harmonize flag names with xan from. Features Adding the xan complete command. Adding an optional unit to ceil, floor, round & trunc moonblade function. E.g. floor to nearest decade: floor(year, 10). Adding basename & dirname moonblade functions. Adding parse_py_literal moonblade functions. Useful to deal with files dubiously serialized using pandas. Adding xan view --repeat-headers=(auto|always|never). Adding xan view --reveal-whitespace=(auto|always|never). Adding --color support to XAN_VIEW_ARGS. Adding xan from -f json --sample-size -1 to sample the whole file. Adding xan from -f json --single-object. Adding xan from --sort-keys. Adding xan to (json|ndjson|jsonl) --sample-size -1 to sample the whole file. Adding xan to (json|ndjson|jsonl) --strings flag. Adding xan separate --prefix. Adding xan heatmap -C short flag for --cram. Adding xan heatmap --repeat-headers. Adding rank, cume_dist, percent_rank and ntile window functions. Adding xan help --color. Fixes Fixing xan select -ne incorrectly emitting headers. Quality of Life xan view -p will not print bottom header anymore by default. xan view will not reveal problematic whitespace if output is not colored anymore, by default. Better xan hist error messages and help. Testing more file name variants when searching for a .gzi index. @ text @d5 1 d9 1 d13 1 a15 1 share/doc/xan/cmd/cluster.md a76 1 share/doc/xan/cookbook/misc.md d103 3 @ 1.3 log @textproc/xan: update to 0.54.0 Breaking Bumping MSRV to 1.83.0. Dropping xan plot -Y/--add-series. It is now possible to select multiple columns as in xan plot instead. Dropping the -C/--force-colors flag in flatten, heatmap, hist, plot and view in favor of the more standardized and flexible --color=(auto|never|always) flag. xan join will now automatically drop joined columns from one the files when it is obviously safe to do so. xan behead & xan rename do not normalize the output anymore to be as fast as possible. The new SIMD CSV parser might not deal with CSV irregular cases the same way rust-csv did. In any case, xan input will still continue to use rust-csv. xan slice -B/--byte-offset & xan slice -A/--accumulate are now mutually exclusive. xan input has been overhauled. Dropping xan count --sample-size. Overhauling xan fixlengths to accept streams by shifting default from double-pass read to buffering the whole stream into memory. xan plot --x-scale log & --y-scale log are now natural log. Use log10 for the base10 log as before. Dropping xan reverse -m/--in-memory flag. Behavior is now automatically detected. Dropping xan shuffle -m/--in-memory flag. Loading the file into memory is now the default. The xan shuffle -e/--external flag has been added if you want the old default behavior. xan bins now outputs values instead of . Overhauling xan bins. The default is now to find nice boundaries for the bins. Use -e/--exact to revert to the old behavior. The default number of bins is now 10, and won't use Freedman-Diaconis rule by default. A -H/--heuristic flag has been added if you want to automatically select a suitable number of bins. Features Adding xan flatten -F/--flatter. xan pivot can now target multiple columns. Adding the xan grep command for fast but coarse filtering. Adding xan search -f/--flag. Adding xan map -F/--filter. xan search -B/--breakdown now consolidates the results when multiple patterns have a same name. Adding xan flatten --row-separator. Adding xan flatten --csv. Adding xan headers --color. Adding the xan join arity as a convenience when joined column names are the same in both inputs. Adding xan join -D/--drop-key=(none|both|left|right). Adding xan fuzzy-join -D/--drop-key=(none|both|left|right). Adding xan plot -A/--aggregate. Adding support for plural selection clauses in both xan select -e & xan map e.g. xan map 'full_name.split(" ") as (first_name, last_name). Adding xan search -P/--add-pattern. Adding xan groupby -M/--along-matrix. Adding xan groupby -T/--total. Adding support for .ndjson & .jsonl files. Those are considered as headless TSV files with null byte quoting so you can easily use them with xan commands. Adding out-of-the-box support for .vcf, .sam, .bed, .gtf & .gff2 files. Adding a xan cat cols alias to xan cat columns. Adding zstd support. Adding earliest & latest moonblade functions. Adding xan dedup -f/--flag. Adding -k short flag for xan dedup --keep-duplicates, and -C short flag for xan dedup --choose. Adding xan fixlengths -H/--trust-header. Adding xan separate. Adding full log scale support to xan plot. Adding xan hist --scale. xan window is now able to run total aggregations. Adding thousands_sep, comma and significance kwargs to numfmt moonblade function. Fixes Fixing xan dedup --check bug where the first record was ignored. Fixing xan hist -D when a same date is found multiple times. Fixing xan from -f xls datetime conversion. Fixing xan flatten & xan view when column names contain line breaks. Fixing invalid argument parsing error being printed to stdout instead of stderr. Fixing xan progress SIGINT corrupting output. Fixing xan enum -A/--accumulate. Fixing xan from -f tar when tarball archive is not gzipped. Fixing min & max moonblade function when passing a list of numbers. Fixing xan flatten -H edge cases. Fixing commands requiring seekable streams accepting unindexed compressed files by error. Fixing xan plot --count --y-scale log. Performance Wildly improving performance of most of xan commands by leveraging a novel SIMD CSV parser/writer. Improving performance of xan from -f txt & xan from -f npy. Improving memory footprint of hash-based commands (e.g. frequency, groupby, dedup etc.). Improving performance of xan progress, xan range, xan enum, xan behead, xan rename. Quality of Life xan parallel cat now flushing more consistently. Better highlighting of problematic strings in xan flatten, xan view & xan headers. xan parallel will now generally stop as soon as an error is detected in a subprocess and cleanly report errors. Better argv parsing error UX in general. The -p flag will now avoid going further than 16 to avoid issues on server with many CPUs where hogging the resources is an issue and where using too much threads at once could hurt performance. The -t flag remain available to tweak the number of threads. xan hist will now dim bars having a 0 count so you can easily distinguish them from non-empty bars. @ text @d7 1 d14 1 @ 1.2 log @textproc/xan: update to 0.52.0 Breaking xan search --count will not emit rows with 0 matches anymore unless --left is used. Features xan transform is now able to work on a selection of columns, rather than on a single column. Adding the xan unpivot command. Adding the xan pivot command. Adding xan join --semi & xan join --anti commands. Adding xan slice --raw. Adding default expression argument to lead & lag window functions. Adding shlex_split, cmd and shell moonblade functions. Adding aarch64-apple-darwin and aarch64-unknown-linux-gnu to CI builds. Adding to_fixed moonblade function. Adding decimal places optional argument to ratio & percentage aggregation functions. Adding frac & dense_rank aggregation functions to xan window. Fixes Loosening xan partition sanitizer to allow hyphens, dashes and points. Fixing xan parallel --progress display. Fixing logic error in xan search -B when using without --left. Fixing xan parallel cat when working on file chunks with -P or -H. Fixing moonblade list/string slicing with some combinations of negatives indices. Fixing moonblade split function not using regex patterns properly. Fixing moonblade parsing wrt regex patterns and comments (using a regex pattern containing # was not possible). Fixing lead window aggregation function when working on any column that is not the first one. Fixing xan view -S/--significance being overzealous, especially wrt integers. Performance Improving performance of xan parallel when working on file chunks. Quality of Life xan headers now report more useful information when files have diverging headers. Better error messages for read_json and parse_json moonblade functions. xan view -p will not engage pager when input errored or is empty. xan select -e & -f become boolean flags instead of error-inducing invocation variants. @ text @d28 1 d54 1 @ 1.1 log @textproc/xan: import package Packaged in wip by wiz. `xan` is a command line tool that can be used to process CSV files directly from the shell. It has been written in Rust to be as fast as possible, use as little memory as possible, and can easily handle very large CSV files (Gigabytes). It is also able to leverage parallelism (through multithreading) to make some tasks complete as fast as your computer can allow. It can easily preview, filter, slice, aggregate, sort, join CSV files, and exposes a large collection of composable commands that can be chained together to perform a wide variety of typical tasks. `xan` also leverages its own expression language so you can perform complex tasks that cannot be done by relying on the simplest commands. This minimalistic language has been tailored for CSV data and is faster than evaluating typical dynamically-typed languages such as Python, Lua, JavaScript etc. @ text @d43 1 d64 1 @