head 1.2; access; symbols; locks; strict; comment @# @; 1.2 date 2026.05.24.17.56.43; author wiz; state dead; branches; next 1.1; commitid DbzjH6TyHJBzT5HG; 1.1 date 2026.05.11.17.39.13; author wiz; state Exp; branches; next ; commitid wbz2PibKu7isdqFG; desc @@ 1.2 log @p5-XML-LibXML: update to 2.0213. 2.0213 2026-05-21 [SECURITY / BUG FIXES] - Revert PR #143 per the libxml2 author's request. PR #143 added a URL-scheme filter inside LibXML_load_external_entity and removed the EXTERNAL_ENTITY_LOADER_FUNC == NULL guards on the five Schema/RelaxNG NONET swap sites, on the premise that no_network on one parser should override a user-installed global externalEntityLoader. Nick Wellnhofer clarified that this contradicts upstream intent: XML_PARSE_NONET only polices libxml2's default loader; a user who installs a global loader is explicitly opting out of that policy, and the http/https/ftp allowlist was never a real security boundary. Reverted in full; PR #138's lifecycle/memory-safety fixes are kept. - GH #168 [BUG FIXES] - Fix latent SEGV in _externalEntityLoader. The XS code returned &PL_sv_undef as RETVAL when no previous global loader existed. Because xsubpp auto-mortalizes SV* RETVAL, each call mortalized the PL_sv_undef singleton, eventually driving its refcount negative and producing "Attempt to free unreferenced scalar" followed by SEGV under repeated invocation. Now returns newSV(0) so RETVAL is always a fresh refcount-1 SV safe to mortalize. The bug shipped in 2.0212 with PR #138's lifecycle fixes; this is a single-line correction to that code path. [MAINTENANCE] - Add t/49global_extent_with_no_network.t, 17 subtests locking in the entity-loader contract restored by the GH #168 revert: a user-installed global loader takes precedence over no_network across plain XML parse, RelaxNG, and XML Schema, while no_network without any loader still blocks via libxml2's default loader. - Document the entity-loader contract in CLAUDE.md ("Entity loaders, no_network, and XML_PARSE_NONET") plus a "Verifying audit-flagged security findings" checklist to keep pattern-matched "security fixes" like PR #143 from shipping again. 2.0212 2026-05-19 [BUG FIXES] - Ship POD files in the CPAN tarball. The per-class .pod files generated from docs/libxml.dbk were gitignored, and nothing in the dist chain was producing them, so recent tarballs shipped without POD. The .pod files are now tracked in git (bison-style), so `make dist` includes them via MANIFEST and the documentation reaches CPAN consumers again. Also eliminates the bootstrap problem of needing XML::LibXML installed to build XML::LibXML's docs, and silences the "kit incomplete" warning from `perl Makefile.PL` on a fresh checkout. [MAINTENANCE] - Add a `pod-drift` CI job that runs `make pod_docs` and fails on any diff, catching forgotten POD regenerations after edits to docs/libxml.dbk. - Move xmllibxmldocs.pl from example/ to scripts/. It is a maintenance tool that emits source files (POD), not a usage example of XML::LibXML; scripts/ already houses similar build/dev tooling. - Skip t/release-kwalitee.t outside a dist tarball. The Test::Kwalitee `has_meta_yml` check was failing under `make test` in author mode because META.yml is only generated by `make dist`. The test now skips cleanly when META.yml is absent and still runs the full 18-check suite under `make disttest` against the unpacked tarball. 2.0211 2026-05-19 [SECURITY / BUG FIXES] - Prevent out-of-bounds UTF-8 read in domParseChar by replacing it with libxml2's xmlValidateName. Truncated multi-byte sequences could cause heap reads past the NUL terminator across five DOM entry points (createElement, createAttribute, setNodeName, etc.). - GH #146, PR #149 - Enforce no_network even when a global externalEntityLoader is set. Previously XML_PARSE_NONET was silently ignored once a global callback was installed, enabling SSRF in multi-module applications that combine a third-party entity loader with no_network parsers. - GH #133, PR #143 - Prevent integer overflow in SAX CBuffer length tracking. Total character data exceeding INT_MAX (~2GB) overflowed the accumulator causing xmlMalloc to under-allocate and the subsequent memcpy to write past the buffer. - GH #135, PR #142 - Proper lifecycle management for externalEntityLoader: the global loader can now be cleared or replaced safely, the previous handler SV is no longer leaked, the returned value is a safe copy rather than the internal global SV, and per-parser ext_ent_handler state is separated from the global slot. - PR #138 - Add NULL checks after xmlMalloc returns in SAX CBuffer operations, converting OOM segfaults into catchable Perl exceptions. - GH #136, PR #140 - Add NULL check after xmlCopyNamespace in _domReconcileNs, matching the existing guard in _domReconcileNsAttr. - GH #137, PR #139 - Plug 11 memory leaks across XS/C code, including setBaseURI, URI/documentURI accessors, load_catalog, PSaxCharactersFlush, createAttributeNS, XPathContext::_find, _newForIO, _toStringC14N, lookupNamespacePrefix, _setNamespace, and the generic XPath extension function dispatcher. - GH #131, PR #132 - Handle Apple's local libxml2 patch where xmlSAX2ResolveEntity throws on a NULL URI, so t/13dtd.t no longer dies on macOS. - RT #2021, PR #102 - Skip t/50devel.t when mem_used() reports 0 bytes, which happens on Apple's libxml2 (system malloc bypasses the tracking wrappers). - RT #165193, PR #94 [IMPROVEMENTS] - Resolve Windows CI test failures and compiler warnings: use the file size (-s) for the byteConsumed test instead of a hardcoded 488 (CRLF inflates the file to 507 bytes), use Perl UV/PTR2UV in PmmRegistryName to avoid pointer truncation under Win64 LLP64, and use const xmlError* for xmlCtxtGetLastError to match the libxml2 2.12+ API. - PR #122 - Silence macOS build warnings cleanly by gating the libxml2 memory tracking API behind a HAVE_LIBXML_MEMORY_DEBUG feature macro. The deprecated calls are no longer compiled on systems where the API is gone (Apple SDK, libxml2 >= 2.14), mem_used is only exported when actually defined, and t/50devel.t skips with a clear reason. Also strip the bogus "-L/lib" entries Alien::Base::Wrapper injects into LDFLAGS on macOS. - PR #127 - Add a minimal hello-world HTML example (example/hello-world.pl) and add createInternalSubset("html", ...) to both HTML examples so they emit a proper declaration. - GH #66, PR #121 - Standardize XPath parameter naming to $xpath_expression throughout the DocBook source, matching the XML::LibXML::XPathExpression class name. - GH #64, PR #125 - Update outdated and dead references in README.md: point repository URLs at the canonical cpan-authors/XML-LibXML home, drop the defunct ActiveState mailing list, replace the long Windows nmake recipe with a Strawberry Perl note, refresh the macOS section, and bring the Package History up to date. - GH #129, PR #144 - Remove the stale "Known Issues" note about push-parser leaks. The leaks it referenced were fixed by Nick Wellnhofer in 2014. - Point distribution metadata at the cpan-authors GitHub repo and add an explicit bugtracker entry so MetaCPAN's "Issues" link goes to GitHub Issues instead of falling back to rt.cpan.org. - Add NamedNodeMap.pod to MANIFEST so the generated POD ships in the CPAN tarball; the L link in Node.pod now resolves on MetaCPAN. - GH #115, PR #118 - Update ppport.h and adopt its suggestions to reduce build issues. - Fix test suite with libxml2 2.13.0 and 2.14.0. - Remove tests that disable line numbers (always enabled since libxml2 2.15.0). - Use `our $VERSION` instead of `use vars`. - Fix formatting in docs/libxml.dbk. - GH #85 [MAINTENANCE] - Modernize the CI workflow with a dynamic Perl version matrix, centralized cpanfile, and updated action versions. - PR #108 - Use cpanm instead of cpm for the Linux CI matrix so jobs on Perl < 5.24 (down through 5.8) no longer fail to install dependencies. - GH #117, PR #119 - Expand CI platform coverage: FreeBSD 14.2, OpenBSD 7.6, NetBSD 10.1, Strawberry Perl on Windows, Fedora 43 container, AddressSanitizer, Devel::Cover + Codecov coverage upload, and a downstream XML::LibXSLT compatibility job. - PR #120 - Fix BSD CI: use the correct OpenBSD package name (`libxml`, not `libxml2`) and install Perl dependencies explicitly instead of relying on META.json autodiscovery. - PR #124 - Parallelize `make` compilation across CI jobs with platform-appropriate CPU detection. - PR #128 - Temporarily disable OpenBSD 7.6 CI due to unreliable runners. - PR #130 - Re-enable OpenBSD CI on version 7.8 once the runner situation stabilized. - PR #144 - Add a CLAUDE.md describing project layout, build/test commands, libxml2 version landscape, and coding conventions. - PR #116 - Add contributing guidelines covering CI, scope, MANIFEST, and version/release handling. - PR #126 - Add AI_POLICY.md documenting how AI tools are used (and not used) in this project. - Add MANIFEST.SKIP so local files (.hgignore, .tidyallrc, CLAUDE.md, etc.) are kept out of `make manifest` output. - Drop unused dev helper (`tester.sh`) and the stale TODO file. - Rename README to README.md and remove the obsolete Travis CI references. @ text @$NetBSD: patch-t_48__security__oob__utf8__gh146.t,v 1.1 2026/05/11 17:39:13 wiz Exp $ fix: validate UTF-8 continuation bytes in domParseChar https://github.com/cpan-authors/XML-LibXML/pull/149 --- t/48_security_oob_utf8_gh146.t.orig 2026-05-11 17:36:06.144804837 +0000 +++ t/48_security_oob_utf8_gh146.t @@@@ -0,0 +1,110 @@@@ +# Security regression test for GitHub issue #146: +# Out-of-bounds heap read in domParseChar on truncated UTF-8 sequences. +# +# domParseChar() read continuation bytes for multi-byte UTF-8 sequences +# without verifying they exist or are valid. A truncated sequence (e.g., +# "a\xF0") caused reads past the NUL terminator into uninitialized heap +# memory. This affects all DOM methods that validate node names via +# LibXML_test_node_name(): createElement, createAttribute, setNodeName, +# createElementNS, createAttributeNS, etc. +# +# Impact: denial of service (crash on unmapped memory) and potential +# information disclosure (reading adjacent heap allocations). +# +# Before the fix, these inputs triggered undefined behavior — the +# function read continuation bytes blindly, producing a garbage +# codepoint and advancing the pointer past the buffer into heap memory. +# After the fix, domParseChar rejects invalid/truncated sequences by +# returning 0 with *len = 1, and the caller rejects the name. + +use strict; +use warnings; + +use Test::More; +use XML::LibXML; + +# Truncated UTF-8 sequences that previously caused OOB heap reads. +# Each entry: [ bytes, description ] +# +# The leading "a" is a valid ASCII char so domParseChar succeeds on the +# first character, then LibXML_test_node_name loops and hits the +# truncated sequence on the second call — this is what triggered the +# OOB read: len was set to 2/3/4 but the actual bytes weren't there. +my @@truncated_sequences = ( + [ "a\xC0", "truncated 2-byte (leader only)" ], + [ "a\xC2", "truncated 2-byte (valid leader, missing continuation)" ], + [ "a\xE0", "truncated 3-byte (leader only)" ], + [ "a\xE0\x80", "truncated 3-byte (leader + 1 continuation)" ], + [ "a\xF0", "truncated 4-byte (leader only)" ], + [ "a\xF0\x80", "truncated 4-byte (leader + 1 continuation)" ], + [ "a\xF0\x80\x80", "truncated 4-byte (leader + 2 continuations)" ], +); + +# Invalid continuation bytes — the leader is valid but the continuations +# are not 10xxxxxx. Before the fix, these were read without validation, +# producing a garbage codepoint and advancing the pointer incorrectly. +my @@invalid_continuations = ( + [ "a\xC2\x41", "2-byte with ASCII continuation" ], + [ "a\xE0\x41\x80", "3-byte with ASCII in first continuation" ], + [ "a\xE0\x80\x41", "3-byte with ASCII in second continuation" ], + [ "a\xF0\x41\x80\x80", "4-byte with ASCII in first continuation" ], + [ "a\xF0\x80\x41\x80", "4-byte with ASCII in second continuation" ], + [ "a\xF0\x80\x80\x41", "4-byte with ASCII in third continuation" ], +); + +my @@all_bad = (@@truncated_sequences, @@invalid_continuations); + +# Methods that croak on invalid names +# TEST:$bad_count=13 +# TEST:$croak_methods=3 +my @@croak_methods = qw( createElement setNodeName createElementNS ); + +# Methods that return undef on invalid names (no exception) +# TEST:$undef_methods=2 +my @@undef_methods = qw( createAttribute createAttributeNS ); + +plan tests => scalar(@@all_bad) * (scalar(@@croak_methods) + scalar(@@undef_methods)); + +my $doc = XML::LibXML::Document->new(); +my $nsURI = "http://example.com/ns"; + +for my $case (@@all_bad) { + my ($bytes, $desc) = @@$case; + + # Methods that die on bad names + for my $method (@@croak_methods) { + my $died = 0; + eval { + if ($method eq 'createElement') { + $doc->createElement($bytes); + } + elsif ($method eq 'setNodeName') { + my $node = $doc->createElement("tmp"); + $node->setNodeName($bytes); + } + elsif ($method eq 'createElementNS') { + $doc->createElementNS($nsURI, $bytes); + } + }; + $died = 1 if $@@; + + # TEST*$bad_count*$croak_methods + ok($died, "$method dies on $desc"); + } + + # Methods that return undef on bad names + for my $method (@@undef_methods) { + my $result; + eval { + if ($method eq 'createAttribute') { + $result = $doc->createAttribute($bytes, "value"); + } + elsif ($method eq 'createAttributeNS') { + $result = $doc->createAttributeNS($nsURI, $bytes, "value"); + } + }; + + # TEST*$bad_count*$undef_methods + ok(!defined $result, "$method returns undef on $desc"); + } +} @ 1.1 log @p5-XML-LibXML: add another upstream pull request with a possible security fix Bump PKGREVISION. @ text @d1 1 a1 1 $NetBSD$ @