head 1.16; access; symbols pkgsrc-2023Q4:1.15.0.2 pkgsrc-2023Q4-base:1.15 pkgsrc-2023Q3:1.14.0.4 pkgsrc-2023Q3-base:1.14 pkgsrc-2023Q2:1.14.0.2 pkgsrc-2023Q2-base:1.14 pkgsrc-2023Q1:1.12.0.12 pkgsrc-2023Q1-base:1.12 pkgsrc-2022Q4:1.12.0.10 pkgsrc-2022Q4-base:1.12 pkgsrc-2022Q3:1.12.0.8 pkgsrc-2022Q3-base:1.12 pkgsrc-2022Q2:1.12.0.6 pkgsrc-2022Q2-base:1.12 pkgsrc-2022Q1:1.12.0.4 pkgsrc-2022Q1-base:1.12 pkgsrc-2021Q4:1.12.0.2 pkgsrc-2021Q4-base:1.12 pkgsrc-2021Q3:1.10.0.6 pkgsrc-2021Q3-base:1.10 pkgsrc-2021Q2:1.10.0.4 pkgsrc-2021Q2-base:1.10 pkgsrc-2021Q1:1.10.0.2 pkgsrc-2021Q1-base:1.10 pkgsrc-2020Q4:1.9.0.8 pkgsrc-2020Q4-base:1.9 pkgsrc-2020Q3:1.9.0.6 pkgsrc-2020Q3-base:1.9 pkgsrc-2020Q2:1.9.0.4 pkgsrc-2020Q2-base:1.9 pkgsrc-2020Q1:1.9.0.2 pkgsrc-2020Q1-base:1.9 pkgsrc-2019Q4:1.8.0.6 pkgsrc-2019Q4-base:1.8 pkgsrc-2019Q3:1.8.0.2 pkgsrc-2019Q3-base:1.8 pkgsrc-2019Q2:1.7.0.4 pkgsrc-2019Q2-base:1.7 pkgsrc-2019Q1:1.7.0.2 pkgsrc-2019Q1-base:1.7 pkgsrc-2018Q4:1.5.0.4 pkgsrc-2018Q4-base:1.5 pkgsrc-2018Q3:1.5.0.2 pkgsrc-2018Q3-base:1.5 pkgsrc-2018Q2:1.4.0.4 pkgsrc-2018Q2-base:1.4 pkgsrc-2018Q1:1.4.0.2 pkgsrc-2018Q1-base:1.4 pkgsrc-2017Q4:1.3.0.8 pkgsrc-2017Q4-base:1.3 pkgsrc-2017Q3:1.3.0.6 pkgsrc-2017Q3-base:1.3 pkgsrc-2017Q2:1.3.0.2 pkgsrc-2017Q2-base:1.3 pkgsrc-2017Q1:1.2.0.2 pkgsrc-2017Q1-base:1.2; locks; strict; comment @# @; 1.16 date 2024.02.16.19.02.45; author adam; state Exp; branches; next 1.15; commitid i6P7L6ng8lxhSGYE; 1.15 date 2023.10.10.17.18.23; author triaxx; state Exp; branches; next 1.14; commitid tLzh8iEqeU6qr6IE; 1.14 date 2023.05.10.12.40.44; author adam; state Exp; branches; next 1.13; commitid aWxKdAWGeJMePpoE; 1.13 date 2023.04.27.09.33.44; author adam; state Exp; branches; next 1.12; commitid tlgNed5lNpikcJmE; 1.12 date 2021.10.26.11.30.48; author nia; state Exp; branches; next 1.11; commitid Gv0TNLbuylhFsjeD; 1.11 date 2021.10.07.15.08.31; author nia; state Exp; branches; next 1.10; commitid kEwAbZZbki9jhTbD; 1.10 date 2021.03.22.08.56.56; author triaxx; state Exp; branches; next 1.9; commitid Do2yerH4BnWuChMC; 1.9 date 2020.01.29.22.06.30; author adam; state Exp; branches; next 1.8; commitid 6gSvVBpWVN1PqDUB; 1.8 date 2019.08.22.08.21.11; author adam; state Exp; branches; next 1.7; commitid 4ykVR3aKtVPZZZzB; 1.7 date 2019.01.31.09.07.46; author adam; state Exp; branches; next 1.6; commitid YqCcULhRsPC1NU9B; 1.6 date 2019.01.24.14.11.48; author adam; state Exp; branches; next 1.5; commitid D549NF1qLDkjH29B; 1.5 date 2018.08.14.06.56.39; author adam; state Exp; branches; next 1.4; commitid fUFzVRf2mUeNw3OA; 1.4 date 2018.01.04.21.31.41; author adam; state Exp; branches; next 1.3; commitid AYVt976oH416vBlA; 1.3 date 2017.05.20.06.25.36; author adam; state Exp; branches; next 1.2; commitid SHXh0bHW2C98R5Sz; 1.2 date 2017.03.19.22.59.11; author adam; state Exp; branches; next 1.1; commitid 7bTNXlThrZwwldKz; 1.1 date 2017.02.13.21.25.33; author adam; state Exp; branches; next ; commitid RFPMXVz9YZD0VPFz; desc @@ 1.16 log @py-scrapy: updated to 2.11.1 Scrapy 2.11.1 (2024-02-14) -------------------------- Highlights: - Security bug fixes. - Support for Twisted >= 23.8.0. - Documentation improvements. Security bug fixes ~~~~~~~~~~~~~~~~~~ - Addressed `ReDoS vulnerabilities`_: - ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of :func:`~scrapy.utils.iterators.xmliter_lxml`, which :class:`~scrapy.spiders.XMLFeedSpider` now uses. To minimize the impact of this change on existing code, :func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating the node namespace with a prefix in the node name, and big files with highly nested trees when using libxml2 2.7+. - Fixed regular expressions in the implementation of the :func:`~scrapy.utils.response.open_in_browser` function. Please, see the `cc65-xxvf-f7r9 security advisory`_ for more information. .. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS .. _cc65-xxvf-f7r9 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-cc65-xxvf-f7r9 - :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security advisory`_ for more information. .. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7 - Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the deprecated ``scrapy.downloadermiddlewares.decompression`` module has been removed. - The ``Authorization`` header is now dropped on redirects to a different domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more information. .. _cw9j-q3vf-hrrv security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-cw9j-q3vf-hrrv Modified requirements ~~~~~~~~~~~~~~~~~~~~~ - The Twisted dependency is no longer restricted to < 23.8.0. (:issue:`6024`, :issue:`6064`, :issue:`6142`) Bug fixes ~~~~~~~~~ - The OS signal handling code was refactored to no longer use private Twisted functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`) Documentation ~~~~~~~~~~~~~ - Improved documentation for :class:`~scrapy.crawler.Crawler` initialization changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`) - Extended documentation for :attr:`Request.meta `. (:issue:`5565`) - Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`, :issue:`6077`) - Added a link to Zyte's export guides to the :ref:`feed exports ` documentation. (:issue:`6183`) - Added a missing note about backward-incompatible changes in :class:`~scrapy.exporters.PythonItemExporter` to the 2.11.0 release notes. (:issue:`6060`, :issue:`6081`) - Added a missing note about removing the deprecated ``scrapy.utils.boto.is_botocore()`` function to the 2.8.0 release notes. (:issue:`6056`, :issue:`6061`) - Other documentation improvements. (:issue:`6128`, :issue:`6144`, :issue:`6163`, :issue:`6190`, :issue:`6192`) Quality assurance ~~~~~~~~~~~~~~~~~ - Added Python 3.12 to the CI configuration, re-enabled tests that were disabled when the pre-release support was added. (:issue:`5985`, :issue:`6083`, :issue:`6098`) - Fixed a test issue on PyPy 7.3.14. (:issue:`6204`, :issue:`6205`) @ text @$NetBSD: distinfo,v 1.15 2023/10/10 17:18:23 triaxx Exp $ BLAKE2s (Scrapy-2.11.1.tar.gz) = ec247564bb7f25be4bca8e966e593c7c6c222b9644cf05686d6d9a0a4a436b07 SHA512 (Scrapy-2.11.1.tar.gz) = c33bf8fe45c96865483398920e823bd169d7d7e5d67dcfd5e57e4546f1016cfdcb404ebcbf67a6710a4597d5970f55481226fee25c27291dfaedfc00322327d9 Size (Scrapy-2.11.1.tar.gz) = 1176726 bytes @ 1.15 log @py-scrapy: Update to 2.11.0 upstream changes: ----------------- * 2.11.0: https://docs.scrapy.org/en/latest/news.html#scrapy-2-11-0-2023-09-18 * 2.10.0: https://docs.scrapy.org/en/2.10/news.html#scrapy-2-10-0-2023-08-04 @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.14 2023/05/10 12:40:44 adam Exp $ d3 3 a5 3 BLAKE2s (Scrapy-2.11.0.tar.gz) = c4bfc4779599de5e70dab45d023b9e97ff3457e9f6c21c31bf5b77401f101a2d SHA512 (Scrapy-2.11.0.tar.gz) = bbebea94329ffacfb2b867884b3800986f4013bbbe34eb2d299c09a0a653ac2793e581d92509dabaa0f8b74a0b4fbeebedbad8fb1074b18ee522e73fad039d2b Size (Scrapy-2.11.0.tar.gz) = 1171092 bytes @ 1.14 log @py-scrapy: updated to 2.9.0 Scrapy 2.9.0 (2023-05-08) ------------------------- Highlights: - Per-domain download settings. - Compatibility with new cryptography_ and new parsel_. - JMESPath selectors from the new parsel_. - Bug fixes. Deprecations ~~~~~~~~~~~~ - :class:`scrapy.extensions.feedexport._FeedSlot` is renamed to :class:`scrapy.extensions.feedexport.FeedSlot` and the old name is deprecated. (:issue:`5876`) New features ~~~~~~~~~~~~ - Settings correponding to :setting:`DOWNLOAD_DELAY`, :setting:`CONCURRENT_REQUESTS_PER_DOMAIN` and :setting:`RANDOMIZE_DOWNLOAD_DELAY` can now be set on a per-domain basis via the new :setting:`DOWNLOAD_SLOTS` setting. (:issue:`5328`) - Added :meth:`TextResponse.jmespath`, a shortcut for JMESPath selectors available since parsel_ 1.8.1. (:issue:`5894`, :issue:`5915`) - Added :signal:`feed_slot_closed` and :signal:`feed_exporter_closed` signals. (:issue:`5876`) - Added :func:`scrapy.utils.request.request_to_curl`, a function to produce a curl command from a :class:`~scrapy.Request` object. (:issue:`5892`) - Values of :setting:`FILES_STORE` and :setting:`IMAGES_STORE` can now be :class:`pathlib.Path` instances. (:issue:`5801`) Bug fixes ~~~~~~~~~ - Fixed a warning with Parsel 1.8.1+. (:issue:`5903`, :issue:`5918`) - Fixed an error when using feed postprocessing with S3 storage. (:issue:`5500`, :issue:`5581`) - Added the missing :meth:`scrapy.settings.BaseSettings.setdefault` method. (:issue:`5811`, :issue:`5821`) - Fixed an error when using cryptography_ 40.0.0+ and :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` is enabled. (:issue:`5857`, :issue:`5858`) - The checksums returned by :class:`~scrapy.pipelines.files.FilesPipeline` for files on Google Cloud Storage are no longer Base64-encoded. (:issue:`5874`, :issue:`5891`) - :func:`scrapy.utils.request.request_from_curl` now supports $-prefixed string values for the curl ``--data-raw`` argument, which are produced by browsers for data that includes certain symbols. (:issue:`5899`, :issue:`5901`) - The :command:`parse` command now also works with async generator callbacks. (:issue:`5819`, :issue:`5824`) - The :command:`genspider` command now properly works with HTTPS URLs. (:issue:`3553`, :issue:`5808`) - Improved handling of asyncio loops. (:issue:`5831`, :issue:`5832`) - :class:`LinkExtractor ` now skips certain malformed URLs instead of raising an exception. (:issue:`5881`) - :func:`scrapy.utils.python.get_func_args` now supports more types of callables. (:issue:`5872`, :issue:`5885`) - Fixed an error when processing non-UTF8 values of ``Content-Type`` headers. (:issue:`5914`, :issue:`5917`) - Fixed an error breaking user handling of send failures in :meth:`scrapy.mail.MailSender.send()`. (:issue:`1611`, :issue:`5880`) Documentation ~~~~~~~~~~~~~ - Expanded contributing docs. (:issue:`5109`, :issue:`5851`) - Added blacken-docs_ to pre-commit and reformatted the docs with it. (:issue:`5813`, :issue:`5816`) - Fixed a JS issue. (:issue:`5875`, :issue:`5877`) - Fixed ``make htmlview``. (:issue:`5878`, :issue:`5879`) - Fixed typos and other small errors. (:issue:`5827`, :issue:`5839`, :issue:`5883`, :issue:`5890`, :issue:`5895`, :issue:`5904`) Quality assurance ~~~~~~~~~~~~~~~~~ - Extended typing hints. (:issue:`5805`, :issue:`5889`, :issue:`5896`) - Tests for most of the examples in the docs are now run as a part of CI, found problems were fixed. (:issue:`5816`, :issue:`5826`, :issue:`5919`) - Removed usage of deprecated Python classes. (:issue:`5849`) - Silenced ``include-ignored`` warnings from coverage. (:issue:`5820`) - Fixed a random failure of the ``test_feedexport.test_batch_path_differ`` test. (:issue:`5855`, :issue:`5898`) - Updated docstrings to match output produced by parsel_ 1.8.1 so that they don't cause test failures. (:issue:`5902`, :issue:`5919`) - Other CI and pre-commit improvements. (:issue:`5802`, :issue:`5823`, :issue:`5908`) @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.13 2023/04/27 09:33:44 adam Exp $ d3 3 a5 3 BLAKE2s (Scrapy-2.9.0.tar.gz) = 6e844d42e3fe57c2ada806619cf63374243167f386b2ba97e661e381eae92bf7 SHA512 (Scrapy-2.9.0.tar.gz) = 92f78d785e9770d5c38e72daeeb2a72d4f410692063c35d4997aa00afb8c9174b2ecb904eae551e20a0ea5a2585dece8e467909caf8f82411c58f04bdd9a83f0 Size (Scrapy-2.9.0.tar.gz) = 1150623 bytes @ 1.13 log @py-scrapy: updated to 2.8.0 Scrapy 2.8.0 (2023-02-02) ------------------------- This is a maintenance release, with minor features, bug fixes, and cleanups. Deprecation removals ~~~~~~~~~~~~~~~~~~~~ - The ``scrapy.utils.gz.read1`` function, deprecated in Scrapy 2.0, has now been removed. Use the :meth:`~io.BufferedIOBase.read1` method of :class:`~gzip.GzipFile` instead. - The ``scrapy.utils.python.to_native_str`` function, deprecated in Scrapy 2.0, has now been removed. Use :func:`scrapy.utils.python.to_unicode` instead. - The ``scrapy.utils.python.MutableChain.next`` method, deprecated in Scrapy 2.0, has now been removed. Use :meth:`~scrapy.utils.python.MutableChain.__next__` instead. - The ``scrapy.linkextractors.FilteringLinkExtractor`` class, deprecated in Scrapy 2.0, has now been removed. Use :class:`LinkExtractor ` instead. - Support for using environment variables prefixed with ``SCRAPY_`` to override settings, deprecated in Scrapy 2.0, has now been removed. - Support for the ``noconnect`` query string argument in proxy URLs, deprecated in Scrapy 2.0, has now been removed. We expect proxies that used to need it to work fine without it. - The ``scrapy.utils.python.retry_on_eintr`` function, deprecated in Scrapy 2.3, has now been removed. - The ``scrapy.utils.python.WeakKeyCache`` class, deprecated in Scrapy 2.4, has now been removed. Deprecations ~~~~~~~~~~~~ - :exc:`scrapy.pipelines.images.NoimagesDrop` is now deprecated. - :meth:`ImagesPipeline.convert_image ` must now accept a ``response_body`` parameter. New features ~~~~~~~~~~~~ - Applied black_ coding style to files generated with the :command:`genspider` and :command:`startproject` commands. .. _black: https://black.readthedocs.io/en/stable/ - :setting:`FEED_EXPORT_ENCODING` is now set to ``"utf-8"`` in the ``settings.py`` file that the :command:`startproject` command generates. With this value, JSON exports won’t force the use of escape sequences for non-ASCII characters. - The :class:`~scrapy.extensions.memusage.MemoryUsage` extension now logs the peak memory usage during checks, and the binary unit MiB is now used to avoid confusion. - The ``callback`` parameter of :class:`~scrapy.http.Request` can now be set to :func:`scrapy.http.request.NO_CALLBACK`, to distinguish it from ``None``, as the latter indicates that the default spider callback (:meth:`~scrapy.Spider.parse`) is to be used. Bug fixes ~~~~~~~~~ - Enabled unsafe legacy SSL renegotiation to fix access to some outdated websites. - Fixed STARTTLS-based email delivery not working with Twisted 21.2.0 and better. - Fixed the :meth:`finish_exporting` method of :ref:`item exporters ` not being called for empty files. - Fixed HTTP/2 responses getting only the last value for a header when multiple headers with the same name are received. - Fixed an exception raised by the :command:`shell` command on some cases when :ref:`using asyncio `. - When using :class:`~scrapy.spiders.CrawlSpider`, callback keyword arguments (``cb_kwargs``) added to a request in the ``process_request`` callback of a :class:`~scrapy.spiders.Rule` will no longer be ignored. - The :ref:`images pipeline ` no longer re-encodes JPEG files. - Fixed the handling of transparent WebP images by the :ref:`images pipeline `. - :func:`scrapy.shell.inspect_response` no longer inhibits ``SIGINT`` (Ctrl+C). - :class:`LinkExtractor ` with ``unique=False`` no longer filters out links that have identical URL *and* text. - :class:`~scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware` now ignores URL protocols that do not support ``robots.txt`` (``data://``, ``file://``). - Silenced the ``filelock`` debug log messages introduced in Scrapy 2.6. - Fixed the output of ``scrapy -h`` showing an unintended ``**commands**`` line. - Made the active project indication in the output of :ref:`commands ` more clear. Documentation ~~~~~~~~~~~~~ - Documented how to :ref:`debug spiders from Visual Studio Code `. - Documented how :setting:`DOWNLOAD_DELAY` affects per-domain concurrency. - Improved consistency. - Fixed typos. Quality assurance ~~~~~~~~~~~~~~~~~ - Applied :ref:`black coding style `, sorted import statements, and introduced :ref:`pre-commit `. - Switched from :mod:`os.path` to :mod:`pathlib`. - Addressed many issues reported by Pylint. - Improved code readability. - Improved package metadata. - Removed direct invocations of ``setup.py``. - Removed unnecessary :class:`~collections.OrderedDict` usages. - Removed unnecessary ``__str__`` definitions. - Removed obsolete code and comments. - Fixed test and CI issues. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.12 2021/10/26 11:30:48 nia Exp $ d3 3 a5 3 BLAKE2s (Scrapy-2.8.0.tar.gz) = e3bd0d640992aab05d1a5c9108b5b741cea4ad9dc902afc5d385eb9ba14c6a61 SHA512 (Scrapy-2.8.0.tar.gz) = 6e631f84e27aeab5aeae672047869deb783b3c2f6de66c9bd6df768598a638d7e76a3f38945bfdf82f5ca0eb69491c262960f1645fe2f4947f72c0829a0eefaf Size (Scrapy-2.8.0.tar.gz) = 1140185 bytes @ 1.12 log @www: Replace RMD160 checksums with BLAKE2s checksums All checksums have been double-checked against existing RMD160 and SHA512 hashes Not committed (merge conflicts): www/nghttp2/distinfo Unfetchable distfiles (almost certainly fetched conditionally...): ./www/nginx-devel/distinfo array-var-nginx-module-0.05.tar.gz ./www/nginx-devel/distinfo echo-nginx-module-0.62.tar.gz ./www/nginx-devel/distinfo encrypted-session-nginx-module-0.08.tar.gz ./www/nginx-devel/distinfo form-input-nginx-module-0.12.tar.gz ./www/nginx-devel/distinfo headers-more-nginx-module-0.33.tar.gz ./www/nginx-devel/distinfo lua-nginx-module-0.10.19.tar.gz ./www/nginx-devel/distinfo naxsi-1.3.tar.gz ./www/nginx-devel/distinfo nginx-dav-ext-module-3.0.0.tar.gz ./www/nginx-devel/distinfo nginx-rtmp-module-1.2.2.tar.gz ./www/nginx-devel/distinfo nginx_http_push_module-1.2.10.tar.gz ./www/nginx-devel/distinfo ngx_cache_purge-2.5.1.tar.gz ./www/nginx-devel/distinfo ngx_devel_kit-0.3.1.tar.gz ./www/nginx-devel/distinfo ngx_http_geoip2_module-3.3.tar.gz ./www/nginx-devel/distinfo njs-0.5.0.tar.gz ./www/nginx-devel/distinfo set-misc-nginx-module-0.32.tar.gz ./www/nginx/distinfo array-var-nginx-module-0.05.tar.gz ./www/nginx/distinfo echo-nginx-module-0.62.tar.gz ./www/nginx/distinfo encrypted-session-nginx-module-0.08.tar.gz ./www/nginx/distinfo form-input-nginx-module-0.12.tar.gz ./www/nginx/distinfo headers-more-nginx-module-0.33.tar.gz ./www/nginx/distinfo lua-nginx-module-0.10.19.tar.gz ./www/nginx/distinfo naxsi-1.3.tar.gz ./www/nginx/distinfo nginx-dav-ext-module-3.0.0.tar.gz ./www/nginx/distinfo nginx-rtmp-module-1.2.2.tar.gz ./www/nginx/distinfo nginx_http_push_module-1.2.10.tar.gz ./www/nginx/distinfo ngx_cache_purge-2.5.1.tar.gz ./www/nginx/distinfo ngx_devel_kit-0.3.1.tar.gz ./www/nginx/distinfo ngx_http_geoip2_module-3.3.tar.gz ./www/nginx/distinfo njs-0.5.0.tar.gz ./www/nginx/distinfo set-misc-nginx-module-0.32.tar.gz @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.11 2021/10/07 15:08:31 nia Exp $ d3 3 a5 3 BLAKE2s (Scrapy-2.4.1.tar.gz) = 38d33dd75b56a710b0624558094c326b280669ad81fc29d5df2bf6037e93df8e SHA512 (Scrapy-2.4.1.tar.gz) = 65e1f6b92a7ca1b46b3edbe3e668e11cc5140fbf983ac7fce38c31282009a848c02883bda8d56ea3019c84658839ee10e7237c9290cfe9a8d6b6abee07566b2a Size (Scrapy-2.4.1.tar.gz) = 1044246 bytes @ 1.11 log @www: Remove SHA1 hashes for distfiles @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.10 2021/03/22 08:56:56 triaxx Exp $ d3 1 a3 1 RMD160 (Scrapy-2.4.1.tar.gz) = 67e32fdb3f9c7828739a89f6d73e6241ab912534 @ 1.10 log @py-scrapy: Update to 2.4.1 upstream cheanges: ------------------ A lot of changes listed at https://github.com/scrapy/scrapy/blob/master/docs/news.rst @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.9 2020/01/29 22:06:30 adam Exp $ a2 1 SHA1 (Scrapy-2.4.1.tar.gz) = f83a20532084724fb1725c79ff427c7e279b1cb8 @ 1.9 log @py-scrapy: updated to 1.8.0 Scrapy 1.8.0: Highlights: * Dropped Python 3.4 support and updated minimum requirements; made Python 3.8 support official * New :meth:`Request.from_curl ` class method * New :setting:`ROBOTSTXT_PARSER` and :setting:`ROBOTSTXT_USER_AGENT` settings * New :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` and :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` settings @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.8 2019/08/22 08:21:11 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.8.0.tar.gz) = 46dcec4b3c18f63ba14d30b050d2c5cdf3554ef5 RMD160 (Scrapy-1.8.0.tar.gz) = 8cc9d59d8428fb948ed1463274bb895d09a86e38 SHA512 (Scrapy-1.8.0.tar.gz) = fcc988cee171a8134ff8a6cd2655d715942fb9e720a070654bc634863e444f121e990910a0c1377e3b4679d496610267cc43eb907b8cf27d0623053c66ff26b9 Size (Scrapy-1.8.0.tar.gz) = 977658 bytes @ 1.8 log @py-scrapy: updated to 1.7.3 Scrapy 1.7.3: Enforce lxml 4.3.5 or lower for Python 3.4 (issue 3912, issue 3918). Scrapy 1.7.2: Fix Python 2 support (issue 3889, issue 3893, issue 3896). Scrapy 1.7.1: Re-packaging of Scrapy 1.7.0, which was missing some changes in PyPI. Scrapy 1.7.0: Highlights: Improvements for crawls targeting multiple domains A cleaner way to pass arguments to callbacks A new class for JSON requests Improvements for rule-based spiders New features for feed exports Backward-incompatible changes 429 is now part of the RETRY_HTTP_CODES setting by default This change is backward incompatible. If you don’t want to retry 429, you must override RETRY_HTTP_CODES accordingly. Crawler, CrawlerRunner.crawl and CrawlerRunner.create_crawler no longer accept a Spider subclass instance, they only accept a Spider subclass now. Spider subclass instances were never meant to work, and they were not working as one would expect: instead of using the passed Spider subclass instance, their from_crawler method was called to generate a new instance. Non-default values for the SCHEDULER_PRIORITY_QUEUE setting may stop working. Scheduler priority queue classes now need to handle Request objects instead of arbitrary Python data structures. New features A new scheduler priority queue, scrapy.pqueues.DownloaderAwarePriorityQueue, may be enabled for a significant scheduling improvement on crawls targetting multiple web domains, at the cost of no CONCURRENT_REQUESTS_PER_IP support (issue 3520) A new Request.cb_kwargs attribute provides a cleaner way to pass keyword arguments to callback methods (issue 1138, issue 3563) A new JSONRequest class offers a more convenient way to build JSON requests (issue 3504, issue 3505) A process_request callback passed to the Rule constructor now receives the Response object that originated the request as its second argument (issue 3682) A new restrict_text parameter for the LinkExtractor constructor allows filtering links by linking text (issue 3622, issue 3635) A new FEED_STORAGE_S3_ACL setting allows defining a custom ACL for feeds exported to Amazon S3 (issue 3607) A new FEED_STORAGE_FTP_ACTIVE setting allows using FTP’s active connection mode for feeds exported to FTP servers (issue 3829) A new METAREFRESH_IGNORE_TAGS setting allows overriding which HTML tags are ignored when searching a response for HTML meta tags that trigger a redirect (issue 1422, issue 3768) A new redirect_reasons request meta key exposes the reason (status code, meta refresh) behind every followed redirect (issue 3581, issue 3687) The SCRAPY_CHECK variable is now set to the true string during runs of the check command, which allows detecting contract check runs from code (issue 3704, issue 3739) A new Item.deepcopy() method makes it easier to deep-copy items (issue 1493, issue 3671) CoreStats also logs elapsed_time_seconds now (issue 3638) Exceptions from ItemLoader input and output processors are now more verbose (issue 3836, issue 3840) Crawler, CrawlerRunner.crawl and CrawlerRunner.create_crawler now fail gracefully if they receive a Spider subclass instance instead of the subclass itself (issue 2283, issue 3610, issue 3872) Bug fixes process_spider_exception() is now also invoked for generators (issue 220, issue 2061) System exceptions like KeyboardInterrupt are no longer caught (issue 3726) ItemLoader.load_item() no longer makes later calls to ItemLoader.get_output_value() or ItemLoader.load_item() return empty data (issue 3804, issue 3819) The images pipeline (ImagesPipeline) no longer ignores these Amazon S3 settings: AWS_ENDPOINT_URL, AWS_REGION_NAME, AWS_USE_SSL, AWS_VERIFY (issue 3625) Fixed a memory leak in MediaPipeline affecting, for example, non-200 responses and exceptions from custom middlewares (issue 3813) Requests with private callbacks are now correctly unserialized from disk (issue 3790) FormRequest.from_response() now handles invalid methods like major web browsers @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.7 2019/01/31 09:07:46 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.7.3.tar.gz) = 905d01beac4a1deeb742e72308b34348c37e4ae5 RMD160 (Scrapy-1.7.3.tar.gz) = ebe54257cfa20c6bc28995ecf6926a9b5c029ed8 SHA512 (Scrapy-1.7.3.tar.gz) = 45638732829976443714988ddcd016f7c222b2796c7bd353d6a93186e0182782211af60d1417cdf0980fa5ed6113c2e94b89e2d13ac42999ec1e45457913382d Size (Scrapy-1.7.3.tar.gz) = 951640 bytes @ 1.7 log @py-scrapy: updated to 1.6.0 Scrapy 1.6.0: Highlights: * better Windows support; * Python 3.7 compatibility; * big documentation improvements, including a switch from .extract_first() + .extract() API to .get() + .getall() API; * feed exports, FilePipeline and MediaPipeline improvements; * better extensibility: :signal:item_error and :signal:request_reached_downloader signals; from_crawler support for feed exporters, feed storages and dupefilters. * scrapy.contracts fixes and new features; * telnet console security improvements, first released as a backport in :ref:release-1.5.2; * clean-up of the deprecated code; * various bug fixes, small new features and usability improvements across the codebase. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.6 2019/01/24 14:11:48 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.6.0.tar.gz) = 731714a49ee4974008182527b0d9fe35f69b6769 RMD160 (Scrapy-1.6.0.tar.gz) = 8fbe6fea79ba57f9c2f03d0c54a7982ab51e9f60 SHA512 (Scrapy-1.6.0.tar.gz) = 8c0581977d5d4e22afc535fbfff96d51dcc171dc60e21b3a2e35b327f83a484960b7979a5fc79502175441cff92a2f6dfa9511fd3de259eb7a0d4cfc28577e1e Size (Scrapy-1.6.0.tar.gz) = 926576 bytes @ 1.6 log @py-scrapy: updated to 1.5.2 Scrapy 1.5.2: * *Security bugfix*: Telnet console extension can be easily exploited by rogue websites POSTing content to http://localhost:6023, we haven't found a way to exploit it from Scrapy, but it is very easy to trick a browser to do so and elevates the risk for local development environment. *The fix is backwards incompatible*, it enables telnet user-password authentication by default with a random generated password. If you can't upgrade right away, please consider setting :setting:TELNET_CONSOLE_PORT out of its default value. See :ref:telnet console documentation for more info * Backport CI build failure under GCE environemnt due to boto import error. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.5 2018/08/14 06:56:39 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.5.2.tar.gz) = 8b2d29cb674ed144972883d15f66c3116942df75 RMD160 (Scrapy-1.5.2.tar.gz) = 2dc4c5c8617c762240ae1342665627b6b803f846 SHA512 (Scrapy-1.5.2.tar.gz) = 4732761d9452ae2157ba7a1ceda02cdc6e417f3e09bc1e02ff5a9e5288c8dc0472c77a0da2b4c3bb8510f94b7e6e93b0fbf1a98df629a86dd5c3803c0ee0b081 Size (Scrapy-1.5.2.tar.gz) = 919358 bytes @ 1.5 log @py-scrapy: updated to 1.5.1 Scrapy 1.5.1: This is a maintenance release with important bug fixes, but no new features: * O(N^2) gzip decompression issue which affected Python 3 and PyPy is fixed * skipping of TLS validation errors is improved * Ctrl-C handling is fixed in Python 3.5+ * testing fixes * documentation improvements @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.4 2018/01/04 21:31:41 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.5.1.tar.gz) = 3fe1d9ce050bcbd315e764d564b9384029a200fc RMD160 (Scrapy-1.5.1.tar.gz) = 19234b618ae8dd5850126e56c93013d6e102524d SHA512 (Scrapy-1.5.1.tar.gz) = 6c82a53dce48d47d92dc87d7d37d1ce1e106a3a51f6a55c9bd9e6665cda4f27666a0488e8462eaba3a66f826e5a14676d8da2ffa6843f25112734c83a1777552 Size (Scrapy-1.5.1.tar.gz) = 908961 bytes @ 1.4 log @py-scrapy: updated to 1.5.0 Scrapy 1.5.0: This release brings small new features and improvements across the codebase. Some highlights: * Google Cloud Storage is supported in FilesPipeline and ImagesPipeline. * Crawling with proxy servers becomes more efficient, as connections to proxies can be reused now. * Warnings, exception and logging messages are improved to make debugging easier. * scrapy parse command now allows to set custom request meta via --meta argument. * Compatibility with Python 3.6, PyPy and PyPy3 is improved; PyPy and PyPy3 are now supported officially, by running tests on CI. * Better default handling of HTTP 308, 522 and 524 status codes. * Documentation is improved, as usual. Backwards Incompatible Changes * Scrapy 1.5 drops support for Python 3.3. * Default Scrapy User-Agent now uses https link to scrapy.org. **This is technically backwards-incompatible**; override :setting:USER_AGENT if you relied on old value. * Logging of settings overridden by custom_settings is fixed; **this is technically backwards-incompatible** because the logger changes from [scrapy.utils.log] to [scrapy.crawler]. If you're parsing Scrapy logs, please update your log parsers. * LinkExtractor now ignores m4v extension by default, this is change in behavior. * 522 and 524 status codes are added to RETRY_HTTP_CODES New features - Support tags in Response.follow - Support for ptpython REPL - Google Cloud Storage support for FilesPipeline and ImagesPipeline - New --meta option of the "scrapy parse" command allows to pass additional request.meta - Populate spider variable when using shell.inspect_response - Handle HTTP 308 Permanent Redirect - Add 522 and 524 to RETRY_HTTP_CODES - Log versions information at startup - scrapy.mail.MailSender now works in Python 3 (it requires Twisted 17.9.0) - Connections to proxy servers are reused - Add template for a downloader middleware - Explicit message for NotImplementedError when parse callback not defined - CrawlerProcess got an option to disable installation of root log handler - LinkExtractor now ignores m4v extension by default - Better log messages for responses over :setting:DOWNLOAD_WARNSIZE and :setting:DOWNLOAD_MAXSIZE limits - Show warning when a URL is put to Spider.allowed_domains instead of a domain. Bug fixes - Fix logging of settings overridden by custom_settings; **this is technically backwards-incompatible** because the logger changes from [scrapy.utils.log] to [scrapy.crawler], so please update your log parsers if needed - Default Scrapy User-Agent now uses https link to scrapy.org. **This is technically backwards-incompatible**; override :setting:USER_AGENT if you relied on old value. - Fix PyPy and PyPy3 test failures, support them officially - Fix DNS resolver when DNSCACHE_ENABLED=False - Add cryptography for Debian Jessie tox test env - Add verification to check if Request callback is callable - Port extras/qpsclient.py to Python 3 - Use getfullargspec under the scenes for Python 3 to stop DeprecationWarning - Update deprecated test aliases - Fix SitemapSpider support for alternate links @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.3 2017/05/20 06:25:36 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.5.0.tar.gz) = 466a6e502585507f0bdd711043a5474ba0f3899d RMD160 (Scrapy-1.5.0.tar.gz) = 083f584cbe11a9382eef6314829f891f3b2a3b9d SHA512 (Scrapy-1.5.0.tar.gz) = b2fb3bc58ab2fe64b8527c9b33478e9bb5239a15c793147d7e1af2827daf2de219c506e07596cdd5ff1ed51a2f489028b29f9ffa8b729125098892dea35d8b50 Size (Scrapy-1.5.0.tar.gz) = 905439 bytes @ 1.3 log @Scrapy 1.4 does not bring that many breathtaking new features but quite a few handy improvements nonetheless. Scrapy now supports anonymous FTP sessions with customizable user and password via the new :setting:`FTP_USER` and :setting:`FTP_PASSWORD` settings. And if you're using Twisted version 17.1.0 or above, FTP is now available with Python 3. There's a new :meth:`response.follow ` method for creating requests; **it is now a recommended way to create Requests in Scrapy spiders**. This method makes it easier to write correct spiders; ``response.follow`` has several advantages over creating ``scrapy.Request`` objects directly: * it handles relative URLs; * it works properly with non-ascii URLs on non-UTF8 pages; * in addition to absolute and relative URLs it supports Selectors; for ```` elements it can also extract their href values. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.2 2017/03/19 22:59:11 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.4.0.tar.gz) = 24222debf2e6b9220a91a56c476c208ac5ecb8e5 RMD160 (Scrapy-1.4.0.tar.gz) = ef20b9288851962fb552c1045e297c8917a74d17 SHA512 (Scrapy-1.4.0.tar.gz) = eedcd7003c51f45a580f160b4e5f428c01713e4ecb5b64e35570bc750fc03bef7cc991e318ef4ff9c96e12a2d21cc32d0f07ce278486ea2e65f08e53c3e4a8f1 Size (Scrapy-1.4.0.tar.gz) = 898159 bytes @ 1.2 log @Changes 1.3.3: Bug fixes - Make ``SpiderLoader`` raise ``ImportError`` again by default for missing dependencies and wrong :setting:`SPIDER_MODULES`. These exceptions were silenced as warnings since 1.3.0. A new setting is introduced to toggle between warning or exception if needed ; see :setting:`SPIDER_LOADER_WARN_ONLY` for details. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.1 2017/02/13 21:25:33 adam Exp $ d3 4 a6 4 SHA1 (Scrapy-1.3.3.tar.gz) = 603b258932b868ad6315e374b7b3de8c45564264 RMD160 (Scrapy-1.3.3.tar.gz) = a23e026285640af4cf1ec29590040a67ef57326d SHA512 (Scrapy-1.3.3.tar.gz) = 795ac4a421be6e903e14cb9d9b242f4e6a130e3f51d43bf2a0904de0e4cadb4f54837badb63a65331ab67c4f8d6111f6fc85e6cb79acca5544db128b55ba3867 Size (Scrapy-1.3.3.tar.gz) = 848990 bytes @ 1.1 log @Added www/py-scrapy version 1.3.2 Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. @ text @d1 1 a1 1 $NetBSD: distinfo,v 1.1.1.1 2012/06/07 18:11:03 slitvinov Exp $ d3 4 a6 4 SHA1 (Scrapy-1.3.2.tar.gz) = cbf6ca3fdcac2b47b90f774b3d1fd390cadd0229 RMD160 (Scrapy-1.3.2.tar.gz) = 182a8d28025c0a91217ae1c0f341986b8ad97deb SHA512 (Scrapy-1.3.2.tar.gz) = 06c034a4a23dfefe449685c9c95bb518ae4d56f8512802570e0885daec7b380f08381284ec4b31e322d1bf0dc7301f6d470b8fdd06ac3c45ce2101339685045d Size (Scrapy-1.3.2.tar.gz) = 848561 bytes @