head 1.1; branch 1.1.1; access; symbols netbsd-11-0-RC5:1.1.1.2 netbsd-11-0-RC4:1.1.1.2 netbsd-11-0-RC3:1.1.1.2 gdb-17-1:1.1.1.2 netbsd-11-0-RC2:1.1.1.2 netbsd-11-0-RC1:1.1.1.2 gdb-16-3:1.1.1.2 perseant-exfatfs-base-20250801:1.1.1.2 netbsd-11:1.1.1.2.0.4 netbsd-11-base:1.1.1.2 netbsd-10-1-RELEASE:1.1.1.1 gdb-15-1:1.1.1.2 perseant-exfatfs-base-20240630:1.1.1.2 perseant-exfatfs:1.1.1.2.0.2 perseant-exfatfs-base:1.1.1.2 netbsd-8-3-RELEASE:1.1.1.1 netbsd-9-4-RELEASE:1.1.1.1 netbsd-10-0-RELEASE:1.1.1.1 netbsd-10-0-RC6:1.1.1.1 netbsd-10-0-RC5:1.1.1.1 netbsd-10-0-RC4:1.1.1.1 netbsd-10-0-RC3:1.1.1.1 netbsd-10-0-RC2:1.1.1.1 netbsd-10-0-RC1:1.1.1.1 gdb-13-2:1.1.1.2 netbsd-10:1.1.1.1.0.26 netbsd-10-base:1.1.1.1 netbsd-9-3-RELEASE:1.1.1.1 cjep_sun2x-base1:1.1.1.1 cjep_sun2x:1.1.1.1.0.24 cjep_sun2x-base:1.1.1.1 cjep_staticlib_x-base1:1.1.1.1 netbsd-9-2-RELEASE:1.1.1.1 cjep_staticlib_x:1.1.1.1.0.22 cjep_staticlib_x-base:1.1.1.1 netbsd-9-1-RELEASE:1.1.1.1 GDB-11-0-50-20200914-git:1.1.1.1 phil-wifi-20200421:1.1.1.1 phil-wifi-20200411:1.1.1.1 is-mlppp:1.1.1.1.0.20 is-mlppp-base:1.1.1.1 phil-wifi-20200406:1.1.1.1 netbsd-8-2-RELEASE:1.1.1.1 netbsd-9-0-RELEASE:1.1.1.1 netbsd-9-0-RC2:1.1.1.1 netbsd-9-0-RC1:1.1.1.1 phil-wifi-20191119:1.1.1.1 netbsd-9:1.1.1.1.0.18 netbsd-9-base:1.1.1.1 phil-wifi-20190609:1.1.1.1 netbsd-8-1-RELEASE:1.1.1.1 gdb-8-3:1.1.1.1 netbsd-8-1-RC1:1.1.1.1 pgoyette-compat-merge-20190127:1.1.1.1 pgoyette-compat-20190127:1.1.1.1 pgoyette-compat-20190118:1.1.1.1 pgoyette-compat-1226:1.1.1.1 pgoyette-compat-1126:1.1.1.1 pgoyette-compat-1020:1.1.1.1 pgoyette-compat-0930:1.1.1.1 pgoyette-compat-0906:1.1.1.1 pgoyette-compat-0728:1.1.1.1 netbsd-8-0-RELEASE:1.1.1.1 phil-wifi:1.1.1.1.0.16 phil-wifi-base:1.1.1.1 pgoyette-compat-0625:1.1.1.1 netbsd-8-0-RC2:1.1.1.1 pgoyette-compat-0521:1.1.1.1 pgoyette-compat-0502:1.1.1.1 pgoyette-compat-0422:1.1.1.1 netbsd-8-0-RC1:1.1.1.1 pgoyette-compat-0415:1.1.1.1 pgoyette-compat-0407:1.1.1.1 pgoyette-compat-0330:1.1.1.1 pgoyette-compat-0322:1.1.1.1 pgoyette-compat-0315:1.1.1.1 pgoyette-compat:1.1.1.1.0.14 pgoyette-compat-base:1.1.1.1 gdb-8-0-1:1.1.1.1 matt-nb8-mediatek:1.1.1.1.0.12 matt-nb8-mediatek-base:1.1.1.1 perseant-stdc-iso10646:1.1.1.1.0.10 perseant-stdc-iso10646-base:1.1.1.1 netbsd-8:1.1.1.1.0.8 netbsd-8-base:1.1.1.1 prg-localcount2-base3:1.1.1.1 prg-localcount2-base2:1.1.1.1 prg-localcount2-base1:1.1.1.1 prg-localcount2:1.1.1.1.0.6 prg-localcount2-base:1.1.1.1 pgoyette-localcount-20170426:1.1.1.1 bouyer-socketcan-base1:1.1.1.1 pgoyette-localcount-20170320:1.1.1.1 bouyer-socketcan:1.1.1.1.0.4 bouyer-socketcan-base:1.1.1.1 pgoyette-localcount-20170107:1.1.1.1 pgoyette-localcount-20161104:1.1.1.1 gdb-7-12:1.1.1.1 localcount-20160914:1.1.1.1 pgoyette-localcount-20160806:1.1.1.1 pgoyette-localcount-20160726:1.1.1.1 pgoyette-localcount:1.1.1.1.0.2 pgoyette-localcount-base:1.1.1.1 gdb-7-10-1:1.1.1.1 FSF:1.1.1; locks; strict; comment @# @; 1.1 date 2016.02.03.03.00.20; author christos; state Exp; branches 1.1.1.1; next ; commitid uy8uLhh4ZQtrSpTy; 1.1.1.1 date 2016.02.03.03.00.20; author christos; state Exp; branches; next 1.1.1.2; commitid uy8uLhh4ZQtrSpTy; 1.1.1.2 date 2023.07.30.22.46.24; author christos; state Exp; branches; next ; commitid HEIv4Prd74m1wSyE; desc @@ 1.1 log @Initial revision @ text @A Fast Method for Identifying Plain Text Files ============================================== Introduction ------------ Given a file coming from an unknown source, it is sometimes desirable to find out whether the format of that file is plain text. Although this may appear like a simple task, a fully accurate detection of the file type requires heavy-duty semantic analysis on the file contents. It is, however, possible to obtain satisfactory results by employing various heuristics. Previous versions of PKZip and other zip-compatible compression tools were using a crude detection scheme: if more than 80% (4/5) of the bytes found in a certain buffer are within the range [7..127], the file is labeled as plain text, otherwise it is labeled as binary. A prominent limitation of this scheme is the restriction to Latin-based alphabets. Other alphabets, like Greek, Cyrillic or Asian, make extensive use of the bytes within the range [128..255], and texts using these alphabets are most often misidentified by this scheme; in other words, the rate of false negatives is sometimes too high, which means that the recall is low. Another weakness of this scheme is a reduced precision, due to the false positives that may occur when binary files containing large amounts of textual characters are misidentified as plain text. In this article we propose a new, simple detection scheme that features a much increased precision and a near-100% recall. This scheme is designed to work on ASCII, Unicode and other ASCII-derived alphabets, and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.) and variable-sized encodings (ISO-2022, UTF-8, etc.). Wider encodings (UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however. The Algorithm ------------- The algorithm works by dividing the set of bytecodes [0..255] into three categories: - The white list of textual bytecodes: 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255. - The gray list of tolerated bytecodes: 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC). - The black list of undesired, non-textual bytecodes: 0 (NUL) to 6, 14 to 31. If a file contains at least one byte that belongs to the white list and no byte that belongs to the black list, then the file is categorized as plain text; otherwise, it is categorized as binary. (The boundary case, when the file is empty, automatically falls into the latter category.) Rationale --------- The idea behind this algorithm relies on two observations. The first observation is that, although the full range of 7-bit codes [0..127] is properly specified by the ASCII standard, most control characters in the range [0..31] are not used in practice. The only widely-used, almost universally-portable control codes are 9 (TAB), 10 (LF) and 13 (CR). There are a few more control codes that are recognized on a reduced range of platforms and text viewers/editors: 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB) and 27 (ESC); but these codes are rarely (if ever) used alone, without being accompanied by some printable text. Even the newer, portable text formats such as XML avoid using control characters outside the list mentioned here. The second observation is that most of the binary files tend to contain control characters, especially 0 (NUL). Even though the older text detection schemes observe the presence of non-ASCII codes from the range [128..255], the precision rarely has to suffer if this upper range is labeled as textual, because the files that are genuinely binary tend to contain both control characters and codes from the upper range. On the other hand, the upper range needs to be labeled as textual, because it is used by virtually all ASCII extensions. In particular, this range is used for encoding non-Latin scripts. Since there is no counting involved, other than simply observing the presence or the absence of some byte values, the algorithm produces consistent results, regardless what alphabet encoding is being used. (If counting were involved, it could be possible to obtain different results on a text encoded, say, using ISO-8859-16 versus UTF-8.) There is an extra category of plain text files that are "polluted" with one or more black-listed codes, either by mistake or by peculiar design considerations. In such cases, a scheme that tolerates a small fraction of black-listed codes would provide an increased recall (i.e. more true positives). This, however, incurs a reduced precision overall, since false positives are more likely to appear in binary files that contain large chunks of textual data. Furthermore, "polluted" plain text should be regarded as binary by general-purpose text detection schemes, because general-purpose text processing algorithms might not be applicable. Under this premise, it is safe to say that our detection method provides a near-100% recall. Experiments have been run on many files coming from various platforms and applications. We tried plain text files, system logs, source code, formatted office documents, compiled object code, etc. The results confirm the optimistic assumptions about the capabilities of this algorithm. -- Cosmin Truta Last updated: 2006-May-28 @ 1.1.1.1 log @Import gdb-7.10.1: 2015-06-30 H.J. Lu * configure.ac (ospace_frag): Enable for i?86*-*-elfiamcu target. * configure: Regenerate. 2015-05-13 John David Anglin * configure.ac: Disable configuration of GDB for HPUX targets. * configure: Regenerate. 2015-05-01 H.J. Lu PR ld/18355 * Makefile.def: Add extra_configure_flags to host zlib. * configure.ac (extra_host_zlib_configure_flags): New. Set to --enable-host-shared When bfd is to be built as shared library. AC_SUBST. * Makefile.in: Regenerated. 2015-04-15 Mike Frysinger Hans-Peter Nilsson Adjust src-release.sh for sim using the gdb create-version.sh. * src-release.sh (tar_compress): If there's a fifth parameter, use that in the getver call instead of $tool. (sim_release): Pass gdb as fifth parameter to tar_compress. (SIM_SUPPORT_DIRS): Add gdb/common/create-version.sh. 2015-04-14 Max Ostapenko * Makefile.tpl (EXTRA_HOST_EXPORTS): New variables. (EXTRA_BOOTSTRAP_FLAGS): Likewise. (check-[+module+]): Add EXTRA_HOST_EXPORTS and EXTRA_BOOTSTRAP_FLAGS. * Makefile.in: Regenerate. 2015-04-01 H.J. Lu * configure.ac: Add --with-system-zlib. * configure: Regenerated. 2015-03-31 H.J. Lu * src-release.sh: Don't configure with --with-target-subdir=. --disable-multilib. 2015-03-31 H.J. Lu * src-release.sh (DEVO_SUPPORT): Replace src-release with src-release.sh. 2015-03-30 Ed Schouten * config.sub: Update from upstream, to 2015-03-04 version. * config.guess: Likewise. 2015-03-30 H.J. Lu * Makefile.def (dependencies): Add all-zlib to all-bfd. * Makefile.in: Regenerated. 2015-03-28 H.J. Lu * src-release.sh (do_proto_toplev): Configure with --target --with-target-subdir and --disable-multilib. (BINUTILS_SUPPORT_DIRS): Add zlib. (GAS_SUPPORT_DIRS): Likewise. (GDB_SUPPORT_DIRS): Likewise. (SIM_SUPPORT_DIRS): Likewise. 2015-03-17 H.J. Lu * configure.ac (target_configdirs): Exclude target-zlib if target-libjava isn't built. * configure: Regenerated. 2015-03-17 H.J. Lu Sync with GCC 2014-06-13 Thomas Schwinge * config-ml.in: Robustify ac_configure_args parsing. 2015-03-16 H.J. Lu * Makefile.def: Updated from GCC trunk. * Makefile.tpl: Likewise. * configure.ac: Likewise. * Makefile.in: Regenerated. * configure: Likewise. 2015-01-28 James Bowman * configure.ac: Add FT32 support. * configure: Regenerate. 2015-01-12 Anthony Green * configure.ac: Don't disable gprof for moxie. * configure: Rebuild. @ text @@ 1.1.1.2 log @Import gdb-13.2 over gdb-11.0.50 May 27th, 2023: GDB 13.2 Released! The latest version of GDB, version 13.2, is available for download. This is a minor corrective release over GDB 13.1, fixing the following issues: PR testsuite/30158 (rustc testsuite fails with 13.1, apparently worked before with trunk 20230114 on i686-linux-gnu and powerpc64le-linux-gnu) PR gdb/30214 (GDB 13.1 does not compile on FreeBSD 13.1) PR gdb/30240 ((linux/aarch) thread.c:86: internal-error: inferior_thread: Assertion `current_thread_ != nullptr' failed) PR gdb/30249 ([13 regression] hookpost-extended-remote will not work) PR exp/30271 (Addresses of static thread_local fields are badly calculated sometimes) PR symtab/30357 (Segmentation fault for the 'start' command) PR symtab/30369 ([gdb/symtab] False match issue in skip_prologue_using_linetable) PR gdb/30423 (Build failures with clang 16) PR build/30450 (Build failure (linux-low.cc:5393:45: error: expected ':' before ')' token) with musl-1.2.4) See the NEWS file for a more complete and detailed list of what this release includes. Feb 19th, 2023: GDB 13.1 Released! The latest version of GDB, version 13.1, is available for download. This version of GDB includes the following changes and enhancements: Support for the following new targets has been added in both GDB and GDBserver: GNU/Linux/LoongArch (gdbserver) loongarch*-*-linux* GNU/Linux/CSKY (gdbserver) csky*-*linux* The Windows native target now supports target async. FreeBSD: Arm and AArch64: Support for Thread Local Storage (TLS) variables Hardware watchpoint support on AArch64 FreeBSD Floating-point support has now been added on LoongArch GNU/Linux. New commands: set print nibbles [on|off] show print nibbles This controls whether the 'print/t' command will display binary values in groups of four bits, known as "nibbles". The default is 'off'. Various styling-related commands. See the gdb/NEWS file for more details. Various maintenance commands. These are normally aimed at GDB experts or developers. See the gdb/NEWS file for more details. Python API improvements: New Python API for instruction disassembly. The new attribute 'locations' of gdb.Breakpoint returns a list of gdb.BreakpointLocation objects specifying the locations where the breakpoint is inserted into the debuggee. New Python type gdb.BreakpointLocation. New function gdb.format_address(ADDRESS, PROGSPACE, ARCHITECTURE) that formats ADDRESS as 'address ' New function gdb.current_language that returns the name of the current language. Unlike gdb.parameter('language'), this will never return 'auto'. New function gdb.print_options that returns a dictionary of the prevailing print options, in the form accepted by gdb.Value.format_string. New method gdb.Frame.language that returns the name of the frame's language. gdb.Value.format_string now uses the format provided by 'print', if it is called during a 'print' or other similar operation. gdb.Value.format_string now accepts the 'summary' keyword. This can be used to request a shorter representation of a value, the way that 'set print frame-arguments scalars' does. The gdb.register_window_type method now restricts the set of acceptable window names. The first character of a window's name must start with a character in the set [a-zA-Z], every subsequent character of a window's name must be in the set [-_.a-zA-Z0-9]. ` GDB/MI changes: MI version 1 is deprecated, and will be removed in GDB 14. The async record stating the stopped reason 'breakpoint-hit' now contains an optional field locno. Miscellaneous improvements: gdb now supports zstd compressed debug sections (ELFCOMPRESS_ZSTD) for ELF. New convenience variable $_inferior_thread_count contains the number of live threads in the current inferior. New convenience variables $_hit_bpnum and $_hit_locno, set to the breakpoint number and the breakpoint location number of the breakpoint last hit. The "info breakpoints" now displays enabled breakpoint locations of disabled breakpoints as in the "y-" state. The format of 'disassemble /r' and 'record instruction-history /r' has changed to match the layout of GNU objdump when disassembling. A new format "/b" has been introduce to provide the old behavior of "/r". The TUI no longer styles the source and assembly code highlighted by the current position indicator by default. You can however re-enable styling using the new "set style tui-current-position" command. It is now possible to use the "document" command to document user-defined commands. Support for memory tag data for AArch64 MTE. Support Removal notices: DBX mode has been removed. Support for building against Python version 2 has been removed. It is now only possible to build GDB against Python 3. Support for the following commands has been removed: set debug aix-solib on|off show debug aix-solib set debug solib-frv on|off show debug solib-frv Use the "set/show debug solib" commands instead. See the NEWS file for a more complete and detailed list of what this release includes. Dec 18th, 2022: GDB 13 branch created The GDB 13 branch (gdb-13-branch) has been created. To check out a copy of the branch use: git clone --branch gdb-13-branch https://sourceware.org/git/binutils-gdb.git May 1st, 2022: GDB 12.1 Released! The latest version of GDB, version 12.1, is available for download. This version of GDB includes the following changes and enhancements: New support for the following native configuration: GNU/Linux/OpenRISC or1k*-*-linux* New support for the following targets: GNU/Linux/LoongArch loongarch*-*-linux* New GDBserver support on the following configuration: GNU/Linux/OpenRISC or1k*-*-linux* Support for the following target has been removed: S+core score-*-* Multithreaded symbol loading is now enabled by default Deprecation Notices: GDB 12 is the last release of GDB that will support building against Python 2 DBX mode is deprecated, and will be removed in GDB 13 GDB/MI changes: The '-add-inferior' with no option flags now inherits the connection of the current inferior, this restores the behaviour of GDB as it was prior to GDB 10. The '-add-inferior' command now accepts a '--no-connection' option, which causes the new inferior to start without a connection. Python API enhancements: It is now possible to add GDB/MI commands implemented in Python New function gdb.Architecture.integer_type() New gdb.events.gdb_exiting event New 'gdb.events.connection_removed' event registry New gdb.TargetConnection object New gdb.Inferior.connection property New read-only attribute gdb.InferiorThread.details New gdb.RemoteTargetConnection.send_packet method New read-only attributes gdb.Type.is_scalar and gdb.Type.is_signed The gdb.Value.format_string method now takes a 'styling' argument Various new function in the "gdb" module Miscellaneous: The FreeBSD native target now supports async mode Improved C++ template support Support for disabling source highlighting through GNU of the Pygments library instead. The "print" command has been changed so as to print floating-point values with a base-modifying formats such as "/x" to display the underlying bytes of the value in the desired base. The "clone-inferior" command now ensures that the TTY, CMD and ARGS settings are copied from the original inferior to the new one. All modifications to the environment variables done using the 'set environment' or 'unset environment' commands are also copied to the new inferior. Various new commands have been introduced See the NEWS file for a more complete and detailed list of what this release includes. Mar 20th, 2022: GDB 12 branch created The GDB 12 branch (gdb-12-branch) has been created. To check out a copy of the branch use: git clone --branch gdb-12-branch https://sourceware.org/git/binutils-gdb.git January 16th, 2022: GDB 11.2 Released! The latest version of GDB, version 11.2, is available for download. This is a minor corrective release over GDB 11.1, fixing the following issues: PR sim/28302 (gdb fails to build with glibc 2.34) PR build/28318 (std::thread support configure check does not use CXX_DIALECT) PR gdb/28405 (arm-none-eabi: internal-error: ptid_t remote_target::select_thread_for_ambiguous_stop_reply(const target_waitstatus*): Assertion `first_resumed_thread != nullptr' failed) PR tui/28483 ([gdb/tui] breakpoint creation not displayed) PR build/28555 (uclibc compile failure since commit 4655f8509fd44e6efabefa373650d9982ff37fd6) PR rust/28637 (Rust characters will be encoded using DW_ATE_UTF) PR gdb/28758 (GDB 11 doesn't work correctly on binaries with a SHT_RELR (.relr.dyn) section) PR gdb/28785 (Support SHT_RELR (.relr.dyn) section) See the NEWS file for a more complete and detailed list of what this release includes. September 13th, 2021: GDB 11.1 Released! The latest version of GDB, version 11.1, is available for download. This version of GDB includes the following changes and enhancements: Support for ARM Symbian (arm*-*-symbianelf*) has been removed. Building GDB now requires GMP (The GNU Multiple Precision Arithmetic Library). New command-line options "--early-init-command" (or "-eix") and "--early-init-eval-command" (or "-eiex") GDB/MI Changes: New --qualified option for the '-break-insert' and '-dprintf-insert' commands. New --force-condition option for the '-break-insert' and '-dprintf-insert' commands. New --force option for the '-break-condition' command. The '-file-list-exec-source-files' now accepts an optional regular expression to filter the source files included in the result. The results from '-file-list-exec-source-files' now include a 'debug-fully-read' field to indicate if the corresponding source's debugging information has been partially read (false) or has been fully read (true). TUI Improvements: Mouse actions are now supported. The mouse wheel scrolls the appropriate window. Key combinations that do not have a specific action on the focused window are now passed to GDB. Python enhancements: Inferior objects now contain a read-only 'connection_num' attribute that gives the connection number as seen in 'info connections' and 'info inferiors'. New method gdb.Frame.level() which returns the stack level of the frame object. New method gdb.PendingFrame.level() which returns the stack level of the frame object. When hitting a catchpoint, the Python API will now emit a gdb.BreakpointEvent rather than a gdb.StopEvent. The gdb.Breakpoint attached to the event will have type BP_CATCHPOINT. Python TUI windows can now receive mouse click events. If the Window object implements the click method, it is called for each mouse click event in this window. New setting "python ignore-environment on|off"; if "on", causes GDB's builtin Python to ignore any environment variable that would otherwise affect how Python behaves (needs to be set during "early initialization" (see above). New setting "python dont-write-bytecode auto|on|off". Guile API enhancements: Improved support for rvalue reference values. New procedures for obtaining value variants: value-reference-value, value-rvalue-reference-value and value-const-value. New "qMemTags" and "QMemTags" remote protocol packets (associated with Memory Tagging). GDB will now look for the .gdbinit file in a config directory before looking for ~/.gdbinit. The file is searched for in the following locations: $XDG_CONFIG_HOME/gdb/gdbinit, $HOME/.config/gdb/gdbinit, $HOME/.gdbinit. On Apple hosts the search order is instead: $HOME/Library/Preferences/gdb/gdbinit, $HOME/.gdbinit. The "break [...] if CONDITION" command no longer returns an error when the condition is invalid at one or more locations. Instead, if the condition is valid at one or more locations, the locations where the condition is not valid are disabled. The behavior of the "condition" command is changed to match the new behavior of the "break" command. Support for general memory tagging functionality (currently limited to AArch64 MTE) Core file debugging now supported for x86_64 Cygwin programs. New "org.gnu.gdb.riscv.vector" feature for RISC-V targets. GDB now supports fixed point types which are described in DWARF as base types with a fixed-point encoding. Additionally, support for the DW_AT_GNU_numerator and DW_AT_GNU_denominator has also been added. Miscellaneous: New "startup-quietly on|off" setting; when "on", behaves the same as passing the "-silent" option on the command line. New "print type hex on|off" setting; when 'on', the 'ptype' command uses hexadecimal notation to print sizes and offsets of struct members. When 'off', decimal notation is used. The "inferior" command, when run without argument, prints information about the current inferior. The "ptype" command now supports "/x" and "/d", affecting the base used to print sizes and offsets. The output of the "info source" has been restructured. New "style version foreground | background | intensity" commands to control the styling of the GDB version number. Various debug and maintenance commands (mostly useful for the GDB developers) See the NEWS file for a more complete and detailed list of what this release includes. @ text @d41 1 a41 1 - The allow list of textual bytecodes: d45 1 a45 1 - The block list of undesired, non-textual bytecodes: d48 2 a49 2 If a file contains at least one byte that belongs to the allow list and no byte that belongs to the block list, then the file is categorized as d87 1 a87 1 one or more block-listed codes, either by mistake or by peculiar design d89 1 a89 1 of block-listed codes would provide an increased recall (i.e. more true @