summaryrefslogtreecommitdiff
path: root/compat/regex
AgeCommit message (Collapse)Author
2020-04-27compat/regex: move stdlib.h up in inclusion chainĐoàn Trần Công Danh
In Linux with musl libc, we have this inclusion chain: compat/regex/regex.c:69 `-> compat/regex/regex_internal.h `-> /usr/include/stdlib.h `-> /usr/include/features.h `-> /usr/include/alloca.h In that inclusion chain, `<features.h>` claims it's _BSD_SOURCE compatible when it's NOT asked to be either {_POSIX,_GNU,_XOPEN,_BSD}_SOURCE, or __STRICT_ANSI__. And, `<stdlib.h>` will include `<alloca.h>` to be compatible with software written for GNU and BSD. Thus, redefine `alloca` macro, which was defined before at compat/regex/regex.c:66. Considering this is only compat code, we've taken from other project, it's not our business to decide which source should we adhere to. Include `<stdlib.h>` early to prevent the redefinition of alloca. This also remove a potential warning about alloca not defined on: #undef alloca Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30Merge branch 'jk/asan-build-fix'Junio C Hamano
Work around test breakages caused by custom regex engine used in libasan, when address sanitizer is used with more recent versions of gcc and clang. * jk/asan-build-fix: Makefile: use compat regex with SANITIZE=address
2020-01-16Makefile: use compat regex with SANITIZE=addressJeff King
Recent versions of the gcc and clang Address Sanitizer produce test failures related to regexec(). This triggers with gcc-10 and clang-8 (but not gcc-9 nor clang-7). Running: make CC=gcc-10 SANITIZE=address test results in failures in t4018, t3206, and t4062. The cause seems to be that when built with ASan, we use a different version of regexec() than normal. And this version doesn't understand the REG_STARTEND flag. Here's my evidence supporting that. The failure in t4062 is an ASan warning: expecting success of 4062.2 '-G matches': git diff --name-only -G "^(0{64}){64}$" HEAD^ >out && test 4096-zeroes.txt = "$(cat out)" ================================================================= ==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220 READ of size 4097 at 0x7fa76f672000 thread T0 #0 0x7fa7726f75b5 (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5) #1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117 #2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52 #3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166 [...] In this case we're looking in a buffer which was mmap'd via reuse_worktree_file(), and whose size is 4096 bytes. But libasan's regex tries to look at byte 4097 anyway! If we tweak Git like this: diff --git a/diff.c b/diff.c index 8e2914c031..cfae60c120 100644 --- a/diff.c +++ b/diff.c @@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate, */ if (ce_uptodate(ce) || (!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0))) - return 1; + return 0; return 0; } to use a regular buffer (with a trailing NUL) instead of an mmap, then the complaint goes away. The other failures are actually diff output with an incorrect funcname header. If I instrument xdiff to show the funcname matching like so: diff --git a/xdiff-interface.c b/xdiff-interface.c index 8509f9ea22..f6c3dc1986 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -197,6 +197,7 @@ struct ff_regs { struct ff_reg { regex_t re; int negate; + char *printable; } *array; }; @@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len, for (i = 0; i < regs->nr; i++) { struct ff_reg *reg = regs->array + i; - if (!regexec_buf(&reg->re, line, len, 2, pmatch, 0)) { + int ret = regexec_buf(&reg->re, line, len, 2, pmatch, 0); + warning("regexec %s:\n regex: %s\n buf: %.*s", + ret == 0 ? "matched" : "did not match", + reg->printable, + (int)len, line); + if (!ret) { if (reg->negate) return -1; break; @@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags) expression = value; if (regcomp(&reg->re, expression, cflags)) die("Invalid regexp to look for hunk header: %s", expression); + reg->printable = xstrdup(expression); free(buffer); value = ep + 1; } then when compiling with ASan and gcc-10, running the diff from t4018.66 produces this: $ git diff -U1 cpp-skip-access-specifiers warning: regexec did not match: regex: ^[ ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*]) buf: private: warning: regexec matched: regex: ^((::[[:space:]]*)?[A-Za-z_].*)$ buf: private: diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers index 4d4a9db..ebd6f42 100644 --- a/cpp-skip-access-specifiers +++ b/cpp-skip-access-specifiers @@ -6,3 +6,3 @@ private: void DoSomething(); int ChangeMe; }; void DoSomething(); - int ChangeMe; + int IWasChanged; }; That first regex should match (and is negated, so it should be telling us _not_ to match "private:"). But it wouldn't if regexec() is looking at the whole buffer, and not just the length-limited line we've fed to regexec_buf(). So this is consistent again with REG_STARTEND being ignored. The correct output (compiling without ASan, or gcc-9 with Asan) looks like this: warning: regexec matched: regex: ^[ ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*]) buf: private: [...more lines that we end up not using...] warning: regexec matched: regex: ^((::[[:space:]]*)?[A-Za-z_].*)$ buf: class RIGHT : public Baseclass diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers index 4d4a9db..ebd6f42 100644 --- a/cpp-skip-access-specifiers +++ b/cpp-skip-access-specifiers @@ -6,3 +6,3 @@ class RIGHT : public Baseclass void DoSomething(); - int ChangeMe; + int IWasChanged; }; So it really does seem like libasan's regex engine is ignoring REG_STARTEND. We should be able to work around it by compiling with NO_REGEX, which would use our local regexec(). But to make matters even more interesting, this isn't enough by itself. Because ASan has support from the compiler, it doesn't seem to intercept our call to regexec() at the dynamic library level. It actually recognizes when we are compiling a call to regexec() and replaces it with ASan-specific code at that point. And unlike most of our other compat code, where we might have git_mmap() or similar, the actual symbol name in the compiled compat/regex code is regexec(). So just compiling with NO_REGEX isn't enough; we still end up in libasan! We can work around that by having the preprocessor replace regexec with git_regexec (both in the callers and in the actual implementation), and we truly end up with a call to our custom regex code, even when compiling with ASan. That's probably a good thing to do anyway, as it means anybody looking at the symbols later (e.g., in a debugger) would have a better indication of which function is which. So we'll do the same for the other common regex functions (even though just regexec() is enough to fix this ASan problem). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Fix spelling errors in no-longer-updated-from-upstream modulesElijah Newren
We have several modules originally taken from some upstream source, and which as far as I can tell we no longer update from the upstream anymore. As such, I have not submitted these spelling fixes to any external projects but just include them directly here. Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-03compat/regex/regcomp.c: define intptr_t and uintptr_t on NonStopRandall S. Becker
The system definition header files on HPE NonStop do not define intptr_t and uintptr_t as do other platforms. These typedefs are added specifically wrapped in a __TANDEM ifdef. Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-09Replace Free Software Foundation address in license noticesTodd Zullinger
The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to gnu.org, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-09cleanup: fix possible overflow errors in binary searchDerrick Stolee
A common mistake when writing binary search is to allow possible integer overflow by using the simple average: mid = (min + max) / 2; Instead, use the overflow-safe version: mid = min + (max - min) / 2; This translation is safe since the operation occurs inside a loop conditioned on "min < max". The included changes were found using the following git grep: git grep '/ *2;' '*.c' Making this cleanup will prevent future review friction when a new binary search is contructed based on existing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-27Spelling fixesVille Skyttä
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-07regex: fix a SIZE_MAX macro redefinition warningRamsay Jones
Since commit 56a1a3ab ("Silence GCC's \"cast of pointer to integer of a different size\" warning", 26-10-2015), sparse has been issuing a macro redefinition warning for the SIZE_MAX macro. However, gcc did not issue any such warning. After commit 56a1a3ab, in terms of the order of #includes and #defines, the code looked something like: $ cat -n junk.c 1 #include <stddef.h> 2 3 #define SIZE_MAX ((size_t) -1) 4 5 #include <stdint.h> 6 7 int main(int argc, char *argv[]) 8 { 9 return 0; 10 } $ $ gcc junk.c $ However, if you compile that file with -Wsystem-headers, then it will also issue a warning. Having set -Wsystem-headers in CFLAGS, using the config.mak file, then (on cygwin): $ make compat/regex/regex.o CC compat/regex/regex.o In file included from /usr/lib/gcc/x86_64-pc-cygwin/4.9.3/include/stdint.h:9:0, from compat/regex/regcomp.c:21, from compat/regex/regex.c:77: /usr/include/stdint.h:362:0: warning: "SIZE_MAX" redefined #define SIZE_MAX (__SIZE_MAX__) ^ In file included from compat/regex/regex.c:69:0: compat/regex/regex_internal.h:108:0: note: this is the location of the previous definition # define SIZE_MAX ((size_t) -1) ^ $ The compilation of the compat/regex code is somewhat unusual in that the regex.c file directly #includes the other c files (regcomp.c, regexec.c and regex_internal.c). Commit 56a1a3ab added an #include of <stdint.h> to the regcomp.c file, which results in the redefinition, since this is included after the regex_internal.h header. This header file contains a 'fallback' definition for SIZE_MAX, in order to support systems which do not have the <stdint.h> header (the HAVE_STDINT_H macro is not defined). In order to suppress the warning, we move the #include of <stdint.h> from regcomp.c to the start of the compilation unit, close to the top of regex.c, prior to the #include of the regex_internal.h header. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-26Silence GCC's "cast of pointer to integer of a different size" warningJohannes Schindelin
When calculating hashes from pointers, it actually makes sense to cut off the most significant bits. In that case, said warning does not make a whole lot of sense. So let's just work around it by casting the pointer first to intptr_t and then casting up/down to the final integral type. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-16C: have space around && and || operatorsJunio C Hamano
Correct all hits from git grep -e '\(&&\|||\)[^ ]' -e '[^ ]\(&&\|||\)' -- '*.c' i.e. && or || operators that are followed by anything but a SP, or that follow something other than a SP or a HT, so that these operators have a SP around it when necessary. We usually refrain from making this kind of a tree-wide change in order to avoid unnecessary conflicts with other "real work" patches, but in this case, the end result does not have a potentially cumbersome tree-wide impact, while this is a tree-wide cleanup. Fixes to compat/regex/regcomp.c and xdiff/xemit.c are to replace a HT immediately after && with a SP. This is based on Felipe's patch to bultin/symbolic-ref.c; I did all the finding out what other files in the whole tree need to be fixed and did the fix and also the log message while reviewing that single liner, so any screw-ups in this version are mine. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-22typofix: in-code commentsOndřej Bílka
Signed-off-by: Ondřej Bílka <neleai@seznam.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-28compat/regex/regexec.c: Fix some sparse warningsRamsay Jones
Sparse issues an "Using plain integer as NULL pointer" warning along with two "symbol was not declared. Should it be static?" type warnings for the 'merge_state_with_log' and 'find_recover_state' functions. In order to suppress the warnings, we replace the use of '0' as a null pointer constant with NULL and add the static modifier to the function definitions. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12compat/regex: fix spelling and grammar in commentsStefano Lattarini
Some of these were found using Lucas De Marchi's codespell tool. Others noticed by Eric Sunshine. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-26Fix compat/regex ANSIfication on MinGWJohannes Sixt
compat/regexec.c had a weird combination of function declaration in ANSI style and function definition in K&R style, for example: static unsigned re_copy_regs (struct re_registers *regs, regmatch_t *pmatch, int nregs, int regs_allocated) internal_function; static unsigned re_copy_regs (regs, pmatch, nregs, regs_allocated) struct re_registers *regs; regmatch_t *pmatch; int nregs, regs_allocated; { ... } with this #define: #ifndef _LIBC # ifdef __i386__ # define internal_function __attribute ((regparm (3), stdcall)) # else # define internal_function # endif #endif The original version as shown above was fine, but with the ANSIfied function definition and in the case where internal_function is not empty, gcc identifies the declaration and definition as different and bails out. Adding internal_function to the definition doesn't help (it results in a syntax error); hence, remove it from the subset of declarations that gcc flags as erroneous. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-19compat/regex: get rid of old-style definitionJunio C Hamano
These files mostly used ANSI style function definitions, but with small number of old-style ones. Convert them to consistently use ANSI style. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-19compat/regex: define out variables only used under RE_ENABLE_I18NÆvar Arnfjörð Bjarmason
Wrap variables that were only used RE_ENABLE_I18N in `#ifdef RE_ENABLE_I18N`. This eliminates compiler warnings when compiling with NO_REGEX=YesPlease. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-18Change regerror() declaration from K&R style to ANSI C (C89)Frank Li
The MSVC headers typedef errcode as int, and thus confused the compiler in the K&R style definition. ANSI style deconfuses it. This patch was originally applied as v1.6.5-rc2~23 but needs to be re-applied since compat/regex was overwritten by Ævar Arnfjörð Bjarmason with the gawk regex engine. Signed-off-by: Frank Li <lznuaa@gmail.com> Signed-off-by: Marius Storm-Olsen <mstormo@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-18compat/regex: get the gawk regex engine to compile within gitÆvar Arnfjörð Bjarmason
We need to define -DGAWK -DNO_MBSUPPORT so that the gawk regex engine will compile, and include stdio.h and stddef.h in regex.h. Gawk itself includes these headers before it includes the regex.h header. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-18compat/regex: use the regex engine from gawk for compatÆvar Arnfjörð Bjarmason
Change the regex engine in compat to use the gawk engine from the gawk-devel module in gawk CVS. This engine supports the REG_STARTEND flag, which was optionally available in Git since v1.7.2-rc0~77^2~1. The source was grabbed from cvs.savannah.gnu.org:/sources/gawk, and these are the upstream versions of the files being included: regcomp.c 1.4 regex.h 1.3 regex.h 1.3 regex_internal.c 1.3 regex_internal.h 1.3 regexec.c 1.3 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-23msvc: Fix a compiler warning due to an incorrect pointer castRamsay Jones
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-19Change regerror() declaration from K&R style to ANSI C (C89)Frank Li
The MSVC headers typedef errcode as int, and thus confused the compiler in the K&R style definition. ANSI style deconfuses it. Signed-off-by: Frank Li <lznuaa@gmail.com> Signed-off-by: Marius Storm-Olsen <mstormo@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-23Fix typos / spelling in commentsMike Ralphson
Signed-off-by: Mike Ralphson <mike@abacus.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-10Use compatibility regex library for OSX/DarwinArjen Laarhoven
The standard libc regex library on OSX does not support alternation in POSIX Basic Regular Expression mode. This breaks the diff.funcname functionality on OSX. To fix this, we use the GNU regex library which is already present in the compat/ diretory for the MinGW port. However, simply adding compat/ to the COMPAT_CFLAGS variable causes a conflict between the system fnmatch.h and the one present in compat/. To remedy this, move the regex and fnmatch functionality to their own subdirectories in compat/ so they can be included seperately. Signed-off-by: Arjen Laarhoven <arjen@yaph.org> Tested-by: Mike Ralphson <mike@abacus.co.uk> (AIX) Tested-by: Johannes Sixt <johannes.sixt@telecom.at> (MinGW) Signed-off-by: Junio C Hamano <gitster@pobox.com>