From ad7c543e3b0f80befd26f4115f8fec4285a018bf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Wed, 28 Aug 2019 07:54:44 -0700 Subject: grep: skip UTF8 checks explicitly MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was released with git 2.10 added the PCRE_UTF8 flag to PCRE1 matching including a call to has_non_ascii() to try to avoid breakage if there was non-utf8 encoded content in the haystack. Usually PCRE is compiled with JIT support (even if is not the default), and therefore the codepath used includes calling pcre_jit_exec, which skips UTF-8 validation by design (which might result in crashes or hangs) but when JIT support wasn't compiled we use pcre_exec instead with the posibility that grep might be aborted if invalid UTF-8 is found in the haystack. PCRE1 provides a flag since Mar 5, 2007 that could be used to skip the checks explicitly so use that to make both codepaths equivalent (the flag is ignored by pcre1_jit_exec) this fix is only implemented for PCRE1 because PCRE2 is likely to have a better solution (without the risks) instead in the future Helped-by: Johannes Schindelin Helped-by: Eric Sunshine Helped-by: Ævar Arnfjörð Bjarmason Suggested-by: Junio C Hamano Signed-off-by: Carlo Marcelo Arenas Belón Signed-off-by: Junio C Hamano diff --git a/grep.c b/grep.c index f7c3a58..b651506 100644 --- a/grep.c +++ b/grep.c @@ -421,7 +421,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { - int ovector[30], ret, flags = 0; + int ovector[30], ret, flags = PCRE_NO_UTF8_CHECK; if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; diff --git a/grep.h b/grep.h index 1875880..9c8797a 100644 --- a/grep.h +++ b/grep.h @@ -3,6 +3,9 @@ #include "color.h" #ifdef USE_LIBPCRE1 #include +#ifndef PCRE_NO_UTF8_CHECK +#define PCRE_NO_UTF8_CHECK 0 +#endif #ifdef PCRE_CONFIG_JIT #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 #ifndef NO_LIBPCRE1_JIT -- cgit v0.10.2-6-g49f6