summaryrefslogtreecommitdiff
path: root/grep.c
diff options
context:
space:
mode:
authorRené Scharfe <l.s.r@web.de>2021-12-18 19:50:02 (GMT)
committerJunio C Hamano <gitster@pobox.com>2021-12-20 20:45:02 (GMT)
commitdc2c44fbb100fa609174d9069a70e2b54b0591ca (patch)
treeed54967a4db81bca347a379d243850586a035c80 /grep.c
parente9d7761bb94f20acc98824275e317fa82436c25d (diff)
downloadgit-dc2c44fbb100fa609174d9069a70e2b54b0591ca.zip
git-dc2c44fbb100fa609174d9069a70e2b54b0591ca.tar.gz
git-dc2c44fbb100fa609174d9069a70e2b54b0591ca.tar.bz2
grep/pcre2: use PCRE2_UTF even with ASCII patterns
compile_pcre2_pattern() currently uses the option PCRE2_UTF only for patterns with non-ASCII characters. Patterns with ASCII wildcards can match non-ASCII strings, though. Without that option PCRE2 mishandles UTF-8 input, though -- it matches parts of multi-byte characters. Fix that by using PCRE2_UTF even for ASCII-only patterns. This is a remake of the reverted ae39ba431a (grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data, 2021-10-15). The change to the condition and the test are simplified and more targeted. Original-patch-by: Hamza Mahfooz <someguy@effective-light.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'grep.c')
-rw-r--r--grep.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/grep.c b/grep.c
index fe847a0..5badb6d 100644
--- a/grep.c
+++ b/grep.c
@@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
}
options |= PCRE2_CASELESS;
}
- if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
+ if (!opt->ignore_locale && is_utf8_locale() &&
!(!opt->ignore_case && (p->fixed || p->is_fixed)))
options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);