summaryrefslogtreecommitdiff
path: root/t/t0028-working-tree-encoding.sh
diff options
context:
space:
mode:
authorTorsten Bögershausen <tboegi@web.de>2019-01-30 15:01:52 (GMT)
committerJunio C Hamano <gitster@pobox.com>2019-01-31 18:27:52 (GMT)
commitaab2a1ae48ff65781a5379a01a4abb4f75e5641d (patch)
treed194277526a0050dff89f418ef2d154999eca31d /t/t0028-working-tree-encoding.sh
parent0d0ac3826a3bbb9247e39e12623bbcfdd722f24c (diff)
downloadgit-aab2a1ae48ff65781a5379a01a4abb4f75e5641d.zip
git-aab2a1ae48ff65781a5379a01a4abb4f75e5641d.tar.gz
git-aab2a1ae48ff65781a5379a01a4abb4f75e5641d.tar.bz2
Support working-tree-encoding "UTF-16LE-BOM"
Users who want UTF-16 files in the working tree set the .gitattributes like this: test.txt working-tree-encoding=UTF-16 The unicode standard itself defines 3 allowed ways how to encode UTF-16. The following 3 versions convert all back to 'g' 'i' 't' in UTF-8: a) UTF-16, without BOM, big endian: $ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 0000000 g i t b) UTF-16, with BOM, little endian: $ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c 0000000 g i t c) UTF-16, with BOM, big endian: $ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 0000000 g i t Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the working tree. After a checkout, the resulting file has a BOM and is encoded in "UTF-16", in the version (c) above. This is what iconv generates, more details follow below. iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE: d) UTF-16 $ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c 0000000 376 377 \0 g \0 i \0 t e) UTF-16LE $ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c 0000000 g \0 i \0 t \0 f) UTF-16BE $ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c 0000000 \0 g \0 i \0 t There is no way to generate version (b) from above in a Git working tree, but that is what some applications need. (All fully unicode aware applications should be able to read all 3 variants, but in practise we are not there yet). When producing UTF-16 as an output, iconv generates the big endian version with a BOM. (big endian is probably chosen for historical reasons). iconv can produce UTF-16 files with little endianess by using "UTF-16LE" as encoding, and that file does not have a BOM. Not all users (especially under Windows) are happy with this. Some tools are not fully unicode aware and can only handle version (b). Today there is no way to produce version (b) with iconv (or libiconv). Looking into the history of iconv, it seems as if version (c) will be used in all future iconv versions (for compatibility reasons). Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM". libiconv can not handle the encoding, so Git pick it up, handles the BOM and uses libiconv to convert the rest of the stream. (UTF-16BE-BOM is added for consistency) Rported-by: Adrián Gimeno Balaguer <adrigibal@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t0028-working-tree-encoding.sh')
-rwxr-xr-xt/t0028-working-tree-encoding.sh12
1 files changed, 11 insertions, 1 deletions
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 7e87b5a..e58ecbf 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -11,9 +11,12 @@ test_expect_success 'setup test files' '
text="hallo there!\ncan you read me?" &&
echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes &&
+ echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes &&
printf "$text" >test.utf8.raw &&
printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw &&
printf "$text" | iconv -f UTF-8 -t UTF-32 >test.utf32.raw &&
+ printf "\377\376" >test.utf16lebom.raw &&
+ printf "$text" | iconv -f UTF-8 -t UTF-32LE >>test.utf16lebom.raw &&
# Line ending tests
printf "one\ntwo\nthree\n" >lf.utf8.raw &&
@@ -32,7 +35,8 @@ test_expect_success 'setup test files' '
# Add only UTF-16 file, we will add the UTF-32 file later
cp test.utf16.raw test.utf16 &&
cp test.utf32.raw test.utf32 &&
- git add .gitattributes test.utf16 &&
+ cp test.utf16lebom.raw test.utf16lebom &&
+ git add .gitattributes test.utf16 test.utf16lebom &&
git commit -m initial
'
@@ -51,6 +55,12 @@ test_expect_success 're-encode to UTF-16 on checkout' '
test_cmp_bin test.utf16.raw test.utf16
'
+test_expect_success 're-encode to UTF-16-LE-BOM on checkout' '
+ rm test.utf16lebom &&
+ git checkout test.utf16lebom &&
+ test_cmp_bin test.utf16lebom.raw test.utf16lebom
+'
+
test_expect_success 'check $GIT_DIR/info/attributes support' '
test_when_finished "rm -f test.utf32.git" &&
test_when_finished "git reset --hard HEAD" &&