summaryrefslogtreecommitdiff
path: root/split-index.h
AgeCommit message (Collapse)Author
2014-06-13split-index: the reading partNguyễn Thái Ngọc Duy
CE_REMOVE'd entries are removed here because only parts of the code base (unpack_trees in fact) test this bit when they look for the presence of an entry. Leaving them may confuse the code ignores this bit and expects to see a real entry. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13split-index: the writing partNguyễn Thái Ngọc Duy
prepare_to_write_split_index() does the major work, classifying deleted, updated and added entries. write_link_extension() then just writes it down. An observation is, deleting an entry, then adding it back is recorded as "entry X is deleted, entry X is added", not "entry X is replaced". This is simpler, with small overhead: a replaced entry is stored without its path, a new entry is store with its path. A note about unpack_trees() and the deduplication code inside prepare_to_write_split_index(). Usually tracking updated/removed entries via read-cache API is enough. unpack_trees() manipulates the index in a different way: it throws the entire source index out, builds up a new one, copying/duplicating entries (using dup_entry) from the source index over if necessary, then returns the new index. A naive solution would be marking the entire source index "deleted" and add their duplicates as new. That could bring $GIT_DIR/index back to the original size. So we try harder and memcmp() between the original and the duplicate to see if it needs updating. We could avoid memcmp() too, by avoiding duplicating the original entry in dup_entry(). The performance gain this way is within noise level and it complicates unpack-trees.c. So memcmp() is the preferred way to deal with deduplication. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13read-cache: mark updated entries for split indexNguyễn Thái Ngọc Duy
The large part of this patch just follows CE_ENTRY_CHANGED marks. replace_index_entry() is updated to update split_index->base->cache[] as well so base->cache[] does not reference to a freed entry. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13read-cache: save deleted entries in split indexNguyễn Thái Ngọc Duy
Entries that belong to the base index should not be freed. Mark CE_REMOVE to track them. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13read-cache: split-index modeNguyễn Thái Ngọc Duy
This split-index mode is designed to keep write cost proportional to the number of changes the user has made, not the size of the work tree. (Read cost is another matter, to be dealt separately.) This mode stores index info in a pair of $GIT_DIR/index and $GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over time while "index" is smaller and updated often. Format details are in index-format.txt, although not everything is implemented in this patch. Shared indexes are not automatically removed, because it's unclear if the shared index is needed by any (even temporary) indexes by just looking at it. After a while you'll collect stale shared indexes. The good news is one shared index is useable for long, until $GIT_DIR/index becomes too big and sluggish that the new shared index must be created. The safest way to clean shared indexes is to turn off split index mode, so shared files are all garbage, delete them all, then turn on split index mode again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>