path: root/object.c
AgeCommit message (Collapse)Author
2010-09-06Merge branch 'maint'v1.7.3-rc0Junio C Hamano
* maint: tag.c: whitespace breakages fix Fix whitespace issue in object.c t5505: add missing &&
2010-09-06Merge branch 'xx/trivial' into maintJunio C Hamano
* xx/trivial: tag.c: whitespace breakages fix Fix whitespace issue in object.c t5505: add missing &&
2010-09-06Fix whitespace issue in object.cJared Hance
Change some expanded tabs (spaces) to tabs in object.c. Signed-off-by: Jared Hance <> Signed-off-by: Junio C Hamano <>
2010-09-04Merge branch 'nd/maint-fix-replace'Junio C Hamano
* nd/maint-fix-replace: parse_object: pass on the original sha1, not the replaced one
2010-09-04parse_object: pass on the original sha1, not the replaced oneNguyễn Thái Ngọc Duy
Commit 0e87c36 (object: call "check_sha1_signature" with the replacement sha1) changed the first argument passed to parse_object_buffer() from "sha1" to "repl". With that change, the returned obj pointer has the replacement SHA1 in obj->sha1, not the original one. But when using lookup_commit() and then parse_commit() on a commit, we get an object pointer with the original sha1, but the commit content comes from the replacement commit. So the result we get from using parse_object() is different from the we get from using lookup_commit() followed by parse_commit(). It looks much simpler and safer to fix this inconsistency by passing "sha1" to parse_object_bufer() instead of "repl". The commit comment should be used to tell the the replacement commit is replacing another commit and why. So it should be easy to see that we have a replacement commit instead of an original one. And it is not a problem if the content of the commit is not consistent with the sha1 as cat-file piped to hash-object can be used to see the difference. Signed-off-by: Christian Couder <> Signed-off-by: Junio C Hamano <>
2010-04-20fix "bundle --stdin" segfaultJonathan Nieder
When passed an empty list, objects_array_remove_duplicates() corrupts it by changing the number of entries from 0 to 1. The problem lies in the condition of its main loop: for (ref = 0; ref < array->nr - 1; ref++) { The loop body manipulates the supplied object array. In the case of an empty array, it should not be doing anything at all. But array->nr is an unsigned quantity, so the code enters the loop, in particular increasing array->nr. Fix this by comparing (ref + 1 < array->nr) instead. This bug can be triggered by git bundle --stdin: $ echo HEAD | git bundle create some.bundle --stdin’ Segmentation fault (core dumped) The list of commits to bundle appears to be empty because of another bug: by the time the revision-walking machinery gets to look at it, standard input has already been consumed by rev-list, so this function gets an empty list of revisions. After this patch, git bundle --stdin still does not work; it just doesn’t segfault any more. Reported-by: Joey Hess <> Signed-off-by: Jonathan Nieder <> Signed-off-by: Junio C Hamano <>
2010-01-18object.c: remove unused functionsJunio C Hamano
object_list_append() and object_list_length}() are not used anywhere. Signed-off-by: Junio C Hamano <>
2009-06-01object: call "check_sha1_signature" with the replacement sha1Christian Couder
Otherwise we get a "sha1 mismatch" error for replaced objects. Signed-off-by: Christian Couder <> Signed-off-by: Junio C Hamano <>
2009-05-20Unify signedness in hashing callsDan McGee
Our hash_obj and hashtable_index calls and functions were doing a lot of funny things with signedness. Unify all of it to 'unsigned int'. Signed-off-by: Dan McGee <> Signed-off-by: Junio C Hamano <>
2009-05-17Fix type-punning issuesDan McGee
In these two places we are casting part of our unsigned char sha1 array into an unsigned int, which violates GCCs strict-aliasing rules (and probably other compilers). Signed-off-by: Dan McGee <> Signed-off-by: Junio C Hamano <>
2009-01-18bundle: allow the same ref to be given more than onceJunio C Hamano
"git bundle create x master master" used to create a bundle that lists the same branch (master) twice. Cloning from such a bundle resulted in a needless warning "warning: Duplicated ref: refs/remotes/origin/master". Signed-off-by: Junio C Hamano <>
2008-02-04parse_object_buffer: don't ignore errors from the object specific parsing ↵Martin Koegler
functions In the case of an malformed object, the object specific parsing functions would return an error, which is currently ignored. The object can be partial initialized in this case. This patch make parse_object_buffer propagate such errors. Signed-off-by: Martin Koegler <> Signed-off-by: Junio C Hamano <>
2007-12-22Don't dereference NULL upon lookup failure.Jim Meyering
Instead, signal the error just like the case we do upon encountering an object with an unknown type. Signed-off-by: Jim Meyering <> Signed-off-by: Junio C Hamano <>
2007-06-06Don't assume tree entries that are not dirs are blobsSam Vilain
When scanning the trees in track_tree_refs() there is a "lazy" test that assumes that entries are either directories or files. Don't do that. Signed-off-by: Junio C Hamano <>
2007-05-25Merge branch 'maint-1.5.1' into maintJunio C Hamano
* maint-1.5.1: fix memory leak in parse_object when check_sha1_signature fails name-rev: tolerate clock skew in committer dates
2007-05-25fix memory leak in parse_object when check_sha1_signature failsCarlos Rica
When check_sha1_signature fails, program is not terminated: it prints an error message and returns NULL, so the buffer returned by read_sha1_file should be freed before. Signed-off-by: Carlos Rica <> Signed-off-by: Junio C Hamano <>
2007-04-24add add_object_array_with_modeMartin Koegler
Each object in struct object_array is extended with the mode. If not specified, S_IFINVALID is used. An object with an mode value can be added with add_object_array_with_mode. Signed-off-by: Martin Koegler <> Signed-off-by: Junio C Hamano <>
2007-04-17Clean up object creation to use more common codeLinus Torvalds
This replaces the fairly odd "created_object()" function that did _most_ of the object setup with a more complete "create_object()" function that also has a more natural calling convention. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2007-04-17Use proper object allocators for unknown object nodes tooLinus Torvalds
We used to use a different allocator scheme for when we didn't know the object type. That meant that objects that were created without any up-front knowledge of the type would not go through the same allocation paths as normal object allocations, and would miss out on the statistics. But perhaps more importantly than the statistics (that are useful when looking at memory usage but not much else), if we want to make the object hash tables use a denser object pointer representation, we need to make sure that they all go through the same blocking allocator. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2007-03-21Don't ever return corrupt objects from "parse_object()"Linus Torvalds
Looking at the SHA1 validation code due to the corruption that Alexander Litvinov is seeing under Cygwin, I notice that one of the most central places where we read objects, we actually do end up verifying the SHA1 of the result, but then we happily parse it anyway. And using "printf" to write the error message means that it not only can get lost, but will actually mess up stdout, and cause other strange and hard-to-debug failures downstream. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2007-02-27get rid of lookup_object_type()Nicolas Pitre
This function is called only once in the whole source tree. Let's move its code inline instead, which is also in the spirit of removing as much object type char arrays as possible (not that this patch does anything for that but at least it is now a local matter). Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2007-02-27convert object type handling from a string to a numberNicolas Pitre
We currently have two parallel notation for dealing with object types in the code: a string and a numerical value. One of them is obviously redundent, and the most used one requires more stack space and a bunch of strcmp() all over the place. This is an initial step for the removal of the version using a char array found in object reading code paths. The patch is unfortunately large but there is no sane way to split it in smaller parts without breaking the system. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2007-02-27formalize typename(), and add its reverse type_from_string()Nicolas Pitre
Sometime typename() is used, sometimes type_names[] is accessed directly. Let's enforce typename() all the time which allows for validating the type. Also let's add a function to go from a name to a type and use it instead of manual memcpy() when appropriate. Signed-off-by: Nicolas Pitre <> Signed-off-by: Junio C Hamano <>
2006-09-16Add git-for-each-ref: helper for language bindingsJunio C Hamano
This adds a new command, git-for-each-ref. You can have it iterate over refs and have it output various aspects of the objects they refer to. Signed-off-by: Junio C Hamano <>
2006-08-28Use xcalloc instead of callocJonas Fonseca
Signed-off-by: Jonas Fonseca <> Signed-off-by: Junio C Hamano <>
2006-08-23Convert memcpy(a,b,20) to hashcpy(a,b).Shawn Pearce
This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void*. This is a reasonable tradeoff as most call sites already use unsigned char* and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <> Signed-off-by: Junio C Hamano <>
2006-08-17Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length.David Rientjes
Introduces global inline: hashcmp(const unsigned char *sha1, const unsigned char *sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <> Signed-off-by: David Rientjes <> Signed-off-by: Junio C Hamano <>
2006-07-13Remove TYPE_* constant macros and use object_type enums consistently.Linus Torvalds
This updates the type-enumeration constants introduced to reduce the memory footprint of "struct object" to match the type bits already used in the packfile format, by removing the former (i.e. TYPE_* constant macros) and using the latter (i.e. enum object_type) throughout the code for consistency. Eventually we can stop passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-07-02git object hash cleanupsv1.4.1Linus Torvalds
This IMNSHO cleans up the object hashing. The hash expansion is separated out into a function of its own, the hash array (and size) names are made more obvious, and the code is generally made to look a bit more like the object-ref hashing. It also gets rid of "find_object()" returning an index (or negative position if no object is found), since that is made redundant by the simplified object rehashing. The basic operation is now "lookup_object()" which just returns the object itself. There's an almost unmeasurable speed increase, but more importantly, I think the end result is more readable. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-06-30Abstract out accesses to object hash arrayLinus Torvalds
There are a few special places where some programs accessed the object hash array directly, which bothered me because I wanted to play with some simple re-organizations. So this patch makes the object hash array data structures all entirely local to object.c, and the few users who wanted to look at it now get to use a function to query how many object index entries there can be, and to actually access the array. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-06-20Add "named object array" conceptLinus Torvalds
We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-06-18Remove "refs" field from "struct object"Linus Torvalds
This shrinks "struct object" to the absolutely minimal size possible. It now contains /only/ the object flags and the SHA1 hash name of the object. The "refs" field, which is really needed only for fsck, is maintained in a separate hashed lookup-table, allowing all normal users to totally ignore it. This helps memory usage, although not as much as I hoped: it looks like the allocation overhead of malloc (and the alignment constraints in particular) means that while the structure size shrinks, the actual allocation overhead mostly does not. [ That said: memory usage is actually down, but not as much as it should be: I suspect just one of the object types actually ended up shrinking its effective allocation size. To get to the next level, we probably need specialized allocators that don't pad the allocation more than necessary. ] The separation makes for some code cleanup, though, and makes the ref tracking that fsck wants a clearly separate thing. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-06-18Shrink "struct object" a bitLinus Torvalds
This shrinks "struct object" by a small amount, by getting rid of the "struct type *" pointer and replacing it with a 3-bit bitfield instead. In addition, we merge the bitfields and the "flags" field, which incidentally should also remove a useless 4-byte padding from the object when in 64-bit mode. Now, our "struct object" is still too damn large, but it's now less obviously bloated, and of the remaining fields, only the "util" (which is not used by most things) is clearly something that should be eventually discarded. This shrinks the "git-rev-list --all" memory use by about 2.5% on the kernel archive (and, perhaps more importantly, on the larger mozilla archive). That may not sound like much, but I suspect it's more on a 64-bit platform. There are other remaining inefficiencies (the parent lists, for example, probably have horrible malloc overhead), but this was pretty obvious. Most of the patch is just changing the comparison of the "type" pointer from one of the constant string pointers to the appropriate new TYPE_xxx small integer constant. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-05-30Make "tree_entry" have a SHA1 instead of a union of object pointersLinus Torvalds
This is preparatory work for further cleanups, where we try to make tree_entry look more like the more efficient tree-walk descriptor. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-05-30Make "struct tree" contain the pointer to the tree bufferLinus Torvalds
This allows us to avoid allocating information for names etc, because we can just use the information from the tree buffer directly. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-04-04Replace xmalloc+memset(0) with xcalloc.Peter Eriksen
Signed-off-by: Peter Eriksen <> Signed-off-by: Junio C Hamano <>
2006-04-04Use blob_, commit_, tag_, and tree_type throughout.Peter Eriksen
This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <> Signed-off-by: Junio C Hamano <>
2006-02-12Fix object re-hashingLinus Torvalds
The hashed object lookup had a subtle bug in re-hashing: it did for (i = 0; i < count; i++) if (objs[i]) { .. rehash .. where "count" was the old hash couny. Oon the face of it is obvious, since it clearly re-hashes all the old objects. However, it's wrong. If the last old hash entry before re-hashing was in use (or became in use by the re-hashing), then when re-hashing could have inserted an object into the hash entries with idx >= count due to overflow. When we then rehash the last old entry, that old entry might become empty, which means that the overflow entries should be re-hashed again. In other words, the loop has to be fixed to either traverse the whole array, rather than just the old count. (There's room for a slight optimization: instead of counting all the way up, we can break when we see the first empty slot that is above the old "count". At that point we know we don't have any collissions that we might have to fix up any more. This patch only does the trivial fix) [jc: with trivial fix on trivial fix] Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2006-02-12hashtable-based objects: minimum fixups.Junio C Hamano
Calling hashtable_index from find_object before objs is created would result in division by zero failure. Avoid it. Also the given object name may not be aligned suitably for unsigned int; avoid dereferencing casted pointer. Signed-off-by: Junio C Hamano <>
2006-02-12Use a hashtable for objects instead of a sorted listJohannes Schindelin
In a simple test, this brings down the CPU time from 47 sec to 22 sec. Signed-off-by: Johannes Schindelin <> Signed-off-by: Junio C Hamano <>
2006-01-07[PATCH] Compilation: zero-length array declaration.Junio C Hamano
ISO C99 (and GCC 3.x or later) lets you write a flexible array at the end of a structure, like this: struct frotz { int xyzzy; char nitfol[]; /* more */ }; GCC 2.95 and 2.96 let you to do this with "char nitfol[0]"; unfortunately this is not allowed by ISO C90. This declares such construct like this: struct frotz { int xyzzy; char nitfol[FLEX_ARRAY]; /* more */ }; and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and empty for others. If you are using a C90 C compiler, you should be able to override this with CFLAGS=-DFLEX_ARRAY=1 from the command line of "make". Signed-off-by: Junio C Hamano <>
2005-12-07qsort() ptrdiff_t may be larger than intJunio C Hamano
Morten Welinder <> writes: > The code looks wrong. It assumes that pointers are no larger than ints. > If pointers are larger than ints, the code does not necessarily compute > a consistent ordering and qsort is allowed to do whatever it wants. > > Morten > > static int compare_object_pointers(const void *a, const void *b) > { > const struct object * const *pa = a; > const struct object * const *pb = b; > return *pa - *pb; > } Signed-off-by: Junio C Hamano <>
2005-11-15Rework object refs tracking to reduce memory usageSergey Vlasov
Store pointers to referenced objects in a variable sized array instead of linked list. This cuts down memory usage of utilities which use object references; e.g., git-fsck-objects --full on the git.git repository consumes about 2 MB of memory tracked by Massif instead of 7 MB before the change. Object refs are still the biggest consumer of memory (57%), but the malloc overhead for a single block instead of a linked list is substantially smaller. Signed-off-by: Sergey Vlasov <> Signed-off-by: Junio C Hamano <>
2005-09-16[PATCH] Avoid building object ref lists when not neededLinus Torvalds
The object parsing code builds a generic "this object references that object" because doing a full connectivity check for fsck requires it. However, nothing else really needs it, and it's quite expensive for git-rev-list that can have tons of objects in flight. So, exactly like the commit buffer save thing, add a global flag to disable it, and use it in git-rev-list. Before: $ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l 12.28user 0.29system 0:12.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+26718minor)pagefaults 0swaps 59124 After this change: $ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l 10.33user 0.18system 0:10.54elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18509minor)pagefaults 0swaps 59124 and note how the number of pages touched by git-rev-list for this particular object list has shrunk from 26,718 (104 MB) to 18,509 (72 MB). Calculating the total object difference between two git revisions is still clearly the most expensive git operation (both in memory and CPU time), but it's now less than 40% of what it used to be. Signed-off-by: Linus Torvalds <> Signed-off-by: Junio C Hamano <>
2005-09-11[PATCH] Add function to append to an object_list.Daniel Barkalow
Signed-off-by: Daniel Barkalow <> Signed-off-by: Junio C Hamano <>
2005-08-03[PATCH] Object library
Add function to look up an object which is entirely unknown, so that it can be put in a list. Various other functions related to lists of objects. Signed-off-by: Daniel Barkalow <> Signed-off-by: Junio C Hamano <>
2005-06-27[PATCH] Remove "delta" object representation.Junio C Hamano
Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <> Signed-off-by: Linus Torvalds <>
2005-06-22[PATCH] Parse tags for absent objectsDaniel Barkalow
Handle parsing a tag for a non-present object. This adds a function to lookup an object with lookup_* for * in a string, so that it can get the right storage based on the "type" line in the tag. Signed-off-by: Daniel Barkalow <> Signed-off-by: Linus Torvalds <>
2005-06-08[PATCH] Anal retentive 'const unsigned char *sha1'Jason McMullan
Make 'sha1' parameters const where possible Signed-off-by: Jason McMullan <> Signed-off-by: Linus Torvalds <>
2005-05-26Make "parse_object()" also fill in commit message buffer data.Linus Torvalds
And teach fsck to free it to save memory.