From 131b8fcbfbea0acee8d78a8f9b2b3fee4285aee5 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Sat, 26 Jan 2013 17:40:38 -0500
Subject: fetch: run gc --auto after fetching

We generally try to run "gc --auto" after any commands that
might introduce a large number of new objects. An obvious
place to do so is after running "fetch", which may introduce
new loose objects or packs (depending on the size of the
fetch).

While an active developer repository will probably
eventually trigger a "gc --auto" on another action (e.g.,
git-rebase), there are two good reasons why it is nicer to
do it at fetch time:

  1. Read-only repositories which track an upstream (e.g., a
     continuous integration server which fetches and builds,
     but never makes new commits) will accrue loose objects
     and small packs, but never coalesce them into a more
     efficient larger pack.

  2. Fetching is often already perceived to be slow to the
     user, since they have to wait on the network. It's much
     more pleasant to include a potentially slow auto-gc as
     part of the already-long network fetch than in the
     middle of productive work with git-rebase or similar.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f483352..cffa3f7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -960,6 +960,9 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	struct string_list list = STRING_LIST_INIT_NODUP;
 	struct remote *remote;
 	int result = 0;
+	static const char *argv_gc_auto[] = {
+		"gc", "--auto", NULL,
+	};
 
 	packet_trace_identity("fetch");
 
@@ -1027,5 +1030,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	list.strdup_strings = 1;
 	string_list_clear(&list, 0);
 
+	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
+
 	return result;
 }
-- 
cgit v0.10.2-6-g49f6


From b495697b8228fdf7cde426ff99fc5ffe7e43d343 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Sat, 26 Jan 2013 17:40:43 -0500
Subject: fetch-pack: avoid repeatedly re-scanning pack directory

When we look up a sha1 object for reading via parse_object() =>
read_sha1_file() => read_object() callpath, we first check
packfiles, and then loose objects. If we still haven't found it, we
re-scan the list of packfiles in `objects/pack`. This final step
ensures that we can co-exist with a simultaneous repack process
which creates a new pack and then prunes the old object.

This extra re-scan usually does not have a performance impact for
two reasons:

  1. If an object is missing, then typically the re-scan will find a
     new pack, then no more misses will occur.  Or if it truly is
     missing, then our next step is usually to die().

  2. Re-scanning is cheap enough that we do not even notice.

However, these do not always hold. The assumption in (1) is that the
caller is expecting to find the object. This is usually the case,
but the call to `parse_object` in `everything_local` does not follow
this pattern. It is looking to see whether we have objects that the
remote side is advertising, not something we expect to
have. Therefore if we are fetching from a remote which has many refs
pointing to objects we do not have, we may end up re-scanning the
pack directory many times.

Even with this extra re-scanning, the impact is often not noticeable
due to (2); we just readdir() the packs directory and skip any packs
that are already loaded. However, if there are a large number of
packs, even enumerating the directory can be expensive, especially
if we do it repeatedly.

Having this many packs is a good sign the user should run `git gc`,
but it would still be nice to avoid having to scan the directory at
all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index fdda36f..827765d 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -610,6 +610,9 @@ static int everything_local(struct ref **refs, int nr_match, char **match)
 	for (ref = *refs; ref; ref = ref->next) {
 		struct object *o;
 
+		if (!has_sha1_file(ref->old_sha1))
+			continue;
+
 		o = parse_object(ref->old_sha1);
 		if (!o)
 			continue;
-- 
cgit v0.10.2-6-g49f6