Skip to content

Turn git survey into a deprecated shim over git repo structure#6268

Open
dscho wants to merge 7 commits into
mainfrom
survey-pivot-into-git-repo
Open

Turn git survey into a deprecated shim over git repo structure#6268
dscho wants to merge 7 commits into
mainfrom
survey-pivot-into-git-repo

Conversation

@dscho

@dscho dscho commented Jun 6, 2026

Copy link
Copy Markdown
Member

git survey was always experimental, and I never got around to upstreaming it to make it non-experimental.

In the meantime, the git repo structure command was upstreamed upstream, which covers most of the same ground with a cleaner option surface and a stable output contract. This PR closes the remaining gap (annotated-tag breakdown, ref scoping, top-N paths by count/disk/inflated, and the corresponding configuration knob) and then turns git survey into a thin shim that warns about deprecation, translates its old command line into the equivalent git repo structure invocation, and re-execs the canonical command. Net result: one user-facing tool to maintain and to teach instead of two.

The intent is that scripts pinned to git survey keep working (a warning aside), and that operators have a single answer when they ask "how do I see what's making my repository large?". The survey.* configuration keys are intentionally dropped; the only one that mattered, survey.top, has a direct replacement in repo.structure.top.

Mirror what git survey already reports: lightweight tags
(pointing straight at a commit/tree/blob) and annotated tags
(pointing at an OBJ_TAG that is itself stored as a separate
object) are different things in many monorepo contexts, and one
of the differences git survey users routinely care about. Add
an annotated_tags counter to struct ref_stats, populate it in
count_references() by peeking at the ref OID's object type, and
expose it as a sub-row under Tags in the table output and as
references.tags.annotated.count in the machine-readable formats.

Step toward pivoting the standalone git survey command onto
git repo structure; this fills the first of the four feature
gaps documented in the assessment.

Tests in t1901 widened to assert the new row and key.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the survey-pivot-into-git-repo branch from dd46870 to d14deae Compare June 7, 2026 01:05
dscho added 6 commits June 7, 2026 03:19
`git repo structure` walks every reference enumerated by
`refs_for_each_ref()` and feeds each reference's tip into the path
walk that produces the object counts. There is no way to scope the
inquiry to a subset of refs, even though that is the most common
need when an operator is investigating what part of the history is
driving cost: only branches, only release tags, only one remote's
view, etc.

Add a single `--ref-filter=<pattern>` option that, when given,
restricts both the reference count and the object walk to refs whose
full name matches one of the patterns. The option is repeatable;
multiple patterns form a union, so `--ref-filter='refs/heads/*'
--ref-filter='refs/tags/v*'` includes local branches and tags whose
short name starts with `v`. Patterns use `wildmatch()` with
`WM_PATHNAME` semantics so a `*` does not cross `/`, matching the
convention used by `git for-each-ref` positional arguments.

Choosing a single flexible filter, rather than a proliferation of
per-kind flags like `--branches`, `--tags`, `--remotes`, keeps the
option surface small and lets the same mechanism express
narrow selections the per-kind flags could not, such as "only release
tags" (`'refs/tags/v*'`) or "only one remote's branches"
(`'refs/remotes/origin/*'`). Without `--ref-filter`, behaviour is
unchanged: every ref `refs_for_each_ref()` enumerates contributes.

Both the reference counter and the path-walk seeding (via
`add_pending_oid()`) sit on the same callback, so an early return
when no pattern matches naturally excludes a ref from both. No
separate object-walk machinery is needed.

Cover the two interesting code paths with tests in t1901: a single
filter narrowing to branches, and two filters unioning to include
both branches and tags.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` distinguishes itself from `git repo structure` largely by
its path-level reporting: in addition to whole-repo totals it lists the
paths whose object histories dominate the repository, ranked by raw
count, on-disk size, and inflated size, separately for trees and blobs.
That is often the most actionable output from `git survey`, since it
points an operator at the directories and files that should be reviewed
for cleanup, sparse-checkout exclusion, or rewriting.

`git repo structure` already drives the same path-walk traversal that
`git survey` uses to gather its per-path numbers; the callback simply
discards the path. Aggregate per-(path, type) summaries inside that
existing callback and add a bounded, descending-sorted "top-N" table
keyed by each of the three axes. Gate the feature behind a new
`--top=<n>` option, defaulting to 0, so unadorned invocations are
unaffected and pay no extra work for the top-N tracking.

Mirror the sort and eviction strategy from `builtin/survey.c`: keep an
array of at most N entries sorted from largest to smallest, walk it
from the bottom on each candidate, and shift entries down when a new
one belongs. Compared to `builtin/survey.c`, drop the void-pointer
indirection in the table data, type the comparator's arguments, and
fold the trivial comparators into the `(a > b) - (a < b)` idiom.

For the human-readable `table` output, extend the existing nested
bullet layout with two new top-level sections, `* Top trees` and
`* Top blobs`, each containing three sub-tables (`Top by count`,
`Top by disk size`, `Top by inflated size`). The path becomes the row
name and the relevant scalar becomes the value, reusing
`stats_table_count_addf` and `stats_table_size_addf` so units and
column alignment match the rest of the table.

For the `lines`/`nul` key-value formats, emit one
`objects.<type>.top.by_<axis>.<rank>.path=<path>` entry alongside an
`objects.<type>.top.by_<axis>.<rank>.<axis>=<value>` entry per ranked
path, so consumers can dispatch by axis without parsing the schema.
The root tree's path is the empty string as produced by the path-walk
machinery; preserve that as-is to stay faithful to the upstream
representation rather than fabricating a placeholder.

This is the first piece of folding `git survey`'s functionality into
`git repo structure`. Subsequent commits will add the corresponding
configuration knob and, eventually, turn `git survey` into a thin
deprecated shim over `git repo structure`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding commit added `--top=<n>` to `git repo structure`,
reporting the top-N paths per type ranked by count, on-disk size, and
inflated size. Cover the three behaviors that matter for that option:

  * Without `--top`, the key-value output emits no `top.*` keys, so
    existing parsers stay unaffected.

  * `--top=N` produces exactly N ranked entries on each of the six
    `objects.<type>.top.by_<axis>` axes (count/disk_size/inflated_size
    crossed with trees/blobs), and a constructed input where one blob
    is several orders of magnitude bigger than the other lets us
    assert the ordering on the disk-size and inflated-size axes.

  * A negative `--top` is rejected with a non-zero exit and a message
    naming the constraint, so a typo cannot silently degrade into the
    default zero.

Avoid grep patterns starting with `--`; grep would parse the leading
double dash as an option terminator.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` exposes its `--top` default via `survey.top` so that a
site or per-repository operator can switch the detail tables on once
and have every subsequent invocation include them. Mirror that
ergonomics for `git repo structure` so that, as `git survey`'s
functionality is folded into `git repo structure`, the configuration
side of the migration story stays equivalent.

Add a small `git_config_int` callback bound to `repo.structure.top`
and invoke it before `parse_options()`, so a `--top=<N>` on the
command line cleanly overrides the configured default (including
`--top=0` to opt out of the detail tables when configuration enables
them). Reject negative configured values with the same wording as the
command-line guard, since `git_config_int()` happily returns negative
integers.

Document the new variable in a fresh `Documentation/config/repo.adoc`
and wire it into the alphabetical includes in `Documentation/config.adoc`
between `repack.adoc` and `rerere.adoc`. Cover the precedence
behaviour with a t1901 test: a configured value enables the tables by
default, and a command-line `--top=0` suppresses them again.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` started life as an experimental scale-measurement tool;
the preceding commits give `git repo structure` the path-level detail
tables and ref-scoping mechanism that were `git survey`'s main draw,
so the two now overlap substantially. Plan the migration explicitly:
add a short notice at the top of the description making clear which
of `git survey`'s knobs map to which `git repo structure` option, and
state that a future release will turn `git survey` into a thin shim
over `git repo structure`.

Putting the notice in the description (rather than only the synopsis)
ensures it shows up in `git help survey` rendering before the reader
sees any option specifics, so an operator skimming the page learns
about the replacement before adopting any survey-specific flags.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` was an experimental scale-measurement tool whose
distinctive features (ref-kind filters, top-N path tables) are now
all available in `git repo structure`. With the path-level reporting
in place (commits "repo: filter the structure scope via
--ref-filter=<pattern>" and "repo: report top-N paths by count, disk,
and inflated size in structure"), there is no functionality `git
survey` provides that `git repo structure` cannot.

Replace the 764-line `git survey` implementation with a roughly
hundred-line shim that:

  * Accepts the existing `git survey` command line so callers in
    scripts continue to parse without changes.
  * Emits a deprecation warning naming the replacement command, so
    interactive users learn about the migration target.
  * Translates the survey-specific knobs into the equivalent
    `git repo structure` invocation and re-execs the canonical
    command via `execv_git_cmd()`. Per-kind ref selectors fan out
    into the corresponding `refs/heads/*`, `refs/tags/*`, etc.
    `--ref-filter` patterns; `--top=<N>` is forwarded directly;
    `--all-refs` becomes the absence of any `--ref-filter`.

Two survey options have no `git repo structure` counterpart:
`--verbose` controlled per-step trace output the new command does
not emit, and `--[no-]detached` selected the detached HEAD which
`git repo structure` does not enumerate separately. Both are
silently accepted and produce a single warning each, so old
invocations keep working while the absence of these knobs in `git
repo structure` is made visible.

Rewrite t8100 to assert the shim's contract: the deprecation
warning is printed, the output is byte-identical to a corresponding
`git repo structure` invocation, and the per-kind selector
translation produces the right `--ref-filter` pattern. The
preceding survey-specific output assertions (the multi-column
plaintext tables) no longer apply, since `git repo structure`'s
output format is now the canonical one and is covered by t1901.

The `survey.*` configuration keys (`survey.top`, `survey.progress`,
`survey.verbose`) are no longer honored by the shim. They were
mirrored by the preceding `repo.structure.top` work for the most
useful knob; users with `survey.top` set in config should migrate
to `repo.structure.top`. This is a backward-incompatible removal
documented by the deprecation notice in `git-survey.adoc`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the survey-pivot-into-git-repo branch from d14deae to 46e1492 Compare June 7, 2026 08:52
@dscho dscho requested a review from derrickstolee June 7, 2026 10:16
@dscho dscho self-assigned this Jun 7, 2026
@dscho dscho marked this pull request as ready for review June 7, 2026 10:16

@derrickstolee derrickstolee left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making progress here. It looks like an interested contributor could take this version at tip and start a patch series to extend git repo structure to get this kind of data upstream.

Comment thread builtin/survey.c
if (argc)
usage(_("'git survey' takes no positional arguments"));

warning(_("'git survey' is deprecated; "

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly considered turning this into advice instead of a warning, but this is a good way to make it clear that we will remove this eventually.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR deprecates the experimental git survey command by turning it into a compatibility shim that translates legacy flags into the corresponding git repo structure options and then re-execs git repo structure. To close remaining feature gaps, it also extends git repo structure with annotated-tag breakdown, ref scoping via --ref-filter, top-N per-path reporting via --top, and a new repo.structure.top configuration default.

Changes:

  • Replace git survey’s implementation with a deprecated re-exec shim to git repo structure, translating legacy flags (--top, ref-selection flags, progress).
  • Extend git repo structure output and option surface: annotated-tag counts, --ref-filter (repeatable), --top=<n> (with repo.structure.top default), plus corresponding tests.
  • Update documentation for the new git repo structure options and introduce repo.structure.* config docs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
builtin/survey.c Removes the old survey implementation and replaces it with a deprecated shim that translates options and execs git repo structure.
builtin/repo.c Adds annotated tag counting, --ref-filter, --top, top-path reporting in outputs, and repo.structure.top config handling.
t/t8100-git-survey.sh Updates tests to validate deprecation banner/warning and forwarding behavior from git survey to git repo structure.
t/t1901-repo-structure.sh Expands tests for annotated tags, --ref-filter, --top, and repo.structure.top behavior.
Documentation/git-survey.adoc Marks git survey as deprecated and points users toward git repo structure.
Documentation/git-repo.adoc Documents new git repo structure options (--ref-filter, --top) and updates synopsis.
Documentation/config/repo.adoc Adds documentation for repo.structure.top.
Documentation/config.adoc Includes the new config/repo.adoc documentation file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread builtin/survey.c
Comment on lines +64 to +66
if (want_detached != -1)
warning(_("--[no-]detached is ignored by "
"'git repo structure'"));
Comment thread builtin/survey.c
Comment on lines +56 to +59
argc = parse_options(argc, argv, prefix, options, survey_usage, 0);
if (argc)
usage(_("'git survey' takes no positional arguments"));

Comment thread builtin/survey.c
Comment on lines +73 to +74
if (top_nr > 0)
strvec_pushf(&child_argv, "--top=%d", top_nr);
Comment thread builtin/repo.c
Comment on lines +827 to +831
return 1;
for (size_t i = 0; i < filters->nr; i++)
if (!wildmatch(filters->items[i].string, refname, WM_PATHNAME))
return 1;
return 0;
Comment thread builtin/repo.c
Comment on lines +709 to +713
static void print_keyvalue_path(const char *key, char key_delim,
const char *path, char value_delim)
{
printf("%s%c%s%c", key, key_delim, path, value_delim);
}
Comment thread builtin/repo.c
Comment on lines +1138 to +1140

return 0;
}
Comment on lines +16 to +21
NOTE: `git survey` is being superseded by `git repo structure`. New
deployments and new features should use `git repo structure`; its
`--ref-filter=<pattern>` option subsumes the various `--branches`,
`--tags`, and `--remotes` flags here, and `--top=<N>` provides the
same detail tables. A future release will turn `git survey` into a
thin shim over `git repo structure`. See linkgit:git-repo[1].
Comment thread t/t8100-git-survey.sh
Comment on lines +39 to +43
test_expect_success 'survey --top is translated' '
git survey --top=3 --all-refs >out &&
git repo structure --top=3 >expected &&
test_cmp expected out
'
Comment thread t/t1901-repo-structure.sh
Comment on lines +239 to +243
git repo structure --format=lines \
--ref-filter="refs/heads/*" >out &&
grep "^references.branches.count=1$" out &&
grep "^references.tags.count=0$" out &&
grep "^references.remotes.count=0$" out
Comment thread t/t1901-repo-structure.sh
Comment on lines +277 to +289
test_expect_success '--top=N reports the N largest paths per axis' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
mkdir -p dir1 dir2 &&
echo small >dir1/small.txt &&
printf "%010000d" 0 >dir2/big.txt &&
git add . &&
test_tick &&
git commit -m commit &&

git repo structure --format=lines --top=2 >out &&
Comment thread t/t1901-repo-structure.sh
Comment on lines 155 to +160
git tag -a foo -m bar &&

cat >expect <<-EOF &&
references.branches.count=1
references.tags.count=1
references.tags.annotated.count=1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be good to have another, lightweight tag be created so we can see that references.tags.count is inclusive of references.tags.annotated.count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants