Turn git survey into a deprecated shim over git repo structure#6268
Turn git survey into a deprecated shim over git repo structure#6268dscho wants to merge 7 commits into
git survey into a deprecated shim over git repo structure#6268Conversation
Mirror what git survey already reports: lightweight tags (pointing straight at a commit/tree/blob) and annotated tags (pointing at an OBJ_TAG that is itself stored as a separate object) are different things in many monorepo contexts, and one of the differences git survey users routinely care about. Add an annotated_tags counter to struct ref_stats, populate it in count_references() by peeking at the ref OID's object type, and expose it as a sub-row under Tags in the table output and as references.tags.annotated.count in the machine-readable formats. Step toward pivoting the standalone git survey command onto git repo structure; this fills the first of the four feature gaps documented in the assessment. Tests in t1901 widened to assert the new row and key. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dd46870 to
d14deae
Compare
`git repo structure` walks every reference enumerated by `refs_for_each_ref()` and feeds each reference's tip into the path walk that produces the object counts. There is no way to scope the inquiry to a subset of refs, even though that is the most common need when an operator is investigating what part of the history is driving cost: only branches, only release tags, only one remote's view, etc. Add a single `--ref-filter=<pattern>` option that, when given, restricts both the reference count and the object walk to refs whose full name matches one of the patterns. The option is repeatable; multiple patterns form a union, so `--ref-filter='refs/heads/*' --ref-filter='refs/tags/v*'` includes local branches and tags whose short name starts with `v`. Patterns use `wildmatch()` with `WM_PATHNAME` semantics so a `*` does not cross `/`, matching the convention used by `git for-each-ref` positional arguments. Choosing a single flexible filter, rather than a proliferation of per-kind flags like `--branches`, `--tags`, `--remotes`, keeps the option surface small and lets the same mechanism express narrow selections the per-kind flags could not, such as "only release tags" (`'refs/tags/v*'`) or "only one remote's branches" (`'refs/remotes/origin/*'`). Without `--ref-filter`, behaviour is unchanged: every ref `refs_for_each_ref()` enumerates contributes. Both the reference counter and the path-walk seeding (via `add_pending_oid()`) sit on the same callback, so an early return when no pattern matches naturally excludes a ref from both. No separate object-walk machinery is needed. Cover the two interesting code paths with tests in t1901: a single filter narrowing to branches, and two filters unioning to include both branches and tags. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` distinguishes itself from `git repo structure` largely by its path-level reporting: in addition to whole-repo totals it lists the paths whose object histories dominate the repository, ranked by raw count, on-disk size, and inflated size, separately for trees and blobs. That is often the most actionable output from `git survey`, since it points an operator at the directories and files that should be reviewed for cleanup, sparse-checkout exclusion, or rewriting. `git repo structure` already drives the same path-walk traversal that `git survey` uses to gather its per-path numbers; the callback simply discards the path. Aggregate per-(path, type) summaries inside that existing callback and add a bounded, descending-sorted "top-N" table keyed by each of the three axes. Gate the feature behind a new `--top=<n>` option, defaulting to 0, so unadorned invocations are unaffected and pay no extra work for the top-N tracking. Mirror the sort and eviction strategy from `builtin/survey.c`: keep an array of at most N entries sorted from largest to smallest, walk it from the bottom on each candidate, and shift entries down when a new one belongs. Compared to `builtin/survey.c`, drop the void-pointer indirection in the table data, type the comparator's arguments, and fold the trivial comparators into the `(a > b) - (a < b)` idiom. For the human-readable `table` output, extend the existing nested bullet layout with two new top-level sections, `* Top trees` and `* Top blobs`, each containing three sub-tables (`Top by count`, `Top by disk size`, `Top by inflated size`). The path becomes the row name and the relevant scalar becomes the value, reusing `stats_table_count_addf` and `stats_table_size_addf` so units and column alignment match the rest of the table. For the `lines`/`nul` key-value formats, emit one `objects.<type>.top.by_<axis>.<rank>.path=<path>` entry alongside an `objects.<type>.top.by_<axis>.<rank>.<axis>=<value>` entry per ranked path, so consumers can dispatch by axis without parsing the schema. The root tree's path is the empty string as produced by the path-walk machinery; preserve that as-is to stay faithful to the upstream representation rather than fabricating a placeholder. This is the first piece of folding `git survey`'s functionality into `git repo structure`. Subsequent commits will add the corresponding configuration knob and, eventually, turn `git survey` into a thin deprecated shim over `git repo structure`. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding commit added `--top=<n>` to `git repo structure`,
reporting the top-N paths per type ranked by count, on-disk size, and
inflated size. Cover the three behaviors that matter for that option:
* Without `--top`, the key-value output emits no `top.*` keys, so
existing parsers stay unaffected.
* `--top=N` produces exactly N ranked entries on each of the six
`objects.<type>.top.by_<axis>` axes (count/disk_size/inflated_size
crossed with trees/blobs), and a constructed input where one blob
is several orders of magnitude bigger than the other lets us
assert the ordering on the disk-size and inflated-size axes.
* A negative `--top` is rejected with a non-zero exit and a message
naming the constraint, so a typo cannot silently degrade into the
default zero.
Avoid grep patterns starting with `--`; grep would parse the leading
double dash as an option terminator.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` exposes its `--top` default via `survey.top` so that a site or per-repository operator can switch the detail tables on once and have every subsequent invocation include them. Mirror that ergonomics for `git repo structure` so that, as `git survey`'s functionality is folded into `git repo structure`, the configuration side of the migration story stays equivalent. Add a small `git_config_int` callback bound to `repo.structure.top` and invoke it before `parse_options()`, so a `--top=<N>` on the command line cleanly overrides the configured default (including `--top=0` to opt out of the detail tables when configuration enables them). Reject negative configured values with the same wording as the command-line guard, since `git_config_int()` happily returns negative integers. Document the new variable in a fresh `Documentation/config/repo.adoc` and wire it into the alphabetical includes in `Documentation/config.adoc` between `repack.adoc` and `rerere.adoc`. Cover the precedence behaviour with a t1901 test: a configured value enables the tables by default, and a command-line `--top=0` suppresses them again. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` started life as an experimental scale-measurement tool; the preceding commits give `git repo structure` the path-level detail tables and ref-scoping mechanism that were `git survey`'s main draw, so the two now overlap substantially. Plan the migration explicitly: add a short notice at the top of the description making clear which of `git survey`'s knobs map to which `git repo structure` option, and state that a future release will turn `git survey` into a thin shim over `git repo structure`. Putting the notice in the description (rather than only the synopsis) ensures it shows up in `git help survey` rendering before the reader sees any option specifics, so an operator skimming the page learns about the replacement before adopting any survey-specific flags. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git survey` was an experimental scale-measurement tool whose
distinctive features (ref-kind filters, top-N path tables) are now
all available in `git repo structure`. With the path-level reporting
in place (commits "repo: filter the structure scope via
--ref-filter=<pattern>" and "repo: report top-N paths by count, disk,
and inflated size in structure"), there is no functionality `git
survey` provides that `git repo structure` cannot.
Replace the 764-line `git survey` implementation with a roughly
hundred-line shim that:
* Accepts the existing `git survey` command line so callers in
scripts continue to parse without changes.
* Emits a deprecation warning naming the replacement command, so
interactive users learn about the migration target.
* Translates the survey-specific knobs into the equivalent
`git repo structure` invocation and re-execs the canonical
command via `execv_git_cmd()`. Per-kind ref selectors fan out
into the corresponding `refs/heads/*`, `refs/tags/*`, etc.
`--ref-filter` patterns; `--top=<N>` is forwarded directly;
`--all-refs` becomes the absence of any `--ref-filter`.
Two survey options have no `git repo structure` counterpart:
`--verbose` controlled per-step trace output the new command does
not emit, and `--[no-]detached` selected the detached HEAD which
`git repo structure` does not enumerate separately. Both are
silently accepted and produce a single warning each, so old
invocations keep working while the absence of these knobs in `git
repo structure` is made visible.
Rewrite t8100 to assert the shim's contract: the deprecation
warning is printed, the output is byte-identical to a corresponding
`git repo structure` invocation, and the per-kind selector
translation produces the right `--ref-filter` pattern. The
preceding survey-specific output assertions (the multi-column
plaintext tables) no longer apply, since `git repo structure`'s
output format is now the canonical one and is covered by t1901.
The `survey.*` configuration keys (`survey.top`, `survey.progress`,
`survey.verbose`) are no longer honored by the shim. They were
mirrored by the preceding `repo.structure.top` work for the most
useful knob; users with `survey.top` set in config should migrate
to `repo.structure.top`. This is a backward-incompatible removal
documented by the deprecation notice in `git-survey.adoc`.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
d14deae to
46e1492
Compare
derrickstolee
left a comment
There was a problem hiding this comment.
Thanks for making progress here. It looks like an interested contributor could take this version at tip and start a patch series to extend git repo structure to get this kind of data upstream.
| if (argc) | ||
| usage(_("'git survey' takes no positional arguments")); | ||
|
|
||
| warning(_("'git survey' is deprecated; " |
There was a problem hiding this comment.
I briefly considered turning this into advice instead of a warning, but this is a good way to make it clear that we will remove this eventually.
There was a problem hiding this comment.
Pull request overview
This PR deprecates the experimental git survey command by turning it into a compatibility shim that translates legacy flags into the corresponding git repo structure options and then re-execs git repo structure. To close remaining feature gaps, it also extends git repo structure with annotated-tag breakdown, ref scoping via --ref-filter, top-N per-path reporting via --top, and a new repo.structure.top configuration default.
Changes:
- Replace
git survey’s implementation with a deprecated re-exec shim togit repo structure, translating legacy flags (--top, ref-selection flags, progress). - Extend
git repo structureoutput and option surface: annotated-tag counts,--ref-filter(repeatable),--top=<n>(withrepo.structure.topdefault), plus corresponding tests. - Update documentation for the new
git repo structureoptions and introducerepo.structure.*config docs.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
builtin/survey.c |
Removes the old survey implementation and replaces it with a deprecated shim that translates options and execs git repo structure. |
builtin/repo.c |
Adds annotated tag counting, --ref-filter, --top, top-path reporting in outputs, and repo.structure.top config handling. |
t/t8100-git-survey.sh |
Updates tests to validate deprecation banner/warning and forwarding behavior from git survey to git repo structure. |
t/t1901-repo-structure.sh |
Expands tests for annotated tags, --ref-filter, --top, and repo.structure.top behavior. |
Documentation/git-survey.adoc |
Marks git survey as deprecated and points users toward git repo structure. |
Documentation/git-repo.adoc |
Documents new git repo structure options (--ref-filter, --top) and updates synopsis. |
Documentation/config/repo.adoc |
Adds documentation for repo.structure.top. |
Documentation/config.adoc |
Includes the new config/repo.adoc documentation file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (want_detached != -1) | ||
| warning(_("--[no-]detached is ignored by " | ||
| "'git repo structure'")); |
| argc = parse_options(argc, argv, prefix, options, survey_usage, 0); | ||
| if (argc) | ||
| usage(_("'git survey' takes no positional arguments")); | ||
|
|
| if (top_nr > 0) | ||
| strvec_pushf(&child_argv, "--top=%d", top_nr); |
| return 1; | ||
| for (size_t i = 0; i < filters->nr; i++) | ||
| if (!wildmatch(filters->items[i].string, refname, WM_PATHNAME)) | ||
| return 1; | ||
| return 0; |
| static void print_keyvalue_path(const char *key, char key_delim, | ||
| const char *path, char value_delim) | ||
| { | ||
| printf("%s%c%s%c", key, key_delim, path, value_delim); | ||
| } |
|
|
||
| return 0; | ||
| } |
| NOTE: `git survey` is being superseded by `git repo structure`. New | ||
| deployments and new features should use `git repo structure`; its | ||
| `--ref-filter=<pattern>` option subsumes the various `--branches`, | ||
| `--tags`, and `--remotes` flags here, and `--top=<N>` provides the | ||
| same detail tables. A future release will turn `git survey` into a | ||
| thin shim over `git repo structure`. See linkgit:git-repo[1]. |
| test_expect_success 'survey --top is translated' ' | ||
| git survey --top=3 --all-refs >out && | ||
| git repo structure --top=3 >expected && | ||
| test_cmp expected out | ||
| ' |
| git repo structure --format=lines \ | ||
| --ref-filter="refs/heads/*" >out && | ||
| grep "^references.branches.count=1$" out && | ||
| grep "^references.tags.count=0$" out && | ||
| grep "^references.remotes.count=0$" out |
| test_expect_success '--top=N reports the N largest paths per axis' ' | ||
| test_when_finished "rm -rf repo" && | ||
| git init repo && | ||
| ( | ||
| cd repo && | ||
| mkdir -p dir1 dir2 && | ||
| echo small >dir1/small.txt && | ||
| printf "%010000d" 0 >dir2/big.txt && | ||
| git add . && | ||
| test_tick && | ||
| git commit -m commit && | ||
|
|
||
| git repo structure --format=lines --top=2 >out && |
| git tag -a foo -m bar && | ||
|
|
||
| cat >expect <<-EOF && | ||
| references.branches.count=1 | ||
| references.tags.count=1 | ||
| references.tags.annotated.count=1 |
There was a problem hiding this comment.
It may also be good to have another, lightweight tag be created so we can see that references.tags.count is inclusive of references.tags.annotated.count.
git surveywas always experimental, and I never got around to upstreaming it to make it non-experimental.In the meantime, the
git repo structurecommand was upstreamed upstream, which covers most of the same ground with a cleaner option surface and a stable output contract. This PR closes the remaining gap (annotated-tag breakdown, ref scoping, top-N paths by count/disk/inflated, and the corresponding configuration knob) and then turnsgit surveyinto a thin shim that warns about deprecation, translates its old command line into the equivalentgit repo structureinvocation, and re-execs the canonical command. Net result: one user-facing tool to maintain and to teach instead of two.The intent is that scripts pinned to
git surveykeep working (a warning aside), and that operators have a single answer when they ask "how do I see what's making my repository large?". Thesurvey.*configuration keys are intentionally dropped; the only one that mattered,survey.top, has a direct replacement inrepo.structure.top.