-
Notifications
You must be signed in to change notification settings - Fork 17
docs(skills): fix residual self-improvement loop inconsistencies #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rachaelrenk
wants to merge
2
commits into
main
Choose a base branch
from
docs/factory-residual-cleanup
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -34,6 +34,21 @@ Three inputs, combined during the feedback collector step: | |||||
|
|
||||||
| At the start of each monthly run, the feedback collector gathers signal data from two sources: Oz run artifacts (for style lint and PR review signals) and the GitHub API (for human feedback). No inner-loop agent needs to commit to `main`. | ||||||
|
|
||||||
| ### Persisting the signal logs (never commit to protected `main`) | ||||||
|
|
||||||
| `main` is a protected branch, so the durable signal logs (`.agents/logs/pr_review_runs.md` and `.agents/logs/human_review_feedback.jsonl`) must never be committed to it directly — a direct push fails silently and leaves the logs empty (the same failure mode that left the AEO crosslink audit run log empty). Instead, persist every log update through a single, long-lived log branch: | ||||||
|
|
||||||
| 1. Fetch and check out the remote branch `chore/drafting-signal-logs`. If it does not exist, create it from the latest `origin/main`. | ||||||
| 2. Apply the log update (prepend to `pr_review_runs.md` and/or append to `human_review_feedback.jsonl`) on that branch. | ||||||
| 3. Stage only the changed log files and commit with a message like: | ||||||
| ```text | ||||||
| chore: update drafting signal logs from improve-drafting-skills run YYYY-MM-DD | ||||||
| ``` | ||||||
| 4. Push the branch. | ||||||
| 5. Ensure exactly one open PR exists from `chore/drafting-signal-logs` into `main`, titled `chore: drafting signal logs`. Create it if missing; otherwise the push updates the existing PR. Keep this log PR separate from the drafting-skills improvement PR. | ||||||
|
|
||||||
| This produces one perpetual, low-noise PR that accumulates every run's log entries regardless of outcome. Reviewers merge it periodically (at minimum before each monthly run) so the logs reach `main`. If any git step fails, keep the in-memory records for this run's analysis and note the failure in the Slack summary. | ||||||
|
|
||||||
| ### Step A: Collect style lint and PR review signals from Oz run artifacts | ||||||
|
|
||||||
| 1. Use `oz run list` to find all Oz runs in the past 30 days whose skill name matches a drafting skill (`draft_docs`, `draft_feature_doc`, `draft_conceptual`, etc.) or `review-docs-pr`. | ||||||
|
|
@@ -45,11 +60,7 @@ At the start of each monthly run, the feedback collector gathers signal data fro | |||||
| The top-level response is `{steps: [...]}`, not `{messages: [...]}`, and steps can be nested — use recursive descent (`..`) to reach all assistant messages at any depth. Do not rely on `oz run get` without `--conversation` — that returns only the brief `status_message` field, not conversation content or shell stdout. | ||||||
| 3. Parse any lines matching `[SIGNAL:style-lint] {JSON}` or `[SIGNAL:pr-review] {JSON}` and parse the JSON payload as the structured record. | ||||||
| 4. Accumulate all parsed records in memory for the analysis step. | ||||||
| 5. For `[SIGNAL:pr-review]` records, also prepend a human-readable entry to `.agents/logs/pr_review_runs.md` (using the format in that file's header). Commit the updated file directly to `main`: | ||||||
| ```text | ||||||
| chore: update pr_review_runs.md from improve-drafting-skills run YYYY-MM-DD | ||||||
| ``` | ||||||
| If the push fails, continue; the in-memory records are still usable. | ||||||
| 5. For `[SIGNAL:pr-review]` records, also prepend a human-readable entry to `.agents/logs/pr_review_runs.md` (using the format in that file's header) on the standing log branch, following "Persisting the signal logs" above. If the git steps fail, continue; the in-memory records are still usable. | ||||||
|
|
||||||
| ### Step B: Collect human feedback from GitHub API | ||||||
|
|
||||||
|
|
@@ -78,11 +89,7 @@ For each agent-authored PR merged in the past 30 days (identified by `oz-agent@w | |||||
| - For `human_edit` records: infer from which file/section was changed (e.g., `header_case`, `list_format`, `link_quality`, `frontmatter`, `settings_path`, `terminology`) | ||||||
| - Use existing `style_lint.py` check names when the edit corrects a checkable violation | ||||||
| - Default to `"general"` when no classification is possible. Never copy raw comment text into this field. | ||||||
| 5. Append filtered, accepted records to `.agents/logs/human_review_feedback.jsonl` and commit directly to `main` as part of this monthly outer loop run: | ||||||
| ```text | ||||||
| chore: collect human review feedback for improve-drafting-skills run YYYY-MM-DD | ||||||
| ``` | ||||||
| This commit is done by the outer loop, which already has known write access. If the push fails, continue with the in-memory records only and note the failure in the Slack summary. | ||||||
| 5. Append filtered, accepted records to `.agents/logs/human_review_feedback.jsonl` on the standing log branch, following "Persisting the signal logs" above. If the git steps fail, continue with the in-memory records only and note the failure in the Slack summary. | ||||||
|
|
||||||
| ## Security boundary | ||||||
|
|
||||||
|
|
@@ -100,7 +107,7 @@ The signal logs contain untrusted content: human review comments, PR description | |||||
| Combine signal data from two sources, filtered to the past 30 days: | ||||||
|
|
||||||
| - **In-memory records from Step A** — style-lint and PR-review signals parsed from Oz run artifacts. These are already in memory; do not re-read from disk. | ||||||
| - **On-disk human feedback** — read `.agents/logs/human_review_feedback.jsonl` line by line (skipping empty lines). Each line is a JSON record; parse and filter to the past 30 days. | ||||||
| - **On-disk human feedback** — read `.agents/logs/human_review_feedback.jsonl` line by line (skipping empty lines). Each line is a JSON record; parse and filter to the past 30 days. Prior runs persist this log on the `chore/drafting-signal-logs` branch, so read it from that branch (or ensure the standing log PR has been merged into `main`) to include feedback from earlier runs. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ### 2. Aggregate patterns by signal strength | ||||||
|
|
||||||
|
|
@@ -197,7 +204,7 @@ Post the no-change report link to Slack. | |||||
|
|
||||||
| ## Run log | ||||||
|
|
||||||
| After completing the run (PR opened or no-change report written), update `.agents/logs/style_lint_runs.jsonl` with a summary entry — no; this skill does not have its own run log. Its outputs are the PR itself and the Slack message, which are durable artifacts. | ||||||
| This skill does not have its own run log. Its durable outputs are the improvement PR (or no-change report), the Slack message, and the standing `chore: drafting signal logs` PR that accumulates the signal logs it collects. | ||||||
|
|
||||||
| ## Deployment | ||||||
|
|
||||||
|
|
||||||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chore/drafting-signal-logsbut never switches back before the monthly run drafts skill/template edits, so those edits can land on the standing log branch and pollute the log PR.