Timeline agent on the 2026-05-20 full run produced 0 phenomena: initial
round hit max_iterations=60 cap before recording, forced retry then hit
max_iterations=10 cap because every grounding-rejected call burns one
iteration in the new gateway. Two changes restore depth without re-
introducing the original "agent wanders off and never records" failure:
1. Raise retry cap 10 → 30. With grounding auto-rescue (prev commit)
most rejections heal on the first retry, but some still need 2-3
turns; 10 is empirically too tight, 30 leaves headroom.
2. Narrow the retry tool surface to RECORD + graph-write +
read-only-graph-query tools. Investigation tools (list_directory,
sqlite_query, parse_registry_key) are dropped on retry so the agent
can't restart its search loop — the retry is explicitly "record
what you already found, then stop".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.
Two-layer fix in evidence_graph.validate_fact_grounding:
Layer A (silent heal): when the cited inv-id misses, search the same
agent / task's invocations for one whose output contains the value
(strict or normalised substring). If exactly one matches, rewrite
fact.invocation_id in place and accept. Multi-match is NOT auto-
rescued — the candidate ids go back to the LLM so it picks deliberately.
Layer B (informative retry): GroundingError now appends the agent's
recent invocation ids and a brief tool-call summary, so the LLM has
the real ids in front of it for the next attempt rather than
fabricating again from memory.
Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>