MASForensic

Files

BattleTag 6b485b98f7 fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error

First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.

Two-layer fix in evidence_graph.validate_fact_grounding:

  Layer A (silent heal): when the cited inv-id misses, search the same
  agent / task's invocations for one whose output contains the value
  (strict or normalised substring). If exactly one matches, rewrite
  fact.invocation_id in place and accept. Multi-match is NOT auto-
  rescued — the candidate ids go back to the LLM so it picks deliberately.

  Layer B (informative retry): GroundingError now appends the agent's
  recent invocation ids and a brief tool-call summary, so the LLM has
  the real ids in front of it for the next attempt rather than
  fabricating again from memory.

Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 02:14:20 -10:00

test_optimizations.py

fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error

2026-05-21 02:14:20 -10:00