First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.
Two-layer fix in evidence_graph.validate_fact_grounding:
Layer A (silent heal): when the cited inv-id misses, search the same
agent / task's invocations for one whose output contains the value
(strict or normalised substring). If exactly one matches, rewrite
fact.invocation_id in place and accept. Multi-match is NOT auto-
rescued — the candidate ids go back to the LLM so it picks deliberately.
Layer B (informative retry): GroundingError now appends the agent's
recent invocation ids and a brief tool-call summary, so the LLM has
the real ids in front of it for the next attempt rather than
fabricating again from memory.
Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five interrelated cleanups:
1. Lead -> Phenomenon provenance
- Phenomenon.from_lead_id field on the dataclass
- BaseAgent.run(lead_id=...) writes self._current_lead_id
- _add_phenomenon auto-injects from agent state (LLM unaware)
- Orchestrator dispatch passes lead.id; Phase 1/2-auto/4/5 stay None
- Merge path preserves the first non-None lead_id on collision
2. Unified Phenomenon <-> Hypothesis link path
- HypothesisAgent only adds hypotheses, never links
- link_phenomenon_to_hypothesis tool + executor removed
- All links go through Orchestrator._judge_new_phenomena
- Phase 2 unconditionally judges after hypothesis generation
- Gap Analysis judges after each dispatch round
(Three previously-missing judge calls now in place.)
3. SSOT in agent subclasses
- Remove RoleTemplate dataclass, ROLE_TEMPLATES dict,
_instantiate_from_template method
- Each agent subclass owns name, role, and tool list
- agent_factory.py shrinks from 299 to 153 lines
- All 7 agents now route through _AGENT_CLASSES (filesystem,
registry, communication, network, timeline were previously dead
subclasses overridden by templates)
4. Configurable edge weights
- HYPOTHESIS_EDGE_WEIGHTS -> _DEFAULT_EDGE_WEIGHTS (private default)
- EvidenceGraph(edge_weights=...) override via config.yaml
- hypothesis_edge_weights section in config.yaml (commented example)
- main.py and regenerate_report.py read and pass through
5. regenerate_report.py auto-picks the latest run/*/graph_state.json
when no CLI arg is given (was a hardcoded date path)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>