Note that the hard-coded HYPOTHESIS_EDGE_WEIGHTS table is a temporary
choice; an adaptive scheme should be explored once the full pipeline
is stable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issues found running the system end-to-end on the NIST CFReDS Hacking Case
disk image (SCHARDT.001, Mr. Evil). Four interconnected fixes:
1. HypothesisAgent boundary leak (two layers)
B.1 Tool set: BaseAgent._register_graph_tools was registering
add_phenomenon / add_lead / link_to_entity for every agent. With
an empty graph in Phase 2, HypothesisAgent "compensated" by
inventing phenomena, dispatching leads, and linking entities.
B.2 Prompt leak: BaseAgent's shared system prompt hard-coded "Call
investigation tools (list_directory, parse_registry_key, etc.)".
HypothesisAgent hallucinated list_directory and wasted 2 LLM
rounds on 'unknown tool' errors before backing off.
Fix:
- Split _register_graph_tools into _register_graph_read_tools +
_register_graph_write_tools.
- HypothesisAgent, ReportAgent, TimelineAgent override
_register_graph_tools to skip write tools.
- HypothesisAgent and TimelineAgent override _build_system_prompt
with focused, role-specific workflows (no Phase A-D investigation
boilerplate).
2. JSON parse failures in Phase 3 lead generation (5/6 hypotheses lost)
DeepSeek emits JSON with stray backslashes (Windows path references)
and occasional minor syntax slips. Old single-stage sanitize couldn't
recover; per-hypothesis fallback silently swallowed each failure.
Fix:
- _safe_json_loads: progressive — stage 0 as-is, stage 1 escape stray
\X (anything not in valid JSON escape set), log raw input on final
failure for diagnosis.
- New _call_llm_for_json helper: on parse failure, append the error
to the prompt and re-call LLM (self-correcting retry, up to 2).
- All 4 LLM-JSON callsites in orchestrator refactored to use it.
3. Phase 1 sometimes skipped add_phenomenon (LLM treated <answer> as deliverable)
Strengthen BaseAgent's RECORDING REQUIREMENT — explicit "your <answer>
is DISCARDED; only graph mutations propagate" plus a new rule:
negative findings (searched X, found nothing) MUST also be recorded
as phenomena, since they constrain the hypothesis space.
4. Phase 4 Timeline was a no-op
TimelineAgent inherited BaseAgent's Phase A-D prompt and never called
add_temporal_edge — produced 0 temporal edges. Override the prompt
with concrete workflow (build_filesystem_timeline ->
get_timestamped_phenomena -> 15-40 add_temporal_edge calls) and
restrict tool set to read-only + its 3 temporal tools.
Verified end-to-end: HypothesisAgent now 8 tools (no writes), ReportAgent
13 (no graph writes), TimelineAgent 10 (read + temporal + timeline).
All 60 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous LLMClient used raw httpx + Claude Messages API (/v1/messages,
x-api-key, Anthropic SSE event types). Incompatible with DeepSeek.
Rewrite LLMClient.__init__/chat/close to use openai.AsyncOpenAI:
- /v1/chat/completions endpoint, OpenAI message format
- Bearer auth, native SDK error types
- Stream chunks via async for + chunk.choices[0].delta.content
Tool calling protocol (ReAct text-based tags) and all surrounding helpers
(_apply_progressive_decay, _fold_old_messages, _partition_tool_calls,
tool_call_loop, etc.) are unchanged — endpoint-agnostic by design.
New optional config params surfaced to config.yaml.agent:
- reasoning_effort: "high" | "medium" | "low" — DeepSeek/o1-style depth
- thinking_enabled: bool — DeepSeek extra_body.thinking switch
main.py and regenerate_report.py pass these through to LLMClient.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five interrelated cleanups:
1. Lead -> Phenomenon provenance
- Phenomenon.from_lead_id field on the dataclass
- BaseAgent.run(lead_id=...) writes self._current_lead_id
- _add_phenomenon auto-injects from agent state (LLM unaware)
- Orchestrator dispatch passes lead.id; Phase 1/2-auto/4/5 stay None
- Merge path preserves the first non-None lead_id on collision
2. Unified Phenomenon <-> Hypothesis link path
- HypothesisAgent only adds hypotheses, never links
- link_phenomenon_to_hypothesis tool + executor removed
- All links go through Orchestrator._judge_new_phenomena
- Phase 2 unconditionally judges after hypothesis generation
- Gap Analysis judges after each dispatch round
(Three previously-missing judge calls now in place.)
3. SSOT in agent subclasses
- Remove RoleTemplate dataclass, ROLE_TEMPLATES dict,
_instantiate_from_template method
- Each agent subclass owns name, role, and tool list
- agent_factory.py shrinks from 299 to 153 lines
- All 7 agents now route through _AGENT_CLASSES (filesystem,
registry, communication, network, timeline were previously dead
subclasses overridden by templates)
4. Configurable edge weights
- HYPOTHESIS_EDGE_WEIGHTS -> _DEFAULT_EDGE_WEIGHTS (private default)
- EvidenceGraph(edge_weights=...) override via config.yaml
- hypothesis_edge_weights section in config.yaml (commented example)
- main.py and regenerate_report.py read and pass through
5. regenerate_report.py auto-picks the latest run/*/graph_state.json
when no CLI arg is given (was a hardcoded date path)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous README described a Blackboard-based 4-phase, 6-agent system.
The actual code uses:
- EvidenceGraph with typed weighted edges (Phenomenon/Hypothesis/Entity)
- 5 phases (explicit Hypothesis Generation between survey and investigation)
- 7 agents (added HypothesisAgent)
Documents the confidence update formula, Phenomenon Jaccard merging,
Asset Library inode dedup, tool-result caching, Gap Analysis coverage
check, auto-persistence, and the resume mechanism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>