fix: address agent boundary / JSON robustness / Phase 4 no-op from CFReDS run
Issues found running the system end-to-end on the NIST CFReDS Hacking Case
disk image (SCHARDT.001, Mr. Evil). Four interconnected fixes:
1. HypothesisAgent boundary leak (two layers)
B.1 Tool set: BaseAgent._register_graph_tools was registering
add_phenomenon / add_lead / link_to_entity for every agent. With
an empty graph in Phase 2, HypothesisAgent "compensated" by
inventing phenomena, dispatching leads, and linking entities.
B.2 Prompt leak: BaseAgent's shared system prompt hard-coded "Call
investigation tools (list_directory, parse_registry_key, etc.)".
HypothesisAgent hallucinated list_directory and wasted 2 LLM
rounds on 'unknown tool' errors before backing off.
Fix:
- Split _register_graph_tools into _register_graph_read_tools +
_register_graph_write_tools.
- HypothesisAgent, ReportAgent, TimelineAgent override
_register_graph_tools to skip write tools.
- HypothesisAgent and TimelineAgent override _build_system_prompt
with focused, role-specific workflows (no Phase A-D investigation
boilerplate).
2. JSON parse failures in Phase 3 lead generation (5/6 hypotheses lost)
DeepSeek emits JSON with stray backslashes (Windows path references)
and occasional minor syntax slips. Old single-stage sanitize couldn't
recover; per-hypothesis fallback silently swallowed each failure.
Fix:
- _safe_json_loads: progressive — stage 0 as-is, stage 1 escape stray
\X (anything not in valid JSON escape set), log raw input on final
failure for diagnosis.
- New _call_llm_for_json helper: on parse failure, append the error
to the prompt and re-call LLM (self-correcting retry, up to 2).
- All 4 LLM-JSON callsites in orchestrator refactored to use it.
3. Phase 1 sometimes skipped add_phenomenon (LLM treated <answer> as deliverable)
Strengthen BaseAgent's RECORDING REQUIREMENT — explicit "your <answer>
is DISCARDED; only graph mutations propagate" plus a new rule:
negative findings (searched X, found nothing) MUST also be recorded
as phenomena, since they constrain the hypothesis space.
4. Phase 4 Timeline was a no-op
TimelineAgent inherited BaseAgent's Phase A-D prompt and never called
add_temporal_edge — produced 0 temporal edges. Override the prompt
with concrete workflow (build_filesystem_timeline ->
get_timestamped_phenomena -> 15-40 add_temporal_edge calls) and
restrict tool set to read-only + its 3 temporal tools.
Verified end-to-end: HypothesisAgent now 8 tools (no writes), ReportAgent
13 (no graph writes), TimelineAgent 10 (read + temporal + timeline).
All 60 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,7 +1,9 @@
|
||||
"""Hypothesis Agent — generates investigative hypotheses from phenomena.
|
||||
|
||||
Generates hypotheses only. Phenomenon→Hypothesis linking is handled centrally
|
||||
by Orchestrator._judge_new_phenomena, so all link logic lives in one place.
|
||||
by Orchestrator._judge_new_phenomena. Tool set is restricted to read-only
|
||||
graph queries + add_hypothesis to prevent the agent from creating phenomena,
|
||||
leads, or entity links.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -27,6 +29,10 @@ class HypothesisAgent(BaseAgent):
|
||||
super().__init__(llm, graph)
|
||||
self._register_hypothesis_tools()
|
||||
|
||||
def _register_graph_tools(self) -> None:
|
||||
"""Restrict to read-only graph tools. add_hypothesis is registered separately."""
|
||||
self._register_graph_read_tools()
|
||||
|
||||
def _register_hypothesis_tools(self) -> None:
|
||||
self.register_tool(
|
||||
name="add_hypothesis",
|
||||
@@ -51,6 +57,32 @@ class HypothesisAgent(BaseAgent):
|
||||
executor=self._add_hypothesis,
|
||||
)
|
||||
|
||||
def _build_system_prompt(self, task: str) -> str:
|
||||
"""Focused prompt — no INVESTIGATE/RECORD/LINK workflow."""
|
||||
return (
|
||||
f"You are {self.name}, a forensic hypothesis analyst.\n"
|
||||
f"Role: {self.role}\n\n"
|
||||
f"Image: {self.graph.image_path}\n"
|
||||
f"Current investigation state: {self.graph.stats_summary()}\n\n"
|
||||
f"Your task: {task}\n\n"
|
||||
f"WORKFLOW:\n"
|
||||
f"1. Call list_phenomena and search_graph to review existing findings.\n"
|
||||
f"2. For each hypothesis you want to record, call add_hypothesis (title + description).\n"
|
||||
f"3. Wrap a short summary in <answer> when you have generated 3-7 hypotheses.\n\n"
|
||||
f"STRICT BOUNDARIES:\n"
|
||||
f"- Your only mutation tool is add_hypothesis. Do NOT attempt list_directory, "
|
||||
f"parse_registry_key, extract_file, or any disk-image investigation tools — "
|
||||
f"they are not yours and you will get 'unknown tool' errors.\n"
|
||||
f"- You CANNOT create phenomena, leads, or entity links. The orchestrator handles "
|
||||
f"all phenomenon↔hypothesis linking after you finish.\n"
|
||||
f"- Each hypothesis must be specific and testable. Avoid generic templates like "
|
||||
f"'Unauthorized Remote Access' or 'Malware Deployment' unless concrete phenomena "
|
||||
f"in the graph already point to them.\n"
|
||||
f"- If the graph is empty, generate broad starting hypotheses and mark them "
|
||||
f"clearly as exploratory in their description so downstream agents know they "
|
||||
f"still need evidence."
|
||||
)
|
||||
|
||||
async def _add_hypothesis(self, title: str, description: str) -> str:
|
||||
hid = await self.graph.add_hypothesis(
|
||||
title=title,
|
||||
|
||||
Reference in New Issue
Block a user