fix: address agent boundary / JSON robustness / Phase 4 no-op from CFReDS run

Issues found running the system end-to-end on the NIST CFReDS Hacking Case disk image (SCHARDT.001, Mr. Evil). Four interconnected fixes: 1. HypothesisAgent boundary leak (two layers) B.1 Tool set: BaseAgent._register_graph_tools was registering add_phenomenon / add_lead / link_to_entity for every agent. With an empty graph in Phase 2, HypothesisAgent "compensated" by inventing phenomena, dispatching leads, and linking entities. B.2 Prompt leak: BaseAgent's shared system prompt hard-coded "Call investigation tools (list_directory, parse_registry_key, etc.)". HypothesisAgent hallucinated list_directory and wasted 2 LLM rounds on 'unknown tool' errors before backing off. Fix: - Split _register_graph_tools into _register_graph_read_tools + _register_graph_write_tools. - HypothesisAgent, ReportAgent, TimelineAgent override _register_graph_tools to skip write tools. - HypothesisAgent and TimelineAgent override _build_system_prompt with focused, role-specific workflows (no Phase A-D investigation boilerplate). 2. JSON parse failures in Phase 3 lead generation (5/6 hypotheses lost) DeepSeek emits JSON with stray backslashes (Windows path references) and occasional minor syntax slips. Old single-stage sanitize couldn't recover; per-hypothesis fallback silently swallowed each failure. Fix: - _safe_json_loads: progressive — stage 0 as-is, stage 1 escape stray \X (anything not in valid JSON escape set), log raw input on final failure for diagnosis. - New _call_llm_for_json helper: on parse failure, append the error to the prompt and re-call LLM (self-correcting retry, up to 2). - All 4 LLM-JSON callsites in orchestrator refactored to use it. 3. Phase 1 sometimes skipped add_phenomenon (LLM treated <answer> as deliverable) Strengthen BaseAgent's RECORDING REQUIREMENT — explicit "your <answer> is DISCARDED; only graph mutations propagate" plus a new rule: negative findings (searched X, found nothing) MUST also be recorded as phenomena, since they constrain the hypothesis space. 4. Phase 4 Timeline was a no-op TimelineAgent inherited BaseAgent's Phase A-D prompt and never called add_temporal_edge — produced 0 temporal edges. Override the prompt with concrete workflow (build_filesystem_timeline -> get_timestamped_phenomena -> 15-40 add_temporal_edge calls) and restrict tool set to read-only + its 3 temporal tools. Verified end-to-end: HypothesisAgent now 8 tools (no writes), ReportAgent 13 (no graph writes), TimelineAgent 10 (read + temporal + timeline). All 60 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:16 +08:00
parent 0a966d8476
commit 893f5b5de2
5 changed files with 251 additions and 82 deletions
--- a/agents/hypothesis.py
+++ b/agents/hypothesis.py
@@ -1,7 +1,9 @@
 """Hypothesis Agent — generates investigative hypotheses from phenomena.

 Generates hypotheses only. Phenomenon→Hypothesis linking is handled centrally
-by Orchestrator._judge_new_phenomena, so all link logic lives in one place.
+by Orchestrator._judge_new_phenomena. Tool set is restricted to read-only
+graph queries + add_hypothesis to prevent the agent from creating phenomena,
+leads, or entity links.
 """

 from __future__ import annotations
@@ -27,6 +29,10 @@ class HypothesisAgent(BaseAgent):
        super().__init__(llm, graph)
        self._register_hypothesis_tools()

+    def _register_graph_tools(self) -> None:
+        """Restrict to read-only graph tools. add_hypothesis is registered separately."""
+        self._register_graph_read_tools()
+
    def _register_hypothesis_tools(self) -> None:
        self.register_tool(
            name="add_hypothesis",
@@ -51,6 +57,32 @@ class HypothesisAgent(BaseAgent):
            executor=self._add_hypothesis,
        )

+    def _build_system_prompt(self, task: str) -> str:
+        """Focused prompt — no INVESTIGATE/RECORD/LINK workflow."""
+        return (
+            f"You are {self.name}, a forensic hypothesis analyst.\n"
+            f"Role: {self.role}\n\n"
+            f"Image: {self.graph.image_path}\n"
+            f"Current investigation state: {self.graph.stats_summary()}\n\n"
+            f"Your task: {task}\n\n"
+            f"WORKFLOW:\n"
+            f"1. Call list_phenomena and search_graph to review existing findings.\n"
+            f"2. For each hypothesis you want to record, call add_hypothesis (title + description).\n"
+            f"3. Wrap a short summary in <answer> when you have generated 3-7 hypotheses.\n\n"
+            f"STRICT BOUNDARIES:\n"
+            f"- Your only mutation tool is add_hypothesis. Do NOT attempt list_directory, "
+            f"parse_registry_key, extract_file, or any disk-image investigation tools — "
+            f"they are not yours and you will get 'unknown tool' errors.\n"
+            f"- You CANNOT create phenomena, leads, or entity links. The orchestrator handles "
+            f"all phenomenon↔hypothesis linking after you finish.\n"
+            f"- Each hypothesis must be specific and testable. Avoid generic templates like "
+            f"'Unauthorized Remote Access' or 'Malware Deployment' unless concrete phenomena "
+            f"in the graph already point to them.\n"
+            f"- If the graph is empty, generate broad starting hypotheses and mark them "
+            f"clearly as exploratory in their description so downstream agents know they "
+            f"still need evidence."
+        )
+
    async def _add_hypothesis(self, title: str, description: str) -> str:
        hid = await self.graph.add_hypothesis(
            title=title,