refactor: native tool calling + generic forced-retry + terminal exit

- llm_client: switch tool_call_loop from text-based <tool_call> regex to OpenAI-native tools=[...] / structured tool_calls field; accumulate delta.reasoning_content for DeepSeek thinking-mode echo-back; fold preserves system msg and aligns boundary to never orphan role:tool - base_agent: generic forced-retry via mandatory_record_tools class attr (filesystem -> add_phenomenon, timeline -> add_temporal_edge, hypothesis -> add_hypothesis, report -> save_report); count via executor wrapper - terminal_tools class attr + loop short-circuit: when a terminal tool is called, loop exits with its raw return as final_text. ReportAgent declares save_report as terminal - replaces the <answer>-tag stop signal that native tool calling broke - _execute_*: return (raw, formatted) - terminal exit uses untruncated raw, conversation history uses 3000-char-capped formatted - evidence_graph + orchestrator: LLM-derived InvestigationArea support (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS / _AREA_TOOLS); manual yaml block kept as optional seed - strip <answer> references from agent prompts (no longer load-bearing) Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures (was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via save_report (was max_iterations regression). 78/78 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:51:19 +08:00
parent 0a2b344c84
commit 444d58726a
9 changed files with 1356 additions and 298 deletions
--- a/agents/timeline.py
+++ b/agents/timeline.py
@@ -24,6 +24,7 @@ class TimelineAgent(BaseAgent):
        "MAC timestamps and correlate events across all phenomena categories in the "
        "evidence graph to reconstruct the sequence of activities on the system."
    )
+    mandatory_record_tools = ("add_temporal_edge",)

    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
        super().__init__(llm, graph)
@@ -95,7 +96,7 @@ class TimelineAgent(BaseAgent):
            f"     - 'Tool installation'       (before) 'Tool execution'\n"
            f"4. Aim for 15-40 temporal edges that connect the major events into a "
            f"forensic story.\n"
-            f"5. Wrap a short summary in <answer> when done.\n\n"
+            f"5. STOP after recording all meaningful temporal edges. Do not call any more tools.\n\n"
            f"STRICT BOUNDARIES:\n"
            f"- Your job is to CONNECT existing phenomena, NOT to discover new ones. "
            f"You CANNOT call add_phenomenon — the tool isn't yours.\n"