refactor: native tool calling + generic forced-retry + terminal exit

- llm_client: switch tool_call_loop from text-based <tool_call> regex to OpenAI-native tools=[...] / structured tool_calls field; accumulate delta.reasoning_content for DeepSeek thinking-mode echo-back; fold preserves system msg and aligns boundary to never orphan role:tool - base_agent: generic forced-retry via mandatory_record_tools class attr (filesystem -> add_phenomenon, timeline -> add_temporal_edge, hypothesis -> add_hypothesis, report -> save_report); count via executor wrapper - terminal_tools class attr + loop short-circuit: when a terminal tool is called, loop exits with its raw return as final_text. ReportAgent declares save_report as terminal - replaces the <answer>-tag stop signal that native tool calling broke - _execute_*: return (raw, formatted) - terminal exit uses untruncated raw, conversation history uses 3000-char-capped formatted - evidence_graph + orchestrator: LLM-derived InvestigationArea support (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS / _AREA_TOOLS); manual yaml block kept as optional seed - strip <answer> references from agent prompts (no longer load-bearing) Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures (was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via save_report (was max_iterations regression). 78/78 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:51:19 +08:00
parent 0a2b344c84
commit 444d58726a
9 changed files with 1356 additions and 298 deletions
--- a/README.md
+++ b/README.md
@@ -67,14 +67,18 @@ Phenomenon → Hypothesis 的边类型与权重写死在 `HYPOTHESIS_EDGE_WEIGHT
 | **Phase 4** | TimelineAgent 用 `build_filesystem_timeline` 生成 MAC 时间线，与 Phenomenon 时间戳关联 |
 | **Phase 5** | ReportAgent 综合假设、证据、实体，生成 Markdown 报告 |

-### Gap Analysis（Phase 3 末）
+### Investigation Areas（hypothesis-derived）

-`config.yaml:investigation_areas` 列出必须覆盖的调查领域（系统信息、用户账户、网络配置、邮件配置、IRC 日志、PCAP、删除文件、Prefetch 等）。Orchestrator 两层判定覆盖情况：
+Phase 2 末尾 orchestrator 调一次 LLM 从所有 active hypothesis 派生 5-12 个 **InvestigationArea**（snake_case slug、description、suggested_agent、expected_keywords、expected_tools、priority、motivating_hypothesis_ids）。Areas 存进 `graph.investigation_areas`，序列化到 `runs/<ts>/investigation_areas.json`。两个用途：

-1. **关键词匹配**（`_AREA_KEYWORDS`）— 扫现有 Phenomenon 标题/描述
-2. **工具命中**（`_AREA_TOOLS`）— 检查是否调用过该领域的关键工具（如 `enumerate_users`、`parse_pcap_strings`）
+1. **Phase 3 主循环提示** — 每个 hypothesis 块附 `Expected areas: a, b, c`，LLM 仍自由选 lead 但有软引导
+2. **Phase 3 末尾 Gap Analysis** — 两层判定覆盖情况：
+   - **关键词匹配**：扫 Phenomenon 标题/描述对照 area.expected_keywords
+   - **工具命中**：检查 area.expected_tools 是否实际调用过

-未覆盖的领域自动派发 lead，最多 3 轮补漏。
+未覆盖的 area 自动派 lead（`suggested_agent` + `priority` + `motivating_hypothesis_ids[0]` 透传给 `Lead.hypothesis_id` 保留 provenance），最多 3 轮补漏。
+
+**手动 override**：`config.yaml:investigation_areas` 默认注释掉，纯 LLM 派生。取消注释可添加强制必查的领域，会先于 LLM 写入并通过 slug-based dedupe 保护不被覆盖（LLM 只会 augment keyword/tool 列表）。这是跨案件/跨平台适配的关键 —— 不再 hardcode Windows-specific 领域。

 ## Agent 体系

@@ -183,11 +187,12 @@ max_investigation_rounds: 5          # Phase 3 最大迭代轮数
 #   - title: "嫌疑人主动实施网络嗅探"
 #     description: "..."

-investigation_areas:                 # Gap Analysis 必须覆盖的领域
-  - area: system_info
-    agent: registry
-    task: "..."
-  # ...
+# investigation_areas:                 # 可选：手动 override（默认全 LLM 派生）
+#   - area: shutdown_time              #         LLM 通过 slug dedupe 只 augment
+#     agent: registry                  #         keyword/tool 列表，不覆盖 manual
+#     priority: 3
+#     keywords: [shutdown]
+#     tools: [get_shutdown_time]
 ```

 未配置 `hypotheses` 时由 HypothesisAgent 自动生成。