refactor: native tool calling + generic forced-retry + terminal exit
- llm_client: switch tool_call_loop from text-based <tool_call> regex to OpenAI-native tools=[...] / structured tool_calls field; accumulate delta.reasoning_content for DeepSeek thinking-mode echo-back; fold preserves system msg and aligns boundary to never orphan role:tool - base_agent: generic forced-retry via mandatory_record_tools class attr (filesystem -> add_phenomenon, timeline -> add_temporal_edge, hypothesis -> add_hypothesis, report -> save_report); count via executor wrapper - terminal_tools class attr + loop short-circuit: when a terminal tool is called, loop exits with its raw return as final_text. ReportAgent declares save_report as terminal - replaces the <answer>-tag stop signal that native tool calling broke - _execute_*: return (raw, formatted) - terminal exit uses untruncated raw, conversation history uses 3000-char-capped formatted - evidence_graph + orchestrator: LLM-derived InvestigationArea support (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS / _AREA_TOOLS); manual yaml block kept as optional seed - strip <answer> references from agent prompts (no longer load-bearing) Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures (was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via save_report (was max_iterations regression). 78/78 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
25
README.md
25
README.md
@@ -67,14 +67,18 @@ Phenomenon → Hypothesis 的边类型与权重写死在 `HYPOTHESIS_EDGE_WEIGHT
|
||||
| **Phase 4** | TimelineAgent 用 `build_filesystem_timeline` 生成 MAC 时间线,与 Phenomenon 时间戳关联 |
|
||||
| **Phase 5** | ReportAgent 综合假设、证据、实体,生成 Markdown 报告 |
|
||||
|
||||
### Gap Analysis(Phase 3 末)
|
||||
### Investigation Areas(hypothesis-derived)
|
||||
|
||||
`config.yaml:investigation_areas` 列出必须覆盖的调查领域(系统信息、用户账户、网络配置、邮件配置、IRC 日志、PCAP、删除文件、Prefetch 等)。Orchestrator 两层判定覆盖情况:
|
||||
Phase 2 末尾 orchestrator 调一次 LLM 从所有 active hypothesis 派生 5-12 个 **InvestigationArea**(snake_case slug、description、suggested_agent、expected_keywords、expected_tools、priority、motivating_hypothesis_ids)。Areas 存进 `graph.investigation_areas`,序列化到 `runs/<ts>/investigation_areas.json`。两个用途:
|
||||
|
||||
1. **关键词匹配**(`_AREA_KEYWORDS`)— 扫现有 Phenomenon 标题/描述
|
||||
2. **工具命中**(`_AREA_TOOLS`)— 检查是否调用过该领域的关键工具(如 `enumerate_users`、`parse_pcap_strings`)
|
||||
1. **Phase 3 主循环提示** — 每个 hypothesis 块附 `Expected areas: a, b, c`,LLM 仍自由选 lead 但有软引导
|
||||
2. **Phase 3 末尾 Gap Analysis** — 两层判定覆盖情况:
|
||||
- **关键词匹配**:扫 Phenomenon 标题/描述对照 area.expected_keywords
|
||||
- **工具命中**:检查 area.expected_tools 是否实际调用过
|
||||
|
||||
未覆盖的领域自动派发 lead,最多 3 轮补漏。
|
||||
未覆盖的 area 自动派 lead(`suggested_agent` + `priority` + `motivating_hypothesis_ids[0]` 透传给 `Lead.hypothesis_id` 保留 provenance),最多 3 轮补漏。
|
||||
|
||||
**手动 override**:`config.yaml:investigation_areas` 默认注释掉,纯 LLM 派生。取消注释可添加强制必查的领域,会先于 LLM 写入并通过 slug-based dedupe 保护不被覆盖(LLM 只会 augment keyword/tool 列表)。这是跨案件/跨平台适配的关键 —— 不再 hardcode Windows-specific 领域。
|
||||
|
||||
## Agent 体系
|
||||
|
||||
@@ -183,11 +187,12 @@ max_investigation_rounds: 5 # Phase 3 最大迭代轮数
|
||||
# - title: "嫌疑人主动实施网络嗅探"
|
||||
# description: "..."
|
||||
|
||||
investigation_areas: # Gap Analysis 必须覆盖的领域
|
||||
- area: system_info
|
||||
agent: registry
|
||||
task: "..."
|
||||
# ...
|
||||
# investigation_areas: # 可选:手动 override(默认全 LLM 派生)
|
||||
# - area: shutdown_time # LLM 通过 slug dedupe 只 augment
|
||||
# agent: registry # keyword/tool 列表,不覆盖 manual
|
||||
# priority: 3
|
||||
# keywords: [shutdown]
|
||||
# tools: [get_shutdown_time]
|
||||
```
|
||||
|
||||
未配置 `hypotheses` 时由 HypothesisAgent 自动生成。
|
||||
|
||||
Reference in New Issue
Block a user