From 8b964b5decf5f2e96f913c14a3d057dafc25d5ee Mon Sep 17 00:00:00 2001
From: BattleTag <hychen3637.com>
Date: Thu, 21 May 2026 02:28:06 -1000
Subject: [PATCH] docs(strategist) S8/9: DESIGN.md updates +
 DESIGN_STRATEGIST.md spec
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

DESIGN_STRATEGIST.md §11. The strategist refit is the first sub-design
big enough to need its own document, so it lives as a sibling to
DESIGN.md rather than inline.

DESIGN_STRATEGIST.md (new, 543 lines) covers:
  §0  Scope, non-goals, invariants preserved
  §1  Data model (Lead extension, InvestigationRound)
  §2  Six tools (graph_overview / source_coverage / marginal_yield /
      budget_status / propose_lead / declare_investigation_complete)
      with full input_schema
  §3  InvestigationStrategist agent class
  §4  Orchestrator Phase 3 loop pseudocode
  §5  Persistence + resume strategy
  §6  config schema
  §7  Test plan (8 scenarios)
  §8  9-step build order (matches commit history)
  §9  Risks + mitigations
  §10 Open questions
  §11 Required DESIGN.md updates (applied here)
  §12 What this design does NOT solve (exam-test coverage, vision-
      capable LLM, blockchain explorer, etc.)

DESIGN.md updates per §11:
  §4.5  Note harmonic damping is now landed
  §4.9  Phase 3 table row now points at the strategist loop +
        inline summary
  §5    Lead + InvestigationRound rows added to the data-model
        summary table

This commit closes the strategist refit. All 174 tests pass / 1 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 DESIGN.md            |  24 +-
 DESIGN_STRATEGIST.md | 543 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 561 insertions(+), 6 deletions(-)
 create mode 100644 DESIGN_STRATEGIST.md

diff --git a/DESIGN.md b/DESIGN.md
index 731c450..c3a993e 100644
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -172,9 +172,11 @@ verified facts（带重跑指令的引证）与 interpretation（明确标注的
 
 - 阈值不变（≥0.8 supported / ≤0.2 refuted），只是改由 `L_post` 推出。
 - `prior_prob` 成为可配置量（默认 0.5 → `L_prior=0`）。
-- **简化假设说明**：多条边按独立处理（朴素贝叶斯）。同类证据反复出现并非
-  完全独立——加一个旋钮：同 `(hypothesis, edge_type)` 的边数封顶或衰减，避免
-  「同一发现被多 agent 重复入图」虚高置信度（现有 Jaccard 去重已部分缓解）。
+- **同类证据调和衰减**（2026-05 落地）：同 `(hypothesis, edge_type)` 的第 k 条边
+  贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`（调和级数，~ ln N）。
+  解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
+  （2026-05-20 实战数据）。单条边不变（k=1, 衰减=1.0）。**结构信号**比绝对值
+  更重要：strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。
 
 附带产出一个 **假设 × 证据矩阵**视图，供报告与线索选择使用。
 
@@ -235,11 +237,19 @@ network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特
 |---|---|
 | Phase 1 | 「单镜像初勘」→ **逐源并行 triage**，每源派类型适配的 agent |
 | Phase 2 | 假设跨源生成；身份共指假设在此首次登记 |
-| Phase 3 | leads 派发到源感知 agent；假设×证据矩阵实时更新 |
+| Phase 3 | **Strategist 循环**：LLM 元 agent 每轮看图决定 propose_lead 或 declare_complete；workers 执行 lead；hypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
 | Phase 4 | 跨源时间线合并，**按源做时区归一**（iOS UTC vs 安卓本地时间） |
 | Phase 5 | 一案一份综合报告：含假设结论、实体关联图、每条结论的 provenance 引证 |
 
-断连恢复、运行归档逻辑保留，`graph_state.json` 增量纳入新字段。
+**Phase 3 的"LLM 决定深度"**（2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地）：调度层从代码硬决策（"max_rounds=N, converged→stop"）转为 LLM 元 agent 驱动。
+
+- 新 agent `InvestigationStrategist`（`agents/strategist.py`）每轮取一个动作：propose 1-3 lead，或 declare_investigation_complete
+- 4 个只读视图工具：`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status`（`tools/strategy.py`）让 LLM 看到调度信号
+- 2 个写入决策工具：`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
+- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
+- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
+
+断连恢复、运行归档逻辑保留；`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。
 
 ---
 
@@ -252,8 +262,10 @@ network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特
 | `Phenomenon` | + `source_id`；description 拆为 `verified_facts[]` + `interpretation`；澄清/移除语义含混的 `confidence`（默认 1.0），观测的可靠性由 grounding 表达 |
 | `Hypothesis` | + `prior_prob`、`log_odds`（累加量）；`confidence` 改为派生值 |
 | `Entity` | + 类型化标识符集合；通过 `same_as` 边跨源连通 |
-| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`） |
+| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`）；同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
 | Entity→Entity 边 | **新增** `same_as`（由 coref 假设背书，可逆） |
+| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`（strategist 注解） |
+| `InvestigationRound` | **新增**：strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |
 
 `evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
 
diff --git a/DESIGN_STRATEGIST.md b/DESIGN_STRATEGIST.md
new file mode 100644
index 0000000..67284e3
--- /dev/null
+++ b/DESIGN_STRATEGIST.md
@@ -0,0 +1,543 @@
+# Strategist Loop —— Phase 3 信念驱动改造
+
+> 这是 DESIGN.md 的补充设计文档，针对 §4.9 编排器 Phase 3 的具体重写。
+>
+> **触发动因**：2026-05-20 第一次全 6-source 实战（`runs/2026-05-20T20-15-04/`）
+> 暴露 Phase 3 不工作——8 条 pending leads 一个都没派发，因为
+> log-odds 通胀让所有 hypothesis 立即 converged。即使在「调和衰减」修复
+> log-odds 数学后（commit 在 `evidence_graph.py:update_hypothesis_confidence`），
+> Phase 3 在当前架构下仍然是「单轮触发、规则收敛」的机械流程——LLM
+> 在调度层完全没有发言权。本设计把 Phase 3 改为 LLM 驱动的探索循环。
+
+---
+
+## 0. 范围
+
+### 做什么
+
+把 `orchestrator.py:Phase 3` 从「单轮、规则触发」改造为「strategist-loop、信念驱动」：
+新增一个 `InvestigationStrategist` agent + 4 个决策视图工具 + 2 个决策动作工具
++ 编排器循环改写。
+
+### 不做什么
+
+- 不改 Phase 1（per-source triage 保持现状）
+- 不改 Phase 2（HypothesisAgent 不动；strategist 可以**调用**它，但不替代）
+- 不改 Phase 4/5（timeline / report）
+- 不写专家级 per-source 检查清单（只在 `source_coverage` 工具里塞**软提示**清单）
+- 不引入新的图节点类型；leads 复用现有结构
+
+### 保留的不变式
+
+- DESIGN.md §4.3 grounding 网关，所有写入仍走它
+- DESIGN.md §4.5 log-odds + 调和衰减
+- DESIGN.md §4.4 verified_facts vs interpretation 划界
+- 断连恢复（`graph_state.json` 序列化兼容）
+
+### 设计原则
+
+1. **"LLM 提议，代码裁决" 上移到调度层**：DESIGN.md 第一原则现在只在事实层
+   （grounding）兑现，调度层「该不该深入、深入哪里、何时停」目前是代码硬决策。
+   本设计让 LLM 持有调度决策权。
+2. **应试能力存在但不被绑死**：系统的工具集和软提示清单覆盖应试场景所需的工件
+   类别；但是否查某个工件、查到什么深度，由 strategist 看具体案件性质决定，
+   不被预定义清单强制。
+3. **可解释、可审计**：每一轮 strategist 决策、动机、产出收益都被记入持久化的
+   `InvestigationRound`，可事后复盘。
+
+---
+
+## 1. 数据模型变更
+
+### 1.1 `Lead` 扩 4 字段
+
+`evidence_graph.py:Lead` 现有 `(id, title, description, target_agent, source_id, status, …)`。
+新增：
+
+```python
+@dataclass
+class Lead:
+    # ... existing fields
+    proposed_by: str = ""           # "strategist" | "filesystem" | ... — 提案 agent
+    motivating_hypothesis: str = "" # hyp-id this lead is meant to corroborate/refute
+    expected_evidence_type: str = "" # one of edge_types — 期望产出的边类型
+    round_number: int = 0           # 哪一轮 strategist 产生
+```
+
+`motivating_hypothesis` 是关键——它把 lead 和 hypothesis 显式挂钩，让事后能算
+"这条 lead 跑完到底有没有改变假设状态"，即 strategist 的边际收益度量。
+
+### 1.2 新增 `InvestigationRound` 节点
+
+记录每一轮 strategist 的决策本身——provenance 也要可审计：
+
+```python
+@dataclass
+class InvestigationRound:
+    id: str                          # "round-001"
+    round_number: int
+    started_at: str
+    completed_at: str = ""
+    strategist_action: str = ""      # "propose_leads" | "declare_complete"
+    leads_proposed: list[str] = field(default_factory=list)
+    leads_executed: list[str] = field(default_factory=list)
+    hypothesis_status_snapshot_before: dict = field(default_factory=dict)  # hyp_id → status
+    hypothesis_status_snapshot_after: dict = field(default_factory=dict)
+    new_phenomena_count: int = 0
+    new_edges_count: int = 0
+    decision_rationale: str = ""     # strategist 自述
+```
+
+随 graph 序列化（加进 `to_dict`/`from_dict`）。
+
+---
+
+## 2. 新工具
+
+放在新文件 `tools/strategy.py`。按现有 `TOOL_CATALOG` 注册模式登记。
+
+### 2.1 `graph_overview()` — 全局态势（只读）
+
+**Signature**: `graph_overview() -> str`
+
+**输出**（markdown，比 JSON 更易 LLM 解读）：
+
+```markdown
+# Investigation State
+
+## Hypotheses (8)
+| id | title | L | conf | status | edges_in | distinct_sources | flipped_in_last_2_rounds |
+|----|-------|---|------|--------|----------|------------------|---------------------------|
+| hyp-83db8748 | Multi-Device Composite | +8.75 | 0.99 | supported | 23 | 1 | no |
+| hyp-daa7c704 | Multiple Identity Aliases | +9.21 | 0.99 | supported | 11 | 3 | no |
+| hyp-7fa9b13e | Sunny.zip contains timer_a | +2.08 | 0.99 | supported | 4 | 1 | yes (active→supported in R2) |
+| ...
+
+## Sources (6)
+| id | type | phenomena | identities | last_touched_in_round |
+| src-usb-leung | disk_image | 8 | 1 | R1 |
+| ...
+
+## Pending Leads (3)
+| id | from | targeting | for_hypothesis | reason |
+| lead-aaa | filesystem | src-ios-chan/Safari | hyp-83db8748 | Safari history likely contains device-switching evidence |
+```
+
+**关键标注**：`distinct_sources` 一栏暴露了"这个假设只靠一个源支撑"——strategist
+看到 23 边都来自 android 源会自动判断"需要从别处独立证据"。
+
+### 2.2 `source_coverage(source_id: str)` — 单源覆盖度（只读）
+
+**Signature**: `source_coverage(source_id: str) -> str`
+
+**实现**：扫 `graph.tool_invocations`，过滤 `source_id == 该源`，按工具名 + 主要 args
+分组。然后跟 `EXPECTED_ARTEFACTS[source_type]` 比对，未触达项打 ✗。
+
+```python
+# tools/strategy.py
+EXPECTED_ARTEFACTS: dict[str, list[dict]] = {
+    "disk_image+windows": [
+        {"name": "filesystem layout", "detector": "fls|mmls", "value_for": "deleted files, hidden partitions"},
+        {"name": "registry hives", "detector": "parse_registry_key", "value_for": "user activity, installed software"},
+        {"name": "browser history", "detector": "list_directory@AppData/.../History", "value_for": "URL access, downloads"},
+        {"name": "prefetch", "detector": "extract_file@Windows/Prefetch", "value_for": "program execution evidence"},
+        # ...
+    ],
+    "mobile_extraction": [
+        {"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb", "value_for": "contacts"},
+        {"name": "SMS messages", "detector": "sqlite_query@sms.db", "value_for": "messaging content"},
+        {"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite", "value_for": "WhatsApp content"},
+        {"name": "Call history", "detector": "sqlite_query@CallHistoryDB", "value_for": "call records"},
+        {"name": "Safari history", "detector": "sqlite_query@History.db|read_text@Bookmarks.plist", "value_for": "web browsing"},
+        {"name": "Photos library", "detector": "sqlite_query@Photos.sqlite", "value_for": "photo metadata, EXIF, geolocation"},
+        {"name": "iCloud accounts", "detector": "parse_plist@Accounts3.sqlite|parse_keychain", "value_for": "Apple ID, services"},
+        {"name": "App inventory", "detector": "list_directory@var/containers/Bundle/Application", "value_for": "installed apps"},
+    ],
+    "disk_image+android": [...],
+    "media_collection": [
+        {"name": "OCR text", "detector": "ocr_image", "value_for": "screenshot text"},
+        {"name": "EXIF metadata", "detector": "exif_image", "value_for": "device, timestamps, geolocation"},
+    ],
+}
+```
+
+**软提示语义**：output 末尾必带一句：
+
+> Coverage hints are heuristics, not requirements. Skip an item if the case theory
+> makes it irrelevant. Investigate ✗ items only when they could materially affect
+> an active hypothesis.
+
+这一句是**"应试能力存在但不被绑死"的关键**——LLM 看到 ✗ 不会盲投，会先看
+hypothesis 列表问"这个工件对当前任何 hypothesis 有意义吗"。
+
+### 2.3 `marginal_yield(last_n_rounds: int = 2)` — 边际收益（只读）
+
+**Signature**: `marginal_yield(last_n_rounds: int = 2) -> str`
+
+**实现**：扫最近 N 个 `InvestigationRound`，统计：
+- 每轮新增 phenomena 数
+- 每轮新增 P→H 边数
+- 每轮 hypothesis status flips 数（active→supported / 反向）
+
+**输出**：
+
+```markdown
+# Marginal Yield (last 2 rounds)
+
+| round | new_phenomena | new_edges | status_flips |
+|   R3  |  5            |  7        |  1           |
+|   R4  |  2            |  1        |  0           |
+
+Trend: decelerating (R4 yield 33% of R3).
+Recommendation interpretation aid: yield trending to zero suggests diminishing
+returns; consider declare_complete after one more probe.
+```
+
+最后一行是 LLM-friendly heuristic prose，不是强制信号。
+
+### 2.4 `budget_status()` — 预算视图（只读）
+
+**Signature**: `budget_status() -> str`
+
+```markdown
+# Budget Status
+
+| metric | used | cap | pct |
+| tool_calls | 1248 | 5000 | 25% |
+| strategist_rounds | 3 | 10 | 30% |
+| wall_clock_minutes | 142 | 360 | 39% |
+
+Phase 1 used 89% of allocated. Phase 2 used 4%. Phase 3 (strategist) so far: 7%.
+```
+
+预算从 config.yaml 读，新增字段见 §6。无预算配置时进 unbounded 模式（仅靠
+strategist 自宣 complete + hard safety cap）。
+
+### 2.5 决策动作工具（写入）
+
+注册到 strategist 的 `mandatory_record_tools`。Strategist 每轮必须 call 至少一个，
+否则 forced-retry 触发（复用现有机制）。
+
+**`propose_lead(...)`**：
+
+```python
+{
+    "name": "propose_lead",
+    "input_schema": {
+        "type": "object",
+        "required": [
+            "description", "target_agent",
+            "motivating_hypothesis", "expected_evidence_type",
+        ],
+        "properties": {
+            "description": {
+                "type": "string",
+                "description": "1-2 sentence specific investigation request, including target source/artefact",
+            },
+            "target_agent": {
+                "type": "string",
+                "enum": ["filesystem","registry","communication","network","ios_artifact","android_artifact","media"],
+            },
+            "source_id": {"type": "string", "description": "which source to investigate"},
+            "motivating_hypothesis": {
+                "type": "string",
+                "description": "hyp-id this lead is meant to corroborate or refute",
+            },
+            "expected_evidence_type": {
+                "type": "string",
+                "enum": ["direct_evidence","supports","contradicts","weakens","prerequisite_met","consequence_observed"],
+            },
+            "rationale": {"type": "string", "description": "why this fills a real gap"},
+        }
+    }
+}
+```
+
+**`declare_investigation_complete(...)`**：
+
+```python
+{
+    "name": "declare_investigation_complete",
+    "input_schema": {
+        "type": "object",
+        "required": ["reason"],
+        "properties": {
+            "reason": {
+                "type": "string",
+                "enum": [
+                    "marginal_yield_zero",
+                    "budget_exhausted",
+                    "all_hypotheses_resolved",
+                    "coverage_saturated",
+                    "other",
+                ],
+            },
+            "rationale": {"type": "string"},
+        }
+    }
+}
+```
+
+Terminal tool —— 调用即结束循环（复用现有 `terminal_tools` 机制）。
+
+---
+
+## 3. `InvestigationStrategist` agent
+
+新文件 `agents/strategist.py`，约 150 行。
+
+```python
+class InvestigationStrategist(BaseAgent):
+    name = "strategist"
+    role = (
+        "You are the investigation strategist. You do not run forensic tools yourself. "
+        "Your job is to read the current evidence graph and decide ONE of:\n"
+        "  (a) propose 1-3 new investigation leads that would materially affect an active hypothesis, or\n"
+        "  (b) declare the investigation complete.\n"
+        "\n"
+        "Use graph_overview / source_coverage / marginal_yield / budget_status to ground your judgment. "
+        "DO NOT propose a lead that just adds more same-direction evidence to an already-supported hypothesis "
+        "(harmonic damping makes it ~useless). DO propose leads when:\n"
+        "  - A hypothesis is supported by edges from only ONE source — get cross-source corroboration.\n"
+        "  - A hypothesis is in the active band (0.2 < conf < 0.8) — it needs the deciding evidence.\n"
+        "  - A specific high-value artefact is uncovered on a source where the active hypotheses suggest it matters.\n"
+        "\n"
+        "Declare complete when marginal_yield is approaching zero AND no remaining active hypotheses have "
+        "obvious investigation paths."
+    )
+
+    mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
+    terminal_tools = ("declare_investigation_complete",)
+
+    def _register_graph_tools(self):
+        # Read-only tools — strategist NEVER writes phenomena/edges directly.
+        # All graph writes happen via the workers it dispatches.
+        self._register_graph_read_tools()
+        # No graph_write_tools.
+        # Add strategy-specific tools:
+        for tool_name in (
+            "graph_overview", "source_coverage", "marginal_yield", "budget_status",
+            "propose_lead", "declare_investigation_complete",
+        ):
+            td = TOOL_CATALOG[tool_name]
+            self.register_tool(td.name, td.description, td.input_schema, td.executor)
+```
+
+注册到 `agent_factory._AGENT_CLASSES["strategist"]`。
+
+---
+
+## 4. 编排器改造
+
+### 4.1 删除/替换：现在的 Phase 3
+
+`orchestrator.py:Phase 3` 当前逻辑（约 150 行）：检查 leads → 派 worker →
+检查 converged → 退出。**删除**。
+
+### 4.2 新 Phase 3：strategist loop
+
+```python
+async def _phase3_strategist_loop(self, run_dir: Path) -> None:
+    """Belief-driven investigation: strategist proposes, workers execute, repeat."""
+    _log("Phase 3: Strategist-Driven Investigation", event="phase")
+
+    strategist = self.factory.get_or_create_agent("strategist")
+    max_rounds = self.config.get("budgets", {}).get("strategist_rounds_max", 10)
+
+    for round_num in range(1, max_rounds + 1):
+        # 1. Record round start + snapshot
+        rid = await self.graph.start_investigation_round(round_num)
+
+        # 2. Strategist run
+        _log(f"Strategist Round {round_num}", event="phase")
+        await strategist.run(
+            f"Review the graph and decide the next investigation action. "
+            f"This is round {round_num}/{max_rounds}. Budget used so far: see budget_status."
+        )
+
+        # 3. Did strategist declare complete?
+        if self.graph.is_round_terminal(rid):
+            _log(f"Strategist declared complete at round {round_num}", event="progress")
+            break
+
+        # 4. Collect new leads proposed this round
+        new_leads = self.graph.leads_from_round(round_num)
+        if not new_leads:
+            _log(f"No leads proposed in round {round_num} — stopping", event="progress")
+            break
+
+        # 5. Dispatch each lead
+        for lead in new_leads:
+            await self._execute_lead(lead, round_num)
+
+        # 6. Close round + record yield
+        await self.graph.complete_investigation_round(rid)
+
+        # 7. Hard budget check
+        if self._budget_exceeded():
+            _log(f"Budget exhausted at round {round_num}", event="progress")
+            break
+```
+
+### 4.3 `_execute_lead` 复用现有 worker 派发逻辑
+
+```python
+async def _execute_lead(self, lead: Lead, round_num: int) -> None:
+    agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
+    worker = self.factory.get_or_create_agent(agent_type)
+    if worker is None:
+        logger.warning(f"No worker for lead {lead.id}: {agent_type}")
+        return
+
+    src = self.graph.case.get_source(lead.source_id) if lead.source_id else None
+    if src:
+        self.graph.set_active_source(src)
+
+    _log(
+        f"Round {round_num} dispatching: {lead.description}",
+        event="dispatch", agent=agent_type,
+    )
+    await worker.run(
+        f"Investigate this specific lead from the strategist:\n\n"
+        f"REQUEST: {lead.description}\n"
+        f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis}\n"
+        f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type}\n"
+        f"RATIONALE: {lead.rationale}\n\n"
+        f"After investigating, record findings via add_phenomenon AND link relevant phenomena "
+        f"to {lead.motivating_hypothesis} via the appropriate edge_type."
+    )
+    lead.status = "completed"
+    self.graph._auto_save()
+```
+
+### 4.4 自动 hypothesis 重生成（可选，建议加）
+
+新增 phenomena 可能产生**新假设**（不只是更新现有假设）。让 strategist 用
+`propose_lead(target_agent="hypothesis", description="re-examine recent phenomena for new hypotheses")`
+显式触发——这是 strategist 自决定的，不是定时触发。一致性优于自动定时。
+
+---
+
+## 5. 状态持久化
+
+`graph_state.json` 新增顶层 key `investigation_rounds: list[InvestigationRound]`。
+`save_state` / `load_state` 处理。**断连恢复**时：
+
+- 找最近一个未 completed 的 round → 视为该 round 失败
+- 从下一个 round 重新开始
+- 已完成 round 的 phenomena / edges 自然保留
+
+---
+
+## 6. 配置
+
+`config.yaml` 新增：
+
+```yaml
+strategist:
+  enabled: true                     # false = 走老 Phase 3 逻辑（safety fallback）
+  max_rounds: 10
+  hard_stop_marginal_yield_zero_rounds: 3  # 连续 3 轮 yield=0 强制停
+
+budgets:
+  tool_calls_total: 5000
+  wall_clock_minutes_max: 480
+```
+
+---
+
+## 7. 测试策略
+
+新文件 `tests/test_strategist.py` 或加入 `test_optimizations.py`。最少要测：
+
+1. Strategist 调 `declare_complete` 时 loop 立即退出
+2. Strategist 调 `propose_lead` 时 lead 入 graph 且 round_number 正确
+3. Round snapshot 正确捕获 before/after status
+4. 预算耗尽时即使 strategist 还想继续也强制停
+5. 断连恢复：中途中断后重启从下一 round 开始
+6. `graph_overview` 输出包含 `distinct_sources` 标注
+7. `source_coverage` 对未触达项标 ✗
+8. `marginal_yield` 数字与 `confidence_log` 一致
+
+不写 LLM 集成测试——strategist 行为通过 mock LLM 验证（已有这种模式见
+`test_forced_record_retry_fires_when_zero_phenomena`）。
+
+---
+
+## 8. 实施顺序
+
+按依赖排（**每步独立 commit**——结构性改造，单点回滚关键）：
+
+| 步 | 内容 | 依赖 | 工作量估算 |
+|---|---|---|---|
+| 1 | `Lead` 加 4 字段 + `InvestigationRound` 数据类 + 序列化 | — | 60 行 + 测试 |
+| 2 | `graph_overview` / `source_coverage` / `marginal_yield` / `budget_status` 实现 | 1 | 250 行 + 测试 |
+| 3 | `propose_lead` / `declare_investigation_complete` 工具 | 1 | 80 行 + 测试 |
+| 4 | `InvestigationStrategist` agent class | 2, 3 | 120 行 + 测试 |
+| 5 | 编排器 Phase 3 重写 | 4 | 150 行（替换 ~50 行旧）+ 测试 |
+| 6 | config schema + 加载逻辑 | 5 | 30 行 |
+| 7 | 断连恢复处理 | 5 | 40 行 + 测试 |
+| 8 | 真实案件 smoke run（小规模：USB only） | 7 | 0 代码 |
+| 9 | 文档：DESIGN.md §4.9 改写 + 本文件归档 | 8 | 文档 |
+
+总：~800 行新代码 + 测试 + 文档。
+
+---
+
+## 9. 风险 + 缓解
+
+| 风险 | 缓解 |
+|---|---|
+| Strategist 太保守（永远 declare_complete） | 加 prompt 例子展示什么是"该深入的情况"；测试时小样本验证 |
+| Strategist 太激进（每轮都 propose 7+ leads） | `propose_lead` 工具 schema 限制每轮最多 3-5 个；prompt 强调"重质不重量" |
+| 单 worker 跑不完 lead 导致预算雪崩 | worker 调用本身 max_iter 不变；strategist 预算独立 |
+| LLM 不理解 `distinct_sources` 这种暗示 | `graph_overview` 末尾加 1-2 句 plain-English 解读 "Hypothesis X has 23 edges but all from one source → cross-source corroboration would strengthen it" |
+| Phase 1 触发产生的 leads 被 strategist 忽略 | strategist prompt 明确"先处理已有 pending leads，再产新的" |
+| 死循环（strategist 反复产同样 lead） | Lead 表上加 `(motivating_hyp, expected_type, source_id)` 三元组去重 |
+| `EXPECTED_ARTEFACTS` 清单维护成本 | 故意保持"软提示"——清单不完整也不会破，只是某些深度需要更多 LLM 自觉 |
+
+---
+
+## 10. 开放问题
+
+1. **InvestigationRound 该不该自己跑 hypothesis agent？**
+   倾向 strategist 用 lead 显式触发（一致性更好），不做定时触发。
+
+2. **预算超用怎么办——硬停 vs 软警告？**
+   当前设计硬停；可加 "strategist 看到 budget < 10% 时只能 declare_complete"
+   的 schema enforcement。
+
+3. **跨 source 边的"独立性奖励"是否纳入 log-odds？**
+   上次衰减用了 `1/k`，没区分跨源 vs 同源。如果要纳入，公式应改为
+   `1/k_within_source × bonus_for_distinct_sources`。这是后续单独工程。
+
+4. **Strategist 输出的 `rationale` 该不该走 grounding？**
+   它不会写 phenomena，但 `rationale` 字段可能包含具体值
+   （"based on inv-12345..."）。倾向不强制——这是元层判断，不是事实落地。
+
+5. **现 Phase 3 的 `max_investigation_rounds` config 留还是删？**
+   建议留作 `strategist.enabled=false` 时的 fallback 旋钮。
+
+---
+
+## 11. 与 DESIGN.md 的关系
+
+本文档落地后，DESIGN.md 需要的对应更新：
+
+- **§4.5**：补一段「同时也要看 log_odds 的**结构**——edges_in 数 / distinct_sources
+  是 strategist 判断是否深入的关键信号，不只是 confidence 数值」
+- **§4.9 Phase 3**：表格内容从「leads 派发到源感知 agent」改为
+  「strategist 循环：看图、提案、执行、复盘、停 / 续」
+- **§8**（设计取舍）：新增第 6 条：「调度层 LLM 化的取舍——strategist 决定深度，
+  但每轮预算受 `budgets.*` 硬限制；这是"LLM 提议、代码裁决"原则在调度层的兑现」
+
+---
+
+## 12. 备忘：本设计**不解决**的问题
+
+- 应试题 8% 命中率的根因是**工具集不全**（无 vision、无 ZIP 暴力破解、无 VeraCrypt
+  挂载、无 blockchain explorer），不是调度问题。strategist 让现有工具被用得更狠，
+  但不会凭空多出工具。
+- LLM 编造 `invocation_id`（已修补，见 `feedback_grounding_pending` memory）和
+  log-odds 通胀（已修补：调和衰减）是本设计的**前置依赖**，不在本设计范围内。
+- Per-edge-type 的更精细贝叶斯建模（如跨源独立性 bonus）是独立工程。