docs(strategist) S8/9: DESIGN.md updates + DESIGN_STRATEGIST.md spec

DESIGN_STRATEGIST.md §11. The strategist refit is the first sub-design big enough to need its own document, so it lives as a sibling to DESIGN.md rather than inline. DESIGN_STRATEGIST.md (new, 543 lines) covers: §0 Scope, non-goals, invariants preserved §1 Data model (Lead extension, InvestigationRound) §2 Six tools (graph_overview / source_coverage / marginal_yield / budget_status / propose_lead / declare_investigation_complete) with full input_schema §3 InvestigationStrategist agent class §4 Orchestrator Phase 3 loop pseudocode §5 Persistence + resume strategy §6 config schema §7 Test plan (8 scenarios) §8 9-step build order (matches commit history) §9 Risks + mitigations §10 Open questions §11 Required DESIGN.md updates (applied here) §12 What this design does NOT solve (exam-test coverage, vision- capable LLM, blockchain explorer, etc.) DESIGN.md updates per §11: §4.5 Note harmonic damping is now landed §4.9 Phase 3 table row now points at the strategist loop + inline summary §5 Lead + InvestigationRound rows added to the data-model summary table This commit closes the strategist refit. All 174 tests pass / 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:28:06 -10:00
parent 388321ee30
commit 8b964b5dec
2 changed files with 561 additions and 6 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -172,9 +172,11 @@ verified facts（带重跑指令的引证）与 interpretation（明确标注的

 - 阈值不变（≥0.8 supported / ≤0.2 refuted），只是改由 `L_post` 推出。
 - `prior_prob` 成为可配置量（默认 0.5 → `L_prior=0`）。
- **简化假设说明**：多条边按独立处理（朴素贝叶斯）。同类证据反复出现并非
-  完全独立——加一个旋钮：同 `(hypothesis, edge_type)` 的边数封顶或衰减，避免
-  「同一发现被多 agent 重复入图」虚高置信度（现有 Jaccard 去重已部分缓解）。
+- **同类证据调和衰减**（2026-05 落地）：同 `(hypothesis, edge_type)` 的第 k 条边
+  贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`（调和级数，~ ln N）。
+  解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
+  （2026-05-20 实战数据）。单条边不变（k=1, 衰减=1.0）。**结构信号**比绝对值
+  更重要：strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。

 附带产出一个 **假设 × 证据矩阵**视图，供报告与线索选择使用。

@@ -235,11 +237,19 @@ network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特
 |---|---|
 | Phase 1 | 「单镜像初勘」→ **逐源并行 triage**，每源派类型适配的 agent |
 | Phase 2 | 假设跨源生成；身份共指假设在此首次登记 |
-| Phase 3 | leads 派发到源感知 agent；假设×证据矩阵实时更新 |
+| Phase 3 | **Strategist 循环**：LLM 元 agent 每轮看图决定 propose_lead 或 declare_complete；workers 执行 lead；hypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
 | Phase 4 | 跨源时间线合并，**按源做时区归一**（iOS UTC vs 安卓本地时间） |
 | Phase 5 | 一案一份综合报告：含假设结论、实体关联图、每条结论的 provenance 引证 |

-断连恢复、运行归档逻辑保留，`graph_state.json` 增量纳入新字段。
+**Phase 3 的"LLM 决定深度"**（2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地）：调度层从代码硬决策（"max_rounds=N, converged→stop"）转为 LLM 元 agent 驱动。
+
+- 新 agent `InvestigationStrategist`（`agents/strategist.py`）每轮取一个动作：propose 1-3 lead，或 declare_investigation_complete
+- 4 个只读视图工具：`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status`（`tools/strategy.py`）让 LLM 看到调度信号
+- 2 个写入决策工具：`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
+- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
+- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
+
+断连恢复、运行归档逻辑保留；`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。

 ---

@@ -252,8 +262,10 @@ network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特
 | `Phenomenon` | + `source_id`；description 拆为 `verified_facts[]` + `interpretation`；澄清/移除语义含混的 `confidence`（默认 1.0），观测的可靠性由 grounding 表达 |
 | `Hypothesis` | + `prior_prob`、`log_odds`（累加量）；`confidence` 改为派生值 |
 | `Entity` | + 类型化标识符集合；通过 `same_as` 边跨源连通 |
-| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`） |
+| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`）；同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
 | Entity→Entity 边 | **新增** `same_as`（由 coref 假设背书，可逆） |
+| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`（strategist 注解） |
+| `InvestigationRound` | **新增**：strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |

 `evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。