docs(strategist) S8/9: DESIGN.md updates + DESIGN_STRATEGIST.md spec

DESIGN_STRATEGIST.md §11. The strategist refit is the first sub-design
big enough to need its own document, so it lives as a sibling to
DESIGN.md rather than inline.

DESIGN_STRATEGIST.md (new, 543 lines) covers:
  §0  Scope, non-goals, invariants preserved
  §1  Data model (Lead extension, InvestigationRound)
  §2  Six tools (graph_overview / source_coverage / marginal_yield /
      budget_status / propose_lead / declare_investigation_complete)
      with full input_schema
  §3  InvestigationStrategist agent class
  §4  Orchestrator Phase 3 loop pseudocode
  §5  Persistence + resume strategy
  §6  config schema
  §7  Test plan (8 scenarios)
  §8  9-step build order (matches commit history)
  §9  Risks + mitigations
  §10 Open questions
  §11 Required DESIGN.md updates (applied here)
  §12 What this design does NOT solve (exam-test coverage, vision-
      capable LLM, blockchain explorer, etc.)

DESIGN.md updates per §11:
  §4.5  Note harmonic damping is now landed
  §4.9  Phase 3 table row now points at the strategist loop +
        inline summary
  §5    Lead + InvestigationRound rows added to the data-model
        summary table

This commit closes the strategist refit. All 174 tests pass / 1 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
BattleTag
2026-05-21 02:28:06 -10:00
parent 388321ee30
commit 8b964b5dec
2 changed files with 561 additions and 6 deletions

View File

@@ -172,9 +172,11 @@ verified facts带重跑指令的引证与 interpretation明确标注的
- 阈值不变≥0.8 supported / ≤0.2 refuted只是改由 `L_post` 推出。
- `prior_prob` 成为可配置量(默认 0.5 → `L_prior=0`)。
- **简化假设说明**:多条边按独立处理(朴素贝叶斯)。同类证据反复出现并非
完全独立——加一个旋钮:同 `(hypothesis, edge_type)` 的边数封顶或衰减,避免
同一发现被多 agent 重复入图」虚高置信度(现有 Jaccard 去重已部分缓解)。
- **同类证据调和衰减**2026-05 落地):同 `(hypothesis, edge_type)` 的第 k 条边
贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`(调和级数,~ ln N
解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
2026-05-20 实战数据。单条边不变k=1, 衰减=1.0)。**结构信号**比绝对值
更重要strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。
附带产出一个 **假设 × 证据矩阵**视图,供报告与线索选择使用。
@@ -235,11 +237,19 @@ network=浏览器/PCAP。改为按**调查职能**组织,并增加平台特
|---|---|
| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**,每源派类型适配的 agent |
| Phase 2 | 假设跨源生成;身份共指假设在此首次登记 |
| Phase 3 | leads 派发到源感知 agent假设×证据矩阵实时更新 |
| Phase 3 | **Strategist 循环**LLM 元 agent 每轮看图决定 propose_lead 或 declare_completeworkers 执行 leadhypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
| Phase 4 | 跨源时间线合并,**按源做时区归一**iOS UTC vs 安卓本地时间) |
| Phase 5 | 一案一份综合报告:含假设结论、实体关联图、每条结论的 provenance 引证 |
断连恢复、运行归档逻辑保留,`graph_state.json` 增量纳入新字段
**Phase 3 的"LLM 决定深度"**2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地):调度层从代码硬决策("max_rounds=N, converged→stop")转为 LLM 元 agent 驱动
- 新 agent `InvestigationStrategist``agents/strategist.py`每轮取一个动作propose 1-3 lead或 declare_investigation_complete
- 4 个只读视图工具:`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status``tools/strategy.py`)让 LLM 看到调度信号
- 2 个写入决策工具:`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
断连恢复、运行归档逻辑保留;`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。
---
@@ -252,8 +262,10 @@ network=浏览器/PCAP。改为按**调查职能**组织,并增加平台特
| `Phenomenon` | + `source_id`description 拆为 `verified_facts[]` + `interpretation`;澄清/移除语义含混的 `confidence`(默认 1.0),观测的可靠性由 grounding 表达 |
| `Hypothesis` | + `prior_prob`、`log_odds`(累加量);`confidence` 改为派生值 |
| `Entity` | + 类型化标识符集合;通过 `same_as` 边跨源连通 |
| Phenomenon→Hypothesis 边 | 携带 `edge_type`,映射到 `log₁₀(LR)`(替换 `_DEFAULT_EDGE_WEIGHTS` |
| Phenomenon→Hypothesis 边 | 携带 `edge_type`,映射到 `log₁₀(LR)`(替换 `_DEFAULT_EDGE_WEIGHTS`;同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
| Entity→Entity 边 | **新增** `same_as`(由 coref 假设背书,可逆) |
| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`strategist 注解) |
| `InvestigationRound` | **新增**strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |
`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。