docs(strategist) S8/9: DESIGN.md updates + DESIGN_STRATEGIST.md spec

DESIGN_STRATEGIST.md §11. The strategist refit is the first sub-design big enough to need its own document, so it lives as a sibling to DESIGN.md rather than inline. DESIGN_STRATEGIST.md (new, 543 lines) covers: §0 Scope, non-goals, invariants preserved §1 Data model (Lead extension, InvestigationRound) §2 Six tools (graph_overview / source_coverage / marginal_yield / budget_status / propose_lead / declare_investigation_complete) with full input_schema §3 InvestigationStrategist agent class §4 Orchestrator Phase 3 loop pseudocode §5 Persistence + resume strategy §6 config schema §7 Test plan (8 scenarios) §8 9-step build order (matches commit history) §9 Risks + mitigations §10 Open questions §11 Required DESIGN.md updates (applied here) §12 What this design does NOT solve (exam-test coverage, vision- capable LLM, blockchain explorer, etc.) DESIGN.md updates per §11: §4.5 Note harmonic damping is now landed §4.9 Phase 3 table row now points at the strategist loop + inline summary §5 Lead + InvestigationRound rows added to the data-model summary table This commit closes the strategist refit. All 174 tests pass / 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(strategist) S7: strategist resume / open-round repair
2026-05-21 02:28:06 -10:00 · 2026-05-21 02:27:05 -10:00 · 2026-05-21 02:26:12 -10:00 · 2026-05-21 02:25:04 -10:00 · 2026-05-21 02:22:05 -10:00 · 2026-05-21 02:21:13 -10:00
30 changed files with 11089 additions and 933 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -0,0 +1,317 @@
+# MASForensics 系统改造设计
+
+> 目标：把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、
+> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档
+> （已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿）。
+>
+> 触发本次改造的实际案件：2025 美亚杯资格赛 Individual —— 5 份证据
+> （1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图），
+> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。
+
+---
+
+## 1. 设计原则（贯穿全文的不变式）
+
+1. **LLM 提议，代码裁决**。LLM 负责语言/分类/感知；它**不持有案件状态、
+   不产出数值、不写入未经核验的事实**。所有「真相」在符号层。
+2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。
+3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中；
+   支持一种新设备 = 写插件，绝不改核心。
+4. **看似不可逆的操作（如实体归并）实为可逆、带证据的论断**，可被推翻。
+
+这四条不是口号——下文每个设计决策都对应其中一条。
+
+---
+
+## 2. 现状问题诊断
+
+| # | 问题 | 位置 | 后果 |
+|---|---|---|---|
+| P1 | **单镜像假设深植**：工具是闭包绑死 `image_path`，图是单源，主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools`、`main.py:91-153` | 无法摄取多份证据，无法跨设备关联 |
+| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话，错误事实进入案件记录且**事后无法识别** |
+| P3 | **置信度公式无统计含义且有序依赖缺陷**：`delta=weight*(1-conf)`(正)/`weight*conf`(负)，正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 |
+| P4 | **工件分类是 Windows 专属**：靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` |
+| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 |
+| P6 | **镜像发现靠扩展名 glob**，`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 |
+| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备，跨源关联无锚点 |
+
+改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。
+
+---
+
+## 3. 目标架构
+
+```
+case.yaml ──► Case ──► N × EvidenceSource
+                         ├ id / type / owner / path
+                         └ access_mode: image | tree
+                                 │
+                  ┌──────────────┴───────────────┐
+            image-backed                     tree-backed
+          (TSK, inode 寻址)              (路径寻址：已挂载/已解包)
+                  │                              │
+                  └────────────┬─────────────────┘
+                               ▼
+        SourceRegistry  ── source_id → SourceHandle（解析 path/offset/mode）
+                               │
+        ToolRegistry    ── 工具按 access_mode 注册，调用时绑定 source_id
+                               │
+        ┌──────────────────────┼───────────────────────┐
+        ▼                      ▼                       ▼
+  Knowledge-Source         Graph Write Gateway      ToolInvocationLog
+  Agents (LLM)        ──►  （唯一写入口，强制      （每次工具调用留痕：
+  只能经网关写图           前置条件 = grounding）     args / 输出 / sha256）
+        │                      │
+        └──────────────────────┴──► Grounded Evidence Graph (GEG)
+                                     Phenomenon / Hypothesis / Entity
+                                     置信度 = 对数几率累加
+```
+
+**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、
+`AgentFactory` 动态组合——这些设计是好的，不重写，只适配。
+
+---
+
+## 4. 核心设计
+
+### 4.1 证据源抽象（解决 P1/P5/P6/P7，地基）
+
+新增 `case.py`：
+
+- **`EvidenceSource`** 数据类：`id`、`label`、`type`、`owner`（关联人）、
+  `path`、`access_mode`、`meta`（类型特定，如分区 offset / 解包后根目录）。
+- **`Case`**：持有 `list[EvidenceSource]` + 案件元数据，从 `case.yaml` 加载。
+- **`access_mode` 是关键设计区分**：
+  - `image`：块设备/磁盘镜像，用 TSK 按 inode 寻址（USB E01、安卓 `blk0_sda` 各分区）。
+  - `tree`：已挂载文件系统或已解包目录，按路径寻址（iOS 提取解压后、归档展开后）。
+  - 工具按 access_mode 分族注册（见 4.2）。一份证据可经「准备」从 image 变为 tree
+    （如分区 mount、zip 解包）。
+
+`main.py` 的 `select_image_interactive`（:91-153）改为加载/构造 `Case`；
+`_IMAGE_GLOBS` 改为类型探测（`mmls` 试探 + 文件头嗅探），不再靠扩展名。
+`config.yaml` 删除 `cfreds_hacking_case`，案件信息移入 `case.yaml`。
+
+### 4.2 工具注册按源参数化（解决 P1）
+
+现状：`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具
+（`tool_registry.py:159+`）。改造：
+
+- 工具执行器签名增加 `source_id`；执行时经 `SourceRegistry` 解析出真实 path/offset/mode。
+- `TOOL_CATALOG` 按 `access_mode` 标注工具适用性；agent 拿到的工具集由其
+  负责的源类型决定。
+- **「当前源」上下文**：编排器为 agent 设置 current source（类比现有
+  `graph._current_agent`），工具默认作用于它——LLM 不必每次传 `source_id`
+  （减少出错）。跨源工具（时间线合并、实体查询）显式跨源。
+- 缓存键 `_cache_key`（`tool_registry.py:41`）纳入 `source_id`，防止跨源串味。
+
+### 4.3 图写入网关（解决 P2，落实原则 1）
+
+现状：agent 通过 `add_phenomenon` 等工具直接写图，约束只在 prompt。改造：
+
+- 所有图变更（`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …）
+  收敛到**一个写入网关**。网关在代码层强制前置条件。
+- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流
+  （INVESTIGATE→RECORD→LINK→ANSWER）不变——变的是 RECORD 这一步底下的网关变严。
+- `base_agent.py` 的 `mandatory_record_tools` 机制保留（它保证 agent 真的记录了东西）。
+
+### 4.4 证据落地约束 Grounding（解决 P2，落实原则 2）
+
+这是系统可靠性的核心机制。
+
+**ToolInvocationLog**：每次工具调用留痕一条记录
+`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`。
+现有结果缓存（`tool_registry.py:29`）已存确定性输出，扩展为完整留痕即可。
+
+**Phenomenon 一分为二**——把「事实」和「解读」分开：
+
+- `verified_facts`: `list[{type, value, invocation_id}]`，
+  `type ∈ {path, timestamp, inode, hash, identifier, count, ...}`。
+- `interpretation`: 自由文本，agent 的分析叙述。
+
+**`add_phenomenon` 网关前置条件**：
+
+1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`。
+2. 代码校验 `fact.value` 命中该次调用的输出：
+   - 文本输出 → 逐字 substring 匹配；
+   - 结构化/二进制工具输出 → 与解析后的字段匹配。
+3. 任一 fact 不通过 → **整条拒绝写入**，返回失败的 fact，agent 须修正重试。
+4. 通过 → 写入；`verified_facts` 每条带 `invocation_id`（可重跑复核），
+   `interpretation` 标记为「未核验分析」。
+
+**效果**：在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」
+**结构性地不可能**。LLM 仍可能写错 `interpretation`，但报告会把
+verified facts（带重跑指令的引证）与 interpretation（明确标注的分析）
+**分开渲染**，人类调查员一眼可辨。这是诚实划定边界的可靠性保证。
+
+> 现有 `_make_auto_record`（`tool_registry.py:126`）把工具输出直接转 phenomenon——
+> 那是「平凡落地」的特例（描述即输出），新设计是它的一般化与形式化。
+
+### 4.5 假设置信度：似然比 / 对数几率（解决 P3）
+
+把 `evidence_graph.py:26` 的 `_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」
+换成基于**似然比（LR）**的对数几率累加：
+
+- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类**
+  （这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …），
+  数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**（延续现有「LLM 选类型、
+  代码算数值」哲学并赋予统计基础）。
+- 置信度更新：
+  ```
+  L_post = L_prior + Σ log₁₀(LR_i)        # 对数几率，可交换 → 无序依赖
+  confidence = 1 / (1 + 10^(−L_post))
+  ```
+- 边类型 → `log₁₀(LR)` 标定表（初值，后续可由标注案例校准）：
+
+  | 边类型 | log₁₀LR |
+  |---|---:|
+  | `direct_evidence` | +2.0 |
+  | `supports` / `consequence_observed` | +1.0 |
+  | `prerequisite_met` | +0.5 |
+  | `weakens` | −0.5 |
+  | `contradicts` | −2.0 |
+
+- 阈值不变（≥0.8 supported / ≤0.2 refuted），只是改由 `L_post` 推出。
+- `prior_prob` 成为可配置量（默认 0.5 → `L_prior=0`）。
+- **同类证据调和衰减**（2026-05 落地）：同 `(hypothesis, edge_type)` 的第 k 条边
+  贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`（调和级数，~ ln N）。
+  解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
+  （2026-05-20 实战数据）。单条边不变（k=1, 衰减=1.0）。**结构信号**比绝对值
+  更重要：strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。
+
+附带产出一个 **假设 × 证据矩阵**视图，供报告与线索选择使用。
+
+### 4.6 跨源实体解析（解决「复杂场景」的关联难题，落实原则 4）
+
+复杂取证的核心难题：iPhone keychain 里的 Apple ID、安卓短信库里的号码、
+USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人？**
+
+**关键设计：「身份共指」本身就是一条假设**——于是实体解析不是独立子系统，
+而是 4.5 假设机制的复用：
+
+- agent 观察到标识符即经网关 `observe_identity`，记一条**类型化**的标识符
+  （强标识符：IMEI / 钱包地址 / email / 电话号；弱标识符：昵称 / 显示名），
+  挂到暂定 `Entity`。
+- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`；共享强标识符 = 强 +LR 边，
+  共享弱标识符 = 弱 +LR 边，冲突的强标识符 = 强 −LR 边——用 4.5 同一套计算打分。
+- **不做破坏性归并**：跨阈值时在两个 Entity 间加一条 `same_as` 边（由该 coref
+  假设背书）。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、
+  可被后续 contradicts 证据推翻**（落实原则 4）。
+- **Blocking**：只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设，
+  避免 O(n²)。
+
+跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。
+
+### 4.7 能力插件层（接入 5 类证据）
+
+每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。
+
+| 插件 | 摄取 | 新工具 | 知识源 agent |
+|---|---|---|---|
+| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` |
+| **安卓整盘** | `mmls` 分区→各分区 `image` 源；可 mount 为 `tree` | 复用 TSK；ext4/F2FS 读取；`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` |
+| **磁盘镜像(E01)** | 已支持（TSK 含 ewf） | 现有 TSK 工具链 | 现有 filesystem/registry |
+| **归档** | `unzip_archive` 通用解包 | —— | —— |
+| **媒体/截图** | —— | `ocr_image`（tesseract；注意 DeepSeek 无视觉能力，必须走 OCR） | `MediaAgent` |
+
+**安卓风险**：`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区
+探明：未加密→TSK 直接用；加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。
+
+`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展：分类由源插件提供自己的
+工件分类表，而非全局 Windows 关键词表（解决 P4）。
+
+### 4.8 Agent 体系重组
+
+现有 7 个 agent 按 Windows 工件命名（registry、communication=邮件/IRC、
+network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特定 agent：
+
+- `agent_factory.py` 的 `_AGENT_CLASSES`（:34-40）扩充：新增 `ios_artifact`、
+  `android_artifact`、`financial`（钱包/交易）、`media`。
+- `communication` 泛化：邮件 + IM + 短信，跨平台。
+- 新增 **源类型 → 适任 agent** 映射，供 Phase 1 逐源派 triage agent。
+- `create_specialized_agent`（:69）的动态组合机制保留——它本就是应对能力缺口的
+  正确手段，只是工具目录变大后选择空间更丰富。
+
+### 4.9 编排器多源流水线
+
+| 阶段 | 改造 |
+|---|---|
+| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**，每源派类型适配的 agent |
+| Phase 2 | 假设跨源生成；身份共指假设在此首次登记 |
+| Phase 3 | **Strategist 循环**：LLM 元 agent 每轮看图决定 propose_lead 或 declare_complete；workers 执行 lead；hypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
+| Phase 4 | 跨源时间线合并，**按源做时区归一**（iOS UTC vs 安卓本地时间） |
+| Phase 5 | 一案一份综合报告：含假设结论、实体关联图、每条结论的 provenance 引证 |
+
+**Phase 3 的"LLM 决定深度"**（2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地）：调度层从代码硬决策（"max_rounds=N, converged→stop"）转为 LLM 元 agent 驱动。
+
+- 新 agent `InvestigationStrategist`（`agents/strategist.py`）每轮取一个动作：propose 1-3 lead，或 declare_investigation_complete
+- 4 个只读视图工具：`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status`（`tools/strategy.py`）让 LLM 看到调度信号
+- 2 个写入决策工具：`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
+- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
+- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
+
+断连恢复、运行归档逻辑保留；`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。
+
+---
+
+## 5. 数据模型变更汇总
+
+| 节点/结构 | 变更 |
+|---|---|
+| `EvidenceSource` | **新增**一等节点（`src-*`） |
+| `ToolInvocation` | **新增**留痕记录（`inv-*`），随 graph 持久化 |
+| `Phenomenon` | + `source_id`；description 拆为 `verified_facts[]` + `interpretation`；澄清/移除语义含混的 `confidence`（默认 1.0），观测的可靠性由 grounding 表达 |
+| `Hypothesis` | + `prior_prob`、`log_odds`（累加量）；`confidence` 改为派生值 |
+| `Entity` | + 类型化标识符集合；通过 `same_as` 边跨源连通 |
+| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`）；同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
+| Entity→Entity 边 | **新增** `same_as`（由 coref 假设背书，可逆） |
+| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`（strategist 注解） |
+| `InvestigationRound` | **新增**：strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |
+
+`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
+
+---
+
+## 6. 组件改动清单
+
+| 文件 | 改动 |
+|---|---|
+| `case.py` | **新建**：`Case` / `EvidenceSource` / `SourceRegistry` |
+| `main.py` | 选源逻辑改为加载 `Case`；类型探测替代扩展名 glob |
+| `tool_registry.py` | 工具按 `source_id` 参数化；缓存键含 source；`_auto_categorize` 改可扩展；`ToolInvocationLog` |
+| `evidence_graph.py` | 数据模型变更（第 5 节）；LR/对数几率置信度；写入网关 + grounding 校验 |
+| `base_agent.py` | RECORD 走网关；`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 |
+| `agent_factory.py` | `_AGENT_CLASSES` 扩充；源类型→agent 映射 |
+| `orchestrator.py` | Phase 1 逐源；Phase 4 跨源时区归一；Phase 5 综合报告 |
+| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py`；`communication.py` 泛化 |
+| `tools/` | 新增 `mobile_ios.py`（plist/sqlite/keychain）、`media.py`（OCR）、`archive.py`（解包） |
+| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`；新建 `case.yaml` 证据清单 |
+
+---
+
+## 7. 构建顺序（按依赖排序）
+
+| 阶段 | 内容 | 依赖 | 价值 |
+|---|---|---|---|
+| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基；先只在 USB E01 上跑通验证不破坏现有逻辑 |
+| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心；可量化「零幻觉录入」 |
+| **S3** | 4.5 LR/对数几率置信度 | 独立（可与 S2 并行） | 修 P3；置信度可辩护 |
+| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 |
+| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联，复杂场景能力成型 |
+| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 |
+
+S1+S2+S3 是「把系统改对」；S4-S6 是「把能力铺全」。建议严格按序——
+S1 不稳，后面全是空中楼阁。
+
+---
+
+## 8. 设计取舍与未决问题
+
+1. **grounding 对自由文本的边界**：只硬核验 `verified_facts` 里的结构化原子，
+   `interpretation` 不做逐字核验（诚实划界）。可加一个二级 lint：扫描
+   interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。
+2. **LR 标定表初值人定**：先用第 4.5 节的初值跑通；「从标注案例学习 LR」是后续工作。
+3. **安卓 userdata 加密**：能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。
+4. **实体解析的破坏性 vs 可逆**：本设计选**可逆的 `same_as` 边**而非破坏性归并——
+   牺牲一点查询效率换取完全可审计可回滚，符合原则 4。
+5. **报告粒度**：定为「一案一份综合报告」，内嵌每证据小节 + 跨源关联，
+   而非每证据独立成篇。
--- a/DESIGN_STRATEGIST.md
+++ b/DESIGN_STRATEGIST.md
@@ -0,0 +1,543 @@
+# Strategist Loop —— Phase 3 信念驱动改造
+
+> 这是 DESIGN.md 的补充设计文档，针对 §4.9 编排器 Phase 3 的具体重写。
+>
+> **触发动因**：2026-05-20 第一次全 6-source 实战（`runs/2026-05-20T20-15-04/`）
+> 暴露 Phase 3 不工作——8 条 pending leads 一个都没派发，因为
+> log-odds 通胀让所有 hypothesis 立即 converged。即使在「调和衰减」修复
+> log-odds 数学后（commit 在 `evidence_graph.py:update_hypothesis_confidence`），
+> Phase 3 在当前架构下仍然是「单轮触发、规则收敛」的机械流程——LLM
+> 在调度层完全没有发言权。本设计把 Phase 3 改为 LLM 驱动的探索循环。
+
+---
+
+## 0. 范围
+
+### 做什么
+
+把 `orchestrator.py:Phase 3` 从「单轮、规则触发」改造为「strategist-loop、信念驱动」：
+新增一个 `InvestigationStrategist` agent + 4 个决策视图工具 + 2 个决策动作工具
+ 编排器循环改写。
+
+### 不做什么
+
+- 不改 Phase 1（per-source triage 保持现状）
+- 不改 Phase 2（HypothesisAgent 不动；strategist 可以**调用**它，但不替代）
+- 不改 Phase 4/5（timeline / report）
+- 不写专家级 per-source 检查清单（只在 `source_coverage` 工具里塞**软提示**清单）
+- 不引入新的图节点类型；leads 复用现有结构
+
+### 保留的不变式
+
+- DESIGN.md §4.3 grounding 网关，所有写入仍走它
+- DESIGN.md §4.5 log-odds + 调和衰减
+- DESIGN.md §4.4 verified_facts vs interpretation 划界
+- 断连恢复（`graph_state.json` 序列化兼容）
+
+### 设计原则
+
+1. **"LLM 提议，代码裁决" 上移到调度层**：DESIGN.md 第一原则现在只在事实层
+   （grounding）兑现，调度层「该不该深入、深入哪里、何时停」目前是代码硬决策。
+   本设计让 LLM 持有调度决策权。
+2. **应试能力存在但不被绑死**：系统的工具集和软提示清单覆盖应试场景所需的工件
+   类别；但是否查某个工件、查到什么深度，由 strategist 看具体案件性质决定，
+   不被预定义清单强制。
+3. **可解释、可审计**：每一轮 strategist 决策、动机、产出收益都被记入持久化的
+   `InvestigationRound`，可事后复盘。
+
+---
+
+## 1. 数据模型变更
+
+### 1.1 `Lead` 扩 4 字段
+
+`evidence_graph.py:Lead` 现有 `(id, title, description, target_agent, source_id, status, …)`。
+新增：
+
+```python
+@dataclass
+class Lead:
+    # ... existing fields
+    proposed_by: str = ""           # "strategist" | "filesystem" | ... — 提案 agent
+    motivating_hypothesis: str = "" # hyp-id this lead is meant to corroborate/refute
+    expected_evidence_type: str = "" # one of edge_types — 期望产出的边类型
+    round_number: int = 0           # 哪一轮 strategist 产生
+```
+
+`motivating_hypothesis` 是关键——它把 lead 和 hypothesis 显式挂钩，让事后能算
+"这条 lead 跑完到底有没有改变假设状态"，即 strategist 的边际收益度量。
+
+### 1.2 新增 `InvestigationRound` 节点
+
+记录每一轮 strategist 的决策本身——provenance 也要可审计：
+
+```python
+@dataclass
+class InvestigationRound:
+    id: str                          # "round-001"
+    round_number: int
+    started_at: str
+    completed_at: str = ""
+    strategist_action: str = ""      # "propose_leads" | "declare_complete"
+    leads_proposed: list[str] = field(default_factory=list)
+    leads_executed: list[str] = field(default_factory=list)
+    hypothesis_status_snapshot_before: dict = field(default_factory=dict)  # hyp_id → status
+    hypothesis_status_snapshot_after: dict = field(default_factory=dict)
+    new_phenomena_count: int = 0
+    new_edges_count: int = 0
+    decision_rationale: str = ""     # strategist 自述
+```
+
+随 graph 序列化（加进 `to_dict`/`from_dict`）。
+
+---
+
+## 2. 新工具
+
+放在新文件 `tools/strategy.py`。按现有 `TOOL_CATALOG` 注册模式登记。
+
+### 2.1 `graph_overview()` — 全局态势（只读）
+
+**Signature**: `graph_overview() -> str`
+
+**输出**（markdown，比 JSON 更易 LLM 解读）：
+
+```markdown
+# Investigation State
+
+## Hypotheses (8)
+| id | title | L | conf | status | edges_in | distinct_sources | flipped_in_last_2_rounds |
+|----|-------|---|------|--------|----------|------------------|---------------------------|
+| hyp-83db8748 | Multi-Device Composite | +8.75 | 0.99 | supported | 23 | 1 | no |
+| hyp-daa7c704 | Multiple Identity Aliases | +9.21 | 0.99 | supported | 11 | 3 | no |
+| hyp-7fa9b13e | Sunny.zip contains timer_a | +2.08 | 0.99 | supported | 4 | 1 | yes (active→supported in R2) |
+| ...
+
+## Sources (6)
+| id | type | phenomena | identities | last_touched_in_round |
+| src-usb-leung | disk_image | 8 | 1 | R1 |
+| ...
+
+## Pending Leads (3)
+| id | from | targeting | for_hypothesis | reason |
+| lead-aaa | filesystem | src-ios-chan/Safari | hyp-83db8748 | Safari history likely contains device-switching evidence |
+```
+
+**关键标注**：`distinct_sources` 一栏暴露了"这个假设只靠一个源支撑"——strategist
+看到 23 边都来自 android 源会自动判断"需要从别处独立证据"。
+
+### 2.2 `source_coverage(source_id: str)` — 单源覆盖度（只读）
+
+**Signature**: `source_coverage(source_id: str) -> str`
+
+**实现**：扫 `graph.tool_invocations`，过滤 `source_id == 该源`，按工具名 + 主要 args
+分组。然后跟 `EXPECTED_ARTEFACTS[source_type]` 比对，未触达项打 ✗。
+
+```python
+# tools/strategy.py
+EXPECTED_ARTEFACTS: dict[str, list[dict]] = {
+    "disk_image+windows": [
+        {"name": "filesystem layout", "detector": "fls|mmls", "value_for": "deleted files, hidden partitions"},
+        {"name": "registry hives", "detector": "parse_registry_key", "value_for": "user activity, installed software"},
+        {"name": "browser history", "detector": "list_directory@AppData/.../History", "value_for": "URL access, downloads"},
+        {"name": "prefetch", "detector": "extract_file@Windows/Prefetch", "value_for": "program execution evidence"},
+        # ...
+    ],
+    "mobile_extraction": [
+        {"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb", "value_for": "contacts"},
+        {"name": "SMS messages", "detector": "sqlite_query@sms.db", "value_for": "messaging content"},
+        {"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite", "value_for": "WhatsApp content"},
+        {"name": "Call history", "detector": "sqlite_query@CallHistoryDB", "value_for": "call records"},
+        {"name": "Safari history", "detector": "sqlite_query@History.db|read_text@Bookmarks.plist", "value_for": "web browsing"},
+        {"name": "Photos library", "detector": "sqlite_query@Photos.sqlite", "value_for": "photo metadata, EXIF, geolocation"},
+        {"name": "iCloud accounts", "detector": "parse_plist@Accounts3.sqlite|parse_keychain", "value_for": "Apple ID, services"},
+        {"name": "App inventory", "detector": "list_directory@var/containers/Bundle/Application", "value_for": "installed apps"},
+    ],
+    "disk_image+android": [...],
+    "media_collection": [
+        {"name": "OCR text", "detector": "ocr_image", "value_for": "screenshot text"},
+        {"name": "EXIF metadata", "detector": "exif_image", "value_for": "device, timestamps, geolocation"},
+    ],
+}
+```
+
+**软提示语义**：output 末尾必带一句：
+
+> Coverage hints are heuristics, not requirements. Skip an item if the case theory
+> makes it irrelevant. Investigate ✗ items only when they could materially affect
+> an active hypothesis.
+
+这一句是**"应试能力存在但不被绑死"的关键**——LLM 看到 ✗ 不会盲投，会先看
+hypothesis 列表问"这个工件对当前任何 hypothesis 有意义吗"。
+
+### 2.3 `marginal_yield(last_n_rounds: int = 2)` — 边际收益（只读）
+
+**Signature**: `marginal_yield(last_n_rounds: int = 2) -> str`
+
+**实现**：扫最近 N 个 `InvestigationRound`，统计：
+- 每轮新增 phenomena 数
+- 每轮新增 P→H 边数
+- 每轮 hypothesis status flips 数（active→supported / 反向）
+
+**输出**：
+
+```markdown
+# Marginal Yield (last 2 rounds)
+
+| round | new_phenomena | new_edges | status_flips |
+|   R3  |  5            |  7        |  1           |
+|   R4  |  2            |  1        |  0           |
+
+Trend: decelerating (R4 yield 33% of R3).
+Recommendation interpretation aid: yield trending to zero suggests diminishing
+returns; consider declare_complete after one more probe.
+```
+
+最后一行是 LLM-friendly heuristic prose，不是强制信号。
+
+### 2.4 `budget_status()` — 预算视图（只读）
+
+**Signature**: `budget_status() -> str`
+
+```markdown
+# Budget Status
+
+| metric | used | cap | pct |
+| tool_calls | 1248 | 5000 | 25% |
+| strategist_rounds | 3 | 10 | 30% |
+| wall_clock_minutes | 142 | 360 | 39% |
+
+Phase 1 used 89% of allocated. Phase 2 used 4%. Phase 3 (strategist) so far: 7%.
+```
+
+预算从 config.yaml 读，新增字段见 §6。无预算配置时进 unbounded 模式（仅靠
+strategist 自宣 complete + hard safety cap）。
+
+### 2.5 决策动作工具（写入）
+
+注册到 strategist 的 `mandatory_record_tools`。Strategist 每轮必须 call 至少一个，
+否则 forced-retry 触发（复用现有机制）。
+
+**`propose_lead(...)`**：
+
+```python
+{
+    "name": "propose_lead",
+    "input_schema": {
+        "type": "object",
+        "required": [
+            "description", "target_agent",
+            "motivating_hypothesis", "expected_evidence_type",
+        ],
+        "properties": {
+            "description": {
+                "type": "string",
+                "description": "1-2 sentence specific investigation request, including target source/artefact",
+            },
+            "target_agent": {
+                "type": "string",
+                "enum": ["filesystem","registry","communication","network","ios_artifact","android_artifact","media"],
+            },
+            "source_id": {"type": "string", "description": "which source to investigate"},
+            "motivating_hypothesis": {
+                "type": "string",
+                "description": "hyp-id this lead is meant to corroborate or refute",
+            },
+            "expected_evidence_type": {
+                "type": "string",
+                "enum": ["direct_evidence","supports","contradicts","weakens","prerequisite_met","consequence_observed"],
+            },
+            "rationale": {"type": "string", "description": "why this fills a real gap"},
+        }
+    }
+}
+```
+
+**`declare_investigation_complete(...)`**：
+
+```python
+{
+    "name": "declare_investigation_complete",
+    "input_schema": {
+        "type": "object",
+        "required": ["reason"],
+        "properties": {
+            "reason": {
+                "type": "string",
+                "enum": [
+                    "marginal_yield_zero",
+                    "budget_exhausted",
+                    "all_hypotheses_resolved",
+                    "coverage_saturated",
+                    "other",
+                ],
+            },
+            "rationale": {"type": "string"},
+        }
+    }
+}
+```
+
+Terminal tool —— 调用即结束循环（复用现有 `terminal_tools` 机制）。
+
+---
+
+## 3. `InvestigationStrategist` agent
+
+新文件 `agents/strategist.py`，约 150 行。
+
+```python
+class InvestigationStrategist(BaseAgent):
+    name = "strategist"
+    role = (
+        "You are the investigation strategist. You do not run forensic tools yourself. "
+        "Your job is to read the current evidence graph and decide ONE of:\n"
+        "  (a) propose 1-3 new investigation leads that would materially affect an active hypothesis, or\n"
+        "  (b) declare the investigation complete.\n"
+        "\n"
+        "Use graph_overview / source_coverage / marginal_yield / budget_status to ground your judgment. "
+        "DO NOT propose a lead that just adds more same-direction evidence to an already-supported hypothesis "
+        "(harmonic damping makes it ~useless). DO propose leads when:\n"
+        "  - A hypothesis is supported by edges from only ONE source — get cross-source corroboration.\n"
+        "  - A hypothesis is in the active band (0.2 < conf < 0.8) — it needs the deciding evidence.\n"
+        "  - A specific high-value artefact is uncovered on a source where the active hypotheses suggest it matters.\n"
+        "\n"
+        "Declare complete when marginal_yield is approaching zero AND no remaining active hypotheses have "
+        "obvious investigation paths."
+    )
+
+    mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
+    terminal_tools = ("declare_investigation_complete",)
+
+    def _register_graph_tools(self):
+        # Read-only tools — strategist NEVER writes phenomena/edges directly.
+        # All graph writes happen via the workers it dispatches.
+        self._register_graph_read_tools()
+        # No graph_write_tools.
+        # Add strategy-specific tools:
+        for tool_name in (
+            "graph_overview", "source_coverage", "marginal_yield", "budget_status",
+            "propose_lead", "declare_investigation_complete",
+        ):
+            td = TOOL_CATALOG[tool_name]
+            self.register_tool(td.name, td.description, td.input_schema, td.executor)
+```
+
+注册到 `agent_factory._AGENT_CLASSES["strategist"]`。
+
+---
+
+## 4. 编排器改造
+
+### 4.1 删除/替换：现在的 Phase 3
+
+`orchestrator.py:Phase 3` 当前逻辑（约 150 行）：检查 leads → 派 worker →
+检查 converged → 退出。**删除**。
+
+### 4.2 新 Phase 3：strategist loop
+
+```python
+async def _phase3_strategist_loop(self, run_dir: Path) -> None:
+    """Belief-driven investigation: strategist proposes, workers execute, repeat."""
+    _log("Phase 3: Strategist-Driven Investigation", event="phase")
+
+    strategist = self.factory.get_or_create_agent("strategist")
+    max_rounds = self.config.get("budgets", {}).get("strategist_rounds_max", 10)
+
+    for round_num in range(1, max_rounds + 1):
+        # 1. Record round start + snapshot
+        rid = await self.graph.start_investigation_round(round_num)
+
+        # 2. Strategist run
+        _log(f"Strategist Round {round_num}", event="phase")
+        await strategist.run(
+            f"Review the graph and decide the next investigation action. "
+            f"This is round {round_num}/{max_rounds}. Budget used so far: see budget_status."
+        )
+
+        # 3. Did strategist declare complete?
+        if self.graph.is_round_terminal(rid):
+            _log(f"Strategist declared complete at round {round_num}", event="progress")
+            break
+
+        # 4. Collect new leads proposed this round
+        new_leads = self.graph.leads_from_round(round_num)
+        if not new_leads:
+            _log(f"No leads proposed in round {round_num} — stopping", event="progress")
+            break
+
+        # 5. Dispatch each lead
+        for lead in new_leads:
+            await self._execute_lead(lead, round_num)
+
+        # 6. Close round + record yield
+        await self.graph.complete_investigation_round(rid)
+
+        # 7. Hard budget check
+        if self._budget_exceeded():
+            _log(f"Budget exhausted at round {round_num}", event="progress")
+            break
+```
+
+### 4.3 `_execute_lead` 复用现有 worker 派发逻辑
+
+```python
+async def _execute_lead(self, lead: Lead, round_num: int) -> None:
+    agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
+    worker = self.factory.get_or_create_agent(agent_type)
+    if worker is None:
+        logger.warning(f"No worker for lead {lead.id}: {agent_type}")
+        return
+
+    src = self.graph.case.get_source(lead.source_id) if lead.source_id else None
+    if src:
+        self.graph.set_active_source(src)
+
+    _log(
+        f"Round {round_num} dispatching: {lead.description}",
+        event="dispatch", agent=agent_type,
+    )
+    await worker.run(
+        f"Investigate this specific lead from the strategist:\n\n"
+        f"REQUEST: {lead.description}\n"
+        f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis}\n"
+        f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type}\n"
+        f"RATIONALE: {lead.rationale}\n\n"
+        f"After investigating, record findings via add_phenomenon AND link relevant phenomena "
+        f"to {lead.motivating_hypothesis} via the appropriate edge_type."
+    )
+    lead.status = "completed"
+    self.graph._auto_save()
+```
+
+### 4.4 自动 hypothesis 重生成（可选，建议加）
+
+新增 phenomena 可能产生**新假设**（不只是更新现有假设）。让 strategist 用
+`propose_lead(target_agent="hypothesis", description="re-examine recent phenomena for new hypotheses")`
+显式触发——这是 strategist 自决定的，不是定时触发。一致性优于自动定时。
+
+---
+
+## 5. 状态持久化
+
+`graph_state.json` 新增顶层 key `investigation_rounds: list[InvestigationRound]`。
+`save_state` / `load_state` 处理。**断连恢复**时：
+
+- 找最近一个未 completed 的 round → 视为该 round 失败
+- 从下一个 round 重新开始
+- 已完成 round 的 phenomena / edges 自然保留
+
+---
+
+## 6. 配置
+
+`config.yaml` 新增：
+
+```yaml
+strategist:
+  enabled: true                     # false = 走老 Phase 3 逻辑（safety fallback）
+  max_rounds: 10
+  hard_stop_marginal_yield_zero_rounds: 3  # 连续 3 轮 yield=0 强制停
+
+budgets:
+  tool_calls_total: 5000
+  wall_clock_minutes_max: 480
+```
+
+---
+
+## 7. 测试策略
+
+新文件 `tests/test_strategist.py` 或加入 `test_optimizations.py`。最少要测：
+
+1. Strategist 调 `declare_complete` 时 loop 立即退出
+2. Strategist 调 `propose_lead` 时 lead 入 graph 且 round_number 正确
+3. Round snapshot 正确捕获 before/after status
+4. 预算耗尽时即使 strategist 还想继续也强制停
+5. 断连恢复：中途中断后重启从下一 round 开始
+6. `graph_overview` 输出包含 `distinct_sources` 标注
+7. `source_coverage` 对未触达项标 ✗
+8. `marginal_yield` 数字与 `confidence_log` 一致
+
+不写 LLM 集成测试——strategist 行为通过 mock LLM 验证（已有这种模式见
+`test_forced_record_retry_fires_when_zero_phenomena`）。
+
+---
+
+## 8. 实施顺序
+
+按依赖排（**每步独立 commit**——结构性改造，单点回滚关键）：
+
+| 步 | 内容 | 依赖 | 工作量估算 |
+|---|---|---|---|
+| 1 | `Lead` 加 4 字段 + `InvestigationRound` 数据类 + 序列化 | — | 60 行 + 测试 |
+| 2 | `graph_overview` / `source_coverage` / `marginal_yield` / `budget_status` 实现 | 1 | 250 行 + 测试 |
+| 3 | `propose_lead` / `declare_investigation_complete` 工具 | 1 | 80 行 + 测试 |
+| 4 | `InvestigationStrategist` agent class | 2, 3 | 120 行 + 测试 |
+| 5 | 编排器 Phase 3 重写 | 4 | 150 行（替换 ~50 行旧）+ 测试 |
+| 6 | config schema + 加载逻辑 | 5 | 30 行 |
+| 7 | 断连恢复处理 | 5 | 40 行 + 测试 |
+| 8 | 真实案件 smoke run（小规模：USB only） | 7 | 0 代码 |
+| 9 | 文档：DESIGN.md §4.9 改写 + 本文件归档 | 8 | 文档 |
+
+总：~800 行新代码 + 测试 + 文档。
+
+---
+
+## 9. 风险 + 缓解
+
+| 风险 | 缓解 |
+|---|---|
+| Strategist 太保守（永远 declare_complete） | 加 prompt 例子展示什么是"该深入的情况"；测试时小样本验证 |
+| Strategist 太激进（每轮都 propose 7+ leads） | `propose_lead` 工具 schema 限制每轮最多 3-5 个；prompt 强调"重质不重量" |
+| 单 worker 跑不完 lead 导致预算雪崩 | worker 调用本身 max_iter 不变；strategist 预算独立 |
+| LLM 不理解 `distinct_sources` 这种暗示 | `graph_overview` 末尾加 1-2 句 plain-English 解读 "Hypothesis X has 23 edges but all from one source → cross-source corroboration would strengthen it" |
+| Phase 1 触发产生的 leads 被 strategist 忽略 | strategist prompt 明确"先处理已有 pending leads，再产新的" |
+| 死循环（strategist 反复产同样 lead） | Lead 表上加 `(motivating_hyp, expected_type, source_id)` 三元组去重 |
+| `EXPECTED_ARTEFACTS` 清单维护成本 | 故意保持"软提示"——清单不完整也不会破，只是某些深度需要更多 LLM 自觉 |
+
+---
+
+## 10. 开放问题
+
+1. **InvestigationRound 该不该自己跑 hypothesis agent？**
+   倾向 strategist 用 lead 显式触发（一致性更好），不做定时触发。
+
+2. **预算超用怎么办——硬停 vs 软警告？**
+   当前设计硬停；可加 "strategist 看到 budget < 10% 时只能 declare_complete"
+   的 schema enforcement。
+
+3. **跨 source 边的"独立性奖励"是否纳入 log-odds？**
+   上次衰减用了 `1/k`，没区分跨源 vs 同源。如果要纳入，公式应改为
+   `1/k_within_source × bonus_for_distinct_sources`。这是后续单独工程。
+
+4. **Strategist 输出的 `rationale` 该不该走 grounding？**
+   它不会写 phenomena，但 `rationale` 字段可能包含具体值
+   （"based on inv-12345..."）。倾向不强制——这是元层判断，不是事实落地。
+
+5. **现 Phase 3 的 `max_investigation_rounds` config 留还是删？**
+   建议留作 `strategist.enabled=false` 时的 fallback 旋钮。
+
+---
+
+## 11. 与 DESIGN.md 的关系
+
+本文档落地后，DESIGN.md 需要的对应更新：
+
+- **§4.5**：补一段「同时也要看 log_odds 的**结构**——edges_in 数 / distinct_sources
+  是 strategist 判断是否深入的关键信号，不只是 confidence 数值」
+- **§4.9 Phase 3**：表格内容从「leads 派发到源感知 agent」改为
+  「strategist 循环：看图、提案、执行、复盘、停 / 续」
+- **§8**（设计取舍）：新增第 6 条：「调度层 LLM 化的取舍——strategist 决定深度，
+  但每轮预算受 `budgets.*` 硬限制；这是"LLM 提议、代码裁决"原则在调度层的兑现」
+
+---
+
+## 12. 备忘：本设计**不解决**的问题
+
+- 应试题 8% 命中率的根因是**工具集不全**（无 vision、无 ZIP 暴力破解、无 VeraCrypt
+  挂载、无 blockchain explorer），不是调度问题。strategist 让现有工具被用得更狠，
+  但不会凭空多出工具。
+- LLM 编造 `invocation_id`（已修补，见 `feedback_grounding_pending` memory）和
+  log-odds 通胀（已修补：调和衰减）是本设计的**前置依赖**，不在本设计范围内。
+- Per-edge-type 的更精细贝叶斯建模（如跨源独立性 bonus）是独立工程。
--- a/README.md
+++ b/README.md
@@ -2,43 +2,120 @@

 Multi-Agent System for Digital Forensics — 基于大语言模型的多智能体电子取证系统。

-系统通过 6 个专业化 Agent 协同工作，对磁盘镜像进行自动化取证分析，最终生成结构化的取证报告。
+系统通过 7 个专业化 Agent 协同工作，对磁盘镜像进行自动化取证分析，最终生成结构化的取证报告。Agent 之间不直接通信，通过共享的 **EvidenceGraph**（证据知识图）协作。

 ## 架构

 ```
-main.py                          入口：配置加载、恢复检测、运行管理
+main.py                          入口：配置加载、镜像选择、断连恢复
  │
-  ├── Orchestrator               四阶段流水线调度
+  ├── Orchestrator               五阶段流水线调度
  │     │
-  │     ├── FileSystemAgent      磁盘结构、文件系统、删除文件、Prefetch
-  │     ├── RegistryAgent        注册表分析（系统/用户/网络/软件）
-  │     ├── CommunicationAgent   邮件、IRC 聊天记录
+  │     ├── FileSystemAgent      分区/文件系统、目录、删除文件、Prefetch
+  │     ├── HypothesisAgent      生成假设，链接已有证据
+  │     ├── RegistryAgent        注册表分析（SYSTEM/SOFTWARE/SAM/NTUSER.DAT）
+  │     ├── CommunicationAgent   邮件、IRC/mIRC 聊天记录
  │     ├── NetworkAgent         浏览器历史、PCAP 抓包
  │     ├── TimelineAgent        跨类别时间线关联
  │     └── ReportAgent          综合报告生成
  │
-  ├── Blackboard                 共享知识库（Evidence + Lead）
-  └── LLMClient                  Claude API 调用（ReAct 模式）
+  ├── EvidenceGraph              带类型边的证据知识图（自动持久化）
+  ├── AgentFactory               角色模板 + 动态 Agent 组合
+  ├── ToolRegistry               工具目录 + 结果缓存
+  └── LLMClient                  Claude API 客户端（异步、tool-use）
 ```

-Agent 之间不直接通信，通过 **Blackboard（黑板）** 共享发现（Evidence）和线索（Lead）。
+## EvidenceGraph：证据知识图

-## 调查流程
+三类节点 + 类型化加权边：
+
+| 节点 | 前缀 | 含义 |
+|---|---|---|
+| `Phenomenon` | `ph-*` | 可观测的取证产物（一条具体发现） |
+| `Hypothesis` | `hyp-*` | 解释性假设（待验证的论断） |
+| `Entity` | `ent-*` | 人、程序、主机、IP 等可复现的实体 |
+
+Phenomenon → Hypothesis 的边类型与权重写死在 `HYPOTHESIS_EDGE_WEIGHTS`：
+# TODO 
+当前流程跑通以后，寻找自适应方案
+
+| 边类型 | 权重 | 语义 |
+|---|---:|---|
+| `direct_evidence` | +0.25 | 现象就是假设所述行为本身 |
+| `supports` | +0.15 | 与假设一致但非决定性 |
+| `consequence_observed` | +0.15 | 观察到假设预期的结果 |
+| `prerequisite_met` | +0.10 | 满足假设的前置条件 |
+| `weakens` | −0.10 | 降低假设可能性 |
+| `contradicts` | −0.20 | 直接反驳假设 |
+
+置信度更新公式（收敛于 [0, 1]）：
+
+- 正向边：`delta = weight * (1 - old_conf)`
+- 负向边：`delta = weight * old_conf`
+
+跨阈值自动转状态：≥ 0.8 → `supported`，≤ 0.2 → `refuted`，跑完仍 active → `inconclusive`。LLM 只负责挑边类型（分类任务），权重表与状态转移由代码裁决，避免数值幻觉。
+
+新增 Phenomenon 时通过 Jaccard 相似度合并（title > 0.6 且 description > 0.4 即视为重复，合并后提升置信度并追加 `corroborating_agents`），避免同一发现被重复入图。
+
+## 五阶段流水线

 | 阶段 | 说明 |
 |------|------|
-| **Phase 1** | FileSystemAgent 勘查磁盘镜像，识别分区、目录结构、关键文件，产出初始 Lead |
-| **Phase 2** | 多轮线索追踪 — Lead 按 Agent 类型分组并行派发，最多 10 轮迭代 |
-| **Phase 2.5** | 覆盖率缺口分析 — 对照 config.yaml 中的 10 个调查领域，自动补漏 |
-| **Phase 3** | TimelineAgent 综合所有 evidence 建立事件时间线 |
-| **Phase 4** | ReportAgent 生成 Markdown 格式取证报告 |
+| **Phase 1** | FileSystemAgent 初勘镜像，识别分区/文件系统/关键路径，产出首批 Phenomenon |
+| **Phase 2** | 假设生成 — 优先读 `config.yaml:hypotheses`；未配置则由 HypothesisAgent 从 Phase 1 现象自动生成 3-7 个 |
+| **Phase 3** | 假设驱动调查（默认 5 轮迭代）。每轮：一次性为所有 active 假设产出 leads → 按 agent 类型并发派发（信号量 = 3）→ 一次性判定新现象与各假设的关系。所有假设收敛即提前退出。末尾：失败 lead 重试一次 + Gap Analysis |
+| **Phase 4** | TimelineAgent 用 `build_filesystem_timeline` 生成 MAC 时间线，与 Phenomenon 时间戳关联 |
+| **Phase 5** | ReportAgent 综合假设、证据、实体，生成 Markdown 报告 |

-## 取证工具链
+### Investigation Areas（hypothesis-derived）

-### Sleuth Kit（磁盘取证）
+Phase 2 末尾 orchestrator 调一次 LLM 从所有 active hypothesis 派生 5-12 个 **InvestigationArea**（snake_case slug、description、suggested_agent、expected_keywords、expected_tools、priority、motivating_hypothesis_ids）。Areas 存进 `graph.investigation_areas`，序列化到 `runs/<ts>/investigation_areas.json`。两个用途：

-通过异步子进程调用 TSK 命令行工具：
+1. **Phase 3 主循环提示** — 每个 hypothesis 块附 `Expected areas: a, b, c`，LLM 仍自由选 lead 但有软引导
+2. **Phase 3 末尾 Gap Analysis** — 两层判定覆盖情况：
+   - **关键词匹配**：扫 Phenomenon 标题/描述对照 area.expected_keywords
+   - **工具命中**：检查 area.expected_tools 是否实际调用过
+
+未覆盖的 area 自动派 lead（`suggested_agent` + `priority` + `motivating_hypothesis_ids[0]` 透传给 `Lead.hypothesis_id` 保留 provenance），最多 3 轮补漏。
+
+**手动 override**：`config.yaml:investigation_areas` 默认注释掉，纯 LLM 派生。取消注释可添加强制必查的领域，会先于 LLM 写入并通过 slug-based dedupe 保护不被覆盖（LLM 只会 augment keyword/tool 列表）。这是跨案件/跨平台适配的关键 —— 不再 hardcode Windows-specific 领域。
+
+## Agent 体系
+
+`AgentFactory` 维护 7 个角色模板（`ROLE_TEMPLATES`），每个模板指定默认工具集。`HypothesisAgent` 和 `ReportAgent` 是 `BaseAgent` 的子类（额外注册专用工具），其余 5 个 Agent 直接由 `BaseAgent` + 工具列表生成。
+
+### Agent 工作流
+
+`BaseAgent.run` 在 system prompt 中强制四阶段：
+
+```
+A. INVESTIGATE   先查图状态 / Asset Library，再调取证工具
+B. RECORD        每条发现写 add_phenomenon
+C. LINK          按需 link_to_entity，但禁止凭记忆引用 ph-id，必须先 list_phenomena
+D. ANSWER        以上完成后再给最终答复
+```
+
+prompt 内置**反幻觉规则**：只允许记录工具输出中逐字出现的内容；时间戳/路径/inode 必须来自工具返回；输出被截断须标 `[truncated]`。
+
+### 动态 Agent 组合
+
+`AgentFactory.create_specialized_agent()` 应对能力缺口：将工具目录与假设描述喂给 LLM，由其挑 3-8 个工具并写角色描述，工厂据此实例化新 Agent 并缓存。
+
+## 工具系统
+
+`tool_registry.py` 启动时调用 `register_all_tools(image_path, partition_offset, graph)`，将所有工具一次性注册到全局 `TOOL_CATALOG`。
+
+### 工具结果缓存
+
+`CACHEABLE_TOOLS` 集合标记纯读取/确定性工具（partition_info、list_directory、parse_registry_key …）。镜像只读，同 args 调用产出固定，命中缓存直接复用，错误结果不入缓存。
+
+### Asset Library
+
+`EvidenceGraph.asset_library` 按 inode 索引所有已提取文件，避免重复 extract。Agent 通过 `list_assets` / `find_extracted_file` 工具查询。新文件按文件名自动归类到 `registry_hive` / `chat_log` / `prefetch` / `network_capture` / `recycle_bin` 等十类之一。
+
+### 取证工具链
+
+**Sleuth Kit（磁盘取证）** — 异步子进程调用 TSK：

 | 工具 | 用途 |
 |------|------|
@@ -49,47 +126,43 @@ Agent 之间不直接通信，通过 **Blackboard（黑板）** 共享发现（E
 | `srch_strings` | 磁盘字符串搜索 |
 | `fls -m` | MAC 时间线生成 |

-### regipy（注册表解析）
+**regipy（注册表解析）** — 直接读 SYSTEM / SOFTWARE / SAM / NTUSER.DAT 二进制，提取系统信息、用户账户、网络配置、已安装软件、邮件账户、关机时间等。

-直接解析 Windows 注册表 hive 二进制文件（SYSTEM、SOFTWARE、SAM、NTUSER.DAT），提取系统信息、用户账户、网络配置、已安装软件、邮件账户、关机时间等。
+**文件解析器** — Prefetch 二进制（`.pf`）、PCAP 字符串提取（HTTP 请求 / Host / Cookie / UA）、通用文本与二进制读取、正则搜索、Hex dump。

-### 文件解析器
+## 断连恢复与运行归档

- **Prefetch** — 二进制解析 Windows XP .pf 文件（运行次数、最后执行时间）
- **PCAP** — 从抓包文件提取 HTTP 请求、Host、Cookie、User-Agent
- **通用文本/二进制** — 按偏移读取、正则搜索、Hex dump
+三层防护：

-## 断连恢复与数据归档
+1. **EvidenceGraph 自动持久化** — 每次 `add_phenomenon` / `add_hypothesis` / `add_edge` / `add_lead` 等写操作均自动落盘（原子写 `.tmp` 后 rename）
+2. **Agent 级容错** — 单 Agent 失败 → 该 lead 标 `failed`，连续 3 次失败触发 `AnalysisAborted` 优雅退出；Phase 3 末尾对失败 lead 重试一次（`retry=True` 防无限循环）
+3. **续跑** — `main.py` 启动时扫 `runs/*/graph_state.json`，发现存在但缺 `run_metadata.json` 的目录即提示恢复，并按 graph 当前状态决定从哪一阶段续起

-系统设计了三层防护，应对长时间运行中的网络中断：
-
-1. **Blackboard 自动持久化** — 每次 add_evidence / add_lead 自动写盘（原子写入）
-2. **Agent 级容错** — 单个 Agent 失败标记 Lead 为 failed，不影响其他 Agent，自动重试一次
-3. **优雅退出** — 连续 3 次 Agent 失败后保存现有成果并干净退出
-
-每次运行自动创建带时间戳的归档目录：
+### 运行归档目录

 ```
 runs/
  2026-04-02T14-30-00/
-    config.yaml              配置快照
-    blackboard_state.json    实时状态（用于恢复）
-    evidence.json            结构化证据导出
-    leads.json               线索及最终状态
-    report.md                取证报告
-    run_metadata.json        运行元数据（时长、统计、错误）
-    masforensics.log         运行日志
+    config.yaml                    配置快照
+    graph_state.json               实时图状态（续跑用）
+    phenomena.json                 现象导出
+    hypotheses.json                假设 + 置信度日志
+    entities.json                  实体
+    edges.json                     边
+    leads.json                     线索及最终状态
+    extracted/                     从镜像提取的文件
+    <image>_forensic_report.md     取证报告
+    run_metadata.json              运行元数据（时长、统计、错误）
+    masforensics.log               运行日志
 ```

-中断后再次运行 `python main.py`，系统自动检测未完成的运行并提示恢复。
-
 ## 快速开始

 ### 环境要求

 - Python >= 3.14
 - The Sleuth Kit（系统安装，提供 `mmls`、`fls`、`icat` 等命令）
- 磁盘镜像文件置于 `image/` 目录
+- 磁盘镜像文件

 ### 安装

@@ -99,50 +172,77 @@ uv sync

 ### 配置

-编辑 `config.yaml`，填入 LLM API 地址和密钥：
+编辑 `config.yaml`：

 ```yaml
 agent:
  base_url: "https://your-api-proxy.com"
  api_key: "sk-your-key"
  model: "claude-sonnet-4-6"
-  max_tokens: 4096
+  max_tokens: 16384
+
+max_investigation_rounds: 5          # Phase 3 最大迭代轮数
+
+# hypotheses:                        # 可选：手动指定初始假设
+#   - title: "嫌疑人主动实施网络嗅探"
+#     description: "..."
+
+# investigation_areas:                 # 可选：手动 override（默认全 LLM 派生）
+#   - area: shutdown_time              #         LLM 通过 slug dedupe 只 augment
+#     agent: registry                  #         keyword/tool 列表，不覆盖 manual
+#     priority: 3
+#     keywords: [shutdown]
+#     tools: [get_shutdown_time]
 ```

-`investigation_areas` 部分定义了必须覆盖的调查领域，可按需增减。
+未配置 `hypotheses` 时由 HypothesisAgent 自动生成。

 ### 运行

 ```bash
-python main.py
+python main.py                       # 交互式选镜像与分区
+python main.py /path/to/image/dir    # 指定镜像目录
 ```

-报告和所有结构化数据将保存在 `runs/<timestamp>/` 目录下。
+中断后再次运行会自动检测未完成的 run 并提示是否续跑。
+
+### 仅重生成报告
+
+跑完一次后若只想换提示词或修复报告：
+
+```bash
+python regenerate_report.py runs/<timestamp>
+```
+
+跳过 Phase 1-4，直接从已有 `graph_state.json` 重跑 ReportAgent。

 ## 项目结构

 ```
 MASForensics/
-├── main.py              入口
-├── orchestrator.py      流水线调度
-├── blackboard.py        共享知识库
-├── llm_client.py        LLM API 客户端
-├── base_agent.py        Agent 基类
-├── config.yaml          配置文件
+├── main.py                  入口、镜像选择、断连恢复
+├── orchestrator.py          五阶段流水线调度
+├── evidence_graph.py        证据知识图 + 边权重表 + 持久化
+├── base_agent.py            Agent 基类 + 内建 graph 工具
+├── agent_factory.py         角色模板 + 动态 Agent 组合
+├── tool_registry.py         工具目录 + 结果缓存 + 自动归类
+├── llm_client.py            LLM API 客户端
+├── log_config.py            彩色终端日志 + 文件日志
+├── regenerate_report.py     从已有 graph_state 重生成报告
+├── config.yaml              配置 + 调查领域 + 可选假设
 ├── agents/
-│   ├── filesystem.py    文件系统 Agent
-│   ├── registry.py      注册表 Agent
-│   ├── communication.py 通信 Agent
-│   ├── network.py       网络 Agent
-│   ├── timeline.py      时间线 Agent
-│   └── report.py        报告 Agent
+│   ├── hypothesis.py        HypothesisAgent（add_hypothesis、link）
+│   ├── report.py            ReportAgent（综合报告，自带读取工具）
+│   ├── timeline.py          TimelineAgent（保留以备扩展）
+│   └── ...                  filesystem/registry/communication/network（同上）
 ├── tools/
-│   ├── sleuthkit.py     Sleuth Kit 封装
-│   ├── registry.py      注册表解析（regipy）
-│   └── parsers.py       文件格式解析器
-├── image/               磁盘镜像
-├── extracted/           提取的文件（运行时生成）
-└── runs/                运行归档
+│   ├── sleuthkit.py         TSK 异步封装
+│   ├── registry.py          regipy 解析
+│   └── parsers.py           Prefetch / PCAP / 通用文件解析
+├── image/                   磁盘镜像（用户放）
+├── runs/                    运行归档
+└── tests/
+    └── test_optimizations.py
 ```

 ## 依赖
@@ -152,14 +252,16 @@ MASForensics/
 | `httpx[socks]` | 异步 HTTP 客户端（支持 SOCKS 代理） |
 | `pyyaml` | 配置文件解析 |
 | `regipy` | Windows 注册表 hive 解析 |
+| `pytest` / `pytest-asyncio` | 测试 |

-## 当前案例
+## 默认案例

-默认配置分析 **CFReDS Hacking Case**（NIST 标准取证教学镜像）：
+**CFReDS Hacking Case**（NIST 标准取证教学镜像）：

- 镜像：SCHARDT.001（~4.6GB，IBM 硬盘，8 个分段）
+- 镜像：SCHARDT.001（~4.6 GB，IBM 硬盘，8 个分段）
 - 系统：Windows XP
 - 场景：涉嫌黑客入侵的计算机取证分析
+- 完整镜像 MD5：`AEE4FCD9301C03B3B054623CA261959A`（`config.yaml` 含各分段 MD5 用于校验）

 ## 测试

--- a/agent_factory.py
+++ b/agent_factory.py
@@ -1,150 +1,99 @@
-"""Agent Factory — composes agents from tool registry and role templates.
+"""Agent Factory — instantiates agents from registered classes.

-Provides both pre-defined agent templates (filesystem, registry, etc.)
-and LLM-driven dynamic agent composition for capability gaps.
+Each agent type has a dedicated subclass under agents/ that owns its name,
+role description, and tool list (single source of truth). The factory just
+maps agent_type → class. Also supports LLM-driven dynamic composition for
+capability gaps via create_specialized_agent().
 """

 from __future__ import annotations

 import json
 import logging
-from dataclasses import dataclass, field

 from base_agent import BaseAgent
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient
-from tool_registry import TOOL_CATALOG, ToolDefinition
+from tool_registry import TOOL_CATALOG

-# Agent classes with custom tools — keyed by template name
-_AGENT_CLASSES: dict[str, type] = {}
+# Agent classes keyed by name. Populated lazily to avoid circular imports.
+_AGENT_CLASSES: dict[str, type[BaseAgent]] = {}


 def _load_agent_classes() -> None:
-    """Lazy-import custom agent classes to avoid circular imports."""
+    """Lazy-import agent classes to avoid circular imports."""
    if _AGENT_CLASSES:
        return
+    from agents.android_artifact import AndroidArtifactAgent
+    from agents.communication import CommunicationAgent
+    from agents.filesystem import FileSystemAgent
    from agents.hypothesis import HypothesisAgent
+    from agents.ios_artifact import IOSArtifactAgent
+    from agents.media import MediaAgent
+    from agents.network import NetworkAgent
+    from agents.registry import RegistryAgent
    from agents.report import ReportAgent
+    from agents.strategist import InvestigationStrategist
+    from agents.timeline import TimelineAgent
+    _AGENT_CLASSES["filesystem"] = FileSystemAgent
+    _AGENT_CLASSES["registry"] = RegistryAgent
+    _AGENT_CLASSES["communication"] = CommunicationAgent
+    _AGENT_CLASSES["network"] = NetworkAgent
+    _AGENT_CLASSES["timeline"] = TimelineAgent
    _AGENT_CLASSES["hypothesis"] = HypothesisAgent
    _AGENT_CLASSES["report"] = ReportAgent
+    _AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent
+    _AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent
+    _AGENT_CLASSES["media"] = MediaAgent
+    _AGENT_CLASSES["strategist"] = InvestigationStrategist
+
+
+# Triage agent per (source.type, platform). disk_image is ambiguous on its
+# own — both a Windows USB image and an Android raw dump are disk_image —
+# so the routing helper also looks at source.meta.platform when present.
+SOURCE_TYPE_AGENTS: dict[str, str] = {
+    "disk_image":        "filesystem",       # default for unknown platform
+    "mobile_extraction": "ios_artifact",
+    "archive":           "filesystem",
+    "media_collection":  "media",
+}
+
+# Per-platform overrides for disk_image sources. Keys come from
+# source.meta.platform in case.yaml (lowercased).
+_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = {
+    "windows": "filesystem",
+    "linux":   "filesystem",
+    "android": "android_artifact",
+    "ios":     "ios_artifact",
+}
+
+
+def get_triage_agent_type(source) -> str:
+    """Pick the right Phase-1 agent for *source*.
+
+    Accepts either an :class:`EvidenceSource` or a raw source.type string
+    (for back-compat with the S5 signature). Disk-image sources additionally
+    consult ``source.meta.platform`` so Windows USBs and Android raw dumps —
+    both type=disk_image — get different agents.
+    """
+    # Back-compat: accept a plain type string.
+    if isinstance(source, str):
+        return SOURCE_TYPE_AGENTS.get(source, "filesystem")
+
+    src_type = getattr(source, "type", "disk_image")
+    if src_type == "disk_image":
+        meta = getattr(source, "meta", {}) or {}
+        platform = str(meta.get("platform", "")).lower()
+        if platform in _DISK_IMAGE_PLATFORM_AGENTS:
+            return _DISK_IMAGE_PLATFORM_AGENTS[platform]
+    return SOURCE_TYPE_AGENTS.get(src_type, "filesystem")
+

 logger = logging.getLogger(__name__)


-@dataclass
-class RoleTemplate:
-    """Pre-defined agent archetype."""
-
-    name: str
-    role: str
-    default_tools: list[str]    # tool names from TOOL_CATALOG
-    tags: list[str] = field(default_factory=list)
-
-
-# Pre-defined templates matching the original 6 agents + hypothesis agent.
-ROLE_TEMPLATES: dict[str, RoleTemplate] = {
-    "filesystem": RoleTemplate(
-        name="filesystem",
-        role=(
-            "File system forensic analyst. You examine disk image partition layouts, "
-            "directory structures, file metadata, and recover deleted files. "
-            "You identify suspicious files, installed programs, and user data locations. "
-            "You also handle Recycle Bin forensics and Prefetch execution evidence."
-        ),
-        default_tools=[
-            "partition_info", "filesystem_info", "list_directory",
-            "extract_file", "find_file", "search_strings",
-            "parse_prefetch", "count_deleted_files",
-            "read_text_file", "search_text_file", "read_binary_preview",
-        ],
-        tags=["filesystem", "disk", "files", "deleted", "prefetch"],
-    ),
-    "registry": RoleTemplate(
-        name="registry",
-        role=(
-            "Windows registry forensic analyst. You parse registry hive files "
-            "(SYSTEM, SOFTWARE, SAM, NTUSER.DAT) to extract system configuration, "
-            "user accounts, installed software, network settings, email accounts, "
-            "and other Windows artifacts."
-        ),
-        default_tools=[
-            "extract_file", "list_directory",
-            "parse_registry_key", "list_installed_software",
-            "get_user_activity", "search_registry",
-            "get_system_info", "get_timezone_info", "get_computer_name",
-            "get_shutdown_time", "enumerate_users",
-            "get_network_interfaces", "get_email_config",
-        ],
-        tags=["registry", "windows", "system", "user", "software"],
-    ),
-    "communication": RoleTemplate(
-        name="communication",
-        role=(
-            "Communication forensic analyst. You analyze email files (.dbx, .pst), "
-            "IRC/mIRC chat logs, newsgroup data, and other messaging artifacts "
-            "to identify communication patterns and contacts."
-        ),
-        default_tools=[
-            "list_directory", "extract_file",
-            "read_text_file", "read_binary_preview",
-            "list_extracted_dir", "search_strings",
-            "search_text_file", "read_text_file_section",
-        ],
-        tags=["email", "chat", "irc", "messaging", "communication"],
-    ),
-    "network": RoleTemplate(
-        name="network",
-        role=(
-            "Network forensic analyst. You analyze browser history, cookies, "
-            "network captures (PCAP), wireless artifacts, and other network-related "
-            "evidence to reconstruct online activities."
-        ),
-        default_tools=[
-            "list_directory", "extract_file",
-            "read_text_file", "read_binary_preview",
-            "list_extracted_dir", "search_strings",
-            "search_text_file", "read_text_file_section",
-            "parse_pcap_strings",
-        ],
-        tags=["network", "browser", "pcap", "http", "internet"],
-    ),
-    "timeline": RoleTemplate(
-        name="timeline",
-        role=(
-            "Timeline correlation analyst. You build chronological timelines "
-            "by combining filesystem MAC times with evidence from other agents. "
-            "You identify temporal patterns and correlate events across categories."
-        ),
-        default_tools=[
-            "build_filesystem_timeline",
-        ],
-        tags=["timeline", "correlation", "temporal"],
-    ),
-    "report": RoleTemplate(
-        name="report",
-        role=(
-            "Forensic report writer. You synthesize all evidence and hypotheses "
-            "into a comprehensive forensic analysis report with executive summary, "
-            "detailed findings organized by hypothesis, timeline of events, and conclusions."
-        ),
-        default_tools=[],  # Report agent uses only graph query tools
-        tags=["report", "summary", "writing"],
-    ),
-    "hypothesis": RoleTemplate(
-        name="hypothesis",
-        role=(
-            "Hypothesis analyst. You review all phenomena discovered so far "
-            "and formulate investigative hypotheses about what happened on the system. "
-            "For each hypothesis, identify which existing phenomena support or contradict it."
-        ),
-        default_tools=[],  # Uses only graph query + hypothesis tools
-        tags=["hypothesis", "analysis", "reasoning"],
-    ),
-}
-
-
 class AgentFactory:
-    """Creates agents from templates or dynamically via LLM composition."""
+    """Creates agents from registered classes or dynamically via LLM composition."""

    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
        self.llm = llm
@@ -152,40 +101,20 @@ class AgentFactory:
        self._cache: dict[str, BaseAgent] = {}

    def get_or_create_agent(self, agent_type: str) -> BaseAgent | None:
-        """Get a cached agent or create one from a template."""
+        """Get a cached agent or instantiate one from its registered class."""
        if agent_type in self._cache:
            return self._cache[agent_type]

-        template = ROLE_TEMPLATES.get(agent_type)
-        if template is None:
-            logger.warning("No template for agent type: %s", agent_type)
-            return None
-
-        # Use custom agent class if one exists, otherwise BaseAgent
        _load_agent_classes()
        agent_cls = _AGENT_CLASSES.get(agent_type)
-        if agent_cls is not None:
-            agent = agent_cls(self.llm, self.graph)
-        else:
-            agent = self._instantiate_from_template(template)
+        if agent_cls is None:
+            logger.warning("No agent class for type: %s", agent_type)
+            return None
+
+        agent = agent_cls(self.llm, self.graph)
        self._cache[agent_type] = agent
        return agent

-    def _instantiate_from_template(self, template: RoleTemplate) -> BaseAgent:
-        """Create a BaseAgent from a role template, registering tools from the catalog."""
-        agent = BaseAgent(self.llm, self.graph)
-        agent.name = template.name
-        agent.role = template.role
-
-        for tool_name in template.default_tools:
-            td = TOOL_CATALOG.get(tool_name)
-            if td is None:
-                logger.warning("Tool '%s' not in catalog (template: %s)", tool_name, template.name)
-                continue
-            agent.register_tool(td.name, td.description, td.input_schema, td.executor)
-
-        return agent
-
    async def create_specialized_agent(
        self,
        hypothesis_title: str,
@@ -220,18 +149,15 @@ class AgentFactory:
            messages=[{"role": "user", "content": prompt}],
        )

-        # Parse response — try to extract JSON
        try:
            config = json.loads(response)
        except json.JSONDecodeError:
-            # Try to find JSON in the response
            import re
            match = re.search(r'\{.*\}', response, re.DOTALL)
            if match:
                config = json.loads(match.group())
            else:
                logger.error("Failed to parse agent composition response: %s", response[:300])
-                # Fallback: create a generic agent with all tools
                return self._create_fallback_agent(capability_gap)

        agent_name = config.get("agent_name", "specialized")
@@ -239,13 +165,11 @@ class AgentFactory:
        strategy = config.get("strategy", "")
        tool_names = config.get("tools", [])

-        # Validate tool names against catalog
        valid_tools = [t for t in tool_names if t in TOOL_CATALOG]
        if not valid_tools:
            logger.warning("No valid tools selected by LLM, using fallback")
            return self._create_fallback_agent(capability_gap)

-        # Build agent
        agent = BaseAgent(self.llm, self.graph)
        agent.name = agent_name
        agent.role = f"{role_text}\n\nInvestigation Strategy:\n{strategy}"
--- a/agents/android_artifact.py
+++ b/agents/android_artifact.py
@@ -0,0 +1,58 @@
+"""Android Artifact Agent — multi-partition analysis of raw Android dumps.
+
+DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is
+its own analysable surface. Ext4-backed partitions (typically SYSTEM,
+USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw
+partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with
+``search_strings``. Userdata is the prize and is often FBE-encrypted on
+modern devices — the agent must check fsstat before assuming readability
+(see ``probe_android_partitions`` for the survey).
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class AndroidArtifactAgent(BaseAgent):
+    name = "android_artifact"
+    role = (
+        "Android forensic analyst. You navigate raw Android disk dumps "
+        "(blk0_sda-style images) partition by partition. Workflow: call "
+        "probe_android_partitions ONCE to map the disk; pick the partitions "
+        "with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, "
+        "EFS); for each, call set_active_partition(offset_from_512_sector_column) "
+        "and then list_directory / extract_file / search_strings as usual. "
+        "For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly "
+        "to search_strings — they have no filesystem. If USERDATA shows "
+        "fs_type=unknown it is almost certainly FBE-encrypted: record that "
+        "as a negative finding (the absence IS evidence) and move on to "
+        "what's reachable."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            # Android-specific
+            "probe_android_partitions",
+            "set_active_partition",
+            # Reused TSK toolset — partition_offset comes from active_source
+            "partition_info", "filesystem_info", "list_directory",
+            "extract_file", "find_file", "search_strings",
+            "count_deleted_files", "build_filesystem_timeline",
+            # Generic parsers
+            "read_text_file", "read_binary_preview", "search_text_file",
+            "read_text_file_section", "list_extracted_dir", "find_files",
+            # SQLite — Android apps store data in sqlite too (WhatsApp, etc.)
+            "sqlite_tables", "sqlite_query",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
--- a/agents/hypothesis.py
+++ b/agents/hypothesis.py
@@ -1,12 +1,17 @@
-"""Hypothesis Agent — analyzes phenomena and generates investigative hypotheses."""
+"""Hypothesis Agent — generates investigative hypotheses from phenomena.
+
+Generates hypotheses only. Phenomenon→Hypothesis linking is handled centrally
+by Orchestrator._judge_new_phenomena. Tool set is restricted to read-only
+graph queries + add_hypothesis to prevent the agent from creating phenomena,
+leads, or entity links.
+"""

 from __future__ import annotations

-import json
 import logging

 from base_agent import BaseAgent
-from evidence_graph import EvidenceGraph, HYPOTHESIS_EDGE_WEIGHTS
+from evidence_graph import EvidenceGraph
 from llm_client import LLMClient

 logger = logging.getLogger(__name__)
@@ -17,19 +22,19 @@ class HypothesisAgent(BaseAgent):
    role = (
        "Hypothesis analyst. You review all phenomena discovered so far "
        "and formulate investigative hypotheses about what happened on this system. "
-        "Your ultimate goal: build the most complete picture of events that occurred. "
-        "For each hypothesis, identify which existing phenomena support or contradict it."
+        "Your ultimate goal: build the most complete picture of events that occurred."
    )
+    mandatory_record_tools = ("add_hypothesis",)

    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
        super().__init__(llm, graph)
        self._register_hypothesis_tools()

+    def _register_graph_tools(self) -> None:
+        """Restrict to read-only graph tools. add_hypothesis is registered separately."""
+        self._register_graph_read_tools()
+
    def _register_hypothesis_tools(self) -> None:
-        """Register hypothesis-specific tools."""
-
-        valid_edge_types = list(HYPOTHESIS_EDGE_WEIGHTS.keys())
-
        self.register_tool(
            name="add_hypothesis",
            description=(
@@ -53,42 +58,30 @@ class HypothesisAgent(BaseAgent):
            executor=self._add_hypothesis,
        )

-        self.register_tool(
-            name="link_phenomenon_to_hypothesis",
-            description=(
-                "Link an existing phenomenon to a hypothesis with a relationship type. "
-                f"Valid relationship types: {', '.join(valid_edge_types)}. "
-                "direct_evidence = the phenomenon IS the hypothesis. "
-                "supports = consistent with the hypothesis. "
-                "prerequisite_met = a necessary condition is satisfied. "
-                "consequence_observed = an expected result of the hypothesis is found. "
-                "contradicts = directly contradicts the hypothesis. "
-                "weakens = makes the hypothesis less likely."
-            ),
-            input_schema={
-                "type": "object",
-                "properties": {
-                    "phenomenon_id": {
-                        "type": "string",
-                        "description": "ID of the phenomenon (e.g. 'ph-a1b2c3d4').",
-                    },
-                    "hypothesis_id": {
-                        "type": "string",
-                        "description": "ID of the hypothesis (e.g. 'hyp-e5f6g7h8').",
-                    },
-                    "edge_type": {
-                        "type": "string",
-                        "enum": valid_edge_types,
-                        "description": "The edge_type of the relationship.",
-                    },
-                    "reason": {
-                        "type": "string",
-                        "description": "The reason this relationship holds (1-2 sentences).",
-                    },
-                },
-                "required": ["phenomenon_id", "hypothesis_id", "edge_type", "reason"],
-            },
-            executor=self._link_phenomenon_to_hypothesis,
+    def _build_system_prompt(self, task: str) -> str:
+        """Focused prompt — no INVESTIGATE/RECORD/LINK workflow."""
+        return (
+            f"You are {self.name}, a forensic hypothesis analyst.\n"
+            f"Role: {self.role}\n\n"
+            f"Image: {self.graph.image_path}\n"
+            f"Current investigation state: {self.graph.stats_summary()}\n\n"
+            f"Your task: {task}\n\n"
+            f"WORKFLOW:\n"
+            f"1. Call list_phenomena and search_graph to review existing findings.\n"
+            f"2. For each hypothesis you want to record, call add_hypothesis (title + description).\n"
+            f"3. STOP after you have generated 3-7 hypotheses. Do not call any more tools.\n\n"
+            f"STRICT BOUNDARIES:\n"
+            f"- Your only mutation tool is add_hypothesis. Do NOT attempt list_directory, "
+            f"parse_registry_key, extract_file, or any disk-image investigation tools — "
+            f"they are not yours and you will get 'unknown tool' errors.\n"
+            f"- You CANNOT create phenomena, leads, or entity links. The orchestrator handles "
+            f"all phenomenon↔hypothesis linking after you finish.\n"
+            f"- Each hypothesis must be specific and testable. Avoid generic templates like "
+            f"'Unauthorized Remote Access' or 'Malware Deployment' unless concrete phenomena "
+            f"in the graph already point to them.\n"
+            f"- If the graph is empty, generate broad starting hypotheses and mark them "
+            f"clearly as exploratory in their description so downstream agents know they "
+            f"still need evidence."
        )

    async def _add_hypothesis(self, title: str, description: str) -> str:
@@ -98,33 +91,3 @@ class HypothesisAgent(BaseAgent):
            created_by=self.name,
        )
        return f"Hypothesis created: {hid} — {title} (confidence: 0.50)"
-
-    async def _link_phenomenon_to_hypothesis(
-        self,
-        phenomenon_id: str,
-        hypothesis_id: str,
-        edge_type: str = "",
-        reason: str = "",
-        # Common LLM misnaming — accept as fallbacks
-        relationship: str = "",
-        note: str = "",
-    ) -> str:
-        edge_type = edge_type or relationship
-        reason = reason or note
-        if not edge_type:
-            return "Error: edge_type is required."
-        try:
-            new_conf = await self.graph.update_hypothesis_confidence(
-                hyp_id=hypothesis_id,
-                phenomenon_id=phenomenon_id,
-                edge_type=edge_type,
-                reason=reason,
-            )
-            weight = HYPOTHESIS_EDGE_WEIGHTS[edge_type]
-            direction = "+" if weight > 0 else ""
-            return (
-                f"Linked: {phenomenon_id} —[{edge_type}]→ {hypothesis_id} "
-                f"(weight: {direction}{weight}, new confidence: {new_conf:.3f})"
-            )
-        except ValueError as e:
-            return f"Error linking: {e}"
--- a/agents/ios_artifact.py
+++ b/agents/ios_artifact.py
@@ -0,0 +1,49 @@
+"""iOS Artifact Agent — analyses unpacked iOS extractions.
+
+DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family
+the system handles (alongside disk images and pcaps). This agent owns the
+iOS-specific toolset; the grounded ``add_phenomenon`` contract from
+BaseAgent applies unchanged — every fact must cite a tool invocation.
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class IOSArtifactAgent(BaseAgent):
+    name = "ios_artifact"
+    role = (
+        "iOS forensic analyst. You analyse unpacked iOS extractions — "
+        "binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, "
+        "AddressBook.sqlitedb), the keychain (keychain-2.db), and the "
+        "iDevice_info.txt summary — to extract device identity, accounts, "
+        "messaging, contacts, and credential metadata. Domain-rooted iOS "
+        "trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) "
+        "are your map; navigate by path, not by inode."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            # navigation — find_files is the workhorse on 10k+-file iOS trees;
+            # list_extracted_dir is for initial layout summary only.
+            "list_extracted_dir", "find_files",
+            "read_text_file", "read_text_file_section", "read_binary_preview",
+            "search_text_file",
+            # iOS-specific parsers
+            "parse_plist",
+            "sqlite_tables", "sqlite_query",
+            "parse_ios_keychain",
+            "read_idevice_info",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
--- a/agents/media.py
+++ b/agents/media.py
@@ -0,0 +1,52 @@
+"""Media Agent — OCR-based analysis of screenshot/photo evidence.
+
+DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG
+evidence must go through tesseract first. The agent runs OCR, then
+records extracted strings — especially identifiers (wallet addresses,
+phone numbers, usernames) — via the grounded observe_identity gateway so
+they participate in cross-source coref the same way iOS keychain entries
+or Windows account names do.
+
+If the OCR runtime is missing on the host, ocr_image returns an explicit
+install hint; the agent should record that as a negative finding ("no
+text extracted — tesseract not installed") rather than guessing.
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class MediaAgent(BaseAgent):
+    name = "media"
+    role = (
+        "Media / OCR forensic analyst. You analyse screenshots, photos, and "
+        "scanned documents — any pixel-based evidence the LLM cannot read "
+        "directly. Workflow: list_extracted_dir to enumerate images, "
+        "ocr_image on each promising one, then add_phenomenon (with the "
+        "OCR'd text as the verified_fact value) and observe_identity for "
+        "any wallet addresses, phone numbers, email addresses, or "
+        "usernames the text contains. If OCR fails because tesseract is "
+        "missing, RECORD that as a negative finding instead of fabricating "
+        "image content — the absence is a real fact about this run."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            "ocr_image",
+            "list_extracted_dir", "find_files",
+            "read_binary_preview",
+            "read_text_file",
+            "search_text_file",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
--- a/agents/report.py
+++ b/agents/report.py
@@ -2,9 +2,6 @@

 from __future__ import annotations

-import json
-import os
-
 from base_agent import BaseAgent
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient
@@ -15,34 +12,60 @@ class ReportAgent(BaseAgent):
    role = (
        "Forensic report writer. You synthesize all findings from the investigation "
        "into a structured, professional forensic analysis report organized by hypotheses.\n\n"
-        "IMPORTANT: Only include findings that have a source_tool attribution (marked VERIFIED). "
-        "If evidence lacks source attribution, mark it as UNVERIFIED. "
-        "Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence.\n\n"
-        "CRITICAL: You MUST call save_report to write the final report."
+        "Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), "
+        "TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When "
+        "writing the report, render verified_facts as primary evidence with their "
+        "invocation citations, and render interpretation as 'agent analysis' so the "
+        "reader can tell ground truth from inference. Do NOT invent or fabricate any "
+        "data, timestamps, or findings not present in the evidence.\n\n"
+        "This is a cross-source case: phenomena come from multiple evidence "
+        "sources, and entities discovered on different sources may refer to the "
+        "same real-world actor. ALWAYS include:\n"
+        "  - 'Findings by Source' section sourced from get_phenomena_by_source\n"
+        "  - 'Actor Clusters' section sourced from get_actor_clusters (the "
+        "cross-source attribution view — multi-source clusters answer "
+        "'which findings on different devices belong to the same person')\n"
+        "  - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix"
    )
+    # Calling save_report is BOTH the recording action and the completion
+    # signal. tool_call_loop returns the moment save_report executes; the
+    # tool's return value becomes the agent's final_text. The forced-retry
+    # mechanism fires if save_report is never called.
+    mandatory_record_tools = ("save_report",)
+    terminal_tools = ("save_report",)

    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
        super().__init__(llm, graph)
        self._register_tools()

+    def _register_graph_tools(self) -> None:
+        """Restrict to read-only graph tools. Report agent does not mutate state."""
+        self._register_graph_read_tools()
+
    def _build_system_prompt(self, task: str) -> str:
-        """Report agent gets a clean prompt — no Phase A/B/C/D workflow."""
        return (
            f"You are a forensic report writer.\n"
            f"Role: {self.role}\n\n"
            f"Investigation state:\n{self.graph.stats_summary()}\n\n"
            f"Your task: {task}\n\n"
            f"WORKFLOW:\n"
-            f"1. Call get_hypotheses_with_evidence to get all hypotheses and their linked evidence\n"
-            f"2. Call get_all_phenomena to get detailed findings by category\n"
-            f"3. Call get_entities to get people, programs, and hosts\n"
-            f"4. Call get_case_info for case metadata\n"
-            f"5. Write the complete report directly in your <answer> block\n\n"
+            f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n"
+            f"   get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n"
+            f"   and get_phenomena_by_source in parallel — these are the eight data\n"
+            f"   sources you assemble the report from.\n"
+            f"2. Assemble the complete markdown forensic report. Cross-source\n"
+            f"   actor clusters and per-source breakdown are MANDATORY sections.\n"
+            f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
+            f"   This single call is the completion signal — the run ENDS the moment it executes.\n"
+            f"   Do NOT call any read tools after this point; they will not run.\n"
+            f"   Do NOT write the report as free text outside of save_report; only the\n"
+            f"   `content` argument of save_report is persisted.\n\n"
            f"RULES:\n"
-            f"- Write the report DIRECTLY in <answer> — do NOT use save_report tool\n"
-            f"- Only include findings present in the evidence graph\n"
-            f"- Do NOT invent timestamps, file paths, or data not in the phenomena\n"
-            f"- The report must be complete — do not cut off mid-section\n"
+            f"- The report must be the complete markdown — do not cut off mid-section.\n"
+            f"- Only include findings present in the evidence graph.\n"
+            f"- Do NOT invent timestamps, file paths, or data not in the phenomena.\n"
+            f"- The `content` argument can be 10K+ chars. JSON-escape inner quotes (\\\") and\n"
+            f"  backslashes (\\\\) and newlines (\\n) correctly.\n"
        )

    def _register_tools(self) -> None:
@@ -74,6 +97,45 @@ class ReportAgent(BaseAgent):
            executor=self._get_entities,
        )

+        self.register_tool(
+            name="get_hypothesis_evidence_matrix",
+            description=(
+                "Render the hypothesis × evidence pivot as a markdown table. "
+                "Columns: per edge_type counts, log_odds, confidence, status. "
+                "Embed this directly in the report to show how each hypothesis "
+                "stands relative to the others on a single screen."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_hypothesis_evidence_matrix,
+        )
+
+        self.register_tool(
+            name="get_actor_clusters",
+            description=(
+                "Render the cross-source actor clusters: each cluster is the "
+                "set of Entity nodes the system currently treats as the same "
+                "actor (via active same_as edges backed by coref hypotheses "
+                "≥ 0.8). Includes the aggregated identifier evidence per "
+                "cluster. Use this in the report's 'Entities / Actors' "
+                "section so readers see who-is-who across devices, not just "
+                "raw entity rows."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_actor_clusters,
+        )
+
+        self.register_tool(
+            name="get_phenomena_by_source",
+            description=(
+                "Group every phenomenon by its originating evidence source "
+                "(source_id). Use this to drive the report's 'Findings by "
+                "Source' section so each evidence item's per-device "
+                "contribution is auditable."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_phenomena_by_source,
+        )
+
        self.register_tool(
            name="save_report",
            description="Save the final report to a file.",
@@ -106,12 +168,24 @@ class ReportAgent(BaseAgent):
            items = [ph for ph in phenomena.values() if ph.category == cat]
            lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---")
            for ph in items:
-                verified = "VERIFIED" if ph.source_tool else "UNVERIFIED"
-                lines.append(f"\n[{verified}] {ph.title} ({ph.id})")
+                # Grounded = at least one verified fact AND a source_tool.
+                grounded = bool(ph.verified_facts) and bool(ph.source_tool)
+                marker = "GROUNDED" if grounded else (
+                    "TOOL-ONLY" if ph.source_tool else "UNVERIFIED"
+                )
+                lines.append(f"\n[{marker}] {ph.title} ({ph.id})")
                lines.append(f"  Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}")
                if ph.timestamp:
                    lines.append(f"  Timestamp: {ph.timestamp}")
-                lines.append(f"  {ph.description[:500]}")
+                if ph.verified_facts:
+                    lines.append(f"  Verified facts ({len(ph.verified_facts)}):")
+                    for f in ph.verified_facts:
+                        lines.append(
+                            f"    - [{f.get('type','?')}] {str(f.get('value',''))[:200]} "
+                            f"(cite: {f.get('invocation_id','?')})"
+                        )
+                if ph.interpretation:
+                    lines.append(f"  Analysis: {ph.interpretation[:500]}")
        return "\n".join(lines)

    async def _get_hypotheses_with_evidence(self) -> str:
@@ -141,12 +215,87 @@ class ReportAgent(BaseAgent):
        return "\n".join(lines)

    async def _get_case_info(self) -> str:
-        info = self.graph.case_info
        lines = ["=== Case Information ==="]
-        for k, v in info.items():
-            lines.append(f"  {k}: {v}")
-        lines.append(f"  Image path: {self.graph.image_path}")
-        lines.append(f"  Partition offset: {self.graph.partition_offset}")
+        case = self.graph.case
+        if case is not None:
+            lines.append(f"  case_id: {case.case_id}")
+            lines.append(f"  name: {case.name}")
+            for k, v in (case.meta or {}).items():
+                lines.append(f"  {k}: {v}")
+            lines.append(f"  sources: {len(case.sources)}")
+            for s in case.sources:
+                owner = f", owner={s.owner}" if s.owner else ""
+                platform = s.meta.get("platform") if s.meta else None
+                plat = f", platform={platform}" if platform else ""
+                lines.append(
+                    f"    - {s.id}: {s.label} "
+                    f"(type={s.type}, mode={s.access_mode}{plat}{owner})"
+                )
+        else:
+            # Legacy single-image fallback — surface whatever case_info dict
+            # was passed in (e.g. the old CFReDS MD5 block).
+            for k, v in (self.graph.case_info or {}).items():
+                lines.append(f"  {k}: {v}")
+            lines.append(f"  Image path: {self.graph.image_path}")
+            lines.append(f"  Partition offset: {self.graph.partition_offset}")
+        return "\n".join(lines)
+
+    async def _get_hypothesis_evidence_matrix(self) -> str:
+        return self.graph.hypothesis_evidence_matrix_markdown()
+
+    async def _get_actor_clusters(self) -> str:
+        clusters = self.graph.actor_clusters()
+        if not clusters:
+            return "(no entities recorded)"
+        # Show multi-member clusters first — they're the cross-source links
+        # the human reader most needs to see.
+        clusters.sort(key=lambda c: (-len(c["members"]), c["members"]))
+        lines = [f"=== Actor Clusters ({len(clusters)}) ==="]
+        for i, c in enumerate(clusters, 1):
+            members = c["members"]
+            label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity"
+            lines.append(f"\n[{label} #{i}] {len(members)} member(s):")
+            for eid in members:
+                ent = self.graph.entities.get(eid)
+                if ent:
+                    lines.append(f"  - {ent.summary()}")
+            if c["identifiers"]:
+                lines.append("  Aggregated identifiers:")
+                for ident in c["identifiers"]:
+                    strong_tag = "strong" if ident.get("strong") else "weak"
+                    lines.append(
+                        f"    [{strong_tag}] {ident.get('type')}={ident.get('value')} "
+                        f"(on {ident.get('on_entity')})"
+                    )
+            if c["coref_hypotheses"]:
+                lines.append("  Backing coref hypotheses (≥0.8 active):")
+                for hid in c["coref_hypotheses"]:
+                    hyp = self.graph.hypotheses.get(hid)
+                    if hyp:
+                        lines.append(f"    - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}")
+        return "\n".join(lines)
+
+    async def _get_phenomena_by_source(self) -> str:
+        by_src: dict[str, list] = {}
+        for ph in self.graph.phenomena.values():
+            by_src.setdefault(ph.source_id or "(unbound)", []).append(ph)
+        if not by_src:
+            return "(no phenomena recorded)"
+        # Resolve source labels via graph.case when possible.
+        def _label(src_id: str) -> str:
+            if self.graph.case:
+                src = self.graph.case.get_source(src_id)
+                if src:
+                    return f"{src_id} — {src.label} ({src.type})"
+            return src_id
+
+        lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="]
+        for src_id in sorted(by_src):
+            phs = by_src[src_id]
+            lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---")
+            for ph in phs:
+                grounded = "G" if ph.verified_facts and ph.source_tool else "·"
+                lines.append(f"  [{grounded}] {ph.summary()}")
        return "\n".join(lines)

    async def _get_entities(self) -> str:
@@ -165,27 +314,42 @@ class ReportAgent(BaseAgent):
        return "\n".join(lines)

    async def _verify_phenomena(self) -> str:
-        verified = []
-        unverified = []
+        grounded: list[str] = []
+        tool_only: list[str] = []
+        unverified: list[str] = []
        for ph in self.graph.phenomena.values():
-            entry = f"  [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})"
-            if ph.source_tool:
-                verified.append(entry)
+            nf = len(ph.verified_facts)
+            entry = (
+                f"  [{ph.category}] {ph.title} "
+                f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})"
+            )
+            if ph.verified_facts and ph.source_tool:
+                grounded.append(entry)
+            elif ph.source_tool:
+                tool_only.append(entry)
            else:
                unverified.append(entry)

        lines = ["=== Phenomena Verification Report ==="]
-        lines.append(f"\nVERIFIED ({len(verified)} — have source_tool):")
-        lines.extend(verified)
+        lines.append(f"\nGROUNDED ({len(grounded)} — facts + source_tool):")
+        lines.extend(grounded)
+        lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):")
+        lines.extend(tool_only)
        lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):")
        lines.extend(unverified)
        return "\n".join(lines)

    async def _save_report(self, content: str, output_path: str) -> str:
-        try:
-            os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
-            with open(output_path, "w") as f:
-                f.write(content)
-            return f"Report saved to {output_path} ({len(content)} chars)"
-        except Exception as e:
-            return f"Error saving report: {e}"
+        """Save the report and return the content itself.
+
+        The content is returned (rather than a "saved to ..." status string)
+        so that when tool_call_loop short-circuits on this terminal tool,
+        `final_text` is the full markdown — orchestrator writes it to the
+        canonical report.md path under runs/<ts>/.
+
+        The output_path argument is kept for backward compat but the model's
+        chosen path is ignored — the orchestrator owns the persistence path.
+        """
+        if not content:
+            return ""
+        return content
--- a/agents/strategist.py
+++ b/agents/strategist.py
@@ -0,0 +1,134 @@
+"""InvestigationStrategist — the LLM that decides depth vs breadth.
+
+DESIGN_STRATEGIST.md §3.
+
+The strategist does NOT run forensic tools. Its job per round is exactly one
+decision: propose 1-3 leads that would move an active hypothesis, OR declare
+the investigation complete. It reads the graph through four read-only views
+(graph_overview / source_coverage / marginal_yield / budget_status) and
+expresses its decision through two write tools (propose_lead /
+declare_investigation_complete).
+
+This is the smallest possible agent in the system — the entire point is that
+strategy decisions live in one agent so they're auditable and the rest of the
+codebase doesn't carry implicit depth/breadth policy.
+"""
+
+from __future__ import annotations
+
+import logging
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+logger = logging.getLogger(__name__)
+
+
+class InvestigationStrategist(BaseAgent):
+    name = "strategist"
+    role = (
+        "Investigation strategist. You do not run forensic tools yourself. "
+        "Each round you take ONE decision: propose 1-3 new investigation leads "
+        "that would materially affect an active hypothesis, OR declare the "
+        "investigation complete. Your judgment is grounded in the graph "
+        "(hypotheses, sources, coverage, marginal yield, budget) — never in "
+        "speculation."
+    )
+    # At least one of these must be called every round, otherwise BaseAgent's
+    # forced RECORD retry kicks in and re-prompts the strategist to take a
+    # documented decision.
+    mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
+    # declare_complete is terminal — calling it short-circuits the tool loop,
+    # which is what we want (strategist returns immediately on "done").
+    terminal_tools = ("declare_investigation_complete",)
+
+    # Strategist-specific tools, plus the read-only graph queries inherited
+    # from BaseAgent. NO graph write tools (no add_phenomenon / link_to_entity
+    # / observe_identity); the strategist must NOT mutate evidence directly.
+    _STRATEGY_TOOLS = (
+        "graph_overview",
+        "source_coverage",
+        "marginal_yield",
+        "budget_status",
+        "propose_lead",
+        "declare_investigation_complete",
+    )
+
+    def _register_graph_tools(self) -> None:
+        """Strategist gets read-only graph queries + the six strategy tools.
+
+        It does NOT get write tools (no add_phenomenon, observe_identity,
+        link_to_entity, add_temporal_edge). Every graph mutation must come
+        from a dispatched worker, not from the planner.
+        """
+        self._register_graph_read_tools()
+        for tool_name in self._STRATEGY_TOOLS:
+            td = TOOL_CATALOG.get(tool_name)
+            if td is None:
+                logger.warning(
+                    "Strategist could not find tool %s in TOOL_CATALOG — "
+                    "register_all_tools must run before agent instantiation.",
+                    tool_name,
+                )
+                continue
+            self.register_tool(td.name, td.description, td.input_schema, td.executor)
+
+    def _build_system_prompt(self, task: str) -> str:
+        """Strategist-specific prompt. Replaces the BaseAgent default which
+        walks an INVESTIGATE→RECORD→LINK workflow that is wrong for a
+        planner agent.
+        """
+        return (
+            f"You are {self.name}, the investigation strategist.\n"
+            f"Role: {self.role}\n\n"
+            f"Your task: {task}\n\n"
+            f"WORKFLOW (do this exactly):\n"
+            f"  1. Call graph_overview FIRST. Look at: which hypotheses are\n"
+            f"     active (conf 0.2-0.8) vs already supported/refuted; which\n"
+            f"     ones have many edges but only 1 distinct_source; which had\n"
+            f"     a recent_flip vs none in two rounds.\n"
+            f"  2. Call marginal_yield to see if the last rounds produced anything.\n"
+            f"  3. Call budget_status to know your runway.\n"
+            f"  4. For each candidate lead direction, call source_coverage on\n"
+            f"     the relevant source to see what's been touched.\n"
+            f"  5. Take exactly ONE of these terminal actions:\n"
+            f"     (a) Call propose_lead 1-3 times for leads that would\n"
+            f"         materially move an active hypothesis. STOP after this.\n"
+            f"     (b) Call declare_investigation_complete with a specific\n"
+            f"         reason. STOP after this.\n"
+            f"\n"
+            f"DECISION CRITERIA — when to propose vs when to stop:\n"
+            f"  PROPOSE when:\n"
+            f"    - A hypothesis is supported only by ONE source — get\n"
+            f"      cross-source corroboration. Same-source repeats are\n"
+            f"      cheap (harmonic damping).\n"
+            f"    - A hypothesis is in the active band (0.2 < conf < 0.8) —\n"
+            f"      it needs the deciding evidence.\n"
+            f"    - A high-value artefact is ✗ on source_coverage AND an\n"
+            f"      active hypothesis depends on the kind of evidence that\n"
+            f"      artefact would produce.\n"
+            f"  STOP (declare_complete) when:\n"
+            f"    - marginal_yield shows zero across 2+ rounds.\n"
+            f"    - budget_status warns ≥90% on tool_calls or rounds.\n"
+            f"    - all active hypotheses are resolved (supported or refuted).\n"
+            f"    - coverage saturation: every ✗ on every source is irrelevant\n"
+            f"      to active hypotheses.\n"
+            f"\n"
+            f"HARD RULES:\n"
+            f"  - You CANNOT call investigation tools (list_directory,\n"
+            f"    sqlite_query, parse_registry_key, extract_file, etc.) — your\n"
+            f"    job is to direct workers, not to investigate yourself.\n"
+            f"  - You CANNOT call write tools (add_phenomenon, observe_identity,\n"
+            f"    link_to_entity, add_hypothesis, add_temporal_edge). All\n"
+            f"    evidence mutations come from the workers you dispatch.\n"
+            f"  - Every propose_lead MUST cite a real hyp-id from\n"
+            f"    graph_overview's table — fabricated ids will be rejected.\n"
+            f"  - Don't propose more than 3 leads in one round. Quality over\n"
+            f"    quantity — a 4th lead almost always means you're not really\n"
+            f"    sure what would move the graph.\n"
+            f"  - Don't re-propose a lead that's already pending. The system\n"
+            f"    deduplicates (motivating_hyp, expected_type, agent, source)\n"
+            f"    so duplicates silently no-op, but they waste your budget."
+        )
--- a/agents/timeline.py
+++ b/agents/timeline.py
@@ -1,14 +1,21 @@
-"""Timeline Agent — correlates evidence across time."""
+"""Timeline Agent — connects existing phenomena with temporal edges.
+
+Operates on phenomena already in the graph. Does NOT investigate the disk
+image itself. The agent's only useful output is the temporal edges it
+creates between phenomena.
+"""

 from __future__ import annotations

-import json
+import logging

 from base_agent import BaseAgent
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient
 from tool_registry import TOOL_CATALOG

+logger = logging.getLogger(__name__)
+

 class TimelineAgent(BaseAgent):
    name = "timeline"
@@ -17,29 +24,39 @@ class TimelineAgent(BaseAgent):
        "MAC timestamps and correlate events across all phenomena categories in the "
        "evidence graph to reconstruct the sequence of activities on the system."
    )
+    mandatory_record_tools = ("add_temporal_edge",)

    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
        super().__init__(llm, graph)
        self._register_tools()

+    def _register_graph_tools(self) -> None:
+        """Restrict to read-only graph tools — Timeline does not add phenomena."""
+        self._register_graph_read_tools()
+
    def _register_tools(self) -> None:
-        # Filesystem timeline tool from catalog
        td = TOOL_CATALOG.get("build_filesystem_timeline")
        if td:
            self.register_tool(td.name, td.description, td.input_schema, td.executor)

-        # Custom tool to get all phenomena with timestamps for correlation
        self.register_tool(
            name="get_timestamped_phenomena",
-            description="Get all phenomena that have timestamps, sorted chronologically. Use for timeline correlation.",
+            description=(
+                "Get all phenomena that have timestamps, sorted chronologically. "
+                "Returns each phenomenon's id, category, title, and a short description "
+                "preview. Use this as your primary input for temporal correlation."
+            ),
            input_schema={"type": "object", "properties": {}},
            executor=self._get_timestamped_phenomena,
        )

-        # Tool to add temporal edges between phenomena
        self.register_tool(
            name="add_temporal_edge",
-            description="Add a temporal relationship between two phenomena (before, after, or concurrent).",
+            description=(
+                "Add a temporal relationship edge between two existing phenomena. "
+                "Use 'before' when source phenomenon happened before target, "
+                "'concurrent' when they occurred within seconds of each other."
+            ),
            input_schema={
                "type": "object",
                "properties": {
@@ -56,6 +73,42 @@ class TimelineAgent(BaseAgent):
            executor=self._add_temporal_edge,
        )

+    def _build_system_prompt(self, task: str) -> str:
+        """Focused prompt — Timeline connects existing phenomena, doesn't investigate."""
+        return (
+            f"You are {self.name}, a forensic timeline correlation analyst.\n"
+            f"Role: {self.role}\n\n"
+            f"Image: {self.graph.image_path}\n"
+            f"Current state: {self.graph.stats_summary()}\n\n"
+            f"Your task: {task}\n\n"
+            f"WORKFLOW:\n"
+            f"1. Call build_filesystem_timeline once to materialize MAC times for the disk.\n"
+            f"2. Call get_timestamped_phenomena to see all phenomena with timestamps, "
+            f"sorted chronologically. THIS IS YOUR PRIMARY INPUT.\n"
+            f"3. For each meaningful temporal relationship between phenomena, call "
+            f"add_temporal_edge(source_id, target_id, relation). Use 'before' when "
+            f"source happened first (the common case); 'concurrent' for events within "
+            f"a few seconds of each other.\n"
+            f"   Examples of meaningful connections:\n"
+            f"     - 'Cain installer executed' (before) 'Cain.exe first execution'\n"
+            f"     - 'WHOIS first lookup'      (before) 'WHOIS second lookup'\n"
+            f"     - 'Recon tool cluster'      (before) 'Anti-forensics defrag'\n"
+            f"     - 'Tool installation'       (before) 'Tool execution'\n"
+            f"4. Aim for 15-40 temporal edges that connect the major events into a "
+            f"forensic story.\n"
+            f"5. STOP after recording all meaningful temporal edges. Do not call any more tools.\n\n"
+            f"STRICT BOUNDARIES:\n"
+            f"- Your job is to CONNECT existing phenomena, NOT to discover new ones. "
+            f"You CANNOT call add_phenomenon — the tool isn't yours.\n"
+            f"- Use ONLY phenomenon IDs returned by get_timestamped_phenomena or "
+            f"list_phenomena. NEVER fabricate IDs.\n"
+            f"- Connect events that tell a forensic story (recon -> exploit -> cover-up). "
+            f"Do not exhaustively pair every two phenomena; focus on causally-relevant "
+            f"sequences.\n"
+            f"- The orchestrator handles report writing in the next phase. Your only "
+            f"output that propagates is the temporal edges you create."
+        )
+
    async def _get_timestamped_phenomena(self) -> str:
        items = [
            ph for ph in self.graph.phenomena.values()
@@ -69,7 +122,15 @@ class TimelineAgent(BaseAgent):
        lines = []
        for ph in items:
            lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})")
-            lines.append(f"  {ph.description[:150]}")
+            preview = ph.interpretation[:150] if ph.interpretation else ""
+            if ph.verified_facts:
+                fact_preview = ", ".join(
+                    f"{f.get('type','?')}={str(f.get('value',''))[:40]}"
+                    for f in ph.verified_facts[:3]
+                )
+                preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]"
+            if preview:
+                lines.append(f"  {preview}")
        return "\n".join(lines)

    async def _add_temporal_edge(
--- a/base_agent.py
+++ b/base_agent.py
@@ -5,6 +5,7 @@ from __future__ import annotations
 import json
 import logging
 import time
+import uuid
 from typing import Any

 from evidence_graph import EvidenceGraph
@@ -31,12 +32,30 @@ class BaseAgent:
    name: str = "base"
    role: str = "A forensic analysis agent."

+    # Tools the agent MUST invoke at least once for the run to count as productive.
+    # If none of these were called when tool_call_loop returns, run() fires a
+    # forced retry with an explicit "you forgot to record" instruction.
+    # Subclasses override to declare their own recording responsibility
+    # (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
+    # observe_identity (S5) counts as a recording too — it writes through the
+    # same grounding gateway and produces an identity_observation phenomenon.
+    mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity")
+
+    # Tools whose invocation ends the run immediately. After any terminal tool
+    # is called, tool_call_loop returns with that tool's result text as
+    # final_text. Used by agents whose "completion" is a single explicit
+    # action rather than "model decides to stop calling tools". For multi-call
+    # agents (filesystem records many phenomena) leave empty.
+    terminal_tools: tuple[str, ...] = ()
+
    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
        self.llm = llm
        self.graph = graph
        self._tools: dict[str, dict] = {}  # name -> schema
        self._executors: dict[str, Any] = {}  # name -> async callable
+        self._record_call_counts: dict[str, int] = {}
        self._work_log: list[str] = []
+        self._current_lead_id: str | None = None

    def register_tool(
        self,
@@ -51,7 +70,18 @@ class BaseAgent:
            "description": description,
            "input_schema": input_schema,
        }
-        self._executors[name] = executor
+        if name in self.mandatory_record_tools:
+            self._executors[name] = self._wrap_record_executor(name, executor)
+        else:
+            self._executors[name] = executor
+
+    def _wrap_record_executor(self, name: str, executor: Any) -> Any:
+        """Wrap a mandatory-record executor to count successful invocations."""
+        async def wrapped(*args, **kwargs):
+            result = await executor(*args, **kwargs)
+            self._record_call_counts[name] = self._record_call_counts.get(name, 0) + 1
+            return result
+        return wrapped

    def get_tool_definitions(self) -> list[dict]:
        """Get tool definitions in Claude API format."""
@@ -83,37 +113,68 @@ class BaseAgent:
            f"  Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n"
            f"  Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n"
            f"  Create add_lead for anything outside your expertise.\n\n"
-            f"Phase B — RECORD PHENOMENA:\n"
-            f"  For EACH significant finding from Phase A, call add_phenomenon.\n"
+            f"Phase B — RECORD PHENOMENA (GROUNDED):\n"
+            f"  For EACH significant finding from Phase A, call add_phenomenon with:\n"
+            f"    * interpretation: your analysis — free text, NOT verified.\n"
+            f"    * verified_facts: one entry per concrete atom (path, timestamp,\n"
+            f"      inode, hash, identifier, count) you want recorded as truth.\n"
+            f"      Each entry MUST have:\n"
+            f"        - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n"
+            f"        - value: a VERBATIM substring from the tool output\n"
+            f"        - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n"
+            f"          header at the top of the tool result that produced this value\n"
+            f"  IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n"
+            f"  whenever you see an email, phone number, Apple ID, IMEI, wallet\n"
+            f"  address, MAC, UDID, persistent nickname, or display name. Same\n"
+            f"  grounding contract: value must be verbatim in the cited tool\n"
+            f"  output. This is HOW cross-source attribution gets built — without\n"
+            f"  it, we can't tell whether the Apple ID in keychain belongs to the\n"
+            f"  same person as the Windows account on the USB.\n"
            f"  Do NOT call link_to_entity yet — just record all phenomena first.\n\n"
            f"Phase C — LINK ENTITIES:\n"
            f"  FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
            f"  Then call link_to_entity for each relevant phenomenon.\n"
            f"  NEVER guess or fabricate a phenomenon ID. If an ID is not in list_phenomena output, it does not exist.\n\n"
-            f"Phase D — ANSWER:\n"
-            f"  Only give your <answer> AFTER completing Phases B and C.\n\n"
-            f"IMPORTANT:\n"
-            f"- You MUST call add_phenomenon at least once before finishing\n"
-            f"- Complete each phase before starting the next\n"
-            f"- Other agents can ONLY see what you write to the graph\n"
-            f"- If you don't record findings, they are LOST\n"
-            f"- Include relevant file paths, inode numbers, timestamps, and raw data\n\n"
-            f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
-            f"- ONLY record findings that appear VERBATIM in tool results you received\n"
-            f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
-            f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n"
-            f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n"
-            f"- Quote exact strings from tool output when recording evidence descriptions\n"
-            f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools"
+            f"Phase D — STOP:\n"
+            f"  Once all phenomena are recorded and entities linked, you are DONE.\n"
+            f"  Do not call any more tools. The orchestrator picks up automatically.\n\n"
+            f"CRITICAL — RECORDING REQUIREMENT:\n"
+            f"- Only graph mutations propagate to other agents and the final report.\n"
+            f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
+            f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
+            f"a registry key) and found NOTHING, that absence IS evidence — call "
+            f"add_phenomenon with a 'No matches for X' title, the search scope in "
+            f"raw_data, and cite the search tool's invocation_id (verified_facts may "
+            f"be empty for a true negative; the cited invocation in source_tool still "
+            f"anchors it). Negative findings constrain the hypothesis space.\n"
+            f"- If you stop without having called add_phenomenon at least once, the task "
+            f"is FAILED and a forced retry will fire.\n\n"
+            f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n"
+            f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n"
+            f"  is what you cite in each fact's invocation_id.\n"
+            f"- fact.value must be a substring of the cited invocation's output.\n"
+            f"  Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n"
+            f"  anything else fabricated is REJECTED with a per-fact reason.\n"
+            f"- On REJECTED: quote the literal text from the output (or drop the\n"
+            f"  fact), and put guesses / inferred paths / model names in\n"
+            f"  `interpretation` instead. Then call add_phenomenon again.\n"
+            f"- You may cite ONLY invocations made within THIS task."
        )

-    async def run(self, task: str) -> str:
+    async def run(self, task: str, lead_id: str | None = None) -> str:
        """Run this agent with a specific task."""
        _log(task, event="agent_start", agent=self.name)
        self.graph.agent_status[self.name] = "running"
        self.graph._current_agent = self.name
+        # Fresh task scope per agent run. Used by the grounding gateway to
+        # check that facts in add_phenomenon cite invocations made *within
+        # this run* — preventing the agent from forwarding stale IDs from
+        # earlier work or another agent.
+        self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}"
+        self._current_lead_id = lead_id

        self._register_graph_tools()
+        self._record_call_counts.clear()

        system = self._build_system_prompt(task)
        messages = [{"role": "user", "content": task}]
@@ -122,12 +183,75 @@ class BaseAgent:
        ph_before = len(self.graph.phenomena)

        try:
-            final_text, _ = await self.llm.tool_call_loop(
+            final_text, conversation = await self.llm.tool_call_loop(
                messages=messages,
                tools=self.get_tool_definitions(),
                tool_executor=self._executors,
                system=system,
+                terminal_tools=self.terminal_tools,
            )
+
+            # Forced-record retry: if the agent has any mandatory recording
+            # tools but never invoked any of them, force one more round with
+            # an explicit "you forgot to record" instruction. The mandatory
+            # set is declared on the class — Timeline → add_temporal_edge,
+            # Hypothesis → add_hypothesis, ReportAgent → (). For agents with
+            # empty mandatory_record_tools this branch is a no-op.
+            registered_mandatory = [
+                t for t in self.mandatory_record_tools if t in self._executors
+            ]
+            recorded_any = any(
+                self._record_call_counts.get(t, 0) > 0
+                for t in registered_mandatory
+            )
+            if registered_mandatory and not recorded_any:
+                missing = "/".join(registered_mandatory)
+                logger.warning(
+                    "[%s] finished without calling any of [%s] — forcing RECORD retry",
+                    self.name, missing,
+                )
+                conversation.append({
+                    "role": "user",
+                    "content": (
+                        f"STOP. You produced an answer without ever calling "
+                        f"{missing}. Your answer is DISCARDED — only graph "
+                        f"mutations propagate to other agents and the final "
+                        f"report.\n\n"
+                        f"You MUST now call {missing} for every significant "
+                        f"finding from your prior investigation, including "
+                        f"exact identifiers, timestamps, and the source_tool "
+                        f"that produced each finding. If you genuinely found "
+                        f"NOTHING noteworthy, call the recording tool ONCE "
+                        f"with a 'No significant findings' style entry "
+                        f"summarizing what you searched.\n\n"
+                        f"Do not run more investigation tools. Just record "
+                        f"what you already found. Then end."
+                    ),
+                })
+                # Narrow the retry tool surface so the agent can't wander off
+                # to investigate again — only RECORD and read-only graph
+                # query tools survive. Each grounding-rejected call burns one
+                # iteration, so the cap is 30 (not the original 10): a
+                # Timeline agent writing ~10 temporal edges with one rejection
+                # apiece needs ~20 turns under the rewritten gateway.
+                retry_tool_names = set(registered_mandatory) | {
+                    "list_phenomena", "list_assets", "search_graph",
+                    "add_temporal_edge", "link_to_entity", "add_lead",
+                    "add_hypothesis", "save_report",
+                }
+                retry_tools = [
+                    td for td in self.get_tool_definitions()
+                    if td["name"] in retry_tool_names
+                ]
+                final_text, _ = await self.llm.tool_call_loop(
+                    messages=conversation,
+                    tools=retry_tools,
+                    tool_executor=self._executors,
+                    system=system,
+                    max_iterations=30,
+                    terminal_tools=self.terminal_tools,
+                )
+
            self._work_log.append(f"[Task: {task[:80]}] -> {final_text[:150]}")
        except Exception:
            self.graph.agent_status[self.name] = "failed"
@@ -143,9 +267,17 @@ class BaseAgent:
    # ---- Graph interaction tools --------------------------------------------

    def _register_graph_tools(self) -> None:
-        """Register tools for querying and writing to the evidence graph."""
+        """Register graph query + mutation tools.

-        # --- Read tools ---
+        Subclasses can override to restrict the toolset. For example, a
+        read-only agent (hypothesis, report) overrides this to skip
+        _register_graph_write_tools.
+        """
+        self._register_graph_read_tools()
+        self._register_graph_write_tools()
+
+    def _register_graph_read_tools(self) -> None:
+        """Register read-only graph + asset query tools."""

        self.register_tool(
            name="list_phenomena",
@@ -211,25 +343,114 @@ class BaseAgent:
            executor=self._get_hypothesis_status,
        )

-        # --- Write tools ---
+        self.register_tool(
+            name="list_assets",
+            description=(
+                "List all files extracted from the disk image. "
+                "Shows filename, category, size, local path, and inode. "
+                "Check this before calling extract_file to avoid re-extraction."
+            ),
+            input_schema={
+                "type": "object",
+                "properties": {
+                    "category": {
+                        "type": "string",
+                        "enum": [
+                            "registry_hive", "chat_log", "prefetch", "network_capture",
+                            "config_file", "address_book", "recycle_bin", "executable",
+                            "text_log", "other",
+                        ],
+                        "description": "Filter by category. Omit to list all.",
+                    },
+                },
+            },
+            executor=self._list_assets,
+        )
+
+        self.register_tool(
+            name="find_extracted_file",
+            description=(
+                "Find an already-extracted file by inode or filename. "
+                "Returns the local path so you can use it directly with "
+                "parse_registry_key, read_text_file, etc. without re-extracting."
+            ),
+            input_schema={
+                "type": "object",
+                "properties": {
+                    "inode": {"type": "string", "description": "Inode to look up."},
+                    "filename": {"type": "string", "description": "Filename or partial name to search."},
+                },
+            },
+            executor=self._find_extracted_file,
+        )
+
+    def _register_graph_write_tools(self) -> None:
+        """Register graph mutation tools (add_phenomenon, add_lead, link_to_entity)."""

        self.register_tool(
            name="add_phenomenon",
            description=(
-                "Record a forensic finding (phenomenon) on the evidence graph. "
-                "You MUST specify source_tool: the name of the tool call that produced this finding."
+                "Record a forensic finding on the evidence graph. The finding is "
+                "split into provenance-bound atoms (verified_facts) and free-form "
+                "analysis (interpretation). Each fact MUST cite the invocation_id "
+                "of a tool call you made in THIS task — the gateway checks every "
+                "fact's value against that call's real output, byte-for-byte. "
+                "Any fact that fails grounding causes the whole record to be "
+                "rejected with a list of failures; fix the facts and call again."
            ),
            input_schema={
                "type": "object",
                "properties": {
                    "category": {"type": "string", "description": "Category of the finding."},
                    "title": {"type": "string", "description": "Short title."},
-                    "description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."},
+                    "interpretation": {
+                        "type": "string",
+                        "description": (
+                            "Free-form analysis text — your reasoning, why this "
+                            "matters, what it implies. NOT verified by the gateway. "
+                            "Rendered in reports as 'agent analysis', not truth."
+                        ),
+                    },
+                    "verified_facts": {
+                        "type": "array",
+                        "description": (
+                            "Atoms you want preserved as ground truth. Each must "
+                            "appear verbatim in the cited tool output."
+                        ),
+                        "items": {
+                            "type": "object",
+                            "properties": {
+                                "type": {
+                                    "type": "string",
+                                    "description": (
+                                        "Kind of fact: path, timestamp, inode, "
+                                        "hash, identifier, count, raw, ..."
+                                    ),
+                                },
+                                "value": {
+                                    "type": "string",
+                                    "description": (
+                                        "Verbatim substring from the cited tool "
+                                        "output. The gateway does a literal "
+                                        "string-in-string check — no paraphrasing."
+                                    ),
+                                },
+                                "invocation_id": {
+                                    "type": "string",
+                                    "description": (
+                                        "ID from the '[invocation: inv-xxx]' header "
+                                        "of the tool call that produced this value."
+                                    ),
+                                },
+                            },
+                            "required": ["type", "value", "invocation_id"],
+                        },
+                    },
                    "raw_data": {"type": "object", "description": "Structured raw data supporting this finding."},
                    "timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."},
                    "source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."},
                },
-                "required": ["category", "title", "description", "source_tool"],
+                "required": ["category", "title", "source_tool"],
            },
            executor=self._add_phenomenon,
        )
@@ -280,47 +501,65 @@ class BaseAgent:
            executor=self._link_to_entity,
        )

-        # --- Asset library tools ---
-
        self.register_tool(
-            name="list_assets",
+            name="observe_identity",
            description=(
-                "List all files extracted from the disk image. "
-                "Shows filename, category, size, local path, and inode. "
-                "Check this before calling extract_file to avoid re-extraction."
+                "Record a typed identifier (email / phone / Apple ID / IMEI / "
+                "wallet address / nickname / display name / …) for an entity. "
+                "Goes through the same grounding gateway as add_phenomenon — "
+                "value MUST be a verbatim substring of the cited tool output. "
+                "After attachment, the engine automatically proposes / "
+                "strengthens / weakens cross-source coreference hypotheses "
+                "between this entity and any others carrying the same or "
+                "conflicting identifiers. This is how 'is the Apple ID in iOS "
+                "keychain the same person as the Windows login name?' gets "
+                "answered. Call this in ADDITION to add_phenomenon for "
+                "identifier-bearing findings."
            ),
            input_schema={
                "type": "object",
                "properties": {
-                    "category": {
+                    "entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."},
+                    "entity_type": {
                        "type": "string",
-                        "enum": [
-                            "registry_hive", "chat_log", "prefetch", "network_capture",
-                            "config_file", "address_book", "recycle_bin", "executable",
-                            "text_log", "other",
-                        ],
-                        "description": "Filter by category. Omit to list all.",
+                        "enum": ["person", "program", "file", "host", "ip_address"],
+                        "description": "Kind of entity this identifier belongs to (usually 'person').",
+                    },
+                    "identifier_type": {
+                        "type": "string",
+                        "description": (
+                            "Strong (near-unique): email, phone_number, imei, "
+                            "imsi, apple_id, icloud_id, google_account, "
+                            "wallet_address, udid, mac_address, device_serial. "
+                            "Weak (free-form, may collide): nickname, "
+                            "display_name, username, screen_name."
+                        ),
+                    },
+                    "value": {
+                        "type": "string",
+                        "description": (
+                            "The identifier value, quoted VERBATIM from the "
+                            "tool output you cite in invocation_id."
+                        ),
+                    },
+                    "invocation_id": {
+                        "type": "string",
+                        "description": (
+                            "ID from the '[invocation: inv-xxx]' header of "
+                            "the tool call that surfaced this identifier."
+                        ),
+                    },
+                    "source_tool": {
+                        "type": "string",
+                        "description": "Name of the tool that produced the identifier.",
                    },
                },
+                "required": [
+                    "entity_name", "entity_type", "identifier_type",
+                    "value", "invocation_id",
+                ],
            },
-            executor=self._list_assets,
-        )
-
-        self.register_tool(
-            name="find_extracted_file",
-            description=(
-                "Find an already-extracted file by inode or filename. "
-                "Returns the local path so you can use it directly with "
-                "parse_registry_key, read_text_file, etc. without re-extracting."
-            ),
-            input_schema={
-                "type": "object",
-                "properties": {
-                    "inode": {"type": "string", "description": "Inode to look up."},
-                    "filename": {"type": "string", "description": "Filename or partial name to search."},
-                },
-            },
-            executor=self._find_extracted_file,
+            executor=self._observe_identity,
        )

    # ---- Tool executors -----------------------------------------------------
@@ -362,19 +601,33 @@ class BaseAgent:
        self,
        category: str,
        title: str,
-        description: str,
+        interpretation: str = "",
+        verified_facts: list[dict] | None = None,
        raw_data: dict | None = None,
        timestamp: str | None = None,
        source_tool: str = "",
+        # Back-compat: older prompts (and accidental LLM emissions) may pass
+        # ``description``; treat it as ``interpretation`` rather than failing.
+        description: str | None = None,
    ) -> str:
+        if description and not interpretation:
+            interpretation = description
+        # GroundingError propagates: llm_client._execute_single_tool turns
+        # raised exceptions into "Error executing add_phenomenon: <msg>" tool
+        # results the LLM sees, and _wrap_record_executor does NOT increment
+        # the mandatory-record counter (the increment only runs after a
+        # successful return), so the forced-retry mechanism still fires if
+        # the agent never lands a grounded phenomenon.
        pid, merged = await self.graph.add_phenomenon(
            source_agent=self.name,
            category=category,
            title=title,
-            description=description,
+            interpretation=interpretation,
+            verified_facts=verified_facts,
            raw_data=raw_data,
            timestamp=timestamp,
            source_tool=source_tool,
+            from_lead_id=self._current_lead_id,
        )
        if merged:
            return f"Phenomenon merged into existing: {pid} — {title} (corroboration boost)"
@@ -416,6 +669,51 @@ class BaseAgent:
        status = "linked to existing" if existing else "created and linked"
        return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}"

+    async def _observe_identity(
+        self,
+        entity_name: str,
+        entity_type: str,
+        identifier_type: str,
+        value: str,
+        invocation_id: str,
+        source_tool: str = "",
+    ) -> str:
+        # GroundingError / ValueError propagate to llm_client's per-tool
+        # exception handler, which formats them back to the LLM. That keeps
+        # the mandatory-record counter honest — only a successful return
+        # triggers the increment in _wrap_record_executor.
+        result = await self.graph.observe_identity(
+            entity_name=entity_name,
+            entity_type=entity_type,
+            identifier_type=identifier_type,
+            value=value,
+            source_agent=self.name,
+            source_tool=source_tool,
+            invocation_id=invocation_id,
+        )
+        lines = [
+            f"Identity observed: {identifier_type}={value} "
+            f"on entity {result['entity_id']} ({entity_name})."
+        ]
+        if result.get("new_identifier"):
+            lines.append(
+                f"  Observation phenomenon: {result['phenomenon_id']}"
+            )
+        else:
+            lines.append("  (identifier already recorded on this entity — idempotent)")
+        for prop in result.get("coref_proposals", []):
+            lines.append(
+                f"  → Coref candidate: {prop['other_entity_id']} via "
+                f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, "
+                f"hypothesis={prop['hypothesis_id']})"
+            )
+            for c in prop.get("conflicts", []):
+                lines.append(
+                    f"      ⚠ conflict on {c['type']}: "
+                    f"{c['new_value']} vs {c['other_value']}"
+                )
+        return "\n".join(lines)
+
    async def _list_assets(self, category: str | None = None) -> str:
        results = self.graph.list_assets(category)
        if not results:
--- a/case.example.yaml
+++ b/case.example.yaml
@@ -0,0 +1,41 @@
+# MASForensics case definition — template
+#
+# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
+# exists in the working directory, `python main.py` loads it automatically;
+# otherwise main.py falls back to interactive single-image selection.
+#
+# A case is a set of evidence sources. Each source has:
+#   id              optional — auto-derived from label if omitted ("src-<slug>")
+#   label           human-readable name
+#   type            disk_image | mobile_extraction | archive | media_collection
+#   access_mode     image | tree   (optional — defaults by type)
+#                     image = block device / disk image, navigated by Sleuth Kit
+#                     tree  = mounted filesystem / unpacked extraction, path-based
+#   owner           optional — the person the source is associated with
+#   path            filesystem path (relative paths resolve against this file)
+#   partition_offset  image-mode only — sector offset of the partition to analyze
+#   meta            optional free-form notes
+#
+# NOTE: at the current refit stage only image-mode (disk) sources are
+# analysable; tree-mode sources are accepted but skipped.
+
+case_id: example-case
+name: "Example forensic case"
+meta:
+  notes: "free-form case-level metadata"
+
+sources:
+  - id: src-suspect-laptop
+    label: "Suspect laptop disk image"
+    type: disk_image
+    access_mode: image
+    owner: "John Doe"
+    path: image/suspect_laptop.E01
+    partition_offset: 0               # run `mmls <image>` to find the right offset
+
+  - id: src-suspect-phone
+    label: "Suspect phone extraction"
+    type: mobile_extraction
+    access_mode: tree
+    owner: "John Doe"
+    path: image/suspect_phone.zip
--- a/case.py
+++ b/case.py
@@ -0,0 +1,226 @@
+"""Case and evidence-source model — the foundation for multi-evidence analysis.
+
+A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source
+has a *type* (disk image, mobile extraction, archive, ...) and an *access mode*
+that determines how forensic tools reach its contents:
+
+  - ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via
+    inode addressing (raw, E01, dd, ...).
+  - ``"tree"``  — an already-mounted filesystem or unpacked extraction,
+    navigated by ordinary filesystem paths.
+
+This module is pure data model + loading. Partition probing and interactive
+selection live in ``main.py``.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Recognised source types and access modes.
+SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"}
+ACCESS_MODES = {"image", "tree"}
+
+# Disk-image file extensions for interactive discovery.
+# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed
+# raw block-device dumps such as ``blk0_sda.bin``.
+DISK_IMAGE_EXTS = {
+    ".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd",
+}
+
+# Default access mode per source type.
+_DEFAULT_ACCESS_MODE = {
+    "disk_image": "image",
+    "mobile_extraction": "tree",
+    "archive": "tree",
+    "media_collection": "tree",
+}
+
+
+def slugify(text: str) -> str:
+    """Reduce *text* to a lowercase, hyphen-separated slug for use in IDs."""
+    slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
+    return slug or "src"
+
+
+@dataclass
+class EvidenceSource:
+    """One piece of evidence within a :class:`Case`."""
+
+    id: str                       # "src-<slug>"
+    label: str                    # human-readable name
+    type: str                     # one of SOURCE_TYPES
+    path: str                     # filesystem path to the evidence
+    access_mode: str              # "image" | "tree"
+    owner: str = ""               # associated person, if known
+    partition_offset: int = 0     # sector offset (image-mode sources only)
+    meta: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        return asdict(self)
+
+    @classmethod
+    def from_dict(cls, d: dict) -> EvidenceSource:
+        """Reconstruct from a dict, ignoring unknown keys (forward-compatible)."""
+        known = set(cls.__dataclass_fields__)
+        return cls(**{k: v for k, v in d.items() if k in known})
+
+    def summary(self) -> str:
+        loc = (
+            f"@{self.partition_offset}"
+            if self.access_mode == "image" and self.partition_offset
+            else ""
+        )
+        owner = f" owner={self.owner}" if self.owner else ""
+        return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}"
+
+
+@dataclass
+class Case:
+    """A forensic case: a set of evidence sources plus metadata."""
+
+    case_id: str
+    name: str
+    sources: list[EvidenceSource] = field(default_factory=list)
+    meta: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        return {
+            "case_id": self.case_id,
+            "name": self.name,
+            "sources": [s.to_dict() for s in self.sources],
+            "meta": dict(self.meta),
+        }
+
+    @classmethod
+    def from_dict(cls, d: dict) -> Case:
+        return cls(
+            case_id=d.get("case_id", ""),
+            name=d.get("name", ""),
+            sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])],
+            meta=d.get("meta", {}),
+        )
+
+    def get_source(self, source_id: str) -> EvidenceSource | None:
+        for s in self.sources:
+            if s.id == source_id:
+                return s
+        return None
+
+
+# ---------------------------------------------------------------------------
+# case.yaml loading
+# ---------------------------------------------------------------------------
+
+def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource:
+    """Validate and normalise one source entry from case.yaml.
+
+    Missing ``id`` is derived from the label; missing ``access_mode`` defaults
+    by type; relative paths are resolved against *base_dir* (the case file's
+    directory).
+    """
+    label = str(raw.get("label") or raw.get("id") or f"source-{index}")
+    src_type = str(raw.get("type", "disk_image"))
+    if src_type not in SOURCE_TYPES:
+        logger.warning("Unknown source type %r for %r — treating as disk_image",
+                        src_type, label)
+        src_type = "disk_image"
+
+    access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree"))
+    if access_mode not in ACCESS_MODES:
+        logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label)
+        access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree")
+
+    src_id = str(raw.get("id") or f"src-{slugify(label)}")
+    if not src_id.startswith("src-"):
+        src_id = f"src-{slugify(src_id)}"
+
+    raw_path = str(raw.get("path", "")).strip()
+    path = raw_path
+    if raw_path:
+        p = Path(raw_path).expanduser()
+        if not p.is_absolute():
+            p = (base_dir / p)
+        path = str(p)
+
+    return EvidenceSource(
+        id=src_id,
+        label=label,
+        type=src_type,
+        path=path,
+        access_mode=access_mode,
+        owner=str(raw.get("owner", "")),
+        partition_offset=int(raw.get("partition_offset", 0) or 0),
+        meta=dict(raw.get("meta", {})),
+    )
+
+
+def build_case(data: dict, base_dir: Path | None = None) -> Case:
+    """Build a validated :class:`Case` from a loosely-typed case.yaml dict."""
+    base_dir = base_dir or Path.cwd()
+    sources: list[EvidenceSource] = []
+    seen_ids: set[str] = set()
+    for i, raw in enumerate(data.get("sources", []) or []):
+        if not isinstance(raw, dict):
+            logger.warning("Skipping malformed source entry #%d", i)
+            continue
+        src = _build_source(raw, base_dir, i)
+        if src.id in seen_ids:
+            src.id = f"{src.id}-{i}"
+        seen_ids.add(src.id)
+        if not src.path:
+            logger.warning("Source %r has no path — keeping but it is not analysable",
+                            src.label)
+        sources.append(src)
+
+    return Case(
+        case_id=str(data.get("case_id", "case")),
+        name=str(data.get("name", "Untitled case")),
+        sources=sources,
+        meta=dict(data.get("meta", {})),
+    )
+
+
+def load_case(path: str | Path = "case.yaml") -> Case | None:
+    """Load a :class:`Case` from a case.yaml file. Returns None if absent."""
+    case_path = Path(path)
+    if not case_path.exists():
+        return None
+    import yaml
+
+    try:
+        data = yaml.safe_load(case_path.read_text()) or {}
+    except Exception as e:
+        logger.error("Failed to parse %s: %s", case_path, e)
+        return None
+    if not isinstance(data, dict):
+        logger.error("%s is not a YAML mapping", case_path)
+        return None
+
+    case = build_case(data, base_dir=case_path.resolve().parent)
+    logger.info("Loaded case %r with %d source(s) from %s",
+                case.name, len(case.sources), case_path)
+    return case
+
+
+def single_source_case(
+    image_path: str,
+    partition_offset: int = 0,
+    label: str | None = None,
+) -> Case:
+    """Wrap a single disk image as a one-source Case (interactive fallback)."""
+    name = label or Path(image_path).name
+    src = EvidenceSource(
+        id=f"src-{slugify(Path(image_path).stem)}",
+        label=name,
+        type="disk_image",
+        path=image_path,
+        access_mode="image",
+        partition_offset=partition_offset,
+    )
+    return Case(case_id="adhoc", name=name, sources=[src])
--- a/config.example.yaml
+++ b/config.example.yaml
@@ -0,0 +1,71 @@
+# MASForensics Configuration — template.
+#
+# Copy this file to `config.yaml` and fill in your API key. config.yaml is
+# git-ignored so secrets don't land in commits. The two files share schema;
+# only this template is tracked.
+
+agent:
+  base_url: "https://api.deepseek.com"
+  api_key: "YOUR-API-KEY-HERE"
+  model: "deepseek-v4-pro"
+  max_tokens: 16384
+  reasoning_effort: "high"      # DeepSeek/o1-style reasoning depth; omit to disable
+  thinking_enabled: true         # DeepSeek extra_body.thinking switch
+
+# Maximum rounds of hypothesis-directed investigation (Phase 3).
+# Only consulted when strategist.enabled is false (legacy fallback path).
+max_investigation_rounds: 1
+
+# Phase 3 strategist loop (DESIGN_STRATEGIST.md). When enabled, the
+# InvestigationStrategist agent decides each round whether to propose new
+# leads or declare the investigation complete. When disabled, the legacy
+# fixed-round investigation loop runs instead.
+strategist:
+  enabled: true
+  max_rounds: 10
+  # Safety net: if the strategist keeps proposing leads but yield (new
+  # phenomena + edges + status flips) is zero for this many consecutive
+  # rounds, the orchestrator force-stops Phase 3 regardless.
+  hard_stop_marginal_yield_zero_rounds: 3
+
+# Hard caps that bound the whole run. The strategist's budget_status tool
+# reads these to pace its proposals; the orchestrator also enforces them
+# as hard stops (DESIGN_STRATEGIST.md §4.2 step 7). Comment out any cap
+# to make it unbounded.
+budgets:
+  tool_calls_total: 5000
+  strategist_rounds_max: 10
+  wall_clock_minutes_max: 480
+
+# Optional: override the per-edge-type log₁₀(LR) calibration table.
+# Confidence updates accumulate these in odds space (additive, order-
+# independent), then map back to probability via sigmoid. Single edge
+# magnitudes: ≥ +0.602 lifts confidence above the 0.8 supported threshold,
+# ≤ −0.602 drops it below the 0.2 refuted threshold.
+# If omitted, evidence_graph._DEFAULT_LOG_LR is used.
+# hypothesis_log_lr:
+#   direct_evidence: 2.0
+#   supports: 1.0
+#   consequence_observed: 1.0
+#   prerequisite_met: 0.5
+#   weakens: -0.5
+#   contradicts: -2.0
+
+# Optional: manually specify initial hypotheses. If omitted, the
+# HypothesisAgent auto-generates them from Phase 1 findings.
+# hypotheses:
+#   - title: "..."
+#     description: "..."
+
+# Investigation areas — LLM-derived from active hypotheses after Phase 2.
+# Each entry below acts as a MANUAL OVERRIDE: it is seeded into the graph
+# before the LLM derives areas, so manual entries always survive (slug-based
+# dedupe; LLM only augments keyword/tool lists, never overwrites).
+#
+# investigation_areas:
+#   - area: shutdown_time
+#     description: "Last recorded shutdown time"
+#     agent: registry
+#     priority: 3
+#     keywords: [shutdown, last shutdown]
+#     tools: [get_shutdown_time]
--- a/evidence_graph.py
+++ b/evidence_graph.py
--- a/llm_client.py
+++ b/llm_client.py
@@ -1,8 +1,10 @@
-"""Custom LLM client using httpx for Claude Messages API via third-party proxy.
+"""LLM client via the OpenAI SDK (works with DeepSeek's OpenAI-compatible API).

-The proxy does not support Claude's native tool_use format (it strips the `tools`
-field from requests). So we embed tool definitions in the system prompt and parse
-structured JSON tool calls from the model's text output (ReAct-style).
+Tool calling uses the OpenAI-native `tools=[...]` parameter. The model
+returns structured tool_calls via the streaming protocol; we accumulate
+them, dispatch to our executors, and feed results back as `role: "tool"`
+messages. This eliminates the fragile "model writes JSON inside free
+text" problem of the previous ReAct text mode.
 """

 from __future__ import annotations
@@ -18,6 +20,7 @@ from dataclasses import dataclass, field
 from typing import Any

 import httpx
+from openai import APIConnectionError, APIError, APITimeoutError, AsyncOpenAI

 logger = logging.getLogger(__name__)

@@ -30,69 +33,81 @@ class LLMAPIError(Exception):
        self.attempts = attempts


-# Markers the model uses to signal tool calls and final answers
-TOOL_CALL_TAG = "<tool_call>"
-TOOL_CALL_END = "</tool_call>"
-TOOL_RESULT_TAG = "<tool_result>"
-TOOL_RESULT_END = "</tool_result>"
+# Optional answer tags — kept for backward compat with prompts that wrap
+# their final response in <answer>...</answer>. Native tool calling does
+# not need these (no tool_calls = final), but if the model continues to
+# emit them, we strip the tags so callers see clean text.
 ANSWER_TAG = "<answer>"
 ANSWER_END = "</answer>"


-def _build_tools_prompt(tools: list[dict]) -> str:
-    """Format tool definitions for inclusion in the system prompt."""
-    lines = ["You have access to the following tools:\n"]
-    for t in tools:
-        schema = t.get("input_schema", {})
-        props = schema.get("properties", {})
-        required = schema.get("required", [])
+def _to_openai_tools(tools: list[dict]) -> list[dict]:
+    """Convert internal tool definitions to OpenAI native function-tools format."""
+    return [
+        {
+            "type": "function",
+            "function": {
+                "name": t["name"],
+                "description": t["description"],
+                "parameters": t.get("input_schema", {"type": "object", "properties": {}}),
+            },
+        }
+        for t in tools
+    ]

-        params = []
-        for pname, pdef in props.items():
-            req = " (required)" if pname in required else ""
-            desc = pdef.get("description", "")
-            ptype = pdef.get("type", "string")
-            enum_vals = pdef.get("enum")
-            if enum_vals:
-                allowed = ", ".join(f'"{v}"' for v in enum_vals)
-                params.append(f"    - {pname}: {ptype}{req} — {desc} Allowed values: [{allowed}]")
-            else:
-                params.append(f"    - {pname}: {ptype}{req} — {desc}")

-        param_block = "\n".join(params) if params else "    (no parameters)"
-        lines.append(f"## {t['name']}\n{t['description']}\nParameters:\n{param_block}\n")
+def _extract_first_balanced(text: str, open_char: str, close_char: str) -> str | None:
+    """Return the first balanced [...] or {...} substring, or None if no balanced pair.

-    lines.append(
-        "## How to use tools\n"
-        "To call a tool, output a JSON block wrapped in XML tags like this:\n"
-        f"{TOOL_CALL_TAG}\n"
-        '{"name": "tool_name", "arguments": {"param1": "value1"}}\n'
-        f"{TOOL_CALL_END}\n\n"
-        "You can call multiple tools in sequence. After each tool call, you will receive the result in:\n"
-        f"{TOOL_RESULT_TAG}\n...result...\n{TOOL_RESULT_END}\n\n"
-        "When you have finished your analysis and have a final answer, wrap it in:\n"
-        f"{ANSWER_TAG}\nyour final answer here\n{ANSWER_END}\n\n"
-        "Think step by step. Call tools to gather evidence before drawing conclusions.\n"
-        "You MUST call at least one tool before giving your final answer."
+    Stack-based — handles nested brackets correctly (regex with .*? would
+    truncate at the first inner closing bracket, regex with .* would over-eat
+    trailing text). Brackets inside JSON string literals are ignored by
+    callers because the caller passes the result through json.loads which
+    re-parses with proper string handling.
+    """
+    start = text.find(open_char)
+    if start < 0:
+        return None
+    depth = 0
+    for i in range(start, len(text)):
+        c = text[i]
+        if c == open_char:
+            depth += 1
+        elif c == close_char:
+            depth -= 1
+            if depth == 0:
+                return text[start:i + 1]
+    return None
+
+
+def _safe_json_loads(text: str):
+    """Parse JSON with progressive sanitization for LLM-produced output.
+
+    Tries (0) as-is, (1) escape stray backslashes outside valid JSON escapes
+    (\\" \\\\ \\/ \\b \\f \\n \\r \\t \\uXXXX). On final failure, logs raw
+    input (first 600 chars) so we can diagnose what the model emitted.
+
+    Used by orchestrator JSON callsites (_call_llm_for_json) and by
+    tool_call_loop when parsing tool_call arguments returned by the API.
+    """
+    try:
+        return json.loads(text)
+    except json.JSONDecodeError:
+        pass
+
+    stage1 = re.sub(
+        r'\\(?!["\\/bfnrt]|u[0-9a-fA-F]{4})',
+        r'\\\\',
+        text,
    )
-    return "\n".join(lines)
-
-
-def _extract_tool_calls(text: str) -> list[dict]:
-    """Extract tool call JSON blocks from model output."""
-    pattern = re.compile(
-        re.escape(TOOL_CALL_TAG) + r"\s*(.*?)\s*" + re.escape(TOOL_CALL_END),
-        re.DOTALL,
-    )
-    calls = []
-    for match in pattern.finditer(text):
-        raw = match.group(1).strip()
-        try:
-            parsed = json.loads(raw)
-            calls.append(parsed)
-        except json.JSONDecodeError:
-            logger.warning("Failed to parse tool call JSON: %s", raw[:200])
-    return calls
+    try:
+        return json.loads(stage1)
+    except json.JSONDecodeError as e:
+        logger.warning(
+            "_safe_json_loads failed after sanitize (%s); raw head[:600]=%r",
+            e, text[:600],
+        )
+        raise


 def _extract_answer(text: str) -> str | None:
@@ -127,6 +142,14 @@ READ_ONLY_TOOLS: set[str] = {
    # Parser reads
    "read_text_file", "read_binary_preview", "search_text_file",
    "read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
+    "find_files",
+    # iOS plugin reads (S4)
+    "parse_plist", "sqlite_tables", "sqlite_query",
+    "parse_ios_keychain", "read_idevice_info",
+    # Android + media reads (S6) — set_active_partition is NOT read-only.
+    "probe_android_partitions", "ocr_image",
+    # Strategist view tools (DESIGN_STRATEGIST.md §2) — pure renders.
+    "graph_overview", "source_coverage", "marginal_yield", "budget_status",
 }


@@ -234,50 +257,41 @@ _DECAY_TIERS: list[tuple[int, int]] = [


 def _apply_progressive_decay(messages: list[dict]) -> list[dict]:
-    """Truncate tool results in older messages to save context space.
+    """Truncate the `content` of older `role: "tool"` messages to save context.

-    Operates in-place-style on a copy. Only touches user messages that
-    contain <tool_result> blocks (these are the tool-result messages
-    generated by tool_call_loop).
+    Each `role: "tool"` message in the conversation corresponds to one tool
+    call's result. We rank these messages by recency and progressively
+    truncate older ones according to `_DECAY_TIERS`.
    """
-    # Count rounds from the end. A "round" is a (assistant, user) pair.
-    # messages alternate: [user, assistant, user, assistant, user, ...]
-    # The initial user message is index 0, then pairs start at index 1.
    total = len(messages)
-    if total <= 10:  # not enough messages to bother
+    if total <= 10:
        return messages

-    result = []
-    # Count tool-result user messages from the end
-    tool_result_indices = [
-        i for i, m in enumerate(messages)
-        if m["role"] == "user" and TOOL_RESULT_TAG in m.get("content", "")
+    tool_msg_indices = [
+        i for i, m in enumerate(messages) if m.get("role") == "tool"
    ]

-    # Build a set of indices that need decay, mapped to their max_chars
    decay_map: dict[int, int] = {}
-    n_tool_msgs = len(tool_result_indices)
-    for rank, idx in enumerate(reversed(tool_result_indices)):
-        rounds_ago = rank  # 0 = most recent, 1 = second most recent, ...
+    for rank, idx in enumerate(reversed(tool_msg_indices)):
+        rounds_ago = rank
        for threshold, max_chars in _DECAY_TIERS:
            if rounds_ago < threshold:
                decay_map[idx] = max_chars
                break

+    result = []
    for i, msg in enumerate(messages):
        if i in decay_map:
            max_chars = decay_map[i]
-            content = msg["content"]
+            content = msg.get("content", "") or ""
            if len(content) > max_chars + 200:
-                # Truncate but preserve the tool_result tags structure
-                truncated = content[:max_chars]
-                # Count how many tool results are in this message
-                n_results = content.count(TOOL_RESULT_TAG)
-                truncated += (
-                    f"\n... [context compressed: {len(content)} -> {max_chars} chars, "
-                    f"{n_results} tool result(s)]"
+                truncated = (
+                    content[:max_chars]
+                    + f"\n... [context compressed: {len(content)} -> {max_chars} chars]"
                )
-                result.append({"role": msg["role"], "content": truncated})
+                new_msg = dict(msg)
+                new_msg["content"] = truncated
+                result.append(new_msg)
            else:
                result.append(msg)
        else:
@@ -301,44 +315,51 @@ _FOLD_SUMMARY_SYSTEM = (


 class LLMClient:
-    """Calls Claude Messages API through a third-party proxy using raw httpx.
+    """Async LLM client via the OpenAI SDK.

-    Uses prompt-based tool calling (ReAct pattern) since the proxy does not
-    support Claude's native tool_use format.
+    Works with any OpenAI-compatible endpoint (OpenAI, DeepSeek, ...).
+    Tool calling is text-based (ReAct) — see module docstring.
    """

    def __init__(
        self,
        base_url: str,
        api_key: str,
-        model: str = "claude-sonnet-4-6",
+        model: str = "deepseek-v4-pro",
        max_tokens: int = 4096,
        proxy: str | None = "auto",
+        reasoning_effort: str | None = None,
+        thinking_enabled: bool = False,
    ) -> None:
        self.base_url = base_url.rstrip("/")
        self.api_key = api_key
        self.model = model
        self.max_tokens = max_tokens
-        # proxy="auto": read from env; proxy=None/""/"none": no proxy; proxy="http://...": use it
+        self.reasoning_effort = reasoning_effort
+        self.thinking_enabled = thinking_enabled
+
+        # proxy="auto": read from env; proxy=None/""/"none": no proxy
        if proxy == "auto":
            proxy_url = os.environ.get("https_proxy") or os.environ.get("HTTPS_PROXY")
        elif proxy and proxy.lower() != "none":
            proxy_url = proxy
        else:
            proxy_url = None
-        self._client = httpx.AsyncClient(
+
+        http_client = (
+            httpx.AsyncClient(proxy=proxy_url, timeout=300.0)
+            if proxy_url else None
+        )
+
+        self._client = AsyncOpenAI(
+            api_key=self.api_key,
            base_url=self.base_url,
-            headers={
-                "x-api-key": self.api_key,
-                "anthropic-version": "2023-06-01",
-                "content-type": "application/json",
-            },
            timeout=300.0,
-            proxy=proxy_url,
+            http_client=http_client,
        )

    async def close(self) -> None:
-        await self._client.aclose()
+        await self._client.close()

    async def chat(
        self,
@@ -346,169 +367,304 @@ class LLMClient:
        system: str | None = None,
        max_retries: int = 5,
    ) -> str:
-        """Send a streaming chat request and return the assembled text response.
+        """Send a streaming chat completion and return the assembled text."""
+        full_messages: list[dict] = []
+        if system:
+            full_messages.append({"role": "system", "content": system})
+        full_messages.extend(messages)

-        Uses SSE streaming to keep the connection alive and avoid gateway
-        timeouts (504/524) on long-running completions.
-        """
-        import asyncio as _asyncio
-
-        payload: dict[str, Any] = {
+        kwargs: dict[str, Any] = {
            "model": self.model,
+            "messages": full_messages,
            "max_tokens": self.max_tokens,
-            "messages": messages,
            "stream": True,
        }
-        if system:
-            payload["system"] = system
+        if self.reasoning_effort:
+            kwargs["reasoning_effort"] = self.reasoning_effort
+        if self.thinking_enabled:
+            kwargs["extra_body"] = {"thinking": {"type": "enabled"}}

        for attempt in range(max_retries):
-            logger.debug("LLM request (stream): %d messages (attempt %d)", len(messages), attempt + 1)
+            logger.debug(
+                "LLM request (stream): %d messages (attempt %d)",
+                len(messages), attempt + 1,
+            )
            text_parts: list[str] = []
            try:
-                async with self._client.stream(
-                    "POST", "/v1/messages", json=payload,
-                ) as resp:
-                    # Check for HTTP errors before consuming stream
-                    if resp.status_code >= 400:
-                        body = await resp.aread()
-                        raise httpx.HTTPStatusError(
-                            f"Server error '{resp.status_code}' for url '{resp.url}'",
-                            request=resp.request,
-                            response=resp,
-                        )
-
-                    # Parse SSE events
-                    async for line in resp.aiter_lines():
-                        if not line.startswith("data: "):
-                            continue
-                        data_str = line[6:]  # strip "data: " prefix
-                        if data_str.strip() == "[DONE]":
-                            break
-                        try:
-                            event = json.loads(data_str)
-                        except json.JSONDecodeError:
-                            continue
-
-                        event_type = event.get("type", "")
-                        if event_type == "content_block_delta":
-                            delta = event.get("delta", {})
-                            if delta.get("type") == "text_delta":
-                                text_parts.append(delta["text"])
-                        elif event_type == "message_stop":
-                            break
-                        elif event_type == "error":
-                            err_msg = event.get("error", {}).get("message", "Unknown streaming error")
-                            raise httpx.HTTPStatusError(
-                                err_msg, request=resp.request, response=resp,
-                            )
+                stream = await self._client.chat.completions.create(**kwargs)
+                async for chunk in stream:
+                    if not chunk.choices:
+                        continue
+                    delta = chunk.choices[0].delta
+                    if delta.content:
+                        text_parts.append(delta.content)

                text = "".join(text_parts)
                logger.debug("LLM response (stream): %d chars", len(text))
                return text

-            except (httpx.HTTPStatusError, httpx.ConnectError, httpx.ReadTimeout, httpx.RemoteProtocolError) as e:
+            except (APIConnectionError, APITimeoutError, APIError) as e:
                if attempt < max_retries - 1:
                    wait = 2 ** attempt * 10
                    logger.warning("Request failed (%s), retrying in %ds...", e, wait)
-                    await _asyncio.sleep(wait)
+                    await asyncio.sleep(wait)
                else:
                    raise LLMAPIError(
                        f"LLM API unreachable after {max_retries} attempts: {e}",
                        attempts=max_retries,
                    ) from e

-        # Should not reach here, but just in case
        return ""

+    async def _chat_with_tools(
+        self,
+        messages: list[dict],
+        openai_tools: list[dict],
+        max_retries: int = 5,
+    ) -> tuple[str, str | None, list[dict]]:
+        """Stream a chat completion with native tool calling enabled.
+
+        Returns:
+            (text_content, reasoning_content, raw_tool_calls).
+            - reasoning_content is non-None when DeepSeek thinking mode is
+              active; the caller MUST echo it back in the assistant message
+              on subsequent requests, or the API returns HTTP 400.
+            - raw_tool_calls is a list of {"id","name","arguments"} dicts;
+              arguments is the raw JSON string returned by the API.
+        """
+        kwargs: dict[str, Any] = {
+            "model": self.model,
+            "messages": messages,
+            "max_tokens": self.max_tokens,
+            "stream": True,
+            "tools": openai_tools,
+        }
+        if self.reasoning_effort:
+            kwargs["reasoning_effort"] = self.reasoning_effort
+        if self.thinking_enabled:
+            kwargs["extra_body"] = {"thinking": {"type": "enabled"}}
+
+        for attempt in range(max_retries):
+            logger.debug(
+                "LLM request (stream+tools): %d messages, %d tools (attempt %d)",
+                len(messages), len(openai_tools), attempt + 1,
+            )
+            text_parts: list[str] = []
+            reasoning_parts: list[str] = []
+            tool_calls_acc: dict[int, dict] = {}  # index -> {id, name, arguments}
+            try:
+                stream = await self._client.chat.completions.create(**kwargs)
+                async for chunk in stream:
+                    if not chunk.choices:
+                        continue
+                    delta = chunk.choices[0].delta
+                    if delta.content:
+                        text_parts.append(delta.content)
+                    # DeepSeek thinking-mode: reasoning_content is returned
+                    # alongside content and MUST be echoed back on subsequent
+                    # requests, otherwise the API rejects with HTTP 400.
+                    rc = getattr(delta, "reasoning_content", None)
+                    if rc:
+                        reasoning_parts.append(rc)
+                    if delta.tool_calls:
+                        for tc_delta in delta.tool_calls:
+                            idx = tc_delta.index
+                            entry = tool_calls_acc.setdefault(
+                                idx, {"id": None, "name": None, "arguments": ""},
+                            )
+                            if tc_delta.id:
+                                entry["id"] = tc_delta.id
+                            fn = tc_delta.function
+                            if fn:
+                                if fn.name:
+                                    entry["name"] = fn.name
+                                if fn.arguments:
+                                    entry["arguments"] += fn.arguments
+
+                text = "".join(text_parts)
+                reasoning = "".join(reasoning_parts) or None
+                ordered = [tool_calls_acc[i] for i in sorted(tool_calls_acc)]
+                logger.debug(
+                    "LLM response (stream+tools): %d chars, %d reasoning chars, %d tool calls",
+                    len(text), len(reasoning or ""), len(ordered),
+                )
+                return text, reasoning, ordered
+
+            except (APIConnectionError, APITimeoutError, APIError) as e:
+                if attempt < max_retries - 1:
+                    wait = 2 ** attempt * 10
+                    logger.warning(
+                        "Tool-call request failed (%s), retrying in %ds...", e, wait,
+                    )
+                    await asyncio.sleep(wait)
+                else:
+                    raise LLMAPIError(
+                        f"LLM API unreachable after {max_retries} attempts: {e}",
+                        attempts=max_retries,
+                    ) from e
+
+        return "", None, []
+
    async def tool_call_loop(
        self,
        messages: list[dict],
        tools: list[dict],
        tool_executor: dict[str, Any],
        system: str | None = None,
-        max_iterations: int = 40,
+        max_iterations: int = 60,
+        terminal_tools: tuple[str, ...] = (),
    ) -> tuple[str, list[dict]]:
-        """Run a ReAct-style tool-calling loop.
+        """Run a tool-calling loop using OpenAI-native tool calls.

-        The model outputs <tool_call> blocks which we parse and execute,
-        feeding results back as <tool_result> blocks until the model
-        outputs an <answer> block.
+        The model returns structured `tool_calls` in its message; we
+        dispatch them through our executor dict and feed each result back
+        as a `role: "tool"` message with the matching `tool_call_id`. The
+        loop ends when:
+          - the model returns a message with no tool_calls (normal exit), or
+          - any tool in `terminal_tools` is called — in that case, the loop
+            short-circuits with that tool's result text as final_text. This
+            gives agents (notably ReportAgent) an explicit completion signal
+            that the old `<answer>` text tag used to provide.

        Returns:
-            (final_text, all_messages)
+            (final_text, full_message_history)
        """
-        # Build system prompt with tool definitions
-        tools_prompt = _build_tools_prompt(tools)
-        full_system = f"{system}\n\n{tools_prompt}" if system else tools_prompt
+        terminal_set = set(terminal_tools)
+        openai_tools = _to_openai_tools(tools)

-        messages = list(messages)  # don't mutate caller's list
-        _folded = False  # Track whether we've already folded once this loop
+        # The caller may pass `messages` either as raw conversation (no system)
+        # together with `system=...`, OR as a complete history that already
+        # starts with the system message (retry path). Accept both shapes.
+        if messages and messages[0].get("role") == "system":
+            full_messages: list[dict] = list(messages)
+        else:
+            full_messages = []
+            if system:
+                full_messages.append({"role": "system", "content": system})
+            full_messages.extend(messages)
+        _folded = False

-        for i in range(max_iterations):
+        for _i in range(max_iterations):
            # ── Context compression before each API call ──────────────
-            # Stage A: progressively decay old tool results
-            messages = _apply_progressive_decay(messages)
-
-            # Stage B: fold oldest messages into LLM summary if too long
-            if not _folded and len(messages) > _FOLD_THRESHOLD:
-                messages = await self._fold_old_messages(messages, full_system)
+            full_messages = _apply_progressive_decay(full_messages)
+            if not _folded and len(full_messages) > _FOLD_THRESHOLD:
+                full_messages = await self._fold_old_messages(full_messages)
                _folded = True
-            elif _folded and len(messages) > _FOLD_THRESHOLD + _FOLD_KEEP_RECENT:
-                # Allow a second fold if messages grew back significantly
-                messages = await self._fold_old_messages(messages, full_system)
+            elif _folded and len(full_messages) > _FOLD_THRESHOLD + _FOLD_KEEP_RECENT:
+                full_messages = await self._fold_old_messages(full_messages)

-            text = await self.chat(messages, system=full_system)
+            text, reasoning, raw_tool_calls = await self._chat_with_tools(
+                full_messages, openai_tools,
+            )

-            # Check for final answer
-            answer = _extract_answer(text)
-            if answer is not None:
-                messages.append({"role": "assistant", "content": text})
-                return answer, messages
+            if not raw_tool_calls:
+                # Model produced a final response. Strip optional <answer>
+                # tags for backward compatibility with old prompts.
+                final_msg: dict[str, Any] = {"role": "assistant", "content": text}
+                if reasoning:
+                    final_msg["reasoning_content"] = reasoning
+                full_messages.append(final_msg)
+                answer = _extract_answer(text)
+                return (answer if answer is not None else text), full_messages

-            # Check for tool calls
-            tool_calls = _extract_tool_calls(text)
+            # Parse arguments + build internal call dicts
+            parsed_calls: list[dict] = []
+            for rc in raw_tool_calls:
+                args_str = rc.get("arguments", "") or ""
+                try:
+                    args = _safe_json_loads(args_str) if args_str.strip() else {}
+                except (json.JSONDecodeError, ValueError) as e:
+                    logger.warning(
+                        "Failed to parse arguments for tool %s: %s",
+                        rc.get("name"), e,
+                    )
+                    args = {}
+                parsed_calls.append({
+                    "id": rc.get("id"),
+                    "name": rc.get("name", ""),
+                    "arguments": args,
+                })

-            if not tool_calls:
-                # No tool calls and no answer tag — treat entire text as answer
-                messages.append({"role": "assistant", "content": text})
-                return text, messages
+            # Append the assistant turn with the raw tool_calls (and the
+            # DeepSeek-mandated reasoning_content echo-back), then execute.
+            asst_msg: dict[str, Any] = {
+                "role": "assistant",
+                "content": text or None,
+                "tool_calls": [
+                    {
+                        "id": rc.get("id"),
+                        "type": "function",
+                        "function": {
+                            "name": rc.get("name", ""),
+                            "arguments": rc.get("arguments", "") or "",
+                        },
+                    }
+                    for rc in raw_tool_calls
+                ],
+            }
+            if reasoning:
+                asst_msg["reasoning_content"] = reasoning
+            full_messages.append(asst_msg)

-            # Execute tool calls — read-only tools run in parallel
-            messages.append({"role": "assistant", "content": text})
-
-            result_parts = []
-            batches = _partition_tool_calls(tool_calls)
+            batches = _partition_tool_calls(parsed_calls)
            t_batch_start = time.monotonic()
-
+            # Each entry: (tool_call_dict, raw_result, formatted_for_llm)
+            executed: list[tuple[dict, str, str]] = []
            for batch in batches:
                if batch.is_read_only and len(batch.calls) > 1:
-                    batch_results = await self._execute_tool_batch_parallel(
+                    results = await self._execute_tool_batch_parallel(
                        batch.calls, tool_executor, tools,
                    )
-                    result_parts.extend(batch_results)
+                    for tc, (raw, formatted) in zip(batch.calls, results):
+                        executed.append((tc, raw, formatted))
                else:
                    for tc in batch.calls:
-                        result_parts.append(
-                            await self._execute_single_tool(tc, tool_executor, tools)
+                        raw, formatted = await self._execute_single_tool(
+                            tc, tool_executor, tools,
                        )
+                        executed.append((tc, raw, formatted))

-            # Emit folded tool-call summary for the terminal
            t_batch_elapsed = time.monotonic() - t_batch_start
-            _emit_tool_call_summary(tool_calls, t_batch_elapsed)
+            _emit_tool_call_summary(parsed_calls, t_batch_elapsed)

-            # Feed results back as a user message
-            result_message = "\n\n".join(result_parts)
-            messages.append({"role": "user", "content": result_message})
+            # Append formatted tool results to the conversation (this is
+            # what the LLM sees on subsequent rounds — truncated for context
+            # economy).
+            for tc, _raw, formatted in executed:
+                full_messages.append({
+                    "role": "tool",
+                    "tool_call_id": tc["id"],
+                    "content": formatted,
+                })
+
+            # Terminal-tool short-circuit: if the model called any tool in
+            # `terminal_tools`, end the loop immediately. The terminal tool's
+            # RAW result (untruncated) becomes final_text — the LLM may have
+            # produced a 20K-char report via save_report and we must not
+            # truncate it just because the LLM-facing copy is truncated.
+            if terminal_set:
+                for tc, raw, _formatted in executed:
+                    name = tc.get("name", "")
+                    if name in terminal_set:
+                        logger.info(
+                            "Terminal tool %s called — exiting tool_call_loop", name,
+                        )
+                        return raw, full_messages

        logger.warning("Tool call loop hit max iterations (%d)", max_iterations)
-        return "[Max tool call iterations reached]", messages
+        return "[Max tool call iterations reached]", full_messages

    async def _execute_single_tool(
        self, tc: dict, tool_executor: dict[str, Any],
        tools: list[dict] | None = None,
-    ) -> str:
-        """Execute a single tool call and return the formatted result."""
+    ) -> tuple[str, str]:
+        """Execute a single tool call.
+
+        Returns (raw_result, formatted_for_llm). `raw_result` is the
+        unmodified executor return (used by terminal-tool short-circuit as
+        final_text). `formatted_for_llm` is `[tool_name] {truncated}` and
+        is what gets fed back to the model as the tool message content.
+        """
        tool_name = tc.get("name", "")
        tool_args = tc.get("arguments", {})

@@ -519,72 +675,106 @@ class LLMClient:

        executor = tool_executor.get(tool_name)
        if executor is None:
-            result_text = f"Error: unknown tool '{tool_name}'"
+            raw = f"Error: unknown tool '{tool_name}'"
        else:
            try:
-                result_text = await executor(**tool_args)
+                raw = await executor(**tool_args)
            except Exception as e:
                logger.error("Tool %s failed: %s", tool_name, e)
-                result_text = f"Error executing {tool_name}: {e}"
+                raw = f"Error executing {tool_name}: {e}"

-        return (
-            f"{TOOL_RESULT_TAG}\n"
-            f"[{tool_name}] {_truncate_tool_result(result_text)}\n"
-            f"{TOOL_RESULT_END}"
-        )
+        formatted = f"[{tool_name}] {_truncate_tool_result(raw)}"
+        return raw, formatted

    async def _execute_tool_batch_parallel(
        self, calls: list[dict], tool_executor: dict[str, Any],
        tools: list[dict] | None = None,
-    ) -> list[str]:
-        """Execute multiple read-only tool calls concurrently."""
+    ) -> list[tuple[str, str]]:
+        """Execute multiple read-only tool calls concurrently.
+
+        Returns a list of (raw_result, formatted_for_llm) tuples in the
+        same order as `calls`.
+        """
        logger.info("Executing %d read-only tools in parallel", len(calls))

-        async def _run_one(tc: dict) -> str:
+        async def _run_one(tc: dict) -> tuple[str, str]:
            tool_name = tc.get("name", "")
            tool_args = tc.get("arguments", {})
            if tools:
                tool_args = _fix_tool_args(tool_name, tool_args, tools)
-            logger.info("Calling tool (parallel): %s(%s)", tool_name, json.dumps(tool_args, ensure_ascii=False))
+            logger.info(
+                "Calling tool (parallel): %s(%s)",
+                tool_name, json.dumps(tool_args, ensure_ascii=False),
+            )
            executor = tool_executor.get(tool_name)
            if executor is None:
-                result_text = f"Error: unknown tool '{tool_name}'"
+                raw = f"Error: unknown tool '{tool_name}'"
            else:
                try:
-                    result_text = await executor(**tool_args)
+                    raw = await executor(**tool_args)
                except Exception as e:
                    logger.error("Tool %s failed: %s", tool_name, e)
-                    result_text = f"Error executing {tool_name}: {e}"
-            return (
-                f"{TOOL_RESULT_TAG}\n"
-                f"[{tool_name}] {_truncate_tool_result(result_text)}\n"
-                f"{TOOL_RESULT_END}"
-            )
+                    raw = f"Error executing {tool_name}: {e}"
+            formatted = f"[{tool_name}] {_truncate_tool_result(raw)}"
+            return raw, formatted

        results = await asyncio.gather(*[_run_one(tc) for tc in calls])
        return list(results)

    async def _fold_old_messages(
-        self, messages: list[dict], system: str,
+        self, messages: list[dict],
    ) -> list[dict]:
        """Fold old messages into an LLM-generated summary (Stage B).

-        Keeps the most recent _FOLD_KEEP_RECENT messages intact and
-        replaces earlier ones with a single summary message.
+        Preserves the leading system message (if any), keeps the most
+        recent _FOLD_KEEP_RECENT messages intact, and replaces the older
+        middle slice with a single summary user message.
        """
-        n_to_fold = len(messages) - _FOLD_KEEP_RECENT
+        # Pin the system message — it must NEVER be summarized away.
+        system_msgs: list[dict] = []
+        body = messages
+        if messages and messages[0].get("role") == "system":
+            system_msgs = [messages[0]]
+            body = messages[1:]
+
+        n_to_fold = len(body) - _FOLD_KEEP_RECENT
        if n_to_fold <= 2:
            return messages

-        old_messages = messages[:n_to_fold]
-        recent_messages = messages[n_to_fold:]
+        # Pull the fold boundary forward so we never split an assistant turn
+        # from its matching tool results. The API rejects (HTTP 400) any
+        # `role: "tool"` message that does not immediately follow an
+        # `assistant` message with `tool_calls`. We walk the boundary into
+        # `recent_messages` while its head is a `role: "tool"` message, or
+        # while the prior `recent` message is `assistant{tool_calls}` whose
+        # paired tools span the boundary.
+        while n_to_fold < len(body):
+            head = body[n_to_fold]
+            if head.get("role") == "tool":
+                n_to_fold += 1
+                continue
+            break
+
+        if n_to_fold >= len(body):
+            # Everything got folded — nothing recent to keep.
+            return system_msgs + [body[0]] if system_msgs else messages
+
+        old_messages = body[:n_to_fold]
+        recent_messages = body[n_to_fold:]

-        # Build a text dump of old messages for summarization
        old_text_parts = []
        for msg in old_messages:
-            role = msg["role"]
-            content = msg.get("content", "")
-            # Truncate each message for the summary prompt to avoid overload
+            role = msg.get("role", "?")
+            content = msg.get("content") or ""
+            # Render tool_calls (assistant turn) compactly.
+            if role == "assistant" and msg.get("tool_calls"):
+                tc_names = [
+                    tc.get("function", {}).get("name", "?")
+                    for tc in msg["tool_calls"]
+                ]
+                content = (content + " " if content else "") + (
+                    "called: " + ", ".join(tc_names)
+                )
            if len(content) > 1000:
                content = content[:1000] + "..."
            old_text_parts.append(f"[{role}]: {content}")
@@ -608,7 +798,6 @@ class LLMClient:
            logger.warning("Context folding failed: %s — keeping original messages", e)
            return messages

-        # Replace old messages with a single summary
        summary_message = {
            "role": "user",
            "content": (
@@ -616,4 +805,4 @@ class LLMClient:
                f"messages in this conversation]\n\n{summary}"
            ),
        }
-        return [summary_message] + recent_messages
+        return system_msgs + [summary_message] + recent_messages
--- a/main.py
+++ b/main.py
@@ -15,17 +15,21 @@ from pathlib import Path
 import yaml

 from agent_factory import AgentFactory
+from case import (
+    DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case,
+)
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient
 from log_config import setup_logging
 from orchestrator import AnalysisAborted, Orchestrator
 from tool_registry import register_all_tools
+from tools.archive import unzip_archive_sync

 RUNS_DIR = Path("runs")
 IMAGE_DIR = Path("image")
-
-# Common forensic image extensions (only first segment / single-file formats)
-_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"]
+# Persistent unpack cache for tree-mode sources (zip extractions). Lives
+# at project root so multiple runs can reuse the same unpacked tree.
+SOURCE_CACHE_DIR = Path(".cache/sources")


 def load_config(path: str = "config.yaml") -> dict:
@@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict:
 # ---------------------------------------------------------------------------

 def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]:
-    """Find forensic disk image files under *search_dir*."""
-    images: set[Path] = set()
-    for glob in _IMAGE_GLOBS:
-        images.update(search_dir.glob(glob))
-    return sorted(images)
+    """Find forensic disk image files under *search_dir* (case-insensitive ext)."""
+    if not search_dir.is_dir():
+        return []
+    return sorted(
+        p for p in search_dir.iterdir()
+        if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS
+    )


 def _parse_mmls(output: str) -> list[dict]:
@@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
    images = _discover_images(image_dir)
    if not images:
        print(f"No disk images found in {image_dir}/")
-        print("Supported formats: " + ", ".join(_IMAGE_GLOBS))
+        print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS)))
        sys.exit(1)

    if len(images) == 1:
@@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
        print("Invalid choice.")


+def resolve_case() -> Case:
+    """Resolve the Case to analyze.
+
+    Priority: an explicit case file given as a CLI argument, then ./case.yaml
+    in the working directory, then legacy interactive single-image selection.
+    """
+    # 1. Explicit case file passed on the command line
+    if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")):
+        case = load_case(sys.argv[1])
+        if case is None:
+            print(f"Error: could not load case file {sys.argv[1]}")
+            sys.exit(1)
+        print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
+        return case
+
+    # 2. ./case.yaml in the working directory
+    case = load_case()
+    if case is not None:
+        print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
+        return case
+
+    # 3. Legacy interactive single-image selection
+    cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
+    image_path, partition_offset = select_image_interactive(cli_dir)
+    return single_source_case(image_path, partition_offset)
+
+
+def _is_analysable(src: EvidenceSource) -> bool:
+    """A source is analysable when it has a path AND its mode has tooling.
+
+    S4 lights up tree-mode iOS extractions; image-mode disks were already
+    supported. Media-collection (screenshots) remain skipped until S6.
+    """
+    if not src.path:
+        return False
+    if src.access_mode == "image":
+        return True
+    if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
+        return True
+    return False
+
+
+def list_analysable_sources(case: Case) -> list[EvidenceSource]:
+    """Return every analysable source in the case (orchestrator iterates them).
+
+    Pre-S6 main.py used to force-choose one source here; the multi-source
+    orchestrator (Phase 1 per-source triage) now consumes the full list.
+    Skipped sources are still reported for visibility.
+    """
+    analysable = [s for s in case.sources if _is_analysable(s)]
+    skipped = [s for s in case.sources if not _is_analysable(s)]
+    if skipped:
+        print(
+            f"Note: {len(skipped)} source(s) not analysable in this build: "
+            + ", ".join(f"{s.label} ({s.type})" for s in skipped)
+        )
+    if not analysable:
+        print("No analysable sources in this case.")
+        sys.exit(1)
+    print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:")
+    for s in analysable:
+        print(f"  - {s.summary()}")
+    return analysable
+
+
+def prepare_source(src: EvidenceSource) -> EvidenceSource:
+    """Materialise a tree-mode source for analysis.
+
+    Mobile / archive sources arrive as .zip files. We unpack once into a
+    project-level cache (``.cache/sources/<src.id>/``) and rewrite
+    ``src.path`` to point at the unpacked directory. Idempotent — a
+    second run with the cache present is a no-op (unzip_archive_sync
+    skips files that already exist with the matching size).
+
+    Disk-image and already-tree sources pass through unchanged.
+    """
+    if src.access_mode != "tree":
+        return src
+    p = Path(src.path)
+    if p.is_dir():
+        return src  # already a directory, nothing to do
+    if not p.is_file():
+        print(f"Warning: source path {src.path} does not exist; leaving as-is.")
+        return src
+    if p.suffix.lower() != ".zip":
+        # Other archive types (tar, 7z, ...) — not handled yet.
+        print(f"Warning: tree-mode source {src.id} is not a .zip "
+                f"({p.suffix}); leaving as-is.")
+        return src
+
+    dest = SOURCE_CACHE_DIR / src.id
+    dest.mkdir(parents=True, exist_ok=True)
+    # Password-protected zips (e.g. CTF artefacts) carry their key in
+    # case.yaml's meta.password — never logged, never persisted.
+    password = (src.meta or {}).get("password")
+    pw_note = " (password from meta)" if password else ""
+    print(f"Unpacking {p.name} → {dest}{pw_note} (idempotent) ...")
+    result = unzip_archive_sync(str(p), str(dest), password=password)
+    first_line = result.split("\n", 1)[0]
+    print("  " + first_line)
+    if first_line.startswith("Error:"):
+        # Surface the multi-line guidance from _do_extract verbatim.
+        for extra in result.split("\n")[1:]:
+            print("  " + extra)
+        print(f"  Source {src.id} stays unanalysable until this is resolved.")
+        # Leave src.path unchanged so the source remains marked unanalysable.
+        return src
+    src.path = str(dest)
+    src.access_mode = "tree"
+    return src
+
+
 def find_resumable_run() -> Path | None:
    """Find the most recent incomplete run with a saved graph state."""
    if not RUNS_DIR.exists():
@@ -219,25 +337,36 @@ async def async_main() -> None:
        model=agent_cfg["model"],
        max_tokens=agent_cfg.get("max_tokens", 4096),
        proxy=agent_cfg.get("proxy", "auto"),
+        reasoning_effort=agent_cfg.get("reasoning_effort"),
+        thinking_enabled=agent_cfg.get("thinking_enabled", False),
    )

    # Initialize evidence graph
    if graph is None:
-        # CLI arg takes priority, otherwise interactive prompt
-        cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
-        image_path, partition_offset = select_image_interactive(cli_dir)
+        case = resolve_case()
+        # case_info derived from THIS case's meta (case.yaml), not from
+        # config.yaml's legacy `cfreds_hacking_case` block. Without this,
+        # the old CFReDS evidence MD5s would be embedded in reports for
+        # every subsequent unrelated case.
        graph = EvidenceGraph(
-            case_info=config.get("cfreds_hacking_case", {}),
+            case_info=dict(case.meta or {}),
            persist_path=run_dir / "graph_state.json",
+            edge_log_lr=config.get("hypothesis_log_lr"),
        )
-        graph.image_path = image_path
-        graph.partition_offset = partition_offset
+        graph.case = case
        graph.extracted_dir = str(run_dir / "extracted")
+        analysable = list_analysable_sources(case)
+        # Prepare every analysable source up front (unzip tree-mode zips,
+        # etc.). Idempotent on cache hits — second run is a no-op.
+        prepared = [prepare_source(s) for s in analysable]
+        # Seed the active source so tools that resolve lazily have a target
+        # before Phase 1 begins; the orchestrator resets it per source.
+        graph.set_active_source(prepared[0])
    else:
        graph._persist_path = run_dir / "graph_state.json"

-    # Register all tools with bound image path
-    register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir)
+    # Register all tools — they resolve the active evidence source at call time
+    register_all_tools(graph)

    # Create agent factory
    factory = AgentFactory(llm, graph)
--- a/orchestrator.py
+++ b/orchestrator.py
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -5,6 +5,9 @@ description = "Multi-Agent System for Digital Forensics"
 requires-python = ">=3.14"
 dependencies = [
    "httpx[socks]>=0.28.1",
+    "openai>=2.36.0",
+    "pillow>=12.2.0",
+    "pytesseract>=0.3.13",
    "pyyaml",
    "regipy>=6.2.1",
 ]
--- a/regenerate_report.py
+++ b/regenerate_report.py
@@ -13,8 +13,16 @@ from tool_registry import register_all_tools


 async def main() -> None:
-    # Find the run to regenerate from
-    run_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("runs/2026-04-02T15-11-25")
+    # Find the run: CLI arg, or latest run with a graph_state.json
+    if len(sys.argv) > 1:
+        run_dir = Path(sys.argv[1])
+    else:
+        states = sorted(Path("runs").glob("*/graph_state.json"), reverse=True)
+        if not states:
+            print("No runs found in runs/")
+            return
+        run_dir = states[0].parent
+        print(f"Using latest run: {run_dir.name}")
    state_path = run_dir / "graph_state.json"

    if not state_path.exists():
@@ -24,8 +32,11 @@ async def main() -> None:
    config = yaml.safe_load(open("config.yaml"))
    agent_cfg = config["agent"]

-    # Load graph
-    graph = EvidenceGraph.load_state(state_path)
+    # Load graph (edge_log_lr from config — applied to the loaded graph)
+    graph = EvidenceGraph.load_state(
+        state_path,
+        edge_log_lr=config.get("hypothesis_log_lr"),
+    )
    print(f"Loaded: {graph.stats_summary()}")

    # LLM client with larger max_tokens for report
@@ -34,9 +45,11 @@ async def main() -> None:
        api_key=agent_cfg["api_key"],
        model=agent_cfg["model"],
        max_tokens=16384,
+        reasoning_effort=agent_cfg.get("reasoning_effort"),
+        thinking_enabled=agent_cfg.get("thinking_enabled", False),
    )

-    register_all_tools(graph.image_path, graph.partition_offset, graph)
+    register_all_tools(graph)
    factory = AgentFactory(llm, graph)

    # Run only the report agent
--- a/tests/test_optimizations.py
+++ b/tests/test_optimizations.py
--- a/tool_registry.py
+++ b/tool_registry.py
--- a/tools/archive.py
+++ b/tools/archive.py
@@ -0,0 +1,156 @@
+"""Archive extraction tools — generic unzip for tree-mode evidence sources.
+
+Mobile extractions (iOS / Android backups), archive sources, and shared
+work products all arrive as .zip files. The forensic agents work on the
+unpacked tree; this module is the single entry point for safely turning
+an archive into a directory.
+
+Stdlib-only. No graph dependency.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import zipfile
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def _is_within(base: Path, target: Path) -> bool:
+    """True when *target* resolves to a path inside *base* — symlink-safe."""
+    try:
+        base_r = base.resolve()
+        target_r = target.resolve()
+    except OSError:
+        return False
+    try:
+        target_r.relative_to(base_r)
+    except ValueError:
+        return False
+    return True
+
+
+def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool:
+    """True when any entry has the zip 'encrypted' flag bit set."""
+    return any(info.flag_bits & 0x1 for info in zf.infolist())
+
+
+def _do_extract(
+    zip_path: str,
+    dest_dir: str,
+    password: str | None = None,
+) -> str:
+    """Shared core for unzip_archive (async) and unzip_archive_sync.
+
+    Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files
+    whose target already exists at the matching size are skipped). Returns
+    a multi-line summary the agent can read directly.
+    """
+    zp = Path(zip_path)
+    if not zp.is_file():
+        return f"Error: {zip_path} is not a file."
+
+    dest = Path(dest_dir)
+    dest.mkdir(parents=True, exist_ok=True)
+
+    extracted = 0
+    skipped: list[str] = []
+    total_bytes = 0
+    pwd_bytes = password.encode("utf-8") if password else None
+
+    try:
+        with zipfile.ZipFile(zp, "r") as zf:
+            encrypted = _is_zip_encrypted(zf)
+            if encrypted and pwd_bytes is None:
+                return (
+                    f"Error: {zip_path} is password-protected. "
+                    f"Provide the password via case.yaml's "
+                    f"meta.password on this source, or pass `password=` "
+                    f"explicitly. Stdlib zipfile only supports the legacy "
+                    f"ZipCrypto algorithm — AES-encrypted zips (created by "
+                    f"7-Zip / WinZip) need an external tool like 7z."
+                )
+            for info in zf.infolist():
+                name = info.filename
+                # Block absolute paths and parent-escape attempts up front.
+                if name.startswith(("/", "\\")) or ".." in Path(name).parts:
+                    skipped.append(f"escape: {name}")
+                    continue
+                target = dest / name
+                if not _is_within(dest, target):
+                    skipped.append(f"escape: {name}")
+                    continue
+                # Symlink entries — skip rather than risk traversing out.
+                if info.external_attr >> 16 & 0o120000 == 0o120000:
+                    skipped.append(f"symlink: {name}")
+                    continue
+                if info.is_dir():
+                    target.mkdir(parents=True, exist_ok=True)
+                    continue
+                # Skip if already extracted with matching size (idempotent rerun).
+                if target.exists() and target.stat().st_size == info.file_size:
+                    continue
+                target.parent.mkdir(parents=True, exist_ok=True)
+                try:
+                    with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out:
+                        while True:
+                            chunk = src.read(65536)
+                            if not chunk:
+                                break
+                            out.write(chunk)
+                except RuntimeError as e:
+                    # zipfile raises RuntimeError for bad-password / AES-encrypted.
+                    msg = str(e)
+                    if "Bad password" in msg or "password required" in msg:
+                        return (
+                            f"Error: bad or missing password for {zip_path}. "
+                            f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib "
+                            f"cannot decrypt it — use `7z x -p<pwd> ...` "
+                            f"externally and point the source path at the result."
+                        )
+                    raise
+                extracted += 1
+                total_bytes += info.file_size
+    except zipfile.BadZipFile as e:
+        return f"Error: {zip_path} is not a valid zip archive: {e}"
+    except Exception as e:
+        return f"Error extracting {zip_path}: {e}"
+
+    parts = [
+        f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}",
+    ]
+    if skipped:
+        parts.append(f"Skipped {len(skipped)} unsafe entries:")
+        for s in skipped[:10]:
+            parts.append(f"  - {s}")
+        if len(skipped) > 10:
+            parts.append(f"  ... ({len(skipped) - 10} more)")
+    return "\n".join(parts)
+
+
+async def unzip_archive(
+    zip_path: str, dest_dir: str, password: str | None = None,
+) -> str:
+    """Extract *zip_path* into *dest_dir*. Idempotent on rerun.
+
+    Defensive: rejects entries with absolute paths, leading '..', or that
+    would resolve outside *dest_dir* (the classic zip-slip vector). Symlink
+    entries are skipped (we never follow symlinks into the host filesystem).
+    Password-protected zips need the password argument (or
+    ``meta.password`` on the source in case.yaml) — stdlib ``zipfile``
+    only handles the legacy ZipCrypto algorithm.
+    """
+    return _do_extract(zip_path, dest_dir, password)
+
+
+def unzip_archive_sync(
+    zip_path: str, dest_dir: str, password: str | None = None,
+) -> str:
+    """Synchronous variant of :func:`unzip_archive` for startup-time prepare_source.
+
+    Same behaviour, just no async wrapping — used before the event loop
+    starts so we don't have to spin one up just to unpack a zip.
+    """
+    return _do_extract(zip_path, dest_dir, password)
--- a/tools/media.py
+++ b/tools/media.py
@@ -0,0 +1,87 @@
+"""Media plugin — OCR for image evidence.
+
+DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run
+OCR locally for any image-bearing evidence. Tesseract via pytesseract is
+the default; if the runtime is missing those packages, the tool returns a
+clear install hint rather than failing silently.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT = 8000
+
+_INSTALL_HINT = (
+    "Error: OCR runtime not available. Install with:\n"
+    "  pip install pytesseract pillow\n"
+    "  sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n"
+    "(or the equivalent for your distribution). Then retry."
+)
+
+
+def _has_ocr_runtime() -> tuple[bool, str]:
+    """Return (available, reason). reason is empty when available."""
+    try:
+        import pytesseract  # noqa: F401
+        from PIL import Image  # noqa: F401
+    except ImportError as e:
+        return False, f"missing python package: {e.name}"
+    # Check the tesseract binary too.
+    import shutil
+    if shutil.which("tesseract") is None:
+        return False, "tesseract binary not on PATH"
+    return True, ""
+
+
+async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str:
+    """Extract text from an image via tesseract.
+
+    *lang* defaults to English + Simplified + Traditional Chinese, matching
+    the multi-language artefacts the current case involves. Pass a single
+    language code (e.g. ``"eng"``) to skip language packs that aren't
+    installed.
+    """
+    p = Path(file_path)
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    available, reason = _has_ocr_runtime()
+    if not available:
+        return f"{_INSTALL_HINT}\n[detail: {reason}]"
+
+    import pytesseract
+    from PIL import Image
+
+    try:
+        img = Image.open(p)
+    except Exception as e:
+        return f"Error: could not open image {file_path}: {e}"
+
+    try:
+        text = pytesseract.image_to_string(img, lang=lang)
+    except pytesseract.TesseractError as e:
+        msg = str(e)
+        if "Failed loading language" in msg or "Error opening data file" in msg:
+            return (
+                f"Error: tesseract is installed but missing language pack(s) for {lang!r}. "
+                f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a "
+                f"different `lang`. Detail: {msg}"
+            )
+        return f"Error running tesseract: {msg}"
+    except Exception as e:
+        return f"Error during OCR: {e}"
+
+    size = p.stat().st_size
+    header = (
+        f"ocr: {file_path} ({size} bytes, lang={lang}, "
+        f"{len(text.splitlines())} line(s))\n"
+    )
+    if len(text) > MAX_OUTPUT - len(header):
+        body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]"
+    else:
+        body = text
+    return header + body
--- a/tools/mobile_android.py
+++ b/tools/mobile_android.py
@@ -0,0 +1,160 @@
+"""Android plugin tools — partition survey + sector translation.
+
+DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source;
+``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK
+toolchain already handles ext4/F2FS reads, so once the agent picks a partition
+offset the standard list_directory / extract_file / search_strings tools work.
+
+Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but
+TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions``
+emits BOTH unit systems so the agent can plug the right ``partition_offset``
+value into ``set_active_partition``.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import re
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT = 8000
+
+# Partitions worth flagging when we encounter them — informs the agent's
+# strategy. Not exhaustive; just opinionated hints.
+_PARTITION_HINTS: dict[str, str] = {
+    "EFS":      "modem firmware area; often contains IMEI / MAC / serial",
+    "PARAM":    "boot parameters; cmdline + flags",
+    "BOOT":     "kernel + initramfs (raw image)",
+    "RECOVERY": "recovery image (raw)",
+    "SYSTEM":   "Android /system — read-only OS partition (ext4)",
+    "CACHE":    "downloaded OTA payloads; usually transient",
+    "USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices",
+    "PERSISTENT": "Samsung persistent partition; carrier/device flags",
+    "STEADY":   "Samsung steady-state config",
+    "HIDDEN":   "Samsung hidden partition; check before assuming empty",
+    "CP_DEBUG": "modem debug logs",
+    "TOMBSTONES": "userland crash dumps",
+}
+
+
+def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]:
+    """Parse mmls output, returning (sector_size_bytes, partitions).
+
+    mmls states ``Units are in N-byte sectors`` near the top; we extract N
+    to translate between image-native units and the 512-byte units TSK
+    tools accept via ``-o``.
+    """
+    sector_size = 512
+    m = re.search(r"Units are in (\d+)-byte sectors", output)
+    if m:
+        sector_size = int(m.group(1))
+
+    parts: list[dict] = []
+    for line in output.splitlines():
+        m = re.match(
+            r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)",
+            line,
+        )
+        if not m:
+            continue
+        _row, slot, start, end, length, desc = m.groups()
+        if slot == "Meta" or slot.startswith("---"):
+            continue
+        parts.append({
+            "slot": slot,
+            "start_native": int(start),
+            "end_native": int(end),
+            "length_native": int(length),
+            "description": desc.strip(),
+        })
+    return sector_size, parts
+
+
+async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]:
+    proc = await asyncio.create_subprocess_exec(
+        *cmd,
+        stdout=asyncio.subprocess.PIPE,
+        stderr=asyncio.subprocess.PIPE,
+    )
+    try:
+        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+    except asyncio.TimeoutError:
+        proc.kill()
+        return 124, "", f"timeout after {timeout}s"
+    return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace")
+
+
+_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE)
+
+
+async def _classify_partition(image_path: str, sector_offset_512: int) -> str:
+    """Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'.
+
+    fsstat's "Cannot determine file system type" is treated as 'unknown' —
+    typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data
+    (modern userdata under FBE).
+    """
+    rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15)
+    if rc != 0:
+        return "unknown"
+    m = _FS_TYPE_RE.search(out)
+    if m:
+        return m.group(1)
+    return "unknown"
+
+
+async def probe_android_partitions(image_path: str) -> str:
+    """Survey every partition on an Android disk dump and return a table.
+
+    The agent reads this once to plan its work: which partitions are
+    Ext4/F2FS (use TSK), which are raw (extract image / strings only),
+    which are encrypted (skip until decrypted).
+    """
+    p = Path(image_path)
+    if not p.is_file():
+        return f"Error: {image_path} is not a file."
+
+    rc, out, err = await _run(["mmls", str(p)], timeout=30)
+    if rc != 0:
+        return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}"
+
+    sector_size, parts = _parse_mmls_with_unit(out)
+    if not parts:
+        return f"No partitions detected in {image_path}."
+
+    lines = [
+        f"Android partition survey: {image_path}",
+        f"  mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)",
+        f"  {len(parts)} data partitions",
+        "",
+        "| slot | name | start (native) | start (512-sector) | size | fs_type | hint |",
+        "|---|---|---:|---:|---|---|---|",
+    ]
+    for prt in parts:
+        sector_512 = prt["start_native"] * sector_size // 512
+        bytes_size = prt["length_native"] * sector_size
+        # human-readable size
+        if bytes_size >= 1 << 30:
+            size_h = f"{bytes_size / (1 << 30):.1f} GB"
+        elif bytes_size >= 1 << 20:
+            size_h = f"{bytes_size / (1 << 20):.1f} MB"
+        else:
+            size_h = f"{bytes_size // 1024} KB"
+        fs_type = await _classify_partition(str(p), sector_512)
+        # Try to extract a friendly partition name from the description
+        # (mmls description often includes the partition name uppercase).
+        name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"])
+        pname = name_match.group(0) if name_match else prt["description"][:20]
+        hint = _PARTITION_HINTS.get(pname, "")
+        lines.append(
+            f"| {prt['slot']} | {pname} | {prt['start_native']} | "
+            f"{sector_512} | {size_h} | {fs_type} | {hint} |"
+        )
+
+    body = "\n".join(lines)
+    if len(body) > MAX_OUTPUT:
+        body = body[:MAX_OUTPUT] + "\n\n[truncated]"
+    return body
--- a/tools/mobile_ios.py
+++ b/tools/mobile_ios.py
@@ -0,0 +1,274 @@
+"""iOS extraction parsers — plist / sqlite / keychain / iDevice info.
+
+DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth
+Kit, no graph dependency. Stdlib + sqlite3 only.
+
+iOS extractions typically arrive as a zip containing domain-rooted trees
+(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary,
+binary/XML plists, and several SQLite databases (sms.db, AddressBook,
+keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite).
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import os
+import plistlib
+import re
+import sqlite3
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Output cap (chars) — keeps a single tool result under the LLM context budget.
+MAX_OUTPUT = 8000
+
+
+def _trunc(text: str, limit: int = MAX_OUTPUT) -> str:
+    if len(text) <= limit:
+        return text
+    return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]"
+
+
+# ---------------------------------------------------------------------------
+# plist
+# ---------------------------------------------------------------------------
+
+def _to_jsonable(obj):
+    """Make plist values JSON-serializable: bytes → hex preview, dates → iso."""
+    import datetime
+    if isinstance(obj, bytes):
+        if len(obj) <= 64:
+            return {"_bytes_hex": obj.hex()}
+        return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)}
+    if isinstance(obj, datetime.datetime):
+        return obj.isoformat()
+    if isinstance(obj, dict):
+        return {str(k): _to_jsonable(v) for k, v in obj.items()}
+    if isinstance(obj, (list, tuple)):
+        return [_to_jsonable(v) for v in obj]
+    return obj
+
+
+async def parse_plist(file_path: str) -> str:
+    """Parse a .plist file (XML or binary) and return its contents as JSON.
+
+    Both formats are handled transparently by ``plistlib.load``.
+    """
+    p = Path(file_path)
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    try:
+        with open(p, "rb") as f:
+            data = plistlib.load(f)
+    except plistlib.InvalidFileException as e:
+        return f"Error: {file_path} is not a valid plist ({e})"
+    except Exception as e:
+        return f"Error parsing plist {file_path}: {e}"
+
+    serial = _to_jsonable(data)
+    rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str)
+    header = f"plist: {file_path} ({p.stat().st_size} bytes)\n"
+    return header + _trunc(rendered)
+
+
+# ---------------------------------------------------------------------------
+# sqlite
+# ---------------------------------------------------------------------------
+
+_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE)
+
+
+async def sqlite_tables(db_path: str) -> str:
+    """List user tables in a sqlite file with row counts and column names."""
+    p = Path(db_path)
+    if not p.is_file():
+        return f"Error: {db_path} is not a file."
+    try:
+        conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db_path} (read-only): {e}"
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT name FROM sqlite_master "
+            "WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
+        )
+        tables = [r[0] for r in cur.fetchall()]
+        if not tables:
+            return f"No user tables in {db_path}."
+        lines = [f"sqlite: {db_path} ({len(tables)} tables)"]
+        for name in tables:
+            try:
+                cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
+                count = cur.fetchone()[0]
+            except sqlite3.DatabaseError as e:
+                count = f"(count failed: {e})"
+            try:
+                cur.execute(f"PRAGMA table_info(\"{name}\")")
+                cols = [r[1] for r in cur.fetchall()]
+            except sqlite3.DatabaseError:
+                cols = []
+            lines.append(f"  {name}: {count} row(s); cols: {', '.join(cols)}")
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+async def sqlite_query(
+    db_path: str,
+    query: str,
+    max_rows: int = 100,
+) -> str:
+    """Run a single read-only SELECT against a sqlite file.
+
+    Multi-statement queries and anything other than a SELECT are rejected
+    (we open the database in read-only mode anyway, so writes would fail
+    too — but the explicit check keeps the agent honest).
+    """
+    if not _SELECT_RE.match(query):
+        return "Error: only single SELECT statements are allowed."
+    if ";" in query.rstrip(";"):
+        return "Error: multi-statement queries are not allowed."
+
+    p = Path(db_path)
+    if not p.is_file():
+        return f"Error: {db_path} is not a file."
+    try:
+        conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db_path} (read-only): {e}"
+
+    try:
+        cur = conn.cursor()
+        try:
+            cur.execute(query)
+        except sqlite3.DatabaseError as e:
+            return f"Error executing query: {e}"
+        cols = [d[0] for d in cur.description] if cur.description else []
+        rows = cur.fetchmany(max(1, int(max_rows)))
+        lines = [
+            f"sqlite query: {db_path}",
+            f"columns: {cols}",
+            f"rows ({len(rows)}, capped at {max_rows}):",
+        ]
+        for row in rows:
+            rendered = [
+                (v.hex() if isinstance(v, bytes) else str(v))
+                for v in row
+            ]
+            lines.append("  " + " | ".join(rendered))
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+# ---------------------------------------------------------------------------
+# iOS keychain (keychain-2.db)
+# ---------------------------------------------------------------------------
+
+# Standard iOS keychain tables. genp = generic passwords, inet = internet
+# passwords, cert = certificates, keys = key material. Forensic extractions
+# of locked keychains have ``data`` columns NULL but accounting metadata
+# (agrp, acct, svce) intact — already useful for attribution work.
+_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys")
+
+
+async def parse_ios_keychain(keychain_root: str) -> str:
+    """Locate and summarize iOS keychain entries under *keychain_root*.
+
+    *keychain_root* may be a path to ``keychain-2.db`` directly or to a
+    directory that contains it (e.g. ``.../var/keychains``).
+    """
+    root = Path(keychain_root)
+    db: Path | None = None
+    if root.is_file() and root.name == "keychain-2.db":
+        db = root
+    elif root.is_dir():
+        candidate = root / "keychain-2.db"
+        if candidate.is_file():
+            db = candidate
+        else:
+            # Fall back to a shallow recursive search.
+            for found in root.rglob("keychain-2.db"):
+                db = found
+                break
+    if db is None:
+        return f"No keychain-2.db found under {keychain_root}."
+
+    try:
+        conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db}: {e}"
+
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT name FROM sqlite_master "
+            "WHERE type='table' AND name IN ({})".format(
+                ",".join("?" * len(_KEYCHAIN_TABLES))
+            ),
+            _KEYCHAIN_TABLES,
+        )
+        present = [r[0] for r in cur.fetchall()]
+        if not present:
+            return f"keychain-2.db at {db} has no recognised tables."
+
+        lines = [f"keychain: {db}"]
+        for name in present:
+            cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
+            count = cur.fetchone()[0]
+            lines.append(f"\n[{name}] {count} row(s)")
+            cur.execute(f"PRAGMA table_info(\"{name}\")")
+            cols = [r[1] for r in cur.fetchall()]
+            # Pick a useful subset of accounting columns when present.
+            preferred = [
+                c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr")
+                if c in cols
+            ]
+            if not preferred:
+                preferred = cols[:5]
+            sel = ", ".join(f'"{c}"' for c in preferred)
+            cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30")
+            for row in cur.fetchall():
+                lines.append("  " + " | ".join(
+                    (v.hex() if isinstance(v, bytes) else str(v))
+                    for v in row
+                ))
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+# ---------------------------------------------------------------------------
+# iDevice_info.txt
+# ---------------------------------------------------------------------------
+
+async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str:
+    """Read the standard iDevice_info.txt summary at the root of an iOS extraction.
+
+    The file is a flat ``Key: value`` dump from libimobiledevice / native
+    extraction tools. We surface the first *max_chars* of content verbatim
+    — the agent can search/extract specific keys via search_text_file if
+    the head isn't enough.
+    """
+    p = Path(file_path)
+    if p.is_dir():
+        # Be helpful: if the agent passed the extraction root, find the file.
+        candidate = p / "iDevice_info.txt"
+        if candidate.is_file():
+            p = candidate
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    try:
+        with open(p, "r", encoding="utf-8", errors="replace") as f:
+            content = f.read(max_chars)
+        size = p.stat().st_size
+        header = f"iDevice_info: {p} ({size} bytes)\n"
+        if size > max_chars:
+            content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]"
+        return header + content
+    except Exception as e:
+        return f"Error reading {file_path}: {e}"
--- a/tools/parsers.py
+++ b/tools/parsers.py
@@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str:
        return f"[Error parsing Prefetch: {e}]"


-async def list_extracted_dir(dir_path: str) -> str:
-    """List files in an extracted directory."""
+async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str:
+    """Smart summary of a (potentially huge) extracted tree.
+
+    Earlier versions dumped up to 200 random entries then truncated — that
+    leaves the agent blind on 10k+-file iOS extractions. The new layout
+    returns a compact summary that scales: total counts, extension
+    breakdown, top-level directories with their sizes, and the largest
+    files. For targeted lookups (e.g. find every ``*.sqlite`` under the
+    tree) the agent should use ``find_files`` instead.
+    """
+    if not os.path.isdir(dir_path):
+        return f"[Error: {dir_path} is not a directory]"
+
    try:
-        entries = []
-        for root, dirs, files in os.walk(dir_path):
+        total_files = 0
+        total_bytes = 0
+        ext_counts: dict[str, int] = {}
+        ext_bytes: dict[str, int] = {}
+        top_level_dirs: dict[str, dict] = {}
+        biggest: list[tuple[int, str]] = []   # (size, relpath)
+
+        dir_path_abs = os.path.abspath(dir_path)
+        for root, dirs, files in os.walk(dir_path_abs):
+            # Track top-level directory aggregates (cheap; no per-entry cost
+            # beyond the walk we're already doing).
+            rel_root = os.path.relpath(root, dir_path_abs)
+            if rel_root == ".":
+                top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs}
+                top_level_dirs.update(top_dirs)
+                top_key = None
+            else:
+                top_key = rel_root.split(os.sep, 1)[0]
+                if top_key not in top_level_dirs:
+                    top_level_dirs[top_key] = {"files": 0, "bytes": 0}
+
            for f in files:
                full = os.path.join(root, f)
-                rel = os.path.relpath(full, dir_path)
-                size = os.path.getsize(full)
-                entries.append(f"  {rel} ({size} bytes)")
-            if len(entries) > 200:
-                entries.append(f"  ... (truncated)")
-                break
+                try:
+                    size = os.path.getsize(full)
+                except OSError:
+                    continue
+                total_files += 1
+                total_bytes += size
+                ext = os.path.splitext(f)[1].lower() or "(no ext)"
+                ext_counts[ext] = ext_counts.get(ext, 0) + 1
+                ext_bytes[ext] = ext_bytes.get(ext, 0) + size
+                if top_key is not None:
+                    top_level_dirs[top_key]["files"] += 1
+                    top_level_dirs[top_key]["bytes"] += size
+                # Maintain a top-10 largest list cheaply (bounded insertion).
+                if len(biggest) < 10:
+                    biggest.append((size, os.path.relpath(full, dir_path_abs)))
+                    biggest.sort(reverse=True)
+                elif size > biggest[-1][0]:
+                    biggest[-1] = (size, os.path.relpath(full, dir_path_abs))
+                    biggest.sort(reverse=True)

-        return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries)
+        def _human(n: int) -> str:
+            for unit in ("B", "KB", "MB", "GB"):
+                if n < 1024:
+                    return f"{n:.1f}{unit}" if unit != "B" else f"{n}B"
+                n /= 1024
+            return f"{n:.1f}TB"
+
+        lines = [
+            f"Directory: {dir_path}",
+            f"  Total: {total_files} file(s), {_human(total_bytes)}",
+        ]
+
+        # Top-level directory layout (immediate children, sorted by file count).
+        if top_level_dirs:
+            lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):")
+            sorted_tlds = sorted(
+                top_level_dirs.items(), key=lambda kv: -kv[1]["files"],
+            )[:15]
+            for d, stats in sorted_tlds:
+                lines.append(
+                    f"  {d}/  ({stats['files']} files, {_human(stats['bytes'])})"
+                )
+            if len(top_level_dirs) > 15:
+                lines.append(f"  ... ({len(top_level_dirs) - 15} more top-level dirs)")
+
+        # Extension breakdown.
+        if ext_counts:
+            lines.append(f"\nExtension breakdown (top 15):")
+            for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]:
+                lines.append(
+                    f"  {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}"
+                )
+
+        # Largest files (often the highest-value forensic targets).
+        if biggest:
+            lines.append("\nLargest files:")
+            for size, rel in biggest:
+                lines.append(f"  {rel} ({_human(size)})")
+
+        lines.append(
+            f"\nNext step: call find_files with a pattern like "
+            f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts."
+        )
+
+        return "\n".join(lines)
    except Exception as e:
        return f"[Error listing {dir_path}: {e}]"
+
+
+async def find_files(
+    root: str,
+    pattern: str,
+    max_results: int = 500,
+) -> str:
+    """Recursively find files under *root* whose path matches *pattern*.
+
+    Uses fnmatch-style globs against the *full relative path*; ``**`` is
+    treated as "any number of path segments" (so ``**/*.plist`` finds
+    every plist no matter how deep). Examples:
+
+      - ``**/sms.db``               — iOS SMS database
+      - ``**/keychain-2.db``        — iOS keychain
+      - ``**/ChatStorage.sqlite``   — WhatsApp app store
+      - ``HomeDomain/Library/**``   — anchor at a known iOS domain root
+      - ``**/*.{plist,sqlite,db}``  — multi-extension (use 2+ calls or a regex if needed)
+
+    Results are sorted by size descending — the biggest hits usually
+    matter most. Capped at *max_results* to keep the LLM context bounded.
+    """
+    import fnmatch
+
+    if not os.path.isdir(root):
+        return f"[Error: {root} is not a directory]"
+
+    root_abs = os.path.abspath(root)
+    # Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /).
+    # fnmatch doesn't natively distinguish segment vs path; expanding ``**``
+    # to ``*`` and letting fnmatch match the full relpath is good enough for
+    # forensic lookups.
+    fn_pattern = pattern.replace("**", "*")
+
+    hits: list[tuple[int, str]] = []
+    truncated = False
+    try:
+        for dirpath, _dirs, files in os.walk(root_abs):
+            for f in files:
+                full = os.path.join(dirpath, f)
+                rel = os.path.relpath(full, root_abs)
+                if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern):
+                    try:
+                        size = os.path.getsize(full)
+                    except OSError:
+                        size = 0
+                    hits.append((size, rel))
+                    if len(hits) >= max_results * 4:
+                        # Hard upper bound to keep the walk cheap on huge trees.
+                        truncated = True
+                        break
+            if truncated:
+                break
+    except Exception as e:
+        return f"[Error searching {root}: {e}]"
+
+    hits.sort(reverse=True)
+    if len(hits) > max_results:
+        truncated = True
+        hits = hits[:max_results]
+
+    lines = [
+        f"find_files: pattern={pattern!r} under {root}",
+        f"  matches: {len(hits)}" + (" (truncated)" if truncated else ""),
+    ]
+    if not hits:
+        lines.append("  (no matches)")
+    else:
+        for size, rel in hits:
+            lines.append(f"  {rel} ({size} bytes)")
+    return "\n".join(lines)
--- a/tools/strategy.py
+++ b/tools/strategy.py
@@ -0,0 +1,485 @@
+"""Strategist-loop tools — read-only views over graph state that let the
+InvestigationStrategist agent decide whether to keep investigating or to
+declare the investigation complete.
+
+DESIGN_STRATEGIST.md §2. Four read-only views:
+
+    graph_overview()          → hypotheses + sources + pending leads snapshot
+    source_coverage(src_id)   → which artefact categories on this source have
+                                been touched vs are still ✗
+    marginal_yield(n_rounds)  → how much information the last N rounds added
+    budget_status()           → tool calls / rounds / wall-clock against caps
+
+These are pure render functions over the graph — they MUST NOT mutate state.
+The strategist never writes phenomena/edges directly; all graph mutations
+happen through worker agents that the strategist dispatches via propose_lead
+(which is registered separately in tool_registry).
+"""
+
+from __future__ import annotations
+
+import time
+from typing import Any
+
+
+# ---------------------------------------------------------------------------
+# Expected artefact catalogue (per source type)
+#
+# These are SOFT HINTS — items the strategist might want to check on a given
+# source type if any active hypothesis depends on them. The catalogue is
+# intentionally compact; expand it in-place when a new forensic specialty
+# joins the toolset. Each entry:
+#
+#   name       human-readable artefact category
+#   detector   how to recognise that this category has been touched — either
+#              a tool name OR a `<tool>@<path-substring>` pattern, joined with
+#              `|` for alternatives. The matcher is substring on the tool name
+#              and on the args' string representation.
+#   value_for  one-line description of why this category might matter
+# ---------------------------------------------------------------------------
+
+EXPECTED_ARTEFACTS: dict[str, list[dict[str, str]]] = {
+    "disk_image+windows": [
+        {"name": "partition layout",   "detector": "partition_info|mmls",
+         "value_for": "deleted files, hidden partitions"},
+        {"name": "filesystem walk",    "detector": "list_directory|fls",
+         "value_for": "directory tree, recoverable deleted entries"},
+        {"name": "registry hives",     "detector": "parse_registry_key|list_installed_software|get_user_activity",
+         "value_for": "installed software, user activity, timezone"},
+        {"name": "browser history",    "detector": "list_directory@AppData|read_text_file@History|read_text_file@Bookmarks",
+         "value_for": "URL access, downloads, web search terms"},
+        {"name": "prefetch",           "detector": "parse_prefetch|extract_file@Prefetch",
+         "value_for": "program execution evidence"},
+        {"name": "email/IM config",    "detector": "get_email_config",
+         "value_for": "user accounts, configured mail/IM clients"},
+        {"name": "recycle bin",        "detector": "list_directory@$Recycle|count_deleted_files",
+         "value_for": "deleted file metadata and recovery"},
+    ],
+    "disk_image+android": [
+        {"name": "partition probe",    "detector": "probe_android_partitions",
+         "value_for": "discover EFS / SYSTEM / USERDATA layout"},
+        {"name": "system properties",  "detector": "read_text_file@build.prop|read_text_file@default.prop",
+         "value_for": "device model, OS version, CSC region"},
+        {"name": "app inventory",      "detector": "list_directory@data/app|list_directory@data/data",
+         "value_for": "installed apps, package names"},
+        {"name": "user data dbs",      "detector": "list_directory@data/data|sqlite_query",
+         "value_for": "messages, contacts, app-specific data"},
+        {"name": "device identity",    "detector": "search_strings@imei|search_strings@serial|search_strings@DRI",
+         "value_for": "IMEI, serial, device fingerprint"},
+    ],
+    "mobile_extraction": [
+        {"name": "device info",        "detector": "read_idevice_info|read_text_file@iDevice_info",
+         "value_for": "model, iOS version, IMEI, ICCID, Bluetooth MAC, UDID"},
+        {"name": "AddressBook",        "detector": "sqlite_query@AddressBook.sqlitedb",
+         "value_for": "contacts, owner identity"},
+        {"name": "SMS / iMessage",     "detector": "sqlite_query@sms.db",
+         "value_for": "messaging content, OTP / verification codes"},
+        {"name": "WhatsApp messages",  "detector": "sqlite_query@ChatStorage.sqlite|sqlite_query@WhatsApp",
+         "value_for": "WhatsApp content, group membership, call records"},
+        {"name": "WeChat",             "detector": "sqlite_query@MM.sqlite|sqlite_query@wcdb|list_directory@WeChat",
+         "value_for": "WeChat IDs, messages, follow targets"},
+        {"name": "Call history",       "detector": "sqlite_query@CallHistory|sqlite_query@call_history",
+         "value_for": "incoming/outgoing call log"},
+        {"name": "Safari history",     "detector": "sqlite_query@History.db|read_text_file@Bookmarks.plist|parse_plist@Bookmarks",
+         "value_for": "URL access, bookmarks, search queries"},
+        {"name": "Photos library",     "detector": "sqlite_query@Photos.sqlite|parse_plist@Photos",
+         "value_for": "photo metadata, EXIF, geolocation, source app"},
+        {"name": "iCloud accounts",    "detector": "parse_plist@Accounts3|parse_ios_keychain",
+         "value_for": "Apple ID, registered services, authentication tokens"},
+        {"name": "app inventory",      "detector": "list_directory@Bundle/Application|list_directory@Containers",
+         "value_for": "installed apps, app-specific containers"},
+        {"name": "Wi-Fi history",      "detector": "parse_plist@com.apple.wifi|read_text_file@known_networks",
+         "value_for": "connected SSIDs, keys, first/last seen times"},
+    ],
+    "media_collection": [
+        {"name": "archive unpack",     "detector": "unzip_archive|list_directory",
+         "value_for": "extract images / docs for downstream analysis"},
+        {"name": "OCR text",           "detector": "ocr_image",
+         "value_for": "screenshot text content (chat, transaction, IDs)"},
+        {"name": "metadata",           "detector": "read_binary_preview|search_strings",
+         "value_for": "EXIF, embedded timestamps, device fingerprints"},
+    ],
+    "archive": [
+        {"name": "archive unpack",     "detector": "unzip_archive",
+         "value_for": "expose contents for further analysis"},
+    ],
+}
+
+
+def _key_for_source(src) -> str:
+    """Return the EXPECTED_ARTEFACTS key for a source: 'disk_image+platform'
+    when platform is set in meta, otherwise just the source type."""
+    src_type = getattr(src, "type", "")
+    if src_type == "disk_image":
+        platform = (getattr(src, "meta", {}) or {}).get("platform", "").lower()
+        if platform:
+            return f"disk_image+{platform}"
+    return src_type
+
+
+def _detector_matches(detector: str, tool_name: str, args_str: str) -> bool:
+    """Return True if any '|'-separated branch of `detector` matches.
+
+    A branch like ``sqlite_query@AddressBook.sqlitedb`` requires both the
+    tool name (substring) AND the args (substring) to match. A branch like
+    ``parse_prefetch`` is a tool-name-only check.
+    """
+    for branch in detector.split("|"):
+        branch = branch.strip()
+        if not branch:
+            continue
+        if "@" in branch:
+            t, sub = branch.split("@", 1)
+            if t in tool_name and sub.lower() in args_str.lower():
+                return True
+        else:
+            if branch in tool_name:
+                return True
+    return False
+
+
+# ---------------------------------------------------------------------------
+# graph_overview()
+# ---------------------------------------------------------------------------
+
+def graph_overview(graph) -> str:
+    """Render hypotheses + sources + pending leads as the strategist's
+    primary decision view.
+
+    Annotates each hypothesis with the count of distinct sources that
+    contribute supporting (positive-LR) edges. A hypothesis with many edges
+    but only one source is a strategist signal to seek cross-source
+    corroboration.
+    """
+    lines: list[str] = ["# Investigation State", ""]
+
+    # Hypotheses table.
+    if graph.hypotheses:
+        lines.append(f"## Hypotheses ({len(graph.hypotheses)})")
+        lines.append("")
+        lines.append(
+            "| id | title | L | conf | status | edges_in | distinct_sources | recent_flip |"
+        )
+        lines.append("|----|-------|---|------|--------|---------:|-----------------:|--------------|")
+        # Sort by absolute log-odds magnitude descending so the strategist
+        # sees the most decided hypotheses first; active ones float to the
+        # middle of the table where decisions matter most.
+        for hid, h in sorted(
+            graph.hypotheses.items(),
+            key=lambda kv: (kv[1].status != "active", -abs(kv[1].log_odds)),
+        ):
+            in_edges = graph._adj_rev.get(hid, [])
+            edges_in = len(in_edges)
+            # Distinct sources contributing edges (looked up via source
+            # phenomenon's source_id; entity→entity edges have no source).
+            distinct_sources: set[str] = set()
+            for e in in_edges:
+                src_node = graph.phenomena.get(e.source_id)
+                if src_node is not None and src_node.source_id:
+                    distinct_sources.add(src_node.source_id)
+            # Did this hypothesis's status change in the last 2 rounds?
+            recent = "no"
+            recent_rounds = graph.investigation_rounds[-2:]
+            for r in recent_rounds:
+                before = r.hypothesis_status_snapshot_before.get(hid)
+                after = r.hypothesis_status_snapshot_after.get(hid)
+                if before and after and before != after:
+                    recent = f"yes ({before}→{after} in R{r.round_number})"
+                    break
+            title = (h.title or "")[:60].replace("|", "/")
+            lines.append(
+                f"| {hid[:14]} | {title} | {h.log_odds:+.2f} | "
+                f"{h.confidence:.2f} | {h.status} | {edges_in} | "
+                f"{len(distinct_sources)} | {recent} |"
+            )
+        lines.append("")
+    else:
+        lines.append("## Hypotheses\n\n_(none yet — Phase 2 has not produced any)_\n")
+
+    # Sources table.
+    if graph.case and graph.case.sources:
+        lines.append(f"## Sources ({len(graph.case.sources)})")
+        lines.append("")
+        lines.append(
+            "| id | type | phenomena | identities | last_touched_in_round |"
+        )
+        lines.append("|----|------|----------:|-----------:|----------------------|")
+        for src in graph.case.sources:
+            ph_count = sum(
+                1 for p in graph.phenomena.values() if p.source_id == src.id
+            )
+            id_count = sum(
+                1 for e in graph.entities.values()
+                for i in e.identifiers
+                if any(
+                    p.source_id == src.id
+                    for p in graph.phenomena.values()
+                    if p.id == i.get("phenomenon_id")
+                )
+            )
+            # Latest round in which a tool invocation was made against this src.
+            last_r = "—"
+            for r in reversed(graph.investigation_rounds):
+                if r.new_phenomena_count > 0:
+                    # Heuristic: if any phenomenon created during this round
+                    # was on this source, mark this round as the last touch.
+                    in_round = [
+                        p for p in graph.phenomena.values()
+                        if p.source_id == src.id
+                        and r.started_at <= p.created_at
+                        and (not r.completed_at or p.created_at <= r.completed_at)
+                    ]
+                    if in_round:
+                        last_r = f"R{r.round_number}"
+                        break
+            lines.append(
+                f"| {src.id} | {src.type} | {ph_count} | {id_count} | {last_r} |"
+            )
+        lines.append("")
+
+    # Pending leads.
+    pending = [l for l in graph.leads if l.status == "pending"]
+    if pending:
+        lines.append(f"## Pending Leads ({len(pending)})")
+        lines.append("")
+        lines.append("| id | from | target_agent | for_hypothesis | description |")
+        lines.append("|----|------|--------------|----------------|-------------|")
+        for l in pending[:20]:
+            desc = (l.description or "")[:80].replace("|", "/")
+            mh = l.motivating_hypothesis or l.hypothesis_id or "—"
+            lines.append(
+                f"| {l.id} | {l.proposed_by or '—'} | {l.target_agent} | "
+                f"{mh[:14] if mh != '—' else '—'} | {desc} |"
+            )
+        if len(pending) > 20:
+            lines.append(f"\n_(+{len(pending) - 20} more pending leads not shown)_")
+        lines.append("")
+    else:
+        lines.append("## Pending Leads\n\n_(none — no investigations queued)_\n")
+
+    # Interpretation hint at the end, plain English.
+    lines.append("---")
+    lines.append(
+        "**Interpretation hints**: A hypothesis with many edges but only one "
+        "distinct_source has fragile cross-source independence — a single "
+        "edge from a *different* source would do more for it than another "
+        "edge from the same source (harmonic damping makes repeats cheap). "
+        "Hypotheses in the active band (0.2 < conf < 0.8) are the ones a "
+        "well-targeted lead can flip. recent_flip = 'yes' means belief is "
+        "still moving on that hypothesis; 'no' across 2 rounds suggests "
+        "stability."
+    )
+
+    return "\n".join(lines)
+
+
+# ---------------------------------------------------------------------------
+# source_coverage(source_id)
+# ---------------------------------------------------------------------------
+
+def source_coverage(graph, source_id: str) -> str:
+    """Render which expected artefact categories have been touched on
+    *source_id*, and which remain ✗.
+
+    Output is markdown. The closing paragraph reminds the strategist that
+    coverage hints are heuristics — investigate ✗ items only when an active
+    hypothesis depends on them. This is the design's central guardrail
+    against the system devolving into a fixed forensic checklist.
+    """
+    src = graph.case.get_source(source_id) if graph.case else None
+    if src is None:
+        return f"Error: source_id {source_id!r} not found in case."
+
+    key = _key_for_source(src)
+    expected = EXPECTED_ARTEFACTS.get(key, [])
+
+    # Collect this source's invocation history.
+    invs = [
+        inv for inv in graph.tool_invocations.values()
+        if inv.source_id == source_id
+    ]
+
+    # For each expected category, decide ✓ / ✗ + show example invocation if ✓.
+    rows: list[tuple[str, str, str, str]] = []
+    for entry in expected:
+        name = entry["name"]
+        detector = entry["detector"]
+        value_for = entry["value_for"]
+        matched: str | None = None
+        for inv in invs:
+            args_str = ""
+            try:
+                args_str = " ".join(f"{k}={v}" for k, v in (inv.args or {}).items())
+            except Exception:
+                args_str = str(inv.args)
+            if _detector_matches(detector, inv.tool, args_str):
+                matched = f"{inv.tool}({args_str[:60]})"
+                break
+        mark = "✓" if matched else "✗"
+        evidence = matched or "—"
+        rows.append((mark, name, evidence, value_for))
+
+    lines: list[str] = [
+        f"# Coverage of source `{source_id}` ({src.label})",
+        "",
+        f"Source type: `{src.type}` / access_mode: `{src.access_mode}`",
+        f"Invocations made against this source: **{len(invs)}**",
+        "",
+    ]
+    if not expected:
+        lines.append(
+            f"_(no expected-artefact catalogue entry for source type `{key}` — "
+            "coverage cannot be assessed against a baseline)_"
+        )
+    else:
+        lines.append(
+            "| ✓/✗ | category | example invocation | what it would tell us |"
+        )
+        lines.append("|-----|----------|---------------------|------------------------|")
+        for mark, name, evidence, value_for in rows:
+            lines.append(
+                f"| {mark} | {name} | {evidence[:70].replace('|','/')} | {value_for} |"
+            )
+        n_covered = sum(1 for r in rows if r[0] == "✓")
+        n_total = len(rows)
+        lines.append("")
+        lines.append(f"Coverage: **{n_covered}/{n_total}** ({n_covered*100//max(n_total,1)}%)")
+
+    # Other invocations on this source that didn't match any expected entry —
+    # could be genuine novel exploration; strategist might want to know.
+    lines.append("")
+    lines.append("---")
+    lines.append(
+        "**Coverage hints are heuristics, not requirements.** Skip an item if "
+        "the case theory makes it irrelevant — a financial-fraud case has no "
+        "reason to OCR every photo. Investigate ✗ items only when they could "
+        "materially affect an active hypothesis. If you propose a lead just "
+        "because something is ✗, the strategist prompt is being misused."
+    )
+    return "\n".join(lines)
+
+
+# ---------------------------------------------------------------------------
+# marginal_yield(last_n_rounds)
+# ---------------------------------------------------------------------------
+
+def marginal_yield(graph, last_n_rounds: int = 2) -> str:
+    """Render the last N investigation rounds' yield deltas.
+
+    Yield columns:
+      - new_phenomena: phenomena created during the round
+      - new_edges:     edges (any direction) added during the round
+      - status_flips:  hypotheses whose status changed during the round
+
+    A row of zeros means that round didn't move the graph. Two consecutive
+    such rows is strong evidence of diminishing returns; the strategist
+    should consider declare_investigation_complete with reason
+    marginal_yield_zero.
+    """
+    rounds = [r for r in graph.investigation_rounds if r.completed_at]
+    if not rounds:
+        return (
+            "# Marginal Yield\n\n"
+            "_(no completed investigation rounds yet — yield not applicable)_"
+        )
+    recent = rounds[-max(1, last_n_rounds):]
+    lines = [f"# Marginal Yield (last {len(recent)} of {len(rounds)} rounds)", ""]
+    lines.append("| round | new_phenomena | new_edges | status_flips |")
+    lines.append("|-------|--------------:|----------:|-------------:|")
+    yields: list[tuple[int, int, int]] = []
+    for r in recent:
+        yields.append((r.new_phenomena_count, r.new_edges_count, r.status_flips))
+        lines.append(
+            f"| R{r.round_number} | {r.new_phenomena_count} | "
+            f"{r.new_edges_count} | {r.status_flips} |"
+        )
+
+    # Trend interpretation aid.
+    lines.append("")
+    if all(y == (0, 0, 0) for y in yields):
+        trend = (
+            "Yield is zero across these rounds — diminishing returns are "
+            "confirmed. Strongly consider declare_investigation_complete "
+            "(reason: marginal_yield_zero)."
+        )
+    elif len(yields) >= 2:
+        first = yields[0][0] + yields[0][1] + yields[0][2]
+        last = yields[-1][0] + yields[-1][1] + yields[-1][2]
+        if last == 0 and first > 0:
+            trend = (
+                "Yield collapsed to zero in the most recent round. One more "
+                "well-targeted probe is reasonable; another zero-yield round "
+                "after that means stop."
+            )
+        elif last < first / 2 and first > 0:
+            trend = (
+                f"Decelerating ({last}/{first} ≈ "
+                f"{int(100*last/first)}% of the earlier round). Diminishing "
+                "returns are accumulating."
+            )
+        else:
+            trend = "Yield is still active — further investigation is paying off."
+    else:
+        trend = (
+            "Only one completed round — too early to call a trend. Run at "
+            "least one more before considering completion."
+        )
+    lines.append(f"**Trend**: {trend}")
+    return "\n".join(lines)
+
+
+# ---------------------------------------------------------------------------
+# budget_status()
+# ---------------------------------------------------------------------------
+
+def budget_status(graph, budgets: dict[str, Any] | None, start_time: float | None) -> str:
+    """Render budget usage against config.yaml `budgets` block.
+
+    Counters:
+      - tool_calls: len(graph.tool_invocations)
+      - strategist_rounds: len(graph.investigation_rounds)
+      - wall_clock_minutes: now - start_time (when start_time is supplied)
+    """
+    budgets = budgets or {}
+    tool_calls_used = len(graph.tool_invocations)
+    rounds_used = len(graph.investigation_rounds)
+    minutes_used: float | None = None
+    if start_time is not None:
+        minutes_used = (time.monotonic() - start_time) / 60.0
+
+    def _row(name: str, used: float, cap: Any) -> str:
+        if cap is None:
+            return f"| {name} | {used:g} | — | (unbounded) |"
+        pct = (used / cap) * 100 if cap else 0
+        return f"| {name} | {used:g} | {cap} | {pct:.0f}% |"
+
+    lines = ["# Budget Status", ""]
+    lines.append("| metric | used | cap | pct |")
+    lines.append("|--------|-----:|----:|----:|")
+    lines.append(_row("tool_calls", tool_calls_used, budgets.get("tool_calls_total")))
+    lines.append(_row("strategist_rounds", rounds_used, budgets.get("strategist_rounds_max")))
+    if minutes_used is not None:
+        lines.append(_row(
+            "wall_clock_minutes", round(minutes_used, 1),
+            budgets.get("wall_clock_minutes_max"),
+        ))
+
+    # Pacing hint.
+    lines.append("")
+    flags = []
+    cap_calls = budgets.get("tool_calls_total")
+    cap_rounds = budgets.get("strategist_rounds_max")
+    if cap_calls and tool_calls_used / cap_calls >= 0.9:
+        flags.append("tool_calls budget ≥ 90% used — favour declare_complete")
+    if cap_rounds and rounds_used / cap_rounds >= 0.7:
+        flags.append("strategist rounds ≥ 70% used — only propose leads with high expected yield")
+    if flags:
+        lines.append("**Budget warnings**:")
+        for f in flags:
+            lines.append(f"- {f}")
+    else:
+        lines.append(
+            "Budget room remains. Standard rule: each propose_lead should "
+            "name a specific hypothesis it expects to move; otherwise skip it."
+        )
+    return "\n".join(lines)
--- a/uv.lock
+++ b/uv.lock
@@ -2,6 +2,15 @@ version = 1
 revision = 3
 requires-python = ">=3.14"

+[[package]]
+name = "annotated-types"
+version = "0.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" },
+]
+
 [[package]]
 name = "anyio"
 version = "4.13.0"
@@ -41,6 +50,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b2/fb/08b3f4bf05da99aba8ffea52a558758def16e8516bc75ca94ff73587e7d3/construct-2.10.70-py3-none-any.whl", hash = "sha256:c80be81ef595a1a821ec69dc16099550ed22197615f4320b57cc9ce2a672cb30", size = 63020, upload-time = "2023-11-29T08:44:46.876Z" },
 ]

+[[package]]
+name = "distro"
+version = "1.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/fc/f8/98eea607f65de6527f8a2e8885fc8015d3e6f5775df186e443e0964a11c3/distro-1.9.0.tar.gz", hash = "sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed", size = 60722, upload-time = "2023-12-24T09:54:32.31Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2", size = 20277, upload-time = "2023-12-24T09:54:30.421Z" },
+]
+
 [[package]]
 name = "h11"
 version = "0.16.0"
@@ -110,12 +128,50 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
 ]

+[[package]]
+name = "jiter"
+version = "0.14.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6e/c1/0cddc6eb17d4c53a99840953f95dd3accdc5cfc7a337b0e9b26476276be9/jiter-0.14.0.tar.gz", hash = "sha256:e8a39e66dac7153cf3f964a12aad515afa8d74938ec5cc0018adcdae5367c79e", size = 165725, upload-time = "2026-04-10T14:28:42.01Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/4f/1e/354ed92461b165bd581f9ef5150971a572c873ec3b68a916d5aa91da3cc2/jiter-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:6f396837fc7577871ca8c12edaf239ed9ccef3bbe39904ae9b8b63ce0a48b140", size = 315277, upload-time = "2026-04-10T14:27:18.109Z" },
+    { url = "https://files.pythonhosted.org/packages/a6/95/8c7c7028aa8636ac21b7a55faef3e34215e6ed0cbf5ae58258427f621aa3/jiter-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a4d50ea3d8ba4176f79754333bd35f1bbcd28e91adc13eb9b7ca91bc52a6cef9", size = 315923, upload-time = "2026-04-10T14:27:19.603Z" },
+    { url = "https://files.pythonhosted.org/packages/47/40/e2a852a44c4a089f2681a16611b7ce113224a80fd8504c46d78491b47220/jiter-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce17f8a050447d1b4153bda4fb7d26e6a9e74eb4f4a41913f30934c5075bf615", size = 344943, upload-time = "2026-04-10T14:27:21.262Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/1f/670f92adee1e9895eac41e8a4d623b6da68c4d46249d8b556b60b63f949e/jiter-0.14.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f4f1c4b125e1652aefbc2e2c1617b60a160ab789d180e3d423c41439e5f32850", size = 369725, upload-time = "2026-04-10T14:27:22.766Z" },
+    { url = "https://files.pythonhosted.org/packages/01/2f/541c9ba567d05de1c4874a0f8f8c5e3fd78e2b874266623da9a775cf46e0/jiter-0.14.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:be808176a6a3a14321d18c603f2d40741858a7c4fc982f83232842689fe86dd9", size = 461210, upload-time = "2026-04-10T14:27:24.315Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/a9/c31cbec09627e0d5de7aeaec7690dba03e090caa808fefd8133137cf45bc/jiter-0.14.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26679d58ba816f88c3849306dd58cb863a90a1cf352cdd4ef67e30ccf8a77994", size = 380002, upload-time = "2026-04-10T14:27:26.155Z" },
+    { url = "https://files.pythonhosted.org/packages/50/02/3c05c1666c41904a2f607475a73e7a4763d1cbde2d18229c4f85b22dc253/jiter-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80381f5a19af8fa9aef743f080e34f6b25ebd89656475f8cf0470ec6157052aa", size = 354678, upload-time = "2026-04-10T14:27:27.701Z" },
+    { url = "https://files.pythonhosted.org/packages/7d/97/e15b33545c2b13518f560d695f974b9891b311641bdcf178d63177e8801e/jiter-0.14.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:004df5fdb8ecbd6d99f3227df18ba1a259254c4359736a2e6f036c944e02d7c5", size = 358920, upload-time = "2026-04-10T14:27:29.256Z" },
+    { url = "https://files.pythonhosted.org/packages/ad/d2/8b1461def6b96ba44530df20d07ef7a1c7da22f3f9bf1727e2d611077bf1/jiter-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cff5708f7ed0fa098f2b53446c6fa74c48469118e5cd7497b4f1cd569ab06928", size = 394512, upload-time = "2026-04-10T14:27:31.344Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/88/837566dd6ed6e452e8d3205355afd484ce44b2533edfa4ed73a298ea893e/jiter-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:2492e5f06c36a976d25c7cc347a60e26d5470178d44cde1b9b75e60b4e519f28", size = 521120, upload-time = "2026-04-10T14:27:33.299Z" },
+    { url = "https://files.pythonhosted.org/packages/89/6b/b00b45c4d1b4c031777fe161d620b755b5b02cdade1e316dcb46e4471d63/jiter-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:7609cfbe3a03d37bfdbf5052012d5a879e72b83168a363deae7b3a26564d57de", size = 553668, upload-time = "2026-04-10T14:27:34.868Z" },
+    { url = "https://files.pythonhosted.org/packages/ad/d8/6fe5b42011d19397433d345716eac16728ac241862a2aac9c91923c7509a/jiter-0.14.0-cp314-cp314-win32.whl", hash = "sha256:7282342d32e357543565286b6450378c3cd402eea333fc1ebe146f1fabb306fc", size = 207001, upload-time = "2026-04-10T14:27:36.455Z" },
+    { url = "https://files.pythonhosted.org/packages/e5/43/5c2e08da1efad5e410f0eaaabeadd954812612c33fbbd8fd5328b489139d/jiter-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:bd77945f38866a448e73b0b7637366afa814d4617790ecd88a18ca74377e6c02", size = 202187, upload-time = "2026-04-10T14:27:38Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/1f/6e39ac0b4cdfa23e606af5b245df5f9adaa76f35e0c5096790da430ca506/jiter-0.14.0-cp314-cp314-win_arm64.whl", hash = "sha256:f2d4c61da0821ee42e0cdf5489da60a6d074306313a377c2b35af464955a3611", size = 192257, upload-time = "2026-04-10T14:27:39.504Z" },
+    { url = "https://files.pythonhosted.org/packages/05/57/7dbc0ffbbb5176a27e3518716608aa464aee2e2887dc938f0b900a120449/jiter-0.14.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1bf7ff85517dd2f20a5750081d2b75083c1b269cf75afc7511bdf1f9548beb3b", size = 323441, upload-time = "2026-04-10T14:27:41.039Z" },
+    { url = "https://files.pythonhosted.org/packages/83/6e/7b3314398d8983f06b557aa21b670511ec72d3b79a68ee5e4d9bff972286/jiter-0.14.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c8ef8791c3e78d6c6b157c6d360fbb5c715bebb8113bc6a9303c5caff012754a", size = 348109, upload-time = "2026-04-10T14:27:42.552Z" },
+    { url = "https://files.pythonhosted.org/packages/ae/4f/8dc674bcd7db6dba566de73c08c763c337058baff1dbeb34567045b27cdc/jiter-0.14.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e74663b8b10da1fe0f4e4703fd7980d24ad17174b6bb35d8498d6e3ebce2ae6a", size = 368328, upload-time = "2026-04-10T14:27:44.574Z" },
+    { url = "https://files.pythonhosted.org/packages/3b/5f/188e09a1f20906f98bbdec44ed820e19f4e8eb8aff88b9d1a5a497587ff3/jiter-0.14.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1aca29ba52913f78362ec9c2da62f22cdc4c3083313403f90c15460979b84d9b", size = 463301, upload-time = "2026-04-10T14:27:46.717Z" },
+    { url = "https://files.pythonhosted.org/packages/ac/f0/19046ef965ed8f349e8554775bb12ff4352f443fbe12b95d31f575891256/jiter-0.14.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8b39b7d87a952b79949af5fef44d2544e58c21a28da7f1bae3ef166455c61746", size = 378891, upload-time = "2026-04-10T14:27:48.32Z" },
+    { url = "https://files.pythonhosted.org/packages/c4/c3/da43bd8431ee175695777ee78cf0e93eacbb47393ff493f18c45231b427d/jiter-0.14.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:78d918a68b26e9fab068c2b5453577ef04943ab2807b9a6275df2a812599a310", size = 360749, upload-time = "2026-04-10T14:27:49.88Z" },
+    { url = "https://files.pythonhosted.org/packages/72/26/e054771be889707c6161dbdec9c23d33a9ec70945395d70f07cfea1e9a6f/jiter-0.14.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:b08997c35aee1201c1a5361466a8fb9162d03ae7bf6568df70b6c859f1e654a4", size = 358526, upload-time = "2026-04-10T14:27:51.504Z" },
+    { url = "https://files.pythonhosted.org/packages/c3/0f/7bea65ea2a6d91f2bf989ff11a18136644392bf2b0497a1fa50934c30a9c/jiter-0.14.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:260bf7ca20704d58d41f669e5e9fe7fe2fa72901a6b324e79056f5d52e9c9be2", size = 393926, upload-time = "2026-04-10T14:27:53.368Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/a1/b1ff7d70deef61ac0b7c6c2f12d2ace950cdeecb4fdc94500a0926802857/jiter-0.14.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:37826e3df29e60f30a382f9294348d0238ef127f4b5d7f5f8da78b5b9e050560", size = 521052, upload-time = "2026-04-10T14:27:55.058Z" },
+    { url = "https://files.pythonhosted.org/packages/0b/7b/3b0649983cbaf15eda26a414b5b1982e910c67bd6f7b1b490f3cfc76896a/jiter-0.14.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:645be49c46f2900937ba0eaf871ad5183c96858c0af74b6becc7f4e367e36e06", size = 553716, upload-time = "2026-04-10T14:27:57.269Z" },
+    { url = "https://files.pythonhosted.org/packages/97/f8/33d78c83bd93ae0c0af05293a6660f88a1977caef39a6d72a84afab94ce0/jiter-0.14.0-cp314-cp314t-win32.whl", hash = "sha256:2f7877ed45118de283786178eceaf877110abacd04fde31efff3940ae9672674", size = 207957, upload-time = "2026-04-10T14:27:59.285Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/ac/2b760516c03e2227826d1f7025d89bf6bf6357a28fe75c2a2800873c50bf/jiter-0.14.0-cp314-cp314t-win_amd64.whl", hash = "sha256:14c0cb10337c49f5eafe8e7364daca5e29a020ea03580b8f8e6c597fed4e1588", size = 204690, upload-time = "2026-04-10T14:28:00.962Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/2e/a44c20c58aeed0355f2d326969a181696aeb551a25195f47563908a815be/jiter-0.14.0-cp314-cp314t-win_arm64.whl", hash = "sha256:5419d4aa2024961da9fe12a9cfe7484996735dca99e8e090b5c88595ef1951ff", size = 191338, upload-time = "2026-04-10T14:28:02.853Z" },
+]
+
 [[package]]
 name = "masforensics"
 version = "0.1.0"
 source = { virtual = "." }
 dependencies = [
    { name = "httpx", extra = ["socks"] },
+    { name = "openai" },
+    { name = "pillow" },
+    { name = "pytesseract" },
    { name = "pyyaml" },
    { name = "regipy" },
 ]
@@ -129,6 +185,9 @@ dev = [
 [package.metadata]
 requires-dist = [
    { name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
+    { name = "openai", specifier = ">=2.36.0" },
+    { name = "pillow", specifier = ">=12.2.0" },
+    { name = "pytesseract", specifier = ">=0.3.13" },
    { name = "pyyaml" },
    { name = "regipy", specifier = ">=6.2.1" },
 ]
@@ -139,6 +198,25 @@ dev = [
    { name = "pytest-asyncio", specifier = ">=1.3.0" },
 ]

+[[package]]
+name = "openai"
+version = "2.36.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "anyio" },
+    { name = "distro" },
+    { name = "httpx" },
+    { name = "jiter" },
+    { name = "pydantic" },
+    { name = "sniffio" },
+    { name = "tqdm" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/f4/a1/4d5e84cf51720fc1526cc49e10ac1961abcccb55b0efb3d970db1e9a2728/openai-2.36.0.tar.gz", hash = "sha256:139dea0edd2f1b30c33d46ae1a6929e03906254140318e4608e98fe8c566f2e7", size = 753003, upload-time = "2026-05-07T17:33:17.075Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/9d/1c/5d43735b2553baae2a5e899dcbcd0670a86930d993184d72ca909bf11c9b/openai-2.36.0-py3-none-any.whl", hash = "sha256:143f6194b548dbc2c921af1f1b03b9f14c85fed8a75b5b516f5bcc11a2a50c63", size = 1302361, upload-time = "2026-05-07T17:33:15.063Z" },
+]
+
 [[package]]
 name = "packaging"
 version = "26.0"
@@ -148,6 +226,39 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
 ]

+[[package]]
+name = "pillow"
+version = "12.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
+    { url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
+    { url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
+    { url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
+    { url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
+    { url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
+    { url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
+    { url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
+    { url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
+    { url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
+    { url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
+    { url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
+]
+
 [[package]]
 name = "pluggy"
 version = "1.6.0"
@@ -157,6 +268,62 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
 ]

+[[package]]
+name = "pydantic"
+version = "2.13.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "annotated-types" },
+    { name = "pydantic-core" },
+    { name = "typing-extensions" },
+    { name = "typing-inspection" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/18/a5/b60d21ac674192f8ab0ba4e9fd860690f9b4a6e51ca5df118733b487d8d6/pydantic-2.13.4.tar.gz", hash = "sha256:c40756b57adaa8b1efeeced5c196f3f3b7c435f90e84ea7f443901bec8099ef6", size = 844775, upload-time = "2026-05-06T13:43:05.343Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fd/7b/122376b1fd3c62c1ed9dc80c931ace4844b3c55407b6fb2d199377c9736f/pydantic-2.13.4-py3-none-any.whl", hash = "sha256:45a282cde31d808236fd7ea9d919b128653c8b38b393d1c4ab335c62924d9aba", size = 472262, upload-time = "2026-05-06T13:43:02.641Z" },
+]
+
+[[package]]
+name = "pydantic-core"
+version = "2.46.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9d/56/921726b776ace8d8f5db44c4ef961006580d91dc52b803c489fafd1aa249/pydantic_core-2.46.4.tar.gz", hash = "sha256:62f875393d7f270851f20523dd2e29f082bcc82292d66db2b64ea71f64b6e1c1", size = 471464, upload-time = "2026-05-06T13:37:06.98Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/8d/74/228a26ddad29c6672b805d9fd78e8d251cd04004fa7eed0e622096cd0250/pydantic_core-2.46.4-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:428e04521a40150c85216fc8b85e8d39fece235a9cf5e383761238c7fa9b96fb", size = 2102079, upload-time = "2026-05-06T13:38:41.019Z" },
+    { url = "https://files.pythonhosted.org/packages/ad/1f/8970b150a4b4365623ae00fc88603491f763c627311ae8031e3111356d6e/pydantic_core-2.46.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:23ace664830ee0bfe014a0c7bc248b1f7f25ed7ad103852c317624a1083af462", size = 1952179, upload-time = "2026-05-06T13:36:59.812Z" },
+    { url = "https://files.pythonhosted.org/packages/95/30/5211a831ae054928054b2f79731661087a2bc5c01e825c672b3a4a8f1b3e/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce5c1d2a8b27468f433ca974829c44060b8097eedc39933e3c206a90ee49c4a9", size = 1978926, upload-time = "2026-05-06T13:37:39.933Z" },
+    { url = "https://files.pythonhosted.org/packages/57/e9/689668733b1eb67adeef047db3c2e8788fcf65a7fd9c9e2b46b7744fe245/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:7283d57845ecf5a163403eb0702dfc220cc4fbdd18919cb5ccea4f95ee1cdab4", size = 2046785, upload-time = "2026-05-06T13:38:01.995Z" },
+    { url = "https://files.pythonhosted.org/packages/60/d9/6715260422ff50a2109878fd24d948a6c3446bb2664f34ee78cd972b3acd/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8daafc69c93ee8a0204506a3b6b30f586ef54028f52aeeeb5c4cfc5184fd5914", size = 2228733, upload-time = "2026-05-06T13:40:50.371Z" },
+    { url = "https://files.pythonhosted.org/packages/18/ae/fdb2f64316afca925640f8e70bb1a564b0ec2721c1389e25b8eb4bf9a299/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:cd2213145bcc2ba85884d0ac63d222fece9209678f77b9b4d76f054c561adb28", size = 2307534, upload-time = "2026-05-06T13:37:21.531Z" },
+    { url = "https://files.pythonhosted.org/packages/89/1d/8eff589b45bb8190a9d12c49cfad0f176a5cbd1534908a6b5125e2886239/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7a5f930472650a82629163023e630d160863fce524c616f4e5186e5de9d9a49b", size = 2099732, upload-time = "2026-05-06T13:39:31.942Z" },
+    { url = "https://files.pythonhosted.org/packages/06/d5/ee5a3366637fee41dee51a1fc91562dcf12ddbc68fda34e6b253da2324bb/pydantic_core-2.46.4-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:c1b3f518abeca3aa13c712fd202306e145abf59a18b094a6bafb2d2bbf59192c", size = 2129627, upload-time = "2026-05-06T13:37:25.033Z" },
+    { url = "https://files.pythonhosted.org/packages/94/33/2414be571d2c6a6c4d08be21f9292b6d3fdb08949a97b6dfe985017821db/pydantic_core-2.46.4-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a7dd0b3ee80d90150e3495a3a13ac34dbcbfd4f012996a6a1d8900e91b5c0fb", size = 2179141, upload-time = "2026-05-06T13:37:14.046Z" },
+    { url = "https://files.pythonhosted.org/packages/7b/79/7daa95be995be0eecc4cf75064cb33f9bbbfe3fe0158caf2f0d4a996a5c7/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:3fb702cd90b0446a3a1c5e470bfa0dd23c0233b676a9099ddcc964fa6ca13898", size = 2184325, upload-time = "2026-05-06T13:36:53.615Z" },
+    { url = "https://files.pythonhosted.org/packages/9f/cb/d0a382f5c0de8a222dc61c65348e0ce831b1f68e0a018450d31c2cace3a5/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:b8458003118a712e66286df6a707db01c52c0f52f7db8e4a38f0da1d3b94fc4e", size = 2323990, upload-time = "2026-05-06T13:40:29.971Z" },
+    { url = "https://files.pythonhosted.org/packages/05/db/d9ba624cc4a5aced1598e88c04fdbd8310c8a69b9d38b9a3d39ce3a61ed7/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:372429a130e469c9cd698925ce5fc50940b7a1336b0d82038e63d5bbc4edc519", size = 2369978, upload-time = "2026-05-06T13:37:23.027Z" },
+    { url = "https://files.pythonhosted.org/packages/f2/20/d15df15ba918c423461905802bfd2981c3af0bfa0e40d05e13edbfa48bc3/pydantic_core-2.46.4-cp314-cp314-win32.whl", hash = "sha256:85bb3611ff1802f3ee7fdd7dbff26b56f343fb432d57a4728fdd49b6ef35e2f4", size = 1966354, upload-time = "2026-05-06T13:38:03.499Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/b6/6b8de4c0a7d7ab3004c439c80c5c1e0a3e8d78bbae19379b01960383d9e5/pydantic_core-2.46.4-cp314-cp314-win_amd64.whl", hash = "sha256:811ff8e9c313ab425368bcbb36e5c4ebd7108c2bbf4e4089cfbb0b01eff63fac", size = 2072238, upload-time = "2026-05-06T13:39:40.807Z" },
+    { url = "https://files.pythonhosted.org/packages/32/36/51eb763beec1f4cf59b1db243a7dcc39cbb41230f050a09b9d69faaf0a48/pydantic_core-2.46.4-cp314-cp314-win_arm64.whl", hash = "sha256:bfec22eab3c8cc2ceec0248aec886624116dc079afa027ecc8ad4a7e62010f8a", size = 2018251, upload-time = "2026-05-06T13:37:26.72Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/91/855af51d625b23aa987116a19e231d2aaef9c4a415273ddc189b79a45fee/pydantic_core-2.46.4-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:af8244b2bef6aaad6d92cda81372de7f8c8d36c9f0c3ea36e827c60e7d9467a0", size = 2099593, upload-time = "2026-05-06T13:39:47.682Z" },
+    { url = "https://files.pythonhosted.org/packages/fb/1b/8784a54c65edb5f49f0a14d6977cf1b209bba85a4c77445b255c2de58ab3/pydantic_core-2.46.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:5a4330cdbc57162e4b3aa303f588ba752257694c9c9be3e7ebb11b4aca659b5d", size = 1935226, upload-time = "2026-05-06T13:40:40.428Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/e7/1955d28d1afc56dd4b3ad7cc0cf39df1b9852964cf16e5d13912756d6d6b/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:29c61fc04a3d840155ff08e475a04809278972fe6aef51e2720554e96367e34b", size = 1974605, upload-time = "2026-05-06T13:37:32.029Z" },
+    { url = "https://files.pythonhosted.org/packages/93/e2/3fedbf0ba7a22850e6e9fd78117f1c0f10f950182344d8a6c535d468fdd8/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c50f2528cf200c5eed56faf3f4e22fcd5f38c157a8b78576e6ba3168ec35f000", size = 2030777, upload-time = "2026-05-06T13:38:55.239Z" },
+    { url = "https://files.pythonhosted.org/packages/f8/61/46be275fcaaba0b4f5b9669dd852267ce1ff616592dccf7a7845588df091/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:0cbe8b01f948de4286c74cdd6c667aceb38f5c1e26f0693b3983d9d74887c65e", size = 2236641, upload-time = "2026-05-06T13:37:08.096Z" },
+    { url = "https://files.pythonhosted.org/packages/60/db/12e93e46a8bac9988be3c016860f83293daea8c716c029c9ace279036f2f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:617d7e2ca7dcb8c5cf6bcb8c59b8832c94b36196bbf1cbd1bfb56ed341905edd", size = 2286404, upload-time = "2026-05-06T13:40:20.221Z" },
+    { url = "https://files.pythonhosted.org/packages/e2/4a/4d8b19008f38d31c53b8219cfedc2e3d5de5fe99d90076b7e767de29274f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7027560ee92211647d0d34e3f7cd6f50da56399d26a9c8ad0da286d3869a53f3", size = 2109219, upload-time = "2026-05-06T13:38:12.153Z" },
+    { url = "https://files.pythonhosted.org/packages/88/70/3cbc40978fefb7bb09c6708d40d4ad1a5d70fd7213c3d17f971de868ec1f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:f99626688942fb746e545232e7726926f3be91b5975f8b55327665fafda991c7", size = 2110594, upload-time = "2026-05-06T13:40:02.971Z" },
+    { url = "https://files.pythonhosted.org/packages/9d/20/b8d36736216e29491125531685b2f9e61aa5b4b2599893f8268551da3338/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:fc3e9034a63de20e15e8ade85358bc6efc614008cab72898b4b4952bea0509ff", size = 2159542, upload-time = "2026-05-06T13:39:27.506Z" },
+    { url = "https://files.pythonhosted.org/packages/1d/a2/367df868eb584dacf6bf82a389272406d7178e301c4ac82545ab98bc2dd9/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:97e7cf2be5c77b7d1a9713a05605d49460d02c6078d38d8bef3cbe323c548424", size = 2168146, upload-time = "2026-05-06T13:38:31.93Z" },
+    { url = "https://files.pythonhosted.org/packages/c1/b8/4460f77f7e201893f649a29ab355dddd3beee8a97bcb1a320db414f9a06e/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:3bf92c5d0e00fefaab325a4d27828fe6b6e2a21848686b5b60d2d9eeb09d76c6", size = 2306309, upload-time = "2026-05-06T13:37:44.717Z" },
+    { url = "https://files.pythonhosted.org/packages/64/c4/be2639293acd87dc8ddbcec41a73cee9b2ebf996fe6d892a1a74e88ad3f7/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:3ecbc122d18468d06ca279dc26a8c2e2d5acb10943bb35e36ae92096dc3b5565", size = 2369736, upload-time = "2026-05-06T13:37:05.645Z" },
+    { url = "https://files.pythonhosted.org/packages/30/a6/9f9f380dbb301f67023bf8f707aaa75daadf84f7152d95c410fd7e81d994/pydantic_core-2.46.4-cp314-cp314t-win32.whl", hash = "sha256:e846ae7835bf0703ae43f534ab79a867146dadd59dc9ca5c8b53d5c8f7c9ef02", size = 1955575, upload-time = "2026-05-06T13:38:51.116Z" },
+    { url = "https://files.pythonhosted.org/packages/40/1f/f1eb9eb350e795d1af8586289746f5c5677d16043040d63710e22abc43c9/pydantic_core-2.46.4-cp314-cp314t-win_amd64.whl", hash = "sha256:2108ba5c1c1eca18030634489dc544844144ee36357f2f9f780b93e7ddbb44b5", size = 2051624, upload-time = "2026-05-06T13:38:21.672Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/d2/42dd53d0a85c27606f316d3aa5d2869c4e8470a5ed6dec30e4a1abe19192/pydantic_core-2.46.4-cp314-cp314t-win_arm64.whl", hash = "sha256:4fcbe087dbc2068af7eda3aa87634eba216dbda64d1ae73c8684b621d33f6596", size = 2017325, upload-time = "2026-05-06T13:40:52.723Z" },
+]
+
 [[package]]
 name = "pygments"
 version = "2.20.0"
@@ -166,6 +333,19 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
 ]

+[[package]]
+name = "pytesseract"
+version = "0.3.13"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "packaging" },
+    { name = "pillow" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" },
+]
+
 [[package]]
 name = "pytest"
 version = "9.0.2"
@@ -243,6 +423,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/65/eb/db13ab9b8d54e04f42b6619acca417ee37b07eb141a54884d13d20d7459e/regipy-6.2.1-py3-none-any.whl", hash = "sha256:b03110e5c4e12385e1ba53c032ccd120c6dcde1b71afb8c3b7aa4717a5a24e43", size = 134861, upload-time = "2026-01-22T15:26:05.653Z" },
 ]

+[[package]]
+name = "sniffio"
+version = "1.3.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a2/87/a6771e1546d97e7e041b6ae58d80074f81b7d5121207425c964ddf5cfdbd/sniffio-1.3.1.tar.gz", hash = "sha256:f4324edc670a0f49750a81b895f35c3adb843cca46f0530f79fc1babb23789dc", size = 20372, upload-time = "2024-02-25T23:20:04.057Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
+]
+
 [[package]]
 name = "socksio"
 version = "1.0.0"
@@ -251,3 +440,36 @@ sdist = { url = "https://files.pythonhosted.org/packages/f8/5c/48a7d9495be3d1c65
 wheels = [
    { url = "https://files.pythonhosted.org/packages/37/c3/6eeb6034408dac0fa653d126c9204ade96b819c936e136c5e8a6897eee9c/socksio-1.0.0-py3-none-any.whl", hash = "sha256:95dc1f15f9b34e8d7b16f06d74b8ccf48f609af32ab33c608d08761c5dcbb1f3", size = 12763, upload-time = "2020-04-17T15:50:31.878Z" },
 ]
+
+[[package]]
+name = "tqdm"
+version = "4.67.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
+]
+
+[[package]]
+name = "typing-extensions"
+version = "4.15.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
+]
+
+[[package]]
+name = "typing-inspection"
+version = "0.4.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/55/e3/70399cb7dd41c10ac53367ae42139cf4b1ca5f36bb3dc6c9d33acdb43655/typing_inspection-0.4.2.tar.gz", hash = "sha256:ba561c48a67c5958007083d386c3295464928b01faa735ab8547c5692e87f464", size = 75949, upload-time = "2025-10-01T02:14:41.687Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/dc/9b/47798a6c91d8bdb567fe2698fe81e0c6b7cb7ef4d13da4114b41d239f65d/typing_inspection-0.4.2-py3-none-any.whl", hash = "sha256:4ed1cacbdc298c220f1bd249ed5287caa16f34d44ef4e9c3d0cbad5b521545e7", size = 14611, upload-time = "2025-10-01T02:14:40.154Z" },
+]