feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

Consolidates the long-running refit work (DESIGN.md as authoritative spec) into a single baseline commit. Six stages landed together: S1 Case + EvidenceSource abstraction; tools parameterised by source_id (case.py, main.py multi-source bootstrap, .bin extension support) S2 Grounding gateway in add_phenomenon: verified_facts cite real ToolInvocation ids; substring / normalised match enforced; agent + task scope checked. Phenomenon.description split into verified_facts (grounded) + interpretation (free text). [invocation: inv-xxx] prefix on every wrapped tool result so the LLM can cite. S3 Confidence as additive log-odds: edge_type → log10(LR) calibration table; commutative updates; supported / refuted thresholds derived from log_odds; hypothesis × evidence matrix view. S4 iOS plugin: unzip_archive + parse_plist / sqlite_tables / sqlite_query / parse_ios_keychain / read_idevice_info; IOSArtifactAgent; SOURCE_TYPE_AGENTS routing. S5 Cross-source entity resolution: typed identifiers on Entity, observe_identity gateway, auto coref hypothesis with shared / conflicting strong/weak LR edges, reversible same_as edges, actor_clusters() view. S6 Android partition probe + AndroidArtifactAgent; MediaAgent with OCR fallback; orchestrator Phase 1 iterates every analysable source; platform-aware get_triage_agent_type; ReportAgent renders actor clusters + per-source breakdown. 142 unit tests / 1 skipped — full coverage of the new gateway, log-odds math, coref hypothesis fall-out, and orchestrator multi-source dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:12:10 -10:00
parent 444d58726a
commit 81ade8f7ac
24 changed files with 5137 additions and 244 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -0,0 +1,305 @@
+# MASForensics 系统改造设计
+
+> 目标：把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、
+> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档
+> （已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿）。
+>
+> 触发本次改造的实际案件：2025 美亚杯资格赛 Individual —— 5 份证据
+> （1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图），
+> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。
+
+---
+
+## 1. 设计原则（贯穿全文的不变式）
+
+1. **LLM 提议，代码裁决**。LLM 负责语言/分类/感知；它**不持有案件状态、
+   不产出数值、不写入未经核验的事实**。所有「真相」在符号层。
+2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。
+3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中；
+   支持一种新设备 = 写插件，绝不改核心。
+4. **看似不可逆的操作（如实体归并）实为可逆、带证据的论断**，可被推翻。
+
+这四条不是口号——下文每个设计决策都对应其中一条。
+
+---
+
+## 2. 现状问题诊断
+
+| # | 问题 | 位置 | 后果 |
+|---|---|---|---|
+| P1 | **单镜像假设深植**：工具是闭包绑死 `image_path`，图是单源，主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools`、`main.py:91-153` | 无法摄取多份证据，无法跨设备关联 |
+| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话，错误事实进入案件记录且**事后无法识别** |
+| P3 | **置信度公式无统计含义且有序依赖缺陷**：`delta=weight*(1-conf)`(正)/`weight*conf`(负)，正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 |
+| P4 | **工件分类是 Windows 专属**：靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` |
+| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 |
+| P6 | **镜像发现靠扩展名 glob**，`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 |
+| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备，跨源关联无锚点 |
+
+改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。
+
+---
+
+## 3. 目标架构
+
+```
+case.yaml ──► Case ──► N × EvidenceSource
+                         ├ id / type / owner / path
+                         └ access_mode: image | tree
+                                 │
+                  ┌──────────────┴───────────────┐
+            image-backed                     tree-backed
+          (TSK, inode 寻址)              (路径寻址：已挂载/已解包)
+                  │                              │
+                  └────────────┬─────────────────┘
+                               ▼
+        SourceRegistry  ── source_id → SourceHandle（解析 path/offset/mode）
+                               │
+        ToolRegistry    ── 工具按 access_mode 注册，调用时绑定 source_id
+                               │
+        ┌──────────────────────┼───────────────────────┐
+        ▼                      ▼                       ▼
+  Knowledge-Source         Graph Write Gateway      ToolInvocationLog
+  Agents (LLM)        ──►  （唯一写入口，强制      （每次工具调用留痕：
+  只能经网关写图           前置条件 = grounding）     args / 输出 / sha256）
+        │                      │
+        └──────────────────────┴──► Grounded Evidence Graph (GEG)
+                                     Phenomenon / Hypothesis / Entity
+                                     置信度 = 对数几率累加
+```
+
+**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、
+`AgentFactory` 动态组合——这些设计是好的，不重写，只适配。
+
+---
+
+## 4. 核心设计
+
+### 4.1 证据源抽象（解决 P1/P5/P6/P7，地基）
+
+新增 `case.py`：
+
+- **`EvidenceSource`** 数据类：`id`、`label`、`type`、`owner`（关联人）、
+  `path`、`access_mode`、`meta`（类型特定，如分区 offset / 解包后根目录）。
+- **`Case`**：持有 `list[EvidenceSource]` + 案件元数据，从 `case.yaml` 加载。
+- **`access_mode` 是关键设计区分**：
+  - `image`：块设备/磁盘镜像，用 TSK 按 inode 寻址（USB E01、安卓 `blk0_sda` 各分区）。
+  - `tree`：已挂载文件系统或已解包目录，按路径寻址（iOS 提取解压后、归档展开后）。
+  - 工具按 access_mode 分族注册（见 4.2）。一份证据可经「准备」从 image 变为 tree
+    （如分区 mount、zip 解包）。
+
+`main.py` 的 `select_image_interactive`（:91-153）改为加载/构造 `Case`；
+`_IMAGE_GLOBS` 改为类型探测（`mmls` 试探 + 文件头嗅探），不再靠扩展名。
+`config.yaml` 删除 `cfreds_hacking_case`，案件信息移入 `case.yaml`。
+
+### 4.2 工具注册按源参数化（解决 P1）
+
+现状：`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具
+（`tool_registry.py:159+`）。改造：
+
+- 工具执行器签名增加 `source_id`；执行时经 `SourceRegistry` 解析出真实 path/offset/mode。
+- `TOOL_CATALOG` 按 `access_mode` 标注工具适用性；agent 拿到的工具集由其
+  负责的源类型决定。
+- **「当前源」上下文**：编排器为 agent 设置 current source（类比现有
+  `graph._current_agent`），工具默认作用于它——LLM 不必每次传 `source_id`
+  （减少出错）。跨源工具（时间线合并、实体查询）显式跨源。
+- 缓存键 `_cache_key`（`tool_registry.py:41`）纳入 `source_id`，防止跨源串味。
+
+### 4.3 图写入网关（解决 P2，落实原则 1）
+
+现状：agent 通过 `add_phenomenon` 等工具直接写图，约束只在 prompt。改造：
+
+- 所有图变更（`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …）
+  收敛到**一个写入网关**。网关在代码层强制前置条件。
+- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流
+  （INVESTIGATE→RECORD→LINK→ANSWER）不变——变的是 RECORD 这一步底下的网关变严。
+- `base_agent.py` 的 `mandatory_record_tools` 机制保留（它保证 agent 真的记录了东西）。
+
+### 4.4 证据落地约束 Grounding（解决 P2，落实原则 2）
+
+这是系统可靠性的核心机制。
+
+**ToolInvocationLog**：每次工具调用留痕一条记录
+`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`。
+现有结果缓存（`tool_registry.py:29`）已存确定性输出，扩展为完整留痕即可。
+
+**Phenomenon 一分为二**——把「事实」和「解读」分开：
+
+- `verified_facts`: `list[{type, value, invocation_id}]`，
+  `type ∈ {path, timestamp, inode, hash, identifier, count, ...}`。
+- `interpretation`: 自由文本，agent 的分析叙述。
+
+**`add_phenomenon` 网关前置条件**：
+
+1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`。
+2. 代码校验 `fact.value` 命中该次调用的输出：
+   - 文本输出 → 逐字 substring 匹配；
+   - 结构化/二进制工具输出 → 与解析后的字段匹配。
+3. 任一 fact 不通过 → **整条拒绝写入**，返回失败的 fact，agent 须修正重试。
+4. 通过 → 写入；`verified_facts` 每条带 `invocation_id`（可重跑复核），
+   `interpretation` 标记为「未核验分析」。
+
+**效果**：在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」
+**结构性地不可能**。LLM 仍可能写错 `interpretation`，但报告会把
+verified facts（带重跑指令的引证）与 interpretation（明确标注的分析）
+**分开渲染**，人类调查员一眼可辨。这是诚实划定边界的可靠性保证。
+
+> 现有 `_make_auto_record`（`tool_registry.py:126`）把工具输出直接转 phenomenon——
+> 那是「平凡落地」的特例（描述即输出），新设计是它的一般化与形式化。
+
+### 4.5 假设置信度：似然比 / 对数几率（解决 P3）
+
+把 `evidence_graph.py:26` 的 `_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」
+换成基于**似然比（LR）**的对数几率累加：
+
+- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类**
+  （这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …），
+  数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**（延续现有「LLM 选类型、
+  代码算数值」哲学并赋予统计基础）。
+- 置信度更新：
+  ```
+  L_post = L_prior + Σ log₁₀(LR_i)        # 对数几率，可交换 → 无序依赖
+  confidence = 1 / (1 + 10^(−L_post))
+  ```
+- 边类型 → `log₁₀(LR)` 标定表（初值，后续可由标注案例校准）：
+
+  | 边类型 | log₁₀LR |
+  |---|---:|
+  | `direct_evidence` | +2.0 |
+  | `supports` / `consequence_observed` | +1.0 |
+  | `prerequisite_met` | +0.5 |
+  | `weakens` | −0.5 |
+  | `contradicts` | −2.0 |
+
+- 阈值不变（≥0.8 supported / ≤0.2 refuted），只是改由 `L_post` 推出。
+- `prior_prob` 成为可配置量（默认 0.5 → `L_prior=0`）。
+- **简化假设说明**：多条边按独立处理（朴素贝叶斯）。同类证据反复出现并非
+  完全独立——加一个旋钮：同 `(hypothesis, edge_type)` 的边数封顶或衰减，避免
+  「同一发现被多 agent 重复入图」虚高置信度（现有 Jaccard 去重已部分缓解）。
+
+附带产出一个 **假设 × 证据矩阵**视图，供报告与线索选择使用。
+
+### 4.6 跨源实体解析（解决「复杂场景」的关联难题，落实原则 4）
+
+复杂取证的核心难题：iPhone keychain 里的 Apple ID、安卓短信库里的号码、
+USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人？**
+
+**关键设计：「身份共指」本身就是一条假设**——于是实体解析不是独立子系统，
+而是 4.5 假设机制的复用：
+
+- agent 观察到标识符即经网关 `observe_identity`，记一条**类型化**的标识符
+  （强标识符：IMEI / 钱包地址 / email / 电话号；弱标识符：昵称 / 显示名），
+  挂到暂定 `Entity`。
+- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`；共享强标识符 = 强 +LR 边，
+  共享弱标识符 = 弱 +LR 边，冲突的强标识符 = 强 −LR 边——用 4.5 同一套计算打分。
+- **不做破坏性归并**：跨阈值时在两个 Entity 间加一条 `same_as` 边（由该 coref
+  假设背书）。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、
+  可被后续 contradicts 证据推翻**（落实原则 4）。
+- **Blocking**：只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设，
+  避免 O(n²)。
+
+跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。
+
+### 4.7 能力插件层（接入 5 类证据）
+
+每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。
+
+| 插件 | 摄取 | 新工具 | 知识源 agent |
+|---|---|---|---|
+| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` |
+| **安卓整盘** | `mmls` 分区→各分区 `image` 源；可 mount 为 `tree` | 复用 TSK；ext4/F2FS 读取；`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` |
+| **磁盘镜像(E01)** | 已支持（TSK 含 ewf） | 现有 TSK 工具链 | 现有 filesystem/registry |
+| **归档** | `unzip_archive` 通用解包 | —— | —— |
+| **媒体/截图** | —— | `ocr_image`（tesseract；注意 DeepSeek 无视觉能力，必须走 OCR） | `MediaAgent` |
+
+**安卓风险**：`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区
+探明：未加密→TSK 直接用；加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。
+
+`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展：分类由源插件提供自己的
+工件分类表，而非全局 Windows 关键词表（解决 P4）。
+
+### 4.8 Agent 体系重组
+
+现有 7 个 agent 按 Windows 工件命名（registry、communication=邮件/IRC、
+network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特定 agent：
+
+- `agent_factory.py` 的 `_AGENT_CLASSES`（:34-40）扩充：新增 `ios_artifact`、
+  `android_artifact`、`financial`（钱包/交易）、`media`。
+- `communication` 泛化：邮件 + IM + 短信，跨平台。
+- 新增 **源类型 → 适任 agent** 映射，供 Phase 1 逐源派 triage agent。
+- `create_specialized_agent`（:69）的动态组合机制保留——它本就是应对能力缺口的
+  正确手段，只是工具目录变大后选择空间更丰富。
+
+### 4.9 编排器多源流水线
+
+| 阶段 | 改造 |
+|---|---|
+| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**，每源派类型适配的 agent |
+| Phase 2 | 假设跨源生成；身份共指假设在此首次登记 |
+| Phase 3 | leads 派发到源感知 agent；假设×证据矩阵实时更新 |
+| Phase 4 | 跨源时间线合并，**按源做时区归一**（iOS UTC vs 安卓本地时间） |
+| Phase 5 | 一案一份综合报告：含假设结论、实体关联图、每条结论的 provenance 引证 |
+
+断连恢复、运行归档逻辑保留，`graph_state.json` 增量纳入新字段。
+
+---
+
+## 5. 数据模型变更汇总
+
+| 节点/结构 | 变更 |
+|---|---|
+| `EvidenceSource` | **新增**一等节点（`src-*`） |
+| `ToolInvocation` | **新增**留痕记录（`inv-*`），随 graph 持久化 |
+| `Phenomenon` | + `source_id`；description 拆为 `verified_facts[]` + `interpretation`；澄清/移除语义含混的 `confidence`（默认 1.0），观测的可靠性由 grounding 表达 |
+| `Hypothesis` | + `prior_prob`、`log_odds`（累加量）；`confidence` 改为派生值 |
+| `Entity` | + 类型化标识符集合；通过 `same_as` 边跨源连通 |
+| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`） |
+| Entity→Entity 边 | **新增** `same_as`（由 coref 假设背书，可逆） |
+
+`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
+
+---
+
+## 6. 组件改动清单
+
+| 文件 | 改动 |
+|---|---|
+| `case.py` | **新建**：`Case` / `EvidenceSource` / `SourceRegistry` |
+| `main.py` | 选源逻辑改为加载 `Case`；类型探测替代扩展名 glob |
+| `tool_registry.py` | 工具按 `source_id` 参数化；缓存键含 source；`_auto_categorize` 改可扩展；`ToolInvocationLog` |
+| `evidence_graph.py` | 数据模型变更（第 5 节）；LR/对数几率置信度；写入网关 + grounding 校验 |
+| `base_agent.py` | RECORD 走网关；`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 |
+| `agent_factory.py` | `_AGENT_CLASSES` 扩充；源类型→agent 映射 |
+| `orchestrator.py` | Phase 1 逐源；Phase 4 跨源时区归一；Phase 5 综合报告 |
+| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py`；`communication.py` 泛化 |
+| `tools/` | 新增 `mobile_ios.py`（plist/sqlite/keychain）、`media.py`（OCR）、`archive.py`（解包） |
+| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`；新建 `case.yaml` 证据清单 |
+
+---
+
+## 7. 构建顺序（按依赖排序）
+
+| 阶段 | 内容 | 依赖 | 价值 |
+|---|---|---|---|
+| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基；先只在 USB E01 上跑通验证不破坏现有逻辑 |
+| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心；可量化「零幻觉录入」 |
+| **S3** | 4.5 LR/对数几率置信度 | 独立（可与 S2 并行） | 修 P3；置信度可辩护 |
+| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 |
+| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联，复杂场景能力成型 |
+| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 |
+
+S1+S2+S3 是「把系统改对」；S4-S6 是「把能力铺全」。建议严格按序——
+S1 不稳，后面全是空中楼阁。
+
+---
+
+## 8. 设计取舍与未决问题
+
+1. **grounding 对自由文本的边界**：只硬核验 `verified_facts` 里的结构化原子，
+   `interpretation` 不做逐字核验（诚实划界）。可加一个二级 lint：扫描
+   interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。
+2. **LR 标定表初值人定**：先用第 4.5 节的初值跑通；「从标注案例学习 LR」是后续工作。
+3. **安卓 userdata 加密**：能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。
+4. **实体解析的破坏性 vs 可逆**：本设计选**可逆的 `same_as` 边**而非破坏性归并——
+   牺牲一点查询效率换取完全可审计可回滚，符合原则 4。
+5. **报告粒度**：定为「一案一份综合报告」，内嵌每证据小节 + 跨源关联，
+   而非每证据独立成篇。
--- a/agent_factory.py
+++ b/agent_factory.py
@@ -24,9 +24,12 @@ def _load_agent_classes() -> None:
    """Lazy-import agent classes to avoid circular imports."""
    if _AGENT_CLASSES:
        return
+    from agents.android_artifact import AndroidArtifactAgent
    from agents.communication import CommunicationAgent
    from agents.filesystem import FileSystemAgent
    from agents.hypothesis import HypothesisAgent
+    from agents.ios_artifact import IOSArtifactAgent
+    from agents.media import MediaAgent
    from agents.network import NetworkAgent
    from agents.registry import RegistryAgent
    from agents.report import ReportAgent
@@ -38,6 +41,50 @@ def _load_agent_classes() -> None:
    _AGENT_CLASSES["timeline"] = TimelineAgent
    _AGENT_CLASSES["hypothesis"] = HypothesisAgent
    _AGENT_CLASSES["report"] = ReportAgent
+    _AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent
+    _AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent
+    _AGENT_CLASSES["media"] = MediaAgent
+
+
+# Triage agent per (source.type, platform). disk_image is ambiguous on its
+# own — both a Windows USB image and an Android raw dump are disk_image —
+# so the routing helper also looks at source.meta.platform when present.
+SOURCE_TYPE_AGENTS: dict[str, str] = {
+    "disk_image":        "filesystem",       # default for unknown platform
+    "mobile_extraction": "ios_artifact",
+    "archive":           "filesystem",
+    "media_collection":  "media",
+}
+
+# Per-platform overrides for disk_image sources. Keys come from
+# source.meta.platform in case.yaml (lowercased).
+_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = {
+    "windows": "filesystem",
+    "linux":   "filesystem",
+    "android": "android_artifact",
+    "ios":     "ios_artifact",
+}
+
+
+def get_triage_agent_type(source) -> str:
+    """Pick the right Phase-1 agent for *source*.
+
+    Accepts either an :class:`EvidenceSource` or a raw source.type string
+    (for back-compat with the S5 signature). Disk-image sources additionally
+    consult ``source.meta.platform`` so Windows USBs and Android raw dumps —
+    both type=disk_image — get different agents.
+    """
+    # Back-compat: accept a plain type string.
+    if isinstance(source, str):
+        return SOURCE_TYPE_AGENTS.get(source, "filesystem")
+
+    src_type = getattr(source, "type", "disk_image")
+    if src_type == "disk_image":
+        meta = getattr(source, "meta", {}) or {}
+        platform = str(meta.get("platform", "")).lower()
+        if platform in _DISK_IMAGE_PLATFORM_AGENTS:
+            return _DISK_IMAGE_PLATFORM_AGENTS[platform]
+    return SOURCE_TYPE_AGENTS.get(src_type, "filesystem")


 logger = logging.getLogger(__name__)
--- a/agents/android_artifact.py
+++ b/agents/android_artifact.py
@@ -0,0 +1,58 @@
+"""Android Artifact Agent — multi-partition analysis of raw Android dumps.
+
+DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is
+its own analysable surface. Ext4-backed partitions (typically SYSTEM,
+USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw
+partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with
+``search_strings``. Userdata is the prize and is often FBE-encrypted on
+modern devices — the agent must check fsstat before assuming readability
+(see ``probe_android_partitions`` for the survey).
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class AndroidArtifactAgent(BaseAgent):
+    name = "android_artifact"
+    role = (
+        "Android forensic analyst. You navigate raw Android disk dumps "
+        "(blk0_sda-style images) partition by partition. Workflow: call "
+        "probe_android_partitions ONCE to map the disk; pick the partitions "
+        "with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, "
+        "EFS); for each, call set_active_partition(offset_from_512_sector_column) "
+        "and then list_directory / extract_file / search_strings as usual. "
+        "For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly "
+        "to search_strings — they have no filesystem. If USERDATA shows "
+        "fs_type=unknown it is almost certainly FBE-encrypted: record that "
+        "as a negative finding (the absence IS evidence) and move on to "
+        "what's reachable."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            # Android-specific
+            "probe_android_partitions",
+            "set_active_partition",
+            # Reused TSK toolset — partition_offset comes from active_source
+            "partition_info", "filesystem_info", "list_directory",
+            "extract_file", "find_file", "search_strings",
+            "count_deleted_files", "build_filesystem_timeline",
+            # Generic parsers
+            "read_text_file", "read_binary_preview", "search_text_file",
+            "read_text_file_section", "list_extracted_dir", "find_files",
+            # SQLite — Android apps store data in sqlite too (WhatsApp, etc.)
+            "sqlite_tables", "sqlite_query",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
--- a/agents/ios_artifact.py
+++ b/agents/ios_artifact.py
@@ -0,0 +1,49 @@
+"""iOS Artifact Agent — analyses unpacked iOS extractions.
+
+DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family
+the system handles (alongside disk images and pcaps). This agent owns the
+iOS-specific toolset; the grounded ``add_phenomenon`` contract from
+BaseAgent applies unchanged — every fact must cite a tool invocation.
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class IOSArtifactAgent(BaseAgent):
+    name = "ios_artifact"
+    role = (
+        "iOS forensic analyst. You analyse unpacked iOS extractions — "
+        "binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, "
+        "AddressBook.sqlitedb), the keychain (keychain-2.db), and the "
+        "iDevice_info.txt summary — to extract device identity, accounts, "
+        "messaging, contacts, and credential metadata. Domain-rooted iOS "
+        "trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) "
+        "are your map; navigate by path, not by inode."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            # navigation — find_files is the workhorse on 10k+-file iOS trees;
+            # list_extracted_dir is for initial layout summary only.
+            "list_extracted_dir", "find_files",
+            "read_text_file", "read_text_file_section", "read_binary_preview",
+            "search_text_file",
+            # iOS-specific parsers
+            "parse_plist",
+            "sqlite_tables", "sqlite_query",
+            "parse_ios_keychain",
+            "read_idevice_info",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
--- a/agents/media.py
+++ b/agents/media.py
@@ -0,0 +1,52 @@
+"""Media Agent — OCR-based analysis of screenshot/photo evidence.
+
+DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG
+evidence must go through tesseract first. The agent runs OCR, then
+records extracted strings — especially identifiers (wallet addresses,
+phone numbers, usernames) — via the grounded observe_identity gateway so
+they participate in cross-source coref the same way iOS keychain entries
+or Windows account names do.
+
+If the OCR runtime is missing on the host, ocr_image returns an explicit
+install hint; the agent should record that as a negative finding ("no
+text extracted — tesseract not installed") rather than guessing.
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class MediaAgent(BaseAgent):
+    name = "media"
+    role = (
+        "Media / OCR forensic analyst. You analyse screenshots, photos, and "
+        "scanned documents — any pixel-based evidence the LLM cannot read "
+        "directly. Workflow: list_extracted_dir to enumerate images, "
+        "ocr_image on each promising one, then add_phenomenon (with the "
+        "OCR'd text as the verified_fact value) and observe_identity for "
+        "any wallet addresses, phone numbers, email addresses, or "
+        "usernames the text contains. If OCR fails because tesseract is "
+        "missing, RECORD that as a negative finding instead of fabricating "
+        "image content — the absence is a real fact about this run."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            "ocr_image",
+            "list_extracted_dir", "find_files",
+            "read_binary_preview",
+            "read_text_file",
+            "search_text_file",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
--- a/agents/report.py
+++ b/agents/report.py
@@ -12,9 +12,20 @@ class ReportAgent(BaseAgent):
    role = (
        "Forensic report writer. You synthesize all findings from the investigation "
        "into a structured, professional forensic analysis report organized by hypotheses.\n\n"
-        "Only include findings that have a source_tool attribution (marked VERIFIED). "
-        "If evidence lacks source attribution, mark it as UNVERIFIED. "
-        "Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence."
+        "Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), "
+        "TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When "
+        "writing the report, render verified_facts as primary evidence with their "
+        "invocation citations, and render interpretation as 'agent analysis' so the "
+        "reader can tell ground truth from inference. Do NOT invent or fabricate any "
+        "data, timestamps, or findings not present in the evidence.\n\n"
+        "This is a cross-source case: phenomena come from multiple evidence "
+        "sources, and entities discovered on different sources may refer to the "
+        "same real-world actor. ALWAYS include:\n"
+        "  - 'Findings by Source' section sourced from get_phenomena_by_source\n"
+        "  - 'Actor Clusters' section sourced from get_actor_clusters (the "
+        "cross-source attribution view — multi-source clusters answer "
+        "'which findings on different devices belong to the same person')\n"
+        "  - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix"
    )
    # Calling save_report is BOTH the recording action and the completion
    # signal. tool_call_loop returns the moment save_report executes; the
@@ -38,9 +49,12 @@ class ReportAgent(BaseAgent):
            f"Investigation state:\n{self.graph.stats_summary()}\n\n"
            f"Your task: {task}\n\n"
            f"WORKFLOW:\n"
-            f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities, get_case_info "
-            f"   to gather all the data needed for the report. Make these calls in parallel.\n"
-            f"2. Assemble the complete markdown forensic report.\n"
+            f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n"
+            f"   get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n"
+            f"   and get_phenomena_by_source in parallel — these are the eight data\n"
+            f"   sources you assemble the report from.\n"
+            f"2. Assemble the complete markdown forensic report. Cross-source\n"
+            f"   actor clusters and per-source breakdown are MANDATORY sections.\n"
            f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
            f"   This single call is the completion signal — the run ENDS the moment it executes.\n"
            f"   Do NOT call any read tools after this point; they will not run.\n"
@@ -83,6 +97,45 @@ class ReportAgent(BaseAgent):
            executor=self._get_entities,
        )

+        self.register_tool(
+            name="get_hypothesis_evidence_matrix",
+            description=(
+                "Render the hypothesis × evidence pivot as a markdown table. "
+                "Columns: per edge_type counts, log_odds, confidence, status. "
+                "Embed this directly in the report to show how each hypothesis "
+                "stands relative to the others on a single screen."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_hypothesis_evidence_matrix,
+        )
+
+        self.register_tool(
+            name="get_actor_clusters",
+            description=(
+                "Render the cross-source actor clusters: each cluster is the "
+                "set of Entity nodes the system currently treats as the same "
+                "actor (via active same_as edges backed by coref hypotheses "
+                "≥ 0.8). Includes the aggregated identifier evidence per "
+                "cluster. Use this in the report's 'Entities / Actors' "
+                "section so readers see who-is-who across devices, not just "
+                "raw entity rows."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_actor_clusters,
+        )
+
+        self.register_tool(
+            name="get_phenomena_by_source",
+            description=(
+                "Group every phenomenon by its originating evidence source "
+                "(source_id). Use this to drive the report's 'Findings by "
+                "Source' section so each evidence item's per-device "
+                "contribution is auditable."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_phenomena_by_source,
+        )
+
        self.register_tool(
            name="save_report",
            description="Save the final report to a file.",
@@ -115,12 +168,24 @@ class ReportAgent(BaseAgent):
            items = [ph for ph in phenomena.values() if ph.category == cat]
            lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---")
            for ph in items:
-                verified = "VERIFIED" if ph.source_tool else "UNVERIFIED"
-                lines.append(f"\n[{verified}] {ph.title} ({ph.id})")
+                # Grounded = at least one verified fact AND a source_tool.
+                grounded = bool(ph.verified_facts) and bool(ph.source_tool)
+                marker = "GROUNDED" if grounded else (
+                    "TOOL-ONLY" if ph.source_tool else "UNVERIFIED"
+                )
+                lines.append(f"\n[{marker}] {ph.title} ({ph.id})")
                lines.append(f"  Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}")
                if ph.timestamp:
                    lines.append(f"  Timestamp: {ph.timestamp}")
-                lines.append(f"  {ph.description[:500]}")
+                if ph.verified_facts:
+                    lines.append(f"  Verified facts ({len(ph.verified_facts)}):")
+                    for f in ph.verified_facts:
+                        lines.append(
+                            f"    - [{f.get('type','?')}] {str(f.get('value',''))[:200]} "
+                            f"(cite: {f.get('invocation_id','?')})"
+                        )
+                if ph.interpretation:
+                    lines.append(f"  Analysis: {ph.interpretation[:500]}")
        return "\n".join(lines)

    async def _get_hypotheses_with_evidence(self) -> str:
@@ -150,12 +215,87 @@ class ReportAgent(BaseAgent):
        return "\n".join(lines)

    async def _get_case_info(self) -> str:
-        info = self.graph.case_info
        lines = ["=== Case Information ==="]
-        for k, v in info.items():
-            lines.append(f"  {k}: {v}")
-        lines.append(f"  Image path: {self.graph.image_path}")
-        lines.append(f"  Partition offset: {self.graph.partition_offset}")
+        case = self.graph.case
+        if case is not None:
+            lines.append(f"  case_id: {case.case_id}")
+            lines.append(f"  name: {case.name}")
+            for k, v in (case.meta or {}).items():
+                lines.append(f"  {k}: {v}")
+            lines.append(f"  sources: {len(case.sources)}")
+            for s in case.sources:
+                owner = f", owner={s.owner}" if s.owner else ""
+                platform = s.meta.get("platform") if s.meta else None
+                plat = f", platform={platform}" if platform else ""
+                lines.append(
+                    f"    - {s.id}: {s.label} "
+                    f"(type={s.type}, mode={s.access_mode}{plat}{owner})"
+                )
+        else:
+            # Legacy single-image fallback — surface whatever case_info dict
+            # was passed in (e.g. the old CFReDS MD5 block).
+            for k, v in (self.graph.case_info or {}).items():
+                lines.append(f"  {k}: {v}")
+            lines.append(f"  Image path: {self.graph.image_path}")
+            lines.append(f"  Partition offset: {self.graph.partition_offset}")
+        return "\n".join(lines)
+
+    async def _get_hypothesis_evidence_matrix(self) -> str:
+        return self.graph.hypothesis_evidence_matrix_markdown()
+
+    async def _get_actor_clusters(self) -> str:
+        clusters = self.graph.actor_clusters()
+        if not clusters:
+            return "(no entities recorded)"
+        # Show multi-member clusters first — they're the cross-source links
+        # the human reader most needs to see.
+        clusters.sort(key=lambda c: (-len(c["members"]), c["members"]))
+        lines = [f"=== Actor Clusters ({len(clusters)}) ==="]
+        for i, c in enumerate(clusters, 1):
+            members = c["members"]
+            label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity"
+            lines.append(f"\n[{label} #{i}] {len(members)} member(s):")
+            for eid in members:
+                ent = self.graph.entities.get(eid)
+                if ent:
+                    lines.append(f"  - {ent.summary()}")
+            if c["identifiers"]:
+                lines.append("  Aggregated identifiers:")
+                for ident in c["identifiers"]:
+                    strong_tag = "strong" if ident.get("strong") else "weak"
+                    lines.append(
+                        f"    [{strong_tag}] {ident.get('type')}={ident.get('value')} "
+                        f"(on {ident.get('on_entity')})"
+                    )
+            if c["coref_hypotheses"]:
+                lines.append("  Backing coref hypotheses (≥0.8 active):")
+                for hid in c["coref_hypotheses"]:
+                    hyp = self.graph.hypotheses.get(hid)
+                    if hyp:
+                        lines.append(f"    - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}")
+        return "\n".join(lines)
+
+    async def _get_phenomena_by_source(self) -> str:
+        by_src: dict[str, list] = {}
+        for ph in self.graph.phenomena.values():
+            by_src.setdefault(ph.source_id or "(unbound)", []).append(ph)
+        if not by_src:
+            return "(no phenomena recorded)"
+        # Resolve source labels via graph.case when possible.
+        def _label(src_id: str) -> str:
+            if self.graph.case:
+                src = self.graph.case.get_source(src_id)
+                if src:
+                    return f"{src_id} — {src.label} ({src.type})"
+            return src_id
+
+        lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="]
+        for src_id in sorted(by_src):
+            phs = by_src[src_id]
+            lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---")
+            for ph in phs:
+                grounded = "G" if ph.verified_facts and ph.source_tool else "·"
+                lines.append(f"  [{grounded}] {ph.summary()}")
        return "\n".join(lines)

    async def _get_entities(self) -> str:
@@ -174,18 +314,27 @@ class ReportAgent(BaseAgent):
        return "\n".join(lines)

    async def _verify_phenomena(self) -> str:
-        verified = []
-        unverified = []
+        grounded: list[str] = []
+        tool_only: list[str] = []
+        unverified: list[str] = []
        for ph in self.graph.phenomena.values():
-            entry = f"  [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})"
-            if ph.source_tool:
-                verified.append(entry)
+            nf = len(ph.verified_facts)
+            entry = (
+                f"  [{ph.category}] {ph.title} "
+                f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})"
+            )
+            if ph.verified_facts and ph.source_tool:
+                grounded.append(entry)
+            elif ph.source_tool:
+                tool_only.append(entry)
            else:
                unverified.append(entry)

        lines = ["=== Phenomena Verification Report ==="]
-        lines.append(f"\nVERIFIED ({len(verified)} — have source_tool):")
-        lines.extend(verified)
+        lines.append(f"\nGROUNDED ({len(grounded)} — facts + source_tool):")
+        lines.extend(grounded)
+        lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):")
+        lines.extend(tool_only)
        lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):")
        lines.extend(unverified)
        return "\n".join(lines)
--- a/agents/timeline.py
+++ b/agents/timeline.py
@@ -122,7 +122,15 @@ class TimelineAgent(BaseAgent):
        lines = []
        for ph in items:
            lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})")
-            lines.append(f"  {ph.description[:150]}")
+            preview = ph.interpretation[:150] if ph.interpretation else ""
+            if ph.verified_facts:
+                fact_preview = ", ".join(
+                    f"{f.get('type','?')}={str(f.get('value',''))[:40]}"
+                    for f in ph.verified_facts[:3]
+                )
+                preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]"
+            if preview:
+                lines.append(f"  {preview}")
        return "\n".join(lines)

    async def _add_temporal_edge(
--- a/base_agent.py
+++ b/base_agent.py
@@ -5,6 +5,7 @@ from __future__ import annotations
 import json
 import logging
 import time
+import uuid
 from typing import Any

 from evidence_graph import EvidenceGraph
@@ -36,7 +37,9 @@ class BaseAgent:
    # forced retry with an explicit "you forgot to record" instruction.
    # Subclasses override to declare their own recording responsibility
    # (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
-    mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",)
+    # observe_identity (S5) counts as a recording too — it writes through the
+    # same grounding gateway and produces an identity_observation phenomenon.
+    mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity")

    # Tools whose invocation ends the run immediately. After any terminal tool
    # is called, tool_call_loop returns with that tool's result text as
@@ -110,8 +113,23 @@ class BaseAgent:
            f"  Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n"
            f"  Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n"
            f"  Create add_lead for anything outside your expertise.\n\n"
-            f"Phase B — RECORD PHENOMENA:\n"
-            f"  For EACH significant finding from Phase A, call add_phenomenon.\n"
+            f"Phase B — RECORD PHENOMENA (GROUNDED):\n"
+            f"  For EACH significant finding from Phase A, call add_phenomenon with:\n"
+            f"    * interpretation: your analysis — free text, NOT verified.\n"
+            f"    * verified_facts: one entry per concrete atom (path, timestamp,\n"
+            f"      inode, hash, identifier, count) you want recorded as truth.\n"
+            f"      Each entry MUST have:\n"
+            f"        - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n"
+            f"        - value: a VERBATIM substring from the tool output\n"
+            f"        - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n"
+            f"          header at the top of the tool result that produced this value\n"
+            f"  IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n"
+            f"  whenever you see an email, phone number, Apple ID, IMEI, wallet\n"
+            f"  address, MAC, UDID, persistent nickname, or display name. Same\n"
+            f"  grounding contract: value must be verbatim in the cited tool\n"
+            f"  output. This is HOW cross-source attribution gets built — without\n"
+            f"  it, we can't tell whether the Apple ID in keychain belongs to the\n"
+            f"  same person as the Windows account on the USB.\n"
            f"  Do NOT call link_to_entity yet — just record all phenomena first.\n\n"
            f"Phase C — LINK ENTITIES:\n"
            f"  FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
@@ -125,20 +143,22 @@ class BaseAgent:
            f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
            f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
            f"a registry key) and found NOTHING, that absence IS evidence — call "
-            f"add_phenomenon with a 'No matches for X' title and the search scope in "
-            f"raw_data. Negative findings constrain the hypothesis space and prevent "
-            f"the next agent from wasting time re-searching.\n"
+            f"add_phenomenon with a 'No matches for X' title, the search scope in "
+            f"raw_data, and cite the search tool's invocation_id (verified_facts may "
+            f"be empty for a true negative; the cited invocation in source_tool still "
+            f"anchors it). Negative findings constrain the hypothesis space.\n"
            f"- If you stop without having called add_phenomenon at least once, the task "
-            f"is FAILED and a forced retry will fire.\n"
-            f"- Include exact file paths, inode numbers, timestamps, and the source_tool "
-            f"that produced each finding.\n\n"
-            f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
-            f"- ONLY record findings that appear VERBATIM in tool results you received\n"
-            f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
-            f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n"
-            f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n"
-            f"- Quote exact strings from tool output when recording evidence descriptions\n"
-            f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools"
+            f"is FAILED and a forced retry will fire.\n\n"
+            f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n"
+            f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n"
+            f"  is what you cite in each fact's invocation_id.\n"
+            f"- fact.value must be a substring of the cited invocation's output.\n"
+            f"  Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n"
+            f"  anything else fabricated is REJECTED with a per-fact reason.\n"
+            f"- On REJECTED: quote the literal text from the output (or drop the\n"
+            f"  fact), and put guesses / inferred paths / model names in\n"
+            f"  `interpretation` instead. Then call add_phenomenon again.\n"
+            f"- You may cite ONLY invocations made within THIS task."
        )

    async def run(self, task: str, lead_id: str | None = None) -> str:
@@ -146,6 +166,11 @@ class BaseAgent:
        _log(task, event="agent_start", agent=self.name)
        self.graph.agent_status[self.name] = "running"
        self.graph._current_agent = self.name
+        # Fresh task scope per agent run. Used by the grounding gateway to
+        # check that facts in add_phenomenon cite invocations made *within
+        # this run* — preventing the agent from forwarding stale IDs from
+        # earlier work or another agent.
+        self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}"
        self._current_lead_id = lead_id

        self._register_graph_tools()
@@ -350,20 +375,67 @@ class BaseAgent:
        self.register_tool(
            name="add_phenomenon",
            description=(
-                "Record a forensic finding (phenomenon) on the evidence graph. "
-                "You MUST specify source_tool: the name of the tool call that produced this finding."
+                "Record a forensic finding on the evidence graph. The finding is "
+                "split into provenance-bound atoms (verified_facts) and free-form "
+                "analysis (interpretation). Each fact MUST cite the invocation_id "
+                "of a tool call you made in THIS task — the gateway checks every "
+                "fact's value against that call's real output, byte-for-byte. "
+                "Any fact that fails grounding causes the whole record to be "
+                "rejected with a list of failures; fix the facts and call again."
            ),
            input_schema={
                "type": "object",
                "properties": {
                    "category": {"type": "string", "description": "Category of the finding."},
                    "title": {"type": "string", "description": "Short title."},
-                    "description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."},
+                    "interpretation": {
+                        "type": "string",
+                        "description": (
+                            "Free-form analysis text — your reasoning, why this "
+                            "matters, what it implies. NOT verified by the gateway. "
+                            "Rendered in reports as 'agent analysis', not truth."
+                        ),
+                    },
+                    "verified_facts": {
+                        "type": "array",
+                        "description": (
+                            "Atoms you want preserved as ground truth. Each must "
+                            "appear verbatim in the cited tool output."
+                        ),
+                        "items": {
+                            "type": "object",
+                            "properties": {
+                                "type": {
+                                    "type": "string",
+                                    "description": (
+                                        "Kind of fact: path, timestamp, inode, "
+                                        "hash, identifier, count, raw, ..."
+                                    ),
+                                },
+                                "value": {
+                                    "type": "string",
+                                    "description": (
+                                        "Verbatim substring from the cited tool "
+                                        "output. The gateway does a literal "
+                                        "string-in-string check — no paraphrasing."
+                                    ),
+                                },
+                                "invocation_id": {
+                                    "type": "string",
+                                    "description": (
+                                        "ID from the '[invocation: inv-xxx]' header "
+                                        "of the tool call that produced this value."
+                                    ),
+                                },
+                            },
+                            "required": ["type", "value", "invocation_id"],
+                        },
+                    },
                    "raw_data": {"type": "object", "description": "Structured raw data supporting this finding."},
                    "timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."},
                    "source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."},
                },
-                "required": ["category", "title", "description", "source_tool"],
+                "required": ["category", "title", "source_tool"],
            },
            executor=self._add_phenomenon,
        )
@@ -414,6 +486,67 @@ class BaseAgent:
            executor=self._link_to_entity,
        )

+        self.register_tool(
+            name="observe_identity",
+            description=(
+                "Record a typed identifier (email / phone / Apple ID / IMEI / "
+                "wallet address / nickname / display name / …) for an entity. "
+                "Goes through the same grounding gateway as add_phenomenon — "
+                "value MUST be a verbatim substring of the cited tool output. "
+                "After attachment, the engine automatically proposes / "
+                "strengthens / weakens cross-source coreference hypotheses "
+                "between this entity and any others carrying the same or "
+                "conflicting identifiers. This is how 'is the Apple ID in iOS "
+                "keychain the same person as the Windows login name?' gets "
+                "answered. Call this in ADDITION to add_phenomenon for "
+                "identifier-bearing findings."
+            ),
+            input_schema={
+                "type": "object",
+                "properties": {
+                    "entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."},
+                    "entity_type": {
+                        "type": "string",
+                        "enum": ["person", "program", "file", "host", "ip_address"],
+                        "description": "Kind of entity this identifier belongs to (usually 'person').",
+                    },
+                    "identifier_type": {
+                        "type": "string",
+                        "description": (
+                            "Strong (near-unique): email, phone_number, imei, "
+                            "imsi, apple_id, icloud_id, google_account, "
+                            "wallet_address, udid, mac_address, device_serial. "
+                            "Weak (free-form, may collide): nickname, "
+                            "display_name, username, screen_name."
+                        ),
+                    },
+                    "value": {
+                        "type": "string",
+                        "description": (
+                            "The identifier value, quoted VERBATIM from the "
+                            "tool output you cite in invocation_id."
+                        ),
+                    },
+                    "invocation_id": {
+                        "type": "string",
+                        "description": (
+                            "ID from the '[invocation: inv-xxx]' header of "
+                            "the tool call that surfaced this identifier."
+                        ),
+                    },
+                    "source_tool": {
+                        "type": "string",
+                        "description": "Name of the tool that produced the identifier.",
+                    },
+                },
+                "required": [
+                    "entity_name", "entity_type", "identifier_type",
+                    "value", "invocation_id",
+                ],
+            },
+            executor=self._observe_identity,
+        )
+
    # ---- Tool executors -----------------------------------------------------

    async def _list_phenomena(self, category: str | None = None) -> str:
@@ -453,16 +586,29 @@ class BaseAgent:
        self,
        category: str,
        title: str,
-        description: str,
+        interpretation: str = "",
+        verified_facts: list[dict] | None = None,
        raw_data: dict | None = None,
        timestamp: str | None = None,
        source_tool: str = "",
+        # Back-compat: older prompts (and accidental LLM emissions) may pass
+        # ``description``; treat it as ``interpretation`` rather than failing.
+        description: str | None = None,
    ) -> str:
+        if description and not interpretation:
+            interpretation = description
+        # GroundingError propagates: llm_client._execute_single_tool turns
+        # raised exceptions into "Error executing add_phenomenon: <msg>" tool
+        # results the LLM sees, and _wrap_record_executor does NOT increment
+        # the mandatory-record counter (the increment only runs after a
+        # successful return), so the forced-retry mechanism still fires if
+        # the agent never lands a grounded phenomenon.
        pid, merged = await self.graph.add_phenomenon(
            source_agent=self.name,
            category=category,
            title=title,
-            description=description,
+            interpretation=interpretation,
+            verified_facts=verified_facts,
            raw_data=raw_data,
            timestamp=timestamp,
            source_tool=source_tool,
@@ -508,6 +654,51 @@ class BaseAgent:
        status = "linked to existing" if existing else "created and linked"
        return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}"

+    async def _observe_identity(
+        self,
+        entity_name: str,
+        entity_type: str,
+        identifier_type: str,
+        value: str,
+        invocation_id: str,
+        source_tool: str = "",
+    ) -> str:
+        # GroundingError / ValueError propagate to llm_client's per-tool
+        # exception handler, which formats them back to the LLM. That keeps
+        # the mandatory-record counter honest — only a successful return
+        # triggers the increment in _wrap_record_executor.
+        result = await self.graph.observe_identity(
+            entity_name=entity_name,
+            entity_type=entity_type,
+            identifier_type=identifier_type,
+            value=value,
+            source_agent=self.name,
+            source_tool=source_tool,
+            invocation_id=invocation_id,
+        )
+        lines = [
+            f"Identity observed: {identifier_type}={value} "
+            f"on entity {result['entity_id']} ({entity_name})."
+        ]
+        if result.get("new_identifier"):
+            lines.append(
+                f"  Observation phenomenon: {result['phenomenon_id']}"
+            )
+        else:
+            lines.append("  (identifier already recorded on this entity — idempotent)")
+        for prop in result.get("coref_proposals", []):
+            lines.append(
+                f"  → Coref candidate: {prop['other_entity_id']} via "
+                f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, "
+                f"hypothesis={prop['hypothesis_id']})"
+            )
+            for c in prop.get("conflicts", []):
+                lines.append(
+                    f"      ⚠ conflict on {c['type']}: "
+                    f"{c['new_value']} vs {c['other_value']}"
+                )
+        return "\n".join(lines)
+
    async def _list_assets(self, category: str | None = None) -> str:
        results = self.graph.list_assets(category)
        if not results:
--- a/case.example.yaml
+++ b/case.example.yaml
@@ -0,0 +1,41 @@
+# MASForensics case definition — template
+#
+# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
+# exists in the working directory, `python main.py` loads it automatically;
+# otherwise main.py falls back to interactive single-image selection.
+#
+# A case is a set of evidence sources. Each source has:
+#   id              optional — auto-derived from label if omitted ("src-<slug>")
+#   label           human-readable name
+#   type            disk_image | mobile_extraction | archive | media_collection
+#   access_mode     image | tree   (optional — defaults by type)
+#                     image = block device / disk image, navigated by Sleuth Kit
+#                     tree  = mounted filesystem / unpacked extraction, path-based
+#   owner           optional — the person the source is associated with
+#   path            filesystem path (relative paths resolve against this file)
+#   partition_offset  image-mode only — sector offset of the partition to analyze
+#   meta            optional free-form notes
+#
+# NOTE: at the current refit stage only image-mode (disk) sources are
+# analysable; tree-mode sources are accepted but skipped.
+
+case_id: example-case
+name: "Example forensic case"
+meta:
+  notes: "free-form case-level metadata"
+
+sources:
+  - id: src-suspect-laptop
+    label: "Suspect laptop disk image"
+    type: disk_image
+    access_mode: image
+    owner: "John Doe"
+    path: image/suspect_laptop.E01
+    partition_offset: 0               # run `mmls <image>` to find the right offset
+
+  - id: src-suspect-phone
+    label: "Suspect phone extraction"
+    type: mobile_extraction
+    access_mode: tree
+    owner: "John Doe"
+    path: image/suspect_phone.zip
--- a/case.py
+++ b/case.py
@@ -0,0 +1,226 @@
+"""Case and evidence-source model — the foundation for multi-evidence analysis.
+
+A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source
+has a *type* (disk image, mobile extraction, archive, ...) and an *access mode*
+that determines how forensic tools reach its contents:
+
+  - ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via
+    inode addressing (raw, E01, dd, ...).
+  - ``"tree"``  — an already-mounted filesystem or unpacked extraction,
+    navigated by ordinary filesystem paths.
+
+This module is pure data model + loading. Partition probing and interactive
+selection live in ``main.py``.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Recognised source types and access modes.
+SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"}
+ACCESS_MODES = {"image", "tree"}
+
+# Disk-image file extensions for interactive discovery.
+# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed
+# raw block-device dumps such as ``blk0_sda.bin``.
+DISK_IMAGE_EXTS = {
+    ".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd",
+}
+
+# Default access mode per source type.
+_DEFAULT_ACCESS_MODE = {
+    "disk_image": "image",
+    "mobile_extraction": "tree",
+    "archive": "tree",
+    "media_collection": "tree",
+}
+
+
+def slugify(text: str) -> str:
+    """Reduce *text* to a lowercase, hyphen-separated slug for use in IDs."""
+    slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
+    return slug or "src"
+
+
+@dataclass
+class EvidenceSource:
+    """One piece of evidence within a :class:`Case`."""
+
+    id: str                       # "src-<slug>"
+    label: str                    # human-readable name
+    type: str                     # one of SOURCE_TYPES
+    path: str                     # filesystem path to the evidence
+    access_mode: str              # "image" | "tree"
+    owner: str = ""               # associated person, if known
+    partition_offset: int = 0     # sector offset (image-mode sources only)
+    meta: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        return asdict(self)
+
+    @classmethod
+    def from_dict(cls, d: dict) -> EvidenceSource:
+        """Reconstruct from a dict, ignoring unknown keys (forward-compatible)."""
+        known = set(cls.__dataclass_fields__)
+        return cls(**{k: v for k, v in d.items() if k in known})
+
+    def summary(self) -> str:
+        loc = (
+            f"@{self.partition_offset}"
+            if self.access_mode == "image" and self.partition_offset
+            else ""
+        )
+        owner = f" owner={self.owner}" if self.owner else ""
+        return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}"
+
+
+@dataclass
+class Case:
+    """A forensic case: a set of evidence sources plus metadata."""
+
+    case_id: str
+    name: str
+    sources: list[EvidenceSource] = field(default_factory=list)
+    meta: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        return {
+            "case_id": self.case_id,
+            "name": self.name,
+            "sources": [s.to_dict() for s in self.sources],
+            "meta": dict(self.meta),
+        }
+
+    @classmethod
+    def from_dict(cls, d: dict) -> Case:
+        return cls(
+            case_id=d.get("case_id", ""),
+            name=d.get("name", ""),
+            sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])],
+            meta=d.get("meta", {}),
+        )
+
+    def get_source(self, source_id: str) -> EvidenceSource | None:
+        for s in self.sources:
+            if s.id == source_id:
+                return s
+        return None
+
+
+# ---------------------------------------------------------------------------
+# case.yaml loading
+# ---------------------------------------------------------------------------
+
+def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource:
+    """Validate and normalise one source entry from case.yaml.
+
+    Missing ``id`` is derived from the label; missing ``access_mode`` defaults
+    by type; relative paths are resolved against *base_dir* (the case file's
+    directory).
+    """
+    label = str(raw.get("label") or raw.get("id") or f"source-{index}")
+    src_type = str(raw.get("type", "disk_image"))
+    if src_type not in SOURCE_TYPES:
+        logger.warning("Unknown source type %r for %r — treating as disk_image",
+                        src_type, label)
+        src_type = "disk_image"
+
+    access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree"))
+    if access_mode not in ACCESS_MODES:
+        logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label)
+        access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree")
+
+    src_id = str(raw.get("id") or f"src-{slugify(label)}")
+    if not src_id.startswith("src-"):
+        src_id = f"src-{slugify(src_id)}"
+
+    raw_path = str(raw.get("path", "")).strip()
+    path = raw_path
+    if raw_path:
+        p = Path(raw_path).expanduser()
+        if not p.is_absolute():
+            p = (base_dir / p)
+        path = str(p)
+
+    return EvidenceSource(
+        id=src_id,
+        label=label,
+        type=src_type,
+        path=path,
+        access_mode=access_mode,
+        owner=str(raw.get("owner", "")),
+        partition_offset=int(raw.get("partition_offset", 0) or 0),
+        meta=dict(raw.get("meta", {})),
+    )
+
+
+def build_case(data: dict, base_dir: Path | None = None) -> Case:
+    """Build a validated :class:`Case` from a loosely-typed case.yaml dict."""
+    base_dir = base_dir or Path.cwd()
+    sources: list[EvidenceSource] = []
+    seen_ids: set[str] = set()
+    for i, raw in enumerate(data.get("sources", []) or []):
+        if not isinstance(raw, dict):
+            logger.warning("Skipping malformed source entry #%d", i)
+            continue
+        src = _build_source(raw, base_dir, i)
+        if src.id in seen_ids:
+            src.id = f"{src.id}-{i}"
+        seen_ids.add(src.id)
+        if not src.path:
+            logger.warning("Source %r has no path — keeping but it is not analysable",
+                            src.label)
+        sources.append(src)
+
+    return Case(
+        case_id=str(data.get("case_id", "case")),
+        name=str(data.get("name", "Untitled case")),
+        sources=sources,
+        meta=dict(data.get("meta", {})),
+    )
+
+
+def load_case(path: str | Path = "case.yaml") -> Case | None:
+    """Load a :class:`Case` from a case.yaml file. Returns None if absent."""
+    case_path = Path(path)
+    if not case_path.exists():
+        return None
+    import yaml
+
+    try:
+        data = yaml.safe_load(case_path.read_text()) or {}
+    except Exception as e:
+        logger.error("Failed to parse %s: %s", case_path, e)
+        return None
+    if not isinstance(data, dict):
+        logger.error("%s is not a YAML mapping", case_path)
+        return None
+
+    case = build_case(data, base_dir=case_path.resolve().parent)
+    logger.info("Loaded case %r with %d source(s) from %s",
+                case.name, len(case.sources), case_path)
+    return case
+
+
+def single_source_case(
+    image_path: str,
+    partition_offset: int = 0,
+    label: str | None = None,
+) -> Case:
+    """Wrap a single disk image as a one-source Case (interactive fallback)."""
+    name = label or Path(image_path).name
+    src = EvidenceSource(
+        id=f"src-{slugify(Path(image_path).stem)}",
+        label=name,
+        type="disk_image",
+        path=image_path,
+        access_mode="image",
+        partition_offset=partition_offset,
+    )
+    return Case(case_id="adhoc", name=name, sources=[src])
--- a/evidence_graph.py
+++ b/evidence_graph.py
--- a/llm_client.py
+++ b/llm_client.py
@@ -142,6 +142,12 @@ READ_ONLY_TOOLS: set[str] = {
    # Parser reads
    "read_text_file", "read_binary_preview", "search_text_file",
    "read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
+    "find_files",
+    # iOS plugin reads (S4)
+    "parse_plist", "sqlite_tables", "sqlite_query",
+    "parse_ios_keychain", "read_idevice_info",
+    # Android + media reads (S6) — set_active_partition is NOT read-only.
+    "probe_android_partitions", "ocr_image",
 }


@@ -503,7 +509,7 @@ class LLMClient:
        tools: list[dict],
        tool_executor: dict[str, Any],
        system: str | None = None,
-        max_iterations: int = 40,
+        max_iterations: int = 60,
        terminal_tools: tuple[str, ...] = (),
    ) -> tuple[str, list[dict]]:
        """Run a tool-calling loop using OpenAI-native tool calls.
--- a/main.py
+++ b/main.py
@@ -15,17 +15,21 @@ from pathlib import Path
 import yaml

 from agent_factory import AgentFactory
+from case import (
+    DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case,
+)
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient
 from log_config import setup_logging
 from orchestrator import AnalysisAborted, Orchestrator
 from tool_registry import register_all_tools
+from tools.archive import unzip_archive_sync

 RUNS_DIR = Path("runs")
 IMAGE_DIR = Path("image")
-
-# Common forensic image extensions (only first segment / single-file formats)
-_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"]
+# Persistent unpack cache for tree-mode sources (zip extractions). Lives
+# at project root so multiple runs can reuse the same unpacked tree.
+SOURCE_CACHE_DIR = Path(".cache/sources")


 def load_config(path: str = "config.yaml") -> dict:
@@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict:
 # ---------------------------------------------------------------------------

 def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]:
-    """Find forensic disk image files under *search_dir*."""
-    images: set[Path] = set()
-    for glob in _IMAGE_GLOBS:
-        images.update(search_dir.glob(glob))
-    return sorted(images)
+    """Find forensic disk image files under *search_dir* (case-insensitive ext)."""
+    if not search_dir.is_dir():
+        return []
+    return sorted(
+        p for p in search_dir.iterdir()
+        if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS
+    )


 def _parse_mmls(output: str) -> list[dict]:
@@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
    images = _discover_images(image_dir)
    if not images:
        print(f"No disk images found in {image_dir}/")
-        print("Supported formats: " + ", ".join(_IMAGE_GLOBS))
+        print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS)))
        sys.exit(1)

    if len(images) == 1:
@@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
        print("Invalid choice.")


+def resolve_case() -> Case:
+    """Resolve the Case to analyze.
+
+    Priority: an explicit case file given as a CLI argument, then ./case.yaml
+    in the working directory, then legacy interactive single-image selection.
+    """
+    # 1. Explicit case file passed on the command line
+    if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")):
+        case = load_case(sys.argv[1])
+        if case is None:
+            print(f"Error: could not load case file {sys.argv[1]}")
+            sys.exit(1)
+        print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
+        return case
+
+    # 2. ./case.yaml in the working directory
+    case = load_case()
+    if case is not None:
+        print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
+        return case
+
+    # 3. Legacy interactive single-image selection
+    cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
+    image_path, partition_offset = select_image_interactive(cli_dir)
+    return single_source_case(image_path, partition_offset)
+
+
+def _is_analysable(src: EvidenceSource) -> bool:
+    """A source is analysable when it has a path AND its mode has tooling.
+
+    S4 lights up tree-mode iOS extractions; image-mode disks were already
+    supported. Media-collection (screenshots) remain skipped until S6.
+    """
+    if not src.path:
+        return False
+    if src.access_mode == "image":
+        return True
+    if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
+        return True
+    return False
+
+
+def list_analysable_sources(case: Case) -> list[EvidenceSource]:
+    """Return every analysable source in the case (orchestrator iterates them).
+
+    Pre-S6 main.py used to force-choose one source here; the multi-source
+    orchestrator (Phase 1 per-source triage) now consumes the full list.
+    Skipped sources are still reported for visibility.
+    """
+    analysable = [s for s in case.sources if _is_analysable(s)]
+    skipped = [s for s in case.sources if not _is_analysable(s)]
+    if skipped:
+        print(
+            f"Note: {len(skipped)} source(s) not analysable in this build: "
+            + ", ".join(f"{s.label} ({s.type})" for s in skipped)
+        )
+    if not analysable:
+        print("No analysable sources in this case.")
+        sys.exit(1)
+    print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:")
+    for s in analysable:
+        print(f"  - {s.summary()}")
+    return analysable
+
+
+def prepare_source(src: EvidenceSource) -> EvidenceSource:
+    """Materialise a tree-mode source for analysis.
+
+    Mobile / archive sources arrive as .zip files. We unpack once into a
+    project-level cache (``.cache/sources/<src.id>/``) and rewrite
+    ``src.path`` to point at the unpacked directory. Idempotent — a
+    second run with the cache present is a no-op (unzip_archive_sync
+    skips files that already exist with the matching size).
+
+    Disk-image and already-tree sources pass through unchanged.
+    """
+    if src.access_mode != "tree":
+        return src
+    p = Path(src.path)
+    if p.is_dir():
+        return src  # already a directory, nothing to do
+    if not p.is_file():
+        print(f"Warning: source path {src.path} does not exist; leaving as-is.")
+        return src
+    if p.suffix.lower() != ".zip":
+        # Other archive types (tar, 7z, ...) — not handled yet.
+        print(f"Warning: tree-mode source {src.id} is not a .zip "
+                f"({p.suffix}); leaving as-is.")
+        return src
+
+    dest = SOURCE_CACHE_DIR / src.id
+    dest.mkdir(parents=True, exist_ok=True)
+    # Password-protected zips (e.g. CTF artefacts) carry their key in
+    # case.yaml's meta.password — never logged, never persisted.
+    password = (src.meta or {}).get("password")
+    pw_note = " (password from meta)" if password else ""
+    print(f"Unpacking {p.name} → {dest}{pw_note} (idempotent) ...")
+    result = unzip_archive_sync(str(p), str(dest), password=password)
+    first_line = result.split("\n", 1)[0]
+    print("  " + first_line)
+    if first_line.startswith("Error:"):
+        # Surface the multi-line guidance from _do_extract verbatim.
+        for extra in result.split("\n")[1:]:
+            print("  " + extra)
+        print(f"  Source {src.id} stays unanalysable until this is resolved.")
+        # Leave src.path unchanged so the source remains marked unanalysable.
+        return src
+    src.path = str(dest)
+    src.access_mode = "tree"
+    return src
+
+
 def find_resumable_run() -> Path | None:
    """Find the most recent incomplete run with a saved graph state."""
    if not RUNS_DIR.exists():
@@ -225,22 +343,30 @@ async def async_main() -> None:

    # Initialize evidence graph
    if graph is None:
-        # CLI arg takes priority, otherwise interactive prompt
-        cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
-        image_path, partition_offset = select_image_interactive(cli_dir)
+        case = resolve_case()
+        # case_info derived from THIS case's meta (case.yaml), not from
+        # config.yaml's legacy `cfreds_hacking_case` block. Without this,
+        # the old CFReDS evidence MD5s would be embedded in reports for
+        # every subsequent unrelated case.
        graph = EvidenceGraph(
-            case_info=config.get("cfreds_hacking_case", {}),
+            case_info=dict(case.meta or {}),
            persist_path=run_dir / "graph_state.json",
-            edge_weights=config.get("hypothesis_edge_weights"),
+            edge_log_lr=config.get("hypothesis_log_lr"),
        )
-        graph.image_path = image_path
-        graph.partition_offset = partition_offset
+        graph.case = case
        graph.extracted_dir = str(run_dir / "extracted")
+        analysable = list_analysable_sources(case)
+        # Prepare every analysable source up front (unzip tree-mode zips,
+        # etc.). Idempotent on cache hits — second run is a no-op.
+        prepared = [prepare_source(s) for s in analysable]
+        # Seed the active source so tools that resolve lazily have a target
+        # before Phase 1 begins; the orchestrator resets it per source.
+        graph.set_active_source(prepared[0])
    else:
        graph._persist_path = run_dir / "graph_state.json"

-    # Register all tools with bound image path
-    register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir)
+    # Register all tools — they resolve the active evidence source at call time
+    register_all_tools(graph)

    # Create agent factory
    factory = AgentFactory(llm, graph)
--- a/orchestrator.py
+++ b/orchestrator.py
@@ -10,7 +10,7 @@ import time
 from datetime import datetime
 from pathlib import Path

-from agent_factory import AgentFactory
+from agent_factory import AgentFactory, get_triage_agent_type
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient, _extract_first_balanced, _safe_json_loads
 from tool_registry import TOOL_CATALOG
@@ -518,7 +518,7 @@ class Orchestrator:
        if not unlinked:
            return

-        valid_types = list(self.graph.edge_weights.keys())
+        valid_types = list(self.graph.edge_log_lr.keys())

        hyp_section = "\n".join(
            f"  [{h.id}] {h.title}: {h.description}" for h in active
@@ -551,7 +551,7 @@ class Orchestrator:
                if (
                    hyp_id in self.graph.hypotheses
                    and ph_id in self.graph.phenomena
-                    and edge_type in self.graph.edge_weights
+                    and edge_type in self.graph.edge_log_lr
                ):
                    await self.graph.update_hypothesis_confidence(
                        hyp_id=hyp_id,
@@ -593,7 +593,7 @@ class Orchestrator:
                    ph_id = j.get("phenomenon_id", "")
                    edge_type = j.get("edge_type", "")
                    reason = j.get("reason", "")
-                    if ph_id in self.graph.phenomena and edge_type in self.graph.edge_weights:
+                    if ph_id in self.graph.phenomena and edge_type in self.graph.edge_log_lr:
                        await self.graph.update_hypothesis_confidence(
                            hyp_id=hyp.id,
                            phenomenon_id=ph_id,
@@ -618,7 +618,10 @@ class Orchestrator:
        phenomena (deterministic — the canonical tool was actually called).
        """
        evidence_text = " ".join(
-            f"{ph.category} {ph.title} {ph.description}".lower()
+            (
+                f"{ph.category} {ph.title} {ph.interpretation} "
+                + " ".join(str(f.get("value", "")) for f in ph.verified_facts)
+            ).lower()
            for ph in self.graph.phenomena.values()
        )
        used_tools: set[str] = {
@@ -747,28 +750,103 @@ class Orchestrator:

    # ---- Main pipeline -------------------------------------------------------

+    # ---- Phase 1 helpers (multi-source triage) -------------------------------
+
+    @staticmethod
+    def _is_analysable(src) -> bool:
+        """Mirror of main._is_analysable so the orchestrator doesn't depend
+        on main.py's import. Disk-image sources need a path; tree-mode
+        sources are analysable when they're mobile_extraction or archive.
+        """
+        if not getattr(src, "path", ""):
+            return False
+        if src.access_mode == "image":
+            return True
+        if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
+            return True
+        # media_collection is analysable too once a MediaAgent is registered.
+        if src.type == "media_collection":
+            return True
+        return False
+
+    def _sources_to_triage(self) -> list:
+        """Pick every analysable source in the case (or fall back to the
+        single active_source for the legacy single-image path).
+        """
+        case = self.graph.case
+        if case is None or not case.sources:
+            return [self.graph.active_source] if self.graph.active_source else []
+        return [s for s in case.sources if self._is_analysable(s)]
+
+    async def _phase1_triage_source(self, src) -> tuple[int, int]:
+        """Run the right triage agent on one source. Returns (Δphenomena, Δleads)."""
+        ph_before = len(self.graph.phenomena)
+        leads_before = sum(1 for l in self.graph.leads if l.status == "pending")
+
+        self.graph.set_active_source(src)
+        agent_type = get_triage_agent_type(src)
+        agent = self.factory.get_or_create_agent(agent_type)
+        if agent is None:
+            logger.warning(
+                "No agent registered for type %s — skipping source %s",
+                agent_type, src.id,
+            )
+            return 0, 0
+
+        _log(
+            f"Phase 1 triage: {src.id} ({src.label}) → {agent_type}",
+            event="dispatch", agent=agent_type, source=src.id,
+        )
+        try:
+            await agent.run(
+                f"Perform an initial Phase-1 triage of source {src.id} "
+                f"({src.label}, type={src.type}). Survey the source's "
+                f"structure, identify the most interesting artefacts, and "
+                f"record significant findings via add_phenomenon. Call "
+                f"observe_identity for any concrete identifiers (email, "
+                f"phone, Apple ID, IMEI, wallet address, persistent "
+                f"username) you encounter — that's how this finding will "
+                f"link across the other sources in the case. Create "
+                f"add_lead for follow-up that's outside your scope."
+            )
+        except Exception as e:
+            logger.error("Phase 1 agent [%s] failed on %s: %s", agent_type, src.id, e)
+
+        return (
+            len(self.graph.phenomena) - ph_before,
+            sum(1 for l in self.graph.leads if l.status == "pending") - leads_before,
+        )
+
    async def run(self, resume_phase: int = 1) -> str:
        """Run the 5-phase hypothesis-driven forensic analysis pipeline."""
-        _log(f"Phase 1: Filesystem Survey (image: {Path(self.graph.image_path).name})", event="phase")
+        sources = self._sources_to_triage()
+        _log(
+            f"Phase 1: per-source triage ({len(sources)} source(s))",
+            event="phase",
+        )

        report = ""
        try:
-            # Phase 1: Initial filesystem survey
+            # Phase 1: Initial per-source triage (S6 multi-source).
+            # Runs sequentially so each agent gets its own task_id scope —
+            # the grounding gateway requires that, and shared graph state
+            # (active_source, partition_offset) would race under parallel
+            # dispatch anyway.
            if resume_phase <= 1:
                t0 = time.monotonic()
                ph_before = len(self.graph.phenomena)
-                fs_agent = self.factory.get_or_create_agent("filesystem")
-                if fs_agent:
-                    await fs_agent.run(
-                        "Perform an initial survey of this disk image. "
-                        "Examine the partition table, filesystem type, and root directory structure. "
-                        "List key user directories and identify interesting files (documents, emails, "
-                        "chat logs, installed programs, registry hives). "
-                        "Create leads for other agents based on what you find."
+                for src in sources:
+                    new_ph, new_leads = await self._phase1_triage_source(src)
+                    _log(
+                        f"  {src.id}: +{new_ph} phenomena, +{new_leads} leads",
+                        event="progress", source=src.id,
                    )
-                new_ph = len(self.graph.phenomena) - ph_before
-                new_leads = sum(1 for l in self.graph.leads if l.status == "pending")
-                _log(f"+{new_ph} phenomena, +{new_leads} leads", event="progress", elapsed=time.monotonic() - t0)
+                total_ph = len(self.graph.phenomena) - ph_before
+                total_leads = sum(1 for l in self.graph.leads if l.status == "pending")
+                _log(
+                    f"Phase 1 total: +{total_ph} phenomena, {total_leads} pending leads",
+                    event="progress", elapsed=time.monotonic() - t0,
+                )

            # Phase 2: Hypothesis generation
            if resume_phase <= 2:
@@ -865,8 +943,15 @@ class Orchestrator:
                    "6. Conclusions and Recommendations"
                )

-                image_stem = Path(self.graph.image_path).stem
-                report_name = f"{image_stem}_forensic_report.md"
+                # Multi-source case → name by case_id (stable across sources).
+                # Legacy single-image runs without a Case → fall back to the
+                # last active image's stem so old workflows still produce a
+                # plausible filename.
+                if self.graph.case and self.graph.case.case_id:
+                    stem = self.graph.case.case_id
+                else:
+                    stem = Path(self.graph.image_path).stem or "case"
+                report_name = f"{stem}_forensic_report.md"
                report_path = (self.run_dir / report_name) if self.run_dir else Path(report_name)
                try:
                    report_path.write_text(report)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,6 +6,8 @@ requires-python = ">=3.14"
 dependencies = [
    "httpx[socks]>=0.28.1",
    "openai>=2.36.0",
+    "pillow>=12.2.0",
+    "pytesseract>=0.3.13",
    "pyyaml",
    "regipy>=6.2.1",
 ]
--- a/regenerate_report.py
+++ b/regenerate_report.py
@@ -32,10 +32,10 @@ async def main() -> None:
    config = yaml.safe_load(open("config.yaml"))
    agent_cfg = config["agent"]

-    # Load graph (edge_weights from config — applied to the loaded graph)
+    # Load graph (edge_log_lr from config — applied to the loaded graph)
    graph = EvidenceGraph.load_state(
        state_path,
-        edge_weights=config.get("hypothesis_edge_weights"),
+        edge_log_lr=config.get("hypothesis_log_lr"),
    )
    print(f"Loaded: {graph.stats_summary()}")

@@ -49,7 +49,7 @@ async def main() -> None:
        thinking_enabled=agent_cfg.get("thinking_enabled", False),
    )

-    register_all_tools(graph.image_path, graph.partition_offset, graph)
+    register_all_tools(graph)
    factory = AgentFactory(llm, graph)

    # Run only the report agent
--- a/tests/test_optimizations.py
+++ b/tests/test_optimizations.py
--- a/tool_registry.py
+++ b/tool_registry.py
@@ -1,6 +1,8 @@
 """Central tool registry — catalogs all available forensic tools.

-Tools are registered once at startup with bound image_path and offset.
+Tools are registered once at startup. Sleuth Kit tools resolve their image
+path and partition offset from graph.active_source at call time, so a single
+registered tool follows whichever evidence source is currently active.
 The AgentFactory uses this catalog to compose agents dynamically.
 """

@@ -14,6 +16,11 @@ import re
 from dataclasses import dataclass, field
 from typing import Any

+from evidence_graph import GroundingError
+from tools import archive as arc
+from tools import media as med
+from tools import mobile_android as android
+from tools import mobile_ios as ios
 from tools import parsers
 from tools import registry as reg
 from tools import sleuthkit as tsk
@@ -35,6 +42,13 @@ CACHEABLE_TOOLS: set[str] = {
    "parse_registry_key", "search_registry", "get_user_activity",
    "read_text_file", "read_binary_preview", "search_text_file",
    "read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
+    "find_files",
+    # iOS (read-only file parses):
+    "parse_plist", "sqlite_tables", "sqlite_query",
+    "parse_ios_keychain", "read_idevice_info",
+    # Android + media (read-only):
+    "probe_android_partitions", "ocr_image",
+    # NB: unzip_archive and set_active_partition are NOT cached — they have side effects.
 }


@@ -45,24 +59,106 @@ def _cache_key(tool_name: str, kwargs: dict) -> str:
    return f"{tool_name}:{args_hash}"


+def _looks_like_error(text: str) -> bool:
+    """Heuristic for unsuccessful tool output (mirrors the prior cache filter)."""
+    return text.startswith("Error") or text.startswith("[Command failed") or text.startswith("[icat failed")
+
+
 def _make_cached(tool_name: str, executor: Any) -> Any:
-    """Wrap an executor with an in-memory result cache."""
+    """Thin in-memory cache wrapper around a tool executor.
+
+    Kept as a standalone primitive (no graph dependency) so unit tests can
+    exercise caching in isolation. Production wiring composes this with
+    invocation logging via :func:`_make_invocation_executor`.
+    """

    async def wrapper(**kwargs) -> str:
        key = _cache_key(tool_name, kwargs)
-        cached = _tool_result_cache.get(key)
-        if cached is not None:
-            logger.debug("Cache hit: %s(%s)", tool_name, kwargs)
-            return cached
+        hit = _tool_result_cache.get(key)
+        if hit is not None:
+            return hit
        result = await executor(**kwargs)
-        # Only cache successful results (not errors)
-        if not result.startswith("Error") and not result.startswith("[Command failed"):
+        if not _looks_like_error(result):
            _tool_result_cache[key] = result
        return result

    return wrapper


+def _make_invocation_executor(
+    tool_name: str,
+    executor: Any,
+    graph: Any,
+    *,
+    cacheable: bool,
+    auto_record_category: str | None = None,
+) -> Any:
+    """Single uniform wrapper around a forensic tool executor.
+
+    Responsibilities (in order):
+      1. Serve from the result cache when ``cacheable=True`` and the key
+         is hot. Cached hits still produce a fresh ToolInvocation record
+         marked ``cached=True`` so the agent can cite their work.
+      2. Call the underlying executor on cache miss; store on success.
+      3. Record a :class:`ToolInvocation` on the graph (this is the
+         provenance unit the grounding gateway looks up).
+      4. (Optionally) auto-record the raw output as a Phenomenon with a
+         single ``type=raw`` fact citing the invocation just made. This
+         replaces the pre-S2 ``_make_auto_record`` shortcut.
+      5. Return the result with a ``[invocation: inv-xxx]`` header so
+         the LLM learns the ID to put in ``add_phenomenon`` facts.
+    """
+
+    async def wrapper(**kwargs) -> str:
+        cached_flag = False
+        cache_hit_key: str | None = None
+        text: str | None = None
+
+        if cacheable:
+            cache_hit_key = _cache_key(tool_name, kwargs)
+            hit = _tool_result_cache.get(cache_hit_key)
+            if hit is not None:
+                logger.debug("Cache hit: %s(%s)", tool_name, kwargs)
+                text, cached_flag = hit, True
+
+        if text is None:
+            text = await executor(**kwargs)
+            if cacheable and cache_hit_key and not _looks_like_error(text):
+                _tool_result_cache[cache_hit_key] = text
+
+        inv_id = await graph.record_tool_invocation(
+            tool=tool_name, args=kwargs, output=text, cached=cached_flag,
+        )
+
+        # Auto-record the raw output as a phenomenon (single grounded fact).
+        # Skipped on error outputs and when no graph is present.
+        if auto_record_category and not _looks_like_error(text):
+            agent = getattr(graph, "_current_agent", "") or "unknown"
+            first_line = text.split("\n", 1)[0][:80]
+            try:
+                await graph.add_phenomenon(
+                    source_agent=agent,
+                    category=auto_record_category,
+                    title=f"{tool_name}: {first_line}",
+                    interpretation="(auto-recorded raw tool output)",
+                    verified_facts=[{
+                        "type": "raw",
+                        "value": text[:2000],
+                        "invocation_id": inv_id,
+                    }],
+                    source_tool=tool_name,
+                )
+            except GroundingError as e:
+                # Should never happen for auto-record (we just wrote the
+                # invocation; value is a literal prefix of output). Log
+                # loudly if it does — that's a bug, not a hallucination.
+                logger.error("Auto-record grounding failed for %s: %s", tool_name, e)
+
+        return f"[invocation: {inv_id}]\n{text}"
+
+    return wrapper
+
+
 def get_cache_stats() -> dict[str, int]:
    """Return cache statistics for diagnostics."""
    return {"entries": len(_tool_result_cache)}
@@ -77,12 +173,11 @@ ASSET_CATEGORIES = [
 ]


-def _auto_categorize(filename: str) -> str:
-    """Infer asset category from filename."""
+def _auto_categorize_windows(filename: str) -> str:
+    """Original Windows-leaning heuristic for disk-image-extracted artifacts."""
    name_lower = filename.lower()
    ext = os.path.splitext(name_lower)[1]

-    # Check full name (with extension) and base name against known hive names
    if name_lower in _REGISTRY_HIVE_NAMES:
        return "registry_hive"
    if ext == ".pf":
@@ -93,7 +188,7 @@ def _auto_categorize(filename: str) -> str:
        return "address_book"
    if name_lower == "info2" or re.match(r"dc\d+\.exe", name_lower):
        return "recycle_bin"
-    # Extension-based checks before keyword-based (e.g. mirc.ini → config, not chat)
+    # Extension-based checks before keyword-based (e.g. mirc.ini → config, not chat).
    if ext in (".ini", ".csv", ".dat", ".cfg"):
        return "config_file"
    if ext in (".log", ".lst"):
@@ -107,6 +202,49 @@ def _auto_categorize(filename: str) -> str:
    return "other"


+def _auto_categorize_ios(filename: str) -> str:
+    """iOS extraction heuristic — plist / sqlite / keychain land here.
+
+    Domain-rooted iOS extractions yield specific filenames (sms.db,
+    AddressBook.sqlitedb, keychain-2.db, *.plist) that the Windows
+    categorizer would dump into 'other' — fixing P4.
+    """
+    name_lower = filename.lower()
+    ext = os.path.splitext(name_lower)[1]
+
+    if name_lower == "keychain-2.db":
+        return "ios_keychain"
+    if name_lower in ("sms.db", "chatstorage.sqlite"):
+        return "messaging_db"
+    if name_lower in ("addressbook.sqlitedb", "addressbookimages.sqlitedb"):
+        return "address_book"
+    if name_lower == "idevice_info.txt":
+        return "device_info"
+    if ext in (".sqlite", ".sqlite3", ".sqlitedb", ".db"):
+        return "sqlite_db"
+    if ext == ".plist":
+        return "plist"
+    if ext in (".log",):
+        return "text_log"
+    return "other"
+
+
+# Per-source-type categorizers — dispatched by _auto_categorize at call time
+# based on graph.active_source.type. Solves P4 (Windows-only categorization).
+_CATEGORIZERS = {
+    "disk_image": _auto_categorize_windows,
+    "mobile_extraction": _auto_categorize_ios,
+    "archive": _auto_categorize_windows,
+    "media_collection": lambda fn: "other",
+}
+
+
+def _auto_categorize(filename: str, source_type: str = "disk_image") -> str:
+    """Dispatch to a source-type-aware categorizer (defaults to Windows)."""
+    fn = _CATEGORIZERS.get(source_type, _auto_categorize_windows)
+    return fn(filename)
+
+
@dataclass
 class ToolDefinition:
    """A registered tool available for agent composition."""
@@ -123,44 +261,53 @@ class ToolDefinition:
 TOOL_CATALOG: dict[str, ToolDefinition] = {}


-def _make_auto_record(tool_name: str, category: str, executor: Any, graph: Any) -> Any:
-    """Wrap a forensic tool to auto-record its result as a phenomenon."""
-
-    async def wrapper(**kwargs) -> str:
-        result = await executor(**kwargs)
-        if graph is None or not result or result.startswith("Error") or result.startswith("["):
-            return result
-        # Auto-record: the tool produced a forensic fact
-        agent = getattr(graph, "_current_agent", "") or "unknown"
-        title = f"{tool_name}: {result.split(chr(10))[0][:80]}"
-        await graph.add_phenomenon(
-            source_agent=agent,
-            category=category,
-            title=title,
-            description=result[:2000],
-            source_tool=tool_name,
-        )
-        return result
-
-    return wrapper
+# Set of (tool_name, category) pairs that auto-record a phenomenon when run.
+# Replaces the pre-S2 ``_make_auto_record`` per-tool wrapping; the central
+# instrumentation pass at the end of register_all_tools applies these.
+AUTO_RECORD_TOOLS: dict[str, str] = {
+    "list_installed_software": "registry",
+    "get_system_info": "registry",
+    "get_timezone_info": "registry",
+    "get_computer_name": "registry",
+    "get_shutdown_time": "registry",
+    "enumerate_users": "registry",
+    "get_network_interfaces": "registry",
+    "get_email_config": "registry",
+    "parse_prefetch": "filesystem",
+}


-def register_all_tools(
-    image_path: str,
-    partition_offset: int,
-    graph: Any = None,
-    extracted_dir: str = "extracted",
-) -> None:
-    """Populate TOOL_CATALOG with all available tools, pre-bound to image/offset."""
+def register_all_tools(graph: Any) -> None:
+    """Populate TOOL_CATALOG with all available forensic tools.
+
+    Tools no longer close over a fixed image path. The Sleuth Kit tools
+    resolve the image path and partition offset from ``graph.active_source``
+    at call time, so the same registered tool follows whichever evidence
+    source the orchestrator has made active.
+    """
    TOOL_CATALOG.clear()

+    def _img() -> str:
+        """Resolve the active source's image path at tool-call time."""
+        src = getattr(graph, "active_source", None)
+        if src is None or not src.path:
+            raise RuntimeError(
+                "No active evidence source — call graph.set_active_source() first."
+            )
+        return src.path
+
+    def _off() -> int:
+        """Resolve the active source's partition offset at tool-call time."""
+        src = getattr(graph, "active_source", None)
+        return src.partition_offset if src is not None else 0
+
    # ---- Sleuth Kit tools ----

    TOOL_CATALOG["partition_info"] = ToolDefinition(
        name="partition_info",
        description="Get the partition table layout of the disk image. Run this first to understand disk structure.",
        input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.partition_info(image_path),
+        executor=lambda: tsk.partition_info(_img()),
        module="sleuthkit",
        tags=["filesystem", "disk", "partition"],
    )
@@ -169,7 +316,7 @@ def register_all_tools(
        name="filesystem_info",
        description="Get detailed filesystem information (type, block size, volume name, etc.) for the selected partition.",
        input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.filesystem_info(image_path, partition_offset),
+        executor=lambda: tsk.filesystem_info(_img(), _off()),
        module="sleuthkit",
        tags=["filesystem", "disk"],
    )
@@ -185,7 +332,7 @@ def register_all_tools(
            },
        },
        executor=lambda inode=None, recursive=False: tsk.list_directory(
-            image_path, partition_offset, inode, recursive
+            _img(), _off(), inode, recursive
        ),
        module="sleuthkit",
        tags=["filesystem", "directory", "listing"],
@@ -204,12 +351,13 @@ def register_all_tools(
                )

        # Resolve real disk path first
-        orig_path = (await tsk.find_file(image_path, inode, partition_offset)).strip()
+        orig_path = (await tsk.find_file(_img(), inode, _off())).strip()
        if not orig_path or "not found" in orig_path.lower():
            return f"Error: inode {inode} not found on the disk image."

        # Derive local filename from real disk path
        filename = os.path.basename(orig_path)
+        extracted_dir = graph.extracted_dir
        local_path = os.path.join(extracted_dir, filename)

        # Handle name collisions by appending inode
@@ -219,12 +367,15 @@ def register_all_tools(
            filename = os.path.basename(local_path)

        # Extract
-        result = await tsk.extract_file(image_path, inode, local_path, partition_offset)
+        result = await tsk.extract_file(_img(), inode, local_path, _off())
        if result.startswith("[icat failed"):
            return result

        size = os.path.getsize(local_path) if os.path.exists(local_path) else 0
-        category = _auto_categorize(os.path.basename(orig_path))
+        src_type = (
+            graph.active_source.type if graph.active_source else "disk_image"
+        )
+        category = _auto_categorize(os.path.basename(orig_path), src_type)

        # Register
        if graph is not None:
@@ -275,7 +426,7 @@ def register_all_tools(
            },
            "required": ["inode"],
        },
-        executor=lambda inode: tsk.find_file(image_path, inode, partition_offset),
+        executor=lambda inode: tsk.find_file(_img(), inode, _off()),
        module="sleuthkit",
        tags=["filesystem"],
    )
@@ -290,7 +441,7 @@ def register_all_tools(
            },
            "required": ["pattern"],
        },
-        executor=lambda pattern: tsk.search_strings(image_path, pattern),
+        executor=lambda pattern: tsk.search_strings(_img(), pattern),
        module="sleuthkit",
        tags=["filesystem", "search", "strings"],
    )
@@ -299,7 +450,7 @@ def register_all_tools(
        name="count_deleted_files",
        description="List and count all deleted files. Shows total count, executables, and extension breakdown.",
        input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.count_deleted_files(image_path, partition_offset),
+        executor=lambda: tsk.count_deleted_files(_img(), _off()),
        module="sleuthkit",
        tags=["filesystem", "deleted", "recovery"],
    )
@@ -308,7 +459,7 @@ def register_all_tools(
        name="build_filesystem_timeline",
        description="Build a MAC timeline from the filesystem (Modified/Accessed/Changed times for all files).",
        input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.build_timeline(image_path, partition_offset),
+        executor=lambda: tsk.build_timeline(_img(), _off()),
        module="sleuthkit",
        tags=["filesystem", "timeline"],
    )
@@ -341,8 +492,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("list_installed_software", "registry",
-                                   lambda hive_path: reg.list_installed_software(hive_path), graph),
+        executor=lambda hive_path: reg.list_installed_software(hive_path),
        module="registry",
        tags=["registry", "software", "installed"],
    )
@@ -390,8 +540,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("get_system_info", "registry",
-                                   lambda hive_path: reg.get_system_info(hive_path), graph),
+        executor=lambda hive_path: reg.get_system_info(hive_path),
        module="registry",
        tags=["registry", "system"],
    )
@@ -406,8 +555,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("get_timezone_info", "registry",
-                                   lambda hive_path: reg.get_timezone_info(hive_path), graph),
+        executor=lambda hive_path: reg.get_timezone_info(hive_path),
        module="registry",
        tags=["registry", "timezone", "system"],
    )
@@ -422,8 +570,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("get_computer_name", "registry",
-                                   lambda hive_path: reg.get_computer_name(hive_path), graph),
+        executor=lambda hive_path: reg.get_computer_name(hive_path),
        module="registry",
        tags=["registry", "system", "hostname"],
    )
@@ -438,8 +585,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("get_shutdown_time", "registry",
-                                   lambda hive_path: reg.get_shutdown_time(hive_path), graph),
+        executor=lambda hive_path: reg.get_shutdown_time(hive_path),
        module="registry",
        tags=["registry", "system", "shutdown"],
    )
@@ -454,8 +600,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("enumerate_users", "registry",
-                                   lambda hive_path: reg.enumerate_users(hive_path), graph),
+        executor=lambda hive_path: reg.enumerate_users(hive_path),
        module="registry",
        tags=["registry", "user", "accounts", "sam"],
    )
@@ -470,8 +615,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("get_network_interfaces", "registry",
-                                   lambda hive_path: reg.get_network_interfaces(hive_path), graph),
+        executor=lambda hive_path: reg.get_network_interfaces(hive_path),
        module="registry",
        tags=["registry", "network", "adapter", "ip"],
    )
@@ -486,8 +630,7 @@ def register_all_tools(
            },
            "required": ["hive_path"],
        },
-        executor=_make_auto_record("get_email_config", "registry",
-                                   lambda hive_path: reg.get_email_config(hive_path), graph),
+        executor=lambda hive_path: reg.get_email_config(hive_path),
        module="registry",
        tags=["registry", "email", "account"],
    )
@@ -504,8 +647,7 @@ def register_all_tools(
            },
            "required": ["file_path"],
        },
-        executor=_make_auto_record("parse_prefetch", "filesystem",
-                                   lambda file_path: parsers.parse_prefetch(file_path), graph),
+        executor=lambda file_path: parsers.parse_prefetch(file_path),
        module="parsers",
        tags=["filesystem", "prefetch", "execution"],
    )
@@ -577,7 +719,13 @@ def register_all_tools(

    TOOL_CATALOG["list_extracted_dir"] = ToolDefinition(
        name="list_extracted_dir",
-        description="List files in an extracted directory with sizes.",
+        description=(
+            "Summarise an extracted directory tree: total counts, "
+            "extension breakdown, top-level layout, largest files. "
+            "Scales to 10k+-file trees without truncating into uselessness. "
+            "For targeted searches (find every *.plist, locate sms.db, ...) "
+            "use find_files instead."
+        ),
        input_schema={
            "type": "object",
            "properties": {
@@ -590,6 +738,31 @@ def register_all_tools(
        tags=["filesystem", "listing", "extracted"],
    )

+    TOOL_CATALOG["find_files"] = ToolDefinition(
+        name="find_files",
+        description=(
+            "Recursively find files under a directory by glob pattern. "
+            "Use this on tree-mode sources (iOS extractions, archives, "
+            "Android-mounted partitions) to locate specific artefacts in "
+            "huge trees. Patterns are fnmatch-style; '**' means 'any "
+            "depth'. Examples: '**/sms.db', '**/keychain-2.db', "
+            "'**/ChatStorage.sqlite', '**/*.plist', 'HomeDomain/Library/**'. "
+            "Results sort by size descending; capped at max_results."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "root": {"type": "string", "description": "Directory to search under."},
+                "pattern": {"type": "string", "description": "fnmatch glob pattern (use '**' for any depth)."},
+                "max_results": {"type": "integer", "description": "Result cap (default 500)."},
+            },
+            "required": ["root", "pattern"],
+        },
+        executor=lambda root, pattern, max_results=500: parsers.find_files(root, pattern, max_results),
+        module="parsers",
+        tags=["filesystem", "search", "extracted", "glob"],
+    )
+
    TOOL_CATALOG["parse_pcap_strings"] = ToolDefinition(
        name="parse_pcap_strings",
        description="Extract HTTP headers, hosts, User-Agent, cookies, and URLs from a PCAP/capture file.",
@@ -605,11 +778,224 @@ def register_all_tools(
        tags=["network", "pcap", "http", "capture"],
    )

-    # ---- Apply result caching to deterministic read-only tools ----
-    # Must come AFTER all tools are registered. Auto-record wrapped tools
-    # (e.g. get_system_info) are NOT in CACHEABLE_TOOLS since they write
-    # to the evidence graph as a side effect.
+    # ---- Archive tools (tree-mode prep) ----
+
+    TOOL_CATALOG["unzip_archive"] = ToolDefinition(
+        name="unzip_archive",
+        description=(
+            "Extract a .zip archive into a target directory. Defensive against "
+            "zip-slip; skips symlinks. Idempotent on rerun. Pass `password` for "
+            "password-protected zips — only the legacy ZipCrypto algorithm is "
+            "supported by stdlib (AES zips need an external `7z x` step)."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "zip_path": {"type": "string", "description": "Path to the .zip file."},
+                "dest_dir": {"type": "string", "description": "Directory to extract into (created if missing)."},
+                "password": {"type": "string", "description": "Password for encrypted zips (omit for plain archives)."},
+            },
+            "required": ["zip_path", "dest_dir"],
+        },
+        executor=lambda zip_path, dest_dir, password=None: arc.unzip_archive(zip_path, dest_dir, password),
+        module="archive",
+        tags=["archive", "zip", "extract", "ingest"],
+    )
+
+    # ---- iOS plugin tools (DESIGN.md §4.7) ----
+
+    TOOL_CATALOG["parse_plist"] = ToolDefinition(
+        name="parse_plist",
+        description=(
+            "Parse a .plist file (XML or binary) and return its contents as JSON. "
+            "Bytes are rendered as hex; dates as ISO-8601."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "file_path": {"type": "string", "description": "Path to .plist file."},
+            },
+            "required": ["file_path"],
+        },
+        executor=lambda file_path: ios.parse_plist(file_path),
+        module="mobile_ios",
+        tags=["ios", "plist", "parse"],
+    )
+
+    TOOL_CATALOG["sqlite_tables"] = ToolDefinition(
+        name="sqlite_tables",
+        description=(
+            "List user tables in a sqlite database with row counts and column "
+            "names. Use this to scout an unfamiliar .sqlite / .db file before "
+            "querying it."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "db_path": {"type": "string", "description": "Path to .sqlite/.db file."},
+            },
+            "required": ["db_path"],
+        },
+        executor=lambda db_path: ios.sqlite_tables(db_path),
+        module="mobile_ios",
+        tags=["sqlite", "schema", "ios", "android"],
+    )
+
+    TOOL_CATALOG["sqlite_query"] = ToolDefinition(
+        name="sqlite_query",
+        description=(
+            "Run a single read-only SELECT against a sqlite file. "
+            "Multi-statement queries and non-SELECT statements are rejected. "
+            "Use this for sms.db / ChatStorage.sqlite / AddressBook.sqlitedb / etc."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "db_path": {"type": "string", "description": "Path to .sqlite/.db file."},
+                "query": {"type": "string", "description": "A single SELECT statement."},
+                "max_rows": {"type": "integer", "description": "Row cap (default 100)."},
+            },
+            "required": ["db_path", "query"],
+        },
+        executor=lambda db_path, query, max_rows=100: ios.sqlite_query(db_path, query, max_rows),
+        module="mobile_ios",
+        tags=["sqlite", "query", "ios", "android"],
+    )
+
+    TOOL_CATALOG["parse_ios_keychain"] = ToolDefinition(
+        name="parse_ios_keychain",
+        description=(
+            "Locate and summarise iOS keychain entries (keychain-2.db). "
+            "Pass either the db file directly or the containing directory; "
+            "dumps accounting metadata from genp/inet/cert/keys tables."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "keychain_root": {
+                    "type": "string",
+                    "description": "Path to keychain-2.db or a directory that contains it.",
+                },
+            },
+            "required": ["keychain_root"],
+        },
+        executor=lambda keychain_root: ios.parse_ios_keychain(keychain_root),
+        module="mobile_ios",
+        tags=["ios", "keychain", "credentials"],
+    )
+
+    TOOL_CATALOG["read_idevice_info"] = ToolDefinition(
+        name="read_idevice_info",
+        description=(
+            "Read the iDevice_info.txt summary at the root of an iOS extraction. "
+            "Pass the file path or the extraction root directory."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "file_path": {"type": "string", "description": "Path to iDevice_info.txt or extraction root."},
+            },
+            "required": ["file_path"],
+        },
+        executor=lambda file_path: ios.read_idevice_info(file_path),
+        module="mobile_ios",
+        tags=["ios", "device", "metadata"],
+    )
+
+    # ---- Android plugin (DESIGN.md §4.7) ----
+
+    TOOL_CATALOG["probe_android_partitions"] = ToolDefinition(
+        name="probe_android_partitions",
+        description=(
+            "Survey every partition on an Android disk dump (mmls + per-"
+            "partition fsstat). Returns a markdown table with name, native "
+            "and 512-byte sector offsets, filesystem type, and a strategy "
+            "hint per partition. Use this BEFORE deciding which partitions "
+            "to dive into via set_active_partition + list_directory."
+        ),
+        input_schema={"type": "object", "properties": {}},
+        executor=lambda: android.probe_android_partitions(_img()),
+        module="mobile_android",
+        tags=["android", "partition", "survey"],
+    )
+
+    async def _set_active_partition(partition_offset: int) -> str:
+        src = getattr(graph, "active_source", None)
+        if src is None:
+            return "Error: no active evidence source."
+        old = src.partition_offset
+        new = int(partition_offset)
+        src.partition_offset = new
+        # Sync the legacy mirror field so older readers stay consistent.
+        graph.partition_offset = new
+        return (
+            f"Active partition offset: {old} → {new} (512-byte sectors). "
+            f"Subsequent list_directory / extract_file / search_strings "
+            f"calls now target this partition on {src.id} ({src.label})."
+        )
+
+    TOOL_CATALOG["set_active_partition"] = ToolDefinition(
+        name="set_active_partition",
+        description=(
+            "Switch the current partition offset (in 512-byte sectors) on "
+            "the active disk-image source. Use the values from "
+            "probe_android_partitions's '512-sector' column. NOT a "
+            "forensic read — purely repoints the TSK toolset. Mutates "
+            "shared state; call serially within one agent run."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "partition_offset": {
+                    "type": "integer",
+                    "description": "Partition start in 512-byte sectors.",
+                },
+            },
+            "required": ["partition_offset"],
+        },
+        executor=_set_active_partition,
+        module="android",
+        tags=["android", "partition", "navigation"],
+    )
+
+    # ---- Media plugin (DESIGN.md §4.7) ----
+
+    TOOL_CATALOG["ocr_image"] = ToolDefinition(
+        name="ocr_image",
+        description=(
+            "Extract text from an image via tesseract. The LLM backend has "
+            "no vision, so this is the only way to read JPEG/PNG evidence "
+            "(screenshots of chats, transactions, IDs). Default lang covers "
+            "English + Simplified & Traditional Chinese; override `lang` "
+            "if you know the artefact's language. Returns 'Error: OCR "
+            "runtime not available' with an install hint when tesseract "
+            "isn't on the host — record that absence as a negative "
+            "finding rather than guessing."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "file_path": {"type": "string", "description": "Path to image file."},
+                "lang": {"type": "string", "description": "Tesseract language code(s), e.g. 'eng' or 'eng+chi_sim'."},
+            },
+            "required": ["file_path"],
+        },
+        executor=lambda file_path, lang="eng+chi_sim+chi_tra": med.ocr_image(file_path, lang),
+        module="media",
+        tags=["media", "ocr", "image"],
+    )
+
+    # ---- Wrap every executor with invocation logging (+ cache + auto-record) ----
+    # Must run AFTER all tools are registered. Every tool call now produces
+    # a ToolInvocation entry on the graph (provenance for grounding), and
+    # returns the result prefixed with ``[invocation: inv-xxx]`` so the LLM
+    # can cite the call in add_phenomenon facts.
    _tool_result_cache.clear()
    for tool_name, td in TOOL_CATALOG.items():
-        if tool_name in CACHEABLE_TOOLS:
-            td.executor = _make_cached(tool_name, td.executor)
+        td.executor = _make_invocation_executor(
+            tool_name,
+            td.executor,
+            graph,
+            cacheable=(tool_name in CACHEABLE_TOOLS),
+            auto_record_category=AUTO_RECORD_TOOLS.get(tool_name),
+        )
--- a/tools/archive.py
+++ b/tools/archive.py
@@ -0,0 +1,156 @@
+"""Archive extraction tools — generic unzip for tree-mode evidence sources.
+
+Mobile extractions (iOS / Android backups), archive sources, and shared
+work products all arrive as .zip files. The forensic agents work on the
+unpacked tree; this module is the single entry point for safely turning
+an archive into a directory.
+
+Stdlib-only. No graph dependency.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import zipfile
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def _is_within(base: Path, target: Path) -> bool:
+    """True when *target* resolves to a path inside *base* — symlink-safe."""
+    try:
+        base_r = base.resolve()
+        target_r = target.resolve()
+    except OSError:
+        return False
+    try:
+        target_r.relative_to(base_r)
+    except ValueError:
+        return False
+    return True
+
+
+def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool:
+    """True when any entry has the zip 'encrypted' flag bit set."""
+    return any(info.flag_bits & 0x1 for info in zf.infolist())
+
+
+def _do_extract(
+    zip_path: str,
+    dest_dir: str,
+    password: str | None = None,
+) -> str:
+    """Shared core for unzip_archive (async) and unzip_archive_sync.
+
+    Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files
+    whose target already exists at the matching size are skipped). Returns
+    a multi-line summary the agent can read directly.
+    """
+    zp = Path(zip_path)
+    if not zp.is_file():
+        return f"Error: {zip_path} is not a file."
+
+    dest = Path(dest_dir)
+    dest.mkdir(parents=True, exist_ok=True)
+
+    extracted = 0
+    skipped: list[str] = []
+    total_bytes = 0
+    pwd_bytes = password.encode("utf-8") if password else None
+
+    try:
+        with zipfile.ZipFile(zp, "r") as zf:
+            encrypted = _is_zip_encrypted(zf)
+            if encrypted and pwd_bytes is None:
+                return (
+                    f"Error: {zip_path} is password-protected. "
+                    f"Provide the password via case.yaml's "
+                    f"meta.password on this source, or pass `password=` "
+                    f"explicitly. Stdlib zipfile only supports the legacy "
+                    f"ZipCrypto algorithm — AES-encrypted zips (created by "
+                    f"7-Zip / WinZip) need an external tool like 7z."
+                )
+            for info in zf.infolist():
+                name = info.filename
+                # Block absolute paths and parent-escape attempts up front.
+                if name.startswith(("/", "\\")) or ".." in Path(name).parts:
+                    skipped.append(f"escape: {name}")
+                    continue
+                target = dest / name
+                if not _is_within(dest, target):
+                    skipped.append(f"escape: {name}")
+                    continue
+                # Symlink entries — skip rather than risk traversing out.
+                if info.external_attr >> 16 & 0o120000 == 0o120000:
+                    skipped.append(f"symlink: {name}")
+                    continue
+                if info.is_dir():
+                    target.mkdir(parents=True, exist_ok=True)
+                    continue
+                # Skip if already extracted with matching size (idempotent rerun).
+                if target.exists() and target.stat().st_size == info.file_size:
+                    continue
+                target.parent.mkdir(parents=True, exist_ok=True)
+                try:
+                    with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out:
+                        while True:
+                            chunk = src.read(65536)
+                            if not chunk:
+                                break
+                            out.write(chunk)
+                except RuntimeError as e:
+                    # zipfile raises RuntimeError for bad-password / AES-encrypted.
+                    msg = str(e)
+                    if "Bad password" in msg or "password required" in msg:
+                        return (
+                            f"Error: bad or missing password for {zip_path}. "
+                            f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib "
+                            f"cannot decrypt it — use `7z x -p<pwd> ...` "
+                            f"externally and point the source path at the result."
+                        )
+                    raise
+                extracted += 1
+                total_bytes += info.file_size
+    except zipfile.BadZipFile as e:
+        return f"Error: {zip_path} is not a valid zip archive: {e}"
+    except Exception as e:
+        return f"Error extracting {zip_path}: {e}"
+
+    parts = [
+        f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}",
+    ]
+    if skipped:
+        parts.append(f"Skipped {len(skipped)} unsafe entries:")
+        for s in skipped[:10]:
+            parts.append(f"  - {s}")
+        if len(skipped) > 10:
+            parts.append(f"  ... ({len(skipped) - 10} more)")
+    return "\n".join(parts)
+
+
+async def unzip_archive(
+    zip_path: str, dest_dir: str, password: str | None = None,
+) -> str:
+    """Extract *zip_path* into *dest_dir*. Idempotent on rerun.
+
+    Defensive: rejects entries with absolute paths, leading '..', or that
+    would resolve outside *dest_dir* (the classic zip-slip vector). Symlink
+    entries are skipped (we never follow symlinks into the host filesystem).
+    Password-protected zips need the password argument (or
+    ``meta.password`` on the source in case.yaml) — stdlib ``zipfile``
+    only handles the legacy ZipCrypto algorithm.
+    """
+    return _do_extract(zip_path, dest_dir, password)
+
+
+def unzip_archive_sync(
+    zip_path: str, dest_dir: str, password: str | None = None,
+) -> str:
+    """Synchronous variant of :func:`unzip_archive` for startup-time prepare_source.
+
+    Same behaviour, just no async wrapping — used before the event loop
+    starts so we don't have to spin one up just to unpack a zip.
+    """
+    return _do_extract(zip_path, dest_dir, password)
--- a/tools/media.py
+++ b/tools/media.py
@@ -0,0 +1,87 @@
+"""Media plugin — OCR for image evidence.
+
+DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run
+OCR locally for any image-bearing evidence. Tesseract via pytesseract is
+the default; if the runtime is missing those packages, the tool returns a
+clear install hint rather than failing silently.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT = 8000
+
+_INSTALL_HINT = (
+    "Error: OCR runtime not available. Install with:\n"
+    "  pip install pytesseract pillow\n"
+    "  sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n"
+    "(or the equivalent for your distribution). Then retry."
+)
+
+
+def _has_ocr_runtime() -> tuple[bool, str]:
+    """Return (available, reason). reason is empty when available."""
+    try:
+        import pytesseract  # noqa: F401
+        from PIL import Image  # noqa: F401
+    except ImportError as e:
+        return False, f"missing python package: {e.name}"
+    # Check the tesseract binary too.
+    import shutil
+    if shutil.which("tesseract") is None:
+        return False, "tesseract binary not on PATH"
+    return True, ""
+
+
+async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str:
+    """Extract text from an image via tesseract.
+
+    *lang* defaults to English + Simplified + Traditional Chinese, matching
+    the multi-language artefacts the current case involves. Pass a single
+    language code (e.g. ``"eng"``) to skip language packs that aren't
+    installed.
+    """
+    p = Path(file_path)
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    available, reason = _has_ocr_runtime()
+    if not available:
+        return f"{_INSTALL_HINT}\n[detail: {reason}]"
+
+    import pytesseract
+    from PIL import Image
+
+    try:
+        img = Image.open(p)
+    except Exception as e:
+        return f"Error: could not open image {file_path}: {e}"
+
+    try:
+        text = pytesseract.image_to_string(img, lang=lang)
+    except pytesseract.TesseractError as e:
+        msg = str(e)
+        if "Failed loading language" in msg or "Error opening data file" in msg:
+            return (
+                f"Error: tesseract is installed but missing language pack(s) for {lang!r}. "
+                f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a "
+                f"different `lang`. Detail: {msg}"
+            )
+        return f"Error running tesseract: {msg}"
+    except Exception as e:
+        return f"Error during OCR: {e}"
+
+    size = p.stat().st_size
+    header = (
+        f"ocr: {file_path} ({size} bytes, lang={lang}, "
+        f"{len(text.splitlines())} line(s))\n"
+    )
+    if len(text) > MAX_OUTPUT - len(header):
+        body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]"
+    else:
+        body = text
+    return header + body
--- a/tools/mobile_android.py
+++ b/tools/mobile_android.py
@@ -0,0 +1,160 @@
+"""Android plugin tools — partition survey + sector translation.
+
+DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source;
+``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK
+toolchain already handles ext4/F2FS reads, so once the agent picks a partition
+offset the standard list_directory / extract_file / search_strings tools work.
+
+Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but
+TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions``
+emits BOTH unit systems so the agent can plug the right ``partition_offset``
+value into ``set_active_partition``.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import re
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT = 8000
+
+# Partitions worth flagging when we encounter them — informs the agent's
+# strategy. Not exhaustive; just opinionated hints.
+_PARTITION_HINTS: dict[str, str] = {
+    "EFS":      "modem firmware area; often contains IMEI / MAC / serial",
+    "PARAM":    "boot parameters; cmdline + flags",
+    "BOOT":     "kernel + initramfs (raw image)",
+    "RECOVERY": "recovery image (raw)",
+    "SYSTEM":   "Android /system — read-only OS partition (ext4)",
+    "CACHE":    "downloaded OTA payloads; usually transient",
+    "USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices",
+    "PERSISTENT": "Samsung persistent partition; carrier/device flags",
+    "STEADY":   "Samsung steady-state config",
+    "HIDDEN":   "Samsung hidden partition; check before assuming empty",
+    "CP_DEBUG": "modem debug logs",
+    "TOMBSTONES": "userland crash dumps",
+}
+
+
+def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]:
+    """Parse mmls output, returning (sector_size_bytes, partitions).
+
+    mmls states ``Units are in N-byte sectors`` near the top; we extract N
+    to translate between image-native units and the 512-byte units TSK
+    tools accept via ``-o``.
+    """
+    sector_size = 512
+    m = re.search(r"Units are in (\d+)-byte sectors", output)
+    if m:
+        sector_size = int(m.group(1))
+
+    parts: list[dict] = []
+    for line in output.splitlines():
+        m = re.match(
+            r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)",
+            line,
+        )
+        if not m:
+            continue
+        _row, slot, start, end, length, desc = m.groups()
+        if slot == "Meta" or slot.startswith("---"):
+            continue
+        parts.append({
+            "slot": slot,
+            "start_native": int(start),
+            "end_native": int(end),
+            "length_native": int(length),
+            "description": desc.strip(),
+        })
+    return sector_size, parts
+
+
+async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]:
+    proc = await asyncio.create_subprocess_exec(
+        *cmd,
+        stdout=asyncio.subprocess.PIPE,
+        stderr=asyncio.subprocess.PIPE,
+    )
+    try:
+        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+    except asyncio.TimeoutError:
+        proc.kill()
+        return 124, "", f"timeout after {timeout}s"
+    return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace")
+
+
+_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE)
+
+
+async def _classify_partition(image_path: str, sector_offset_512: int) -> str:
+    """Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'.
+
+    fsstat's "Cannot determine file system type" is treated as 'unknown' —
+    typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data
+    (modern userdata under FBE).
+    """
+    rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15)
+    if rc != 0:
+        return "unknown"
+    m = _FS_TYPE_RE.search(out)
+    if m:
+        return m.group(1)
+    return "unknown"
+
+
+async def probe_android_partitions(image_path: str) -> str:
+    """Survey every partition on an Android disk dump and return a table.
+
+    The agent reads this once to plan its work: which partitions are
+    Ext4/F2FS (use TSK), which are raw (extract image / strings only),
+    which are encrypted (skip until decrypted).
+    """
+    p = Path(image_path)
+    if not p.is_file():
+        return f"Error: {image_path} is not a file."
+
+    rc, out, err = await _run(["mmls", str(p)], timeout=30)
+    if rc != 0:
+        return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}"
+
+    sector_size, parts = _parse_mmls_with_unit(out)
+    if not parts:
+        return f"No partitions detected in {image_path}."
+
+    lines = [
+        f"Android partition survey: {image_path}",
+        f"  mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)",
+        f"  {len(parts)} data partitions",
+        "",
+        "| slot | name | start (native) | start (512-sector) | size | fs_type | hint |",
+        "|---|---|---:|---:|---|---|---|",
+    ]
+    for prt in parts:
+        sector_512 = prt["start_native"] * sector_size // 512
+        bytes_size = prt["length_native"] * sector_size
+        # human-readable size
+        if bytes_size >= 1 << 30:
+            size_h = f"{bytes_size / (1 << 30):.1f} GB"
+        elif bytes_size >= 1 << 20:
+            size_h = f"{bytes_size / (1 << 20):.1f} MB"
+        else:
+            size_h = f"{bytes_size // 1024} KB"
+        fs_type = await _classify_partition(str(p), sector_512)
+        # Try to extract a friendly partition name from the description
+        # (mmls description often includes the partition name uppercase).
+        name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"])
+        pname = name_match.group(0) if name_match else prt["description"][:20]
+        hint = _PARTITION_HINTS.get(pname, "")
+        lines.append(
+            f"| {prt['slot']} | {pname} | {prt['start_native']} | "
+            f"{sector_512} | {size_h} | {fs_type} | {hint} |"
+        )
+
+    body = "\n".join(lines)
+    if len(body) > MAX_OUTPUT:
+        body = body[:MAX_OUTPUT] + "\n\n[truncated]"
+    return body
--- a/tools/mobile_ios.py
+++ b/tools/mobile_ios.py
@@ -0,0 +1,274 @@
+"""iOS extraction parsers — plist / sqlite / keychain / iDevice info.
+
+DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth
+Kit, no graph dependency. Stdlib + sqlite3 only.
+
+iOS extractions typically arrive as a zip containing domain-rooted trees
+(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary,
+binary/XML plists, and several SQLite databases (sms.db, AddressBook,
+keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite).
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import os
+import plistlib
+import re
+import sqlite3
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Output cap (chars) — keeps a single tool result under the LLM context budget.
+MAX_OUTPUT = 8000
+
+
+def _trunc(text: str, limit: int = MAX_OUTPUT) -> str:
+    if len(text) <= limit:
+        return text
+    return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]"
+
+
+# ---------------------------------------------------------------------------
+# plist
+# ---------------------------------------------------------------------------
+
+def _to_jsonable(obj):
+    """Make plist values JSON-serializable: bytes → hex preview, dates → iso."""
+    import datetime
+    if isinstance(obj, bytes):
+        if len(obj) <= 64:
+            return {"_bytes_hex": obj.hex()}
+        return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)}
+    if isinstance(obj, datetime.datetime):
+        return obj.isoformat()
+    if isinstance(obj, dict):
+        return {str(k): _to_jsonable(v) for k, v in obj.items()}
+    if isinstance(obj, (list, tuple)):
+        return [_to_jsonable(v) for v in obj]
+    return obj
+
+
+async def parse_plist(file_path: str) -> str:
+    """Parse a .plist file (XML or binary) and return its contents as JSON.
+
+    Both formats are handled transparently by ``plistlib.load``.
+    """
+    p = Path(file_path)
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    try:
+        with open(p, "rb") as f:
+            data = plistlib.load(f)
+    except plistlib.InvalidFileException as e:
+        return f"Error: {file_path} is not a valid plist ({e})"
+    except Exception as e:
+        return f"Error parsing plist {file_path}: {e}"
+
+    serial = _to_jsonable(data)
+    rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str)
+    header = f"plist: {file_path} ({p.stat().st_size} bytes)\n"
+    return header + _trunc(rendered)
+
+
+# ---------------------------------------------------------------------------
+# sqlite
+# ---------------------------------------------------------------------------
+
+_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE)
+
+
+async def sqlite_tables(db_path: str) -> str:
+    """List user tables in a sqlite file with row counts and column names."""
+    p = Path(db_path)
+    if not p.is_file():
+        return f"Error: {db_path} is not a file."
+    try:
+        conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db_path} (read-only): {e}"
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT name FROM sqlite_master "
+            "WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
+        )
+        tables = [r[0] for r in cur.fetchall()]
+        if not tables:
+            return f"No user tables in {db_path}."
+        lines = [f"sqlite: {db_path} ({len(tables)} tables)"]
+        for name in tables:
+            try:
+                cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
+                count = cur.fetchone()[0]
+            except sqlite3.DatabaseError as e:
+                count = f"(count failed: {e})"
+            try:
+                cur.execute(f"PRAGMA table_info(\"{name}\")")
+                cols = [r[1] for r in cur.fetchall()]
+            except sqlite3.DatabaseError:
+                cols = []
+            lines.append(f"  {name}: {count} row(s); cols: {', '.join(cols)}")
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+async def sqlite_query(
+    db_path: str,
+    query: str,
+    max_rows: int = 100,
+) -> str:
+    """Run a single read-only SELECT against a sqlite file.
+
+    Multi-statement queries and anything other than a SELECT are rejected
+    (we open the database in read-only mode anyway, so writes would fail
+    too — but the explicit check keeps the agent honest).
+    """
+    if not _SELECT_RE.match(query):
+        return "Error: only single SELECT statements are allowed."
+    if ";" in query.rstrip(";"):
+        return "Error: multi-statement queries are not allowed."
+
+    p = Path(db_path)
+    if not p.is_file():
+        return f"Error: {db_path} is not a file."
+    try:
+        conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db_path} (read-only): {e}"
+
+    try:
+        cur = conn.cursor()
+        try:
+            cur.execute(query)
+        except sqlite3.DatabaseError as e:
+            return f"Error executing query: {e}"
+        cols = [d[0] for d in cur.description] if cur.description else []
+        rows = cur.fetchmany(max(1, int(max_rows)))
+        lines = [
+            f"sqlite query: {db_path}",
+            f"columns: {cols}",
+            f"rows ({len(rows)}, capped at {max_rows}):",
+        ]
+        for row in rows:
+            rendered = [
+                (v.hex() if isinstance(v, bytes) else str(v))
+                for v in row
+            ]
+            lines.append("  " + " | ".join(rendered))
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+# ---------------------------------------------------------------------------
+# iOS keychain (keychain-2.db)
+# ---------------------------------------------------------------------------
+
+# Standard iOS keychain tables. genp = generic passwords, inet = internet
+# passwords, cert = certificates, keys = key material. Forensic extractions
+# of locked keychains have ``data`` columns NULL but accounting metadata
+# (agrp, acct, svce) intact — already useful for attribution work.
+_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys")
+
+
+async def parse_ios_keychain(keychain_root: str) -> str:
+    """Locate and summarize iOS keychain entries under *keychain_root*.
+
+    *keychain_root* may be a path to ``keychain-2.db`` directly or to a
+    directory that contains it (e.g. ``.../var/keychains``).
+    """
+    root = Path(keychain_root)
+    db: Path | None = None
+    if root.is_file() and root.name == "keychain-2.db":
+        db = root
+    elif root.is_dir():
+        candidate = root / "keychain-2.db"
+        if candidate.is_file():
+            db = candidate
+        else:
+            # Fall back to a shallow recursive search.
+            for found in root.rglob("keychain-2.db"):
+                db = found
+                break
+    if db is None:
+        return f"No keychain-2.db found under {keychain_root}."
+
+    try:
+        conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db}: {e}"
+
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT name FROM sqlite_master "
+            "WHERE type='table' AND name IN ({})".format(
+                ",".join("?" * len(_KEYCHAIN_TABLES))
+            ),
+            _KEYCHAIN_TABLES,
+        )
+        present = [r[0] for r in cur.fetchall()]
+        if not present:
+            return f"keychain-2.db at {db} has no recognised tables."
+
+        lines = [f"keychain: {db}"]
+        for name in present:
+            cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
+            count = cur.fetchone()[0]
+            lines.append(f"\n[{name}] {count} row(s)")
+            cur.execute(f"PRAGMA table_info(\"{name}\")")
+            cols = [r[1] for r in cur.fetchall()]
+            # Pick a useful subset of accounting columns when present.
+            preferred = [
+                c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr")
+                if c in cols
+            ]
+            if not preferred:
+                preferred = cols[:5]
+            sel = ", ".join(f'"{c}"' for c in preferred)
+            cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30")
+            for row in cur.fetchall():
+                lines.append("  " + " | ".join(
+                    (v.hex() if isinstance(v, bytes) else str(v))
+                    for v in row
+                ))
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+# ---------------------------------------------------------------------------
+# iDevice_info.txt
+# ---------------------------------------------------------------------------
+
+async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str:
+    """Read the standard iDevice_info.txt summary at the root of an iOS extraction.
+
+    The file is a flat ``Key: value`` dump from libimobiledevice / native
+    extraction tools. We surface the first *max_chars* of content verbatim
+    — the agent can search/extract specific keys via search_text_file if
+    the head isn't enough.
+    """
+    p = Path(file_path)
+    if p.is_dir():
+        # Be helpful: if the agent passed the extraction root, find the file.
+        candidate = p / "iDevice_info.txt"
+        if candidate.is_file():
+            p = candidate
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    try:
+        with open(p, "r", encoding="utf-8", errors="replace") as f:
+            content = f.read(max_chars)
+        size = p.stat().st_size
+        header = f"iDevice_info: {p} ({size} bytes)\n"
+        if size > max_chars:
+            content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]"
+        return header + content
+    except Exception as e:
+        return f"Error reading {file_path}: {e}"
--- a/tools/parsers.py
+++ b/tools/parsers.py
@@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str:
        return f"[Error parsing Prefetch: {e}]"


-async def list_extracted_dir(dir_path: str) -> str:
-    """List files in an extracted directory."""
+async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str:
+    """Smart summary of a (potentially huge) extracted tree.
+
+    Earlier versions dumped up to 200 random entries then truncated — that
+    leaves the agent blind on 10k+-file iOS extractions. The new layout
+    returns a compact summary that scales: total counts, extension
+    breakdown, top-level directories with their sizes, and the largest
+    files. For targeted lookups (e.g. find every ``*.sqlite`` under the
+    tree) the agent should use ``find_files`` instead.
+    """
+    if not os.path.isdir(dir_path):
+        return f"[Error: {dir_path} is not a directory]"
+
    try:
-        entries = []
-        for root, dirs, files in os.walk(dir_path):
+        total_files = 0
+        total_bytes = 0
+        ext_counts: dict[str, int] = {}
+        ext_bytes: dict[str, int] = {}
+        top_level_dirs: dict[str, dict] = {}
+        biggest: list[tuple[int, str]] = []   # (size, relpath)
+
+        dir_path_abs = os.path.abspath(dir_path)
+        for root, dirs, files in os.walk(dir_path_abs):
+            # Track top-level directory aggregates (cheap; no per-entry cost
+            # beyond the walk we're already doing).
+            rel_root = os.path.relpath(root, dir_path_abs)
+            if rel_root == ".":
+                top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs}
+                top_level_dirs.update(top_dirs)
+                top_key = None
+            else:
+                top_key = rel_root.split(os.sep, 1)[0]
+                if top_key not in top_level_dirs:
+                    top_level_dirs[top_key] = {"files": 0, "bytes": 0}
+
            for f in files:
                full = os.path.join(root, f)
-                rel = os.path.relpath(full, dir_path)
-                size = os.path.getsize(full)
-                entries.append(f"  {rel} ({size} bytes)")
-            if len(entries) > 200:
-                entries.append(f"  ... (truncated)")
-                break
+                try:
+                    size = os.path.getsize(full)
+                except OSError:
+                    continue
+                total_files += 1
+                total_bytes += size
+                ext = os.path.splitext(f)[1].lower() or "(no ext)"
+                ext_counts[ext] = ext_counts.get(ext, 0) + 1
+                ext_bytes[ext] = ext_bytes.get(ext, 0) + size
+                if top_key is not None:
+                    top_level_dirs[top_key]["files"] += 1
+                    top_level_dirs[top_key]["bytes"] += size
+                # Maintain a top-10 largest list cheaply (bounded insertion).
+                if len(biggest) < 10:
+                    biggest.append((size, os.path.relpath(full, dir_path_abs)))
+                    biggest.sort(reverse=True)
+                elif size > biggest[-1][0]:
+                    biggest[-1] = (size, os.path.relpath(full, dir_path_abs))
+                    biggest.sort(reverse=True)

-        return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries)
+        def _human(n: int) -> str:
+            for unit in ("B", "KB", "MB", "GB"):
+                if n < 1024:
+                    return f"{n:.1f}{unit}" if unit != "B" else f"{n}B"
+                n /= 1024
+            return f"{n:.1f}TB"
+
+        lines = [
+            f"Directory: {dir_path}",
+            f"  Total: {total_files} file(s), {_human(total_bytes)}",
+        ]
+
+        # Top-level directory layout (immediate children, sorted by file count).
+        if top_level_dirs:
+            lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):")
+            sorted_tlds = sorted(
+                top_level_dirs.items(), key=lambda kv: -kv[1]["files"],
+            )[:15]
+            for d, stats in sorted_tlds:
+                lines.append(
+                    f"  {d}/  ({stats['files']} files, {_human(stats['bytes'])})"
+                )
+            if len(top_level_dirs) > 15:
+                lines.append(f"  ... ({len(top_level_dirs) - 15} more top-level dirs)")
+
+        # Extension breakdown.
+        if ext_counts:
+            lines.append(f"\nExtension breakdown (top 15):")
+            for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]:
+                lines.append(
+                    f"  {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}"
+                )
+
+        # Largest files (often the highest-value forensic targets).
+        if biggest:
+            lines.append("\nLargest files:")
+            for size, rel in biggest:
+                lines.append(f"  {rel} ({_human(size)})")
+
+        lines.append(
+            f"\nNext step: call find_files with a pattern like "
+            f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts."
+        )
+
+        return "\n".join(lines)
    except Exception as e:
        return f"[Error listing {dir_path}: {e}]"
+
+
+async def find_files(
+    root: str,
+    pattern: str,
+    max_results: int = 500,
+) -> str:
+    """Recursively find files under *root* whose path matches *pattern*.
+
+    Uses fnmatch-style globs against the *full relative path*; ``**`` is
+    treated as "any number of path segments" (so ``**/*.plist`` finds
+    every plist no matter how deep). Examples:
+
+      - ``**/sms.db``               — iOS SMS database
+      - ``**/keychain-2.db``        — iOS keychain
+      - ``**/ChatStorage.sqlite``   — WhatsApp app store
+      - ``HomeDomain/Library/**``   — anchor at a known iOS domain root
+      - ``**/*.{plist,sqlite,db}``  — multi-extension (use 2+ calls or a regex if needed)
+
+    Results are sorted by size descending — the biggest hits usually
+    matter most. Capped at *max_results* to keep the LLM context bounded.
+    """
+    import fnmatch
+
+    if not os.path.isdir(root):
+        return f"[Error: {root} is not a directory]"
+
+    root_abs = os.path.abspath(root)
+    # Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /).
+    # fnmatch doesn't natively distinguish segment vs path; expanding ``**``
+    # to ``*`` and letting fnmatch match the full relpath is good enough for
+    # forensic lookups.
+    fn_pattern = pattern.replace("**", "*")
+
+    hits: list[tuple[int, str]] = []
+    truncated = False
+    try:
+        for dirpath, _dirs, files in os.walk(root_abs):
+            for f in files:
+                full = os.path.join(dirpath, f)
+                rel = os.path.relpath(full, root_abs)
+                if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern):
+                    try:
+                        size = os.path.getsize(full)
+                    except OSError:
+                        size = 0
+                    hits.append((size, rel))
+                    if len(hits) >= max_results * 4:
+                        # Hard upper bound to keep the walk cheap on huge trees.
+                        truncated = True
+                        break
+            if truncated:
+                break
+    except Exception as e:
+        return f"[Error searching {root}: {e}]"
+
+    hits.sort(reverse=True)
+    if len(hits) > max_results:
+        truncated = True
+        hits = hits[:max_results]
+
+    lines = [
+        f"find_files: pattern={pattern!r} under {root}",
+        f"  matches: {len(hits)}" + (" (truncated)" if truncated else ""),
+    ]
+    if not hits:
+        lines.append("  (no matches)")
+    else:
+        for size, rel in hits:
+            lines.append(f"  {rel} ({size} bytes)")
+    return "\n".join(lines)
--- a/uv.lock
+++ b/uv.lock
@@ -170,6 +170,8 @@ source = { virtual = "." }
 dependencies = [
    { name = "httpx", extra = ["socks"] },
    { name = "openai" },
+    { name = "pillow" },
+    { name = "pytesseract" },
    { name = "pyyaml" },
    { name = "regipy" },
 ]
@@ -184,6 +186,8 @@ dev = [
 requires-dist = [
    { name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
    { name = "openai", specifier = ">=2.36.0" },
+    { name = "pillow", specifier = ">=12.2.0" },
+    { name = "pytesseract", specifier = ">=0.3.13" },
    { name = "pyyaml" },
    { name = "regipy", specifier = ">=6.2.1" },
 ]
@@ -222,6 +226,39 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
 ]

+[[package]]
+name = "pillow"
+version = "12.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
+    { url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
+    { url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
+    { url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
+    { url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
+    { url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
+    { url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
+    { url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
+    { url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
+    { url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
+    { url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
+    { url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
+]
+
 [[package]]
 name = "pluggy"
 version = "1.6.0"
@@ -296,6 +333,19 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
 ]

+[[package]]
+name = "pytesseract"
+version = "0.3.13"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "packaging" },
+    { name = "pillow" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" },
+]
+
 [[package]]
 name = "pytest"
 version = "9.0.2"