diff --git a/DESIGN.md b/DESIGN.md
new file mode 100644
index 0000000..731c450
--- /dev/null
+++ b/DESIGN.md
@@ -0,0 +1,305 @@
+# MASForensics 系统改造设计
+
+> 目标：把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、
+> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档
+> （已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿）。
+>
+> 触发本次改造的实际案件：2025 美亚杯资格赛 Individual —— 5 份证据
+> （1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图），
+> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。
+
+---
+
+## 1. 设计原则（贯穿全文的不变式）
+
+1. **LLM 提议，代码裁决**。LLM 负责语言/分类/感知；它**不持有案件状态、
+   不产出数值、不写入未经核验的事实**。所有「真相」在符号层。
+2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。
+3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中；
+   支持一种新设备 = 写插件，绝不改核心。
+4. **看似不可逆的操作（如实体归并）实为可逆、带证据的论断**，可被推翻。
+
+这四条不是口号——下文每个设计决策都对应其中一条。
+
+---
+
+## 2. 现状问题诊断
+
+| # | 问题 | 位置 | 后果 |
+|---|---|---|---|
+| P1 | **单镜像假设深植**：工具是闭包绑死 `image_path`，图是单源，主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools`、`main.py:91-153` | 无法摄取多份证据，无法跨设备关联 |
+| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话，错误事实进入案件记录且**事后无法识别** |
+| P3 | **置信度公式无统计含义且有序依赖缺陷**：`delta=weight*(1-conf)`(正)/`weight*conf`(负)，正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 |
+| P4 | **工件分类是 Windows 专属**：靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` |
+| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 |
+| P6 | **镜像发现靠扩展名 glob**，`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 |
+| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备，跨源关联无锚点 |
+
+改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。
+
+---
+
+## 3. 目标架构
+
+```
+case.yaml ──► Case ──► N × EvidenceSource
+                         ├ id / type / owner / path
+                         └ access_mode: image | tree
+                                 │
+                  ┌──────────────┴───────────────┐
+            image-backed                     tree-backed
+          (TSK, inode 寻址)              (路径寻址：已挂载/已解包)
+                  │                              │
+                  └────────────┬─────────────────┘
+                               ▼
+        SourceRegistry  ── source_id → SourceHandle（解析 path/offset/mode）
+                               │
+        ToolRegistry    ── 工具按 access_mode 注册，调用时绑定 source_id
+                               │
+        ┌──────────────────────┼───────────────────────┐
+        ▼                      ▼                       ▼
+  Knowledge-Source         Graph Write Gateway      ToolInvocationLog
+  Agents (LLM)        ──►  （唯一写入口，强制      （每次工具调用留痕：
+  只能经网关写图           前置条件 = grounding）     args / 输出 / sha256）
+        │                      │
+        └──────────────────────┴──► Grounded Evidence Graph (GEG)
+                                     Phenomenon / Hypothesis / Entity
+                                     置信度 = 对数几率累加
+```
+
+**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、
+`AgentFactory` 动态组合——这些设计是好的，不重写，只适配。
+
+---
+
+## 4. 核心设计
+
+### 4.1 证据源抽象（解决 P1/P5/P6/P7，地基）
+
+新增 `case.py`：
+
+- **`EvidenceSource`** 数据类：`id`、`label`、`type`、`owner`（关联人）、
+  `path`、`access_mode`、`meta`（类型特定，如分区 offset / 解包后根目录）。
+- **`Case`**：持有 `list[EvidenceSource]` + 案件元数据，从 `case.yaml` 加载。
+- **`access_mode` 是关键设计区分**：
+  - `image`：块设备/磁盘镜像，用 TSK 按 inode 寻址（USB E01、安卓 `blk0_sda` 各分区）。
+  - `tree`：已挂载文件系统或已解包目录，按路径寻址（iOS 提取解压后、归档展开后）。
+  - 工具按 access_mode 分族注册（见 4.2）。一份证据可经「准备」从 image 变为 tree
+    （如分区 mount、zip 解包）。
+
+`main.py` 的 `select_image_interactive`（:91-153）改为加载/构造 `Case`；
+`_IMAGE_GLOBS` 改为类型探测（`mmls` 试探 + 文件头嗅探），不再靠扩展名。
+`config.yaml` 删除 `cfreds_hacking_case`，案件信息移入 `case.yaml`。
+
+### 4.2 工具注册按源参数化（解决 P1）
+
+现状：`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具
+（`tool_registry.py:159+`）。改造：
+
+- 工具执行器签名增加 `source_id`；执行时经 `SourceRegistry` 解析出真实 path/offset/mode。
+- `TOOL_CATALOG` 按 `access_mode` 标注工具适用性；agent 拿到的工具集由其
+  负责的源类型决定。
+- **「当前源」上下文**：编排器为 agent 设置 current source（类比现有
+  `graph._current_agent`），工具默认作用于它——LLM 不必每次传 `source_id`
+  （减少出错）。跨源工具（时间线合并、实体查询）显式跨源。
+- 缓存键 `_cache_key`（`tool_registry.py:41`）纳入 `source_id`，防止跨源串味。
+
+### 4.3 图写入网关（解决 P2，落实原则 1）
+
+现状：agent 通过 `add_phenomenon` 等工具直接写图，约束只在 prompt。改造：
+
+- 所有图变更（`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …）
+  收敛到**一个写入网关**。网关在代码层强制前置条件。
+- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流
+  （INVESTIGATE→RECORD→LINK→ANSWER）不变——变的是 RECORD 这一步底下的网关变严。
+- `base_agent.py` 的 `mandatory_record_tools` 机制保留（它保证 agent 真的记录了东西）。
+
+### 4.4 证据落地约束 Grounding（解决 P2，落实原则 2）
+
+这是系统可靠性的核心机制。
+
+**ToolInvocationLog**：每次工具调用留痕一条记录
+`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`。
+现有结果缓存（`tool_registry.py:29`）已存确定性输出，扩展为完整留痕即可。
+
+**Phenomenon 一分为二**——把「事实」和「解读」分开：
+
+- `verified_facts`: `list[{type, value, invocation_id}]`，
+  `type ∈ {path, timestamp, inode, hash, identifier, count, ...}`。
+- `interpretation`: 自由文本，agent 的分析叙述。
+
+**`add_phenomenon` 网关前置条件**：
+
+1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`。
+2. 代码校验 `fact.value` 命中该次调用的输出：
+   - 文本输出 → 逐字 substring 匹配；
+   - 结构化/二进制工具输出 → 与解析后的字段匹配。
+3. 任一 fact 不通过 → **整条拒绝写入**，返回失败的 fact，agent 须修正重试。
+4. 通过 → 写入；`verified_facts` 每条带 `invocation_id`（可重跑复核），
+   `interpretation` 标记为「未核验分析」。
+
+**效果**：在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」
+**结构性地不可能**。LLM 仍可能写错 `interpretation`，但报告会把
+verified facts（带重跑指令的引证）与 interpretation（明确标注的分析）
+**分开渲染**，人类调查员一眼可辨。这是诚实划定边界的可靠性保证。
+
+> 现有 `_make_auto_record`（`tool_registry.py:126`）把工具输出直接转 phenomenon——
+> 那是「平凡落地」的特例（描述即输出），新设计是它的一般化与形式化。
+
+### 4.5 假设置信度：似然比 / 对数几率（解决 P3）
+
+把 `evidence_graph.py:26` 的 `_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」
+换成基于**似然比（LR）**的对数几率累加：
+
+- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类**
+  （这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …），
+  数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**（延续现有「LLM 选类型、
+  代码算数值」哲学并赋予统计基础）。
+- 置信度更新：
+  ```
+  L_post = L_prior + Σ log₁₀(LR_i)        # 对数几率，可交换 → 无序依赖
+  confidence = 1 / (1 + 10^(−L_post))
+  ```
+- 边类型 → `log₁₀(LR)` 标定表（初值，后续可由标注案例校准）：
+
+  | 边类型 | log₁₀LR |
+  |---|---:|
+  | `direct_evidence` | +2.0 |
+  | `supports` / `consequence_observed` | +1.0 |
+  | `prerequisite_met` | +0.5 |
+  | `weakens` | −0.5 |
+  | `contradicts` | −2.0 |
+
+- 阈值不变（≥0.8 supported / ≤0.2 refuted），只是改由 `L_post` 推出。
+- `prior_prob` 成为可配置量（默认 0.5 → `L_prior=0`）。
+- **简化假设说明**：多条边按独立处理（朴素贝叶斯）。同类证据反复出现并非
+  完全独立——加一个旋钮：同 `(hypothesis, edge_type)` 的边数封顶或衰减，避免
+  「同一发现被多 agent 重复入图」虚高置信度（现有 Jaccard 去重已部分缓解）。
+
+附带产出一个 **假设 × 证据矩阵**视图，供报告与线索选择使用。
+
+### 4.6 跨源实体解析（解决「复杂场景」的关联难题，落实原则 4）
+
+复杂取证的核心难题：iPhone keychain 里的 Apple ID、安卓短信库里的号码、
+USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人？**
+
+**关键设计：「身份共指」本身就是一条假设**——于是实体解析不是独立子系统，
+而是 4.5 假设机制的复用：
+
+- agent 观察到标识符即经网关 `observe_identity`，记一条**类型化**的标识符
+  （强标识符：IMEI / 钱包地址 / email / 电话号；弱标识符：昵称 / 显示名），
+  挂到暂定 `Entity`。
+- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`；共享强标识符 = 强 +LR 边，
+  共享弱标识符 = 弱 +LR 边，冲突的强标识符 = 强 −LR 边——用 4.5 同一套计算打分。
+- **不做破坏性归并**：跨阈值时在两个 Entity 间加一条 `same_as` 边（由该 coref
+  假设背书）。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、
+  可被后续 contradicts 证据推翻**（落实原则 4）。
+- **Blocking**：只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设，
+  避免 O(n²)。
+
+跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。
+
+### 4.7 能力插件层（接入 5 类证据）
+
+每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。
+
+| 插件 | 摄取 | 新工具 | 知识源 agent |
+|---|---|---|---|
+| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` |
+| **安卓整盘** | `mmls` 分区→各分区 `image` 源；可 mount 为 `tree` | 复用 TSK；ext4/F2FS 读取；`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` |
+| **磁盘镜像(E01)** | 已支持（TSK 含 ewf） | 现有 TSK 工具链 | 现有 filesystem/registry |
+| **归档** | `unzip_archive` 通用解包 | —— | —— |
+| **媒体/截图** | —— | `ocr_image`（tesseract；注意 DeepSeek 无视觉能力，必须走 OCR） | `MediaAgent` |
+
+**安卓风险**：`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区
+探明：未加密→TSK 直接用；加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。
+
+`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展：分类由源插件提供自己的
+工件分类表，而非全局 Windows 关键词表（解决 P4）。
+
+### 4.8 Agent 体系重组
+
+现有 7 个 agent 按 Windows 工件命名（registry、communication=邮件/IRC、
+network=浏览器/PCAP）。改为按**调查职能**组织，并增加平台特定 agent：
+
+- `agent_factory.py` 的 `_AGENT_CLASSES`（:34-40）扩充：新增 `ios_artifact`、
+  `android_artifact`、`financial`（钱包/交易）、`media`。
+- `communication` 泛化：邮件 + IM + 短信，跨平台。
+- 新增 **源类型 → 适任 agent** 映射，供 Phase 1 逐源派 triage agent。
+- `create_specialized_agent`（:69）的动态组合机制保留——它本就是应对能力缺口的
+  正确手段，只是工具目录变大后选择空间更丰富。
+
+### 4.9 编排器多源流水线
+
+| 阶段 | 改造 |
+|---|---|
+| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**，每源派类型适配的 agent |
+| Phase 2 | 假设跨源生成；身份共指假设在此首次登记 |
+| Phase 3 | leads 派发到源感知 agent；假设×证据矩阵实时更新 |
+| Phase 4 | 跨源时间线合并，**按源做时区归一**（iOS UTC vs 安卓本地时间） |
+| Phase 5 | 一案一份综合报告：含假设结论、实体关联图、每条结论的 provenance 引证 |
+
+断连恢复、运行归档逻辑保留，`graph_state.json` 增量纳入新字段。
+
+---
+
+## 5. 数据模型变更汇总
+
+| 节点/结构 | 变更 |
+|---|---|
+| `EvidenceSource` | **新增**一等节点（`src-*`） |
+| `ToolInvocation` | **新增**留痕记录（`inv-*`），随 graph 持久化 |
+| `Phenomenon` | + `source_id`；description 拆为 `verified_facts[]` + `interpretation`；澄清/移除语义含混的 `confidence`（默认 1.0），观测的可靠性由 grounding 表达 |
+| `Hypothesis` | + `prior_prob`、`log_odds`（累加量）；`confidence` 改为派生值 |
+| `Entity` | + 类型化标识符集合；通过 `same_as` 边跨源连通 |
+| Phenomenon→Hypothesis 边 | 携带 `edge_type`，映射到 `log₁₀(LR)`（替换 `_DEFAULT_EDGE_WEIGHTS`） |
+| Entity→Entity 边 | **新增** `same_as`（由 coref 假设背书，可逆） |
+
+`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
+
+---
+
+## 6. 组件改动清单
+
+| 文件 | 改动 |
+|---|---|
+| `case.py` | **新建**：`Case` / `EvidenceSource` / `SourceRegistry` |
+| `main.py` | 选源逻辑改为加载 `Case`；类型探测替代扩展名 glob |
+| `tool_registry.py` | 工具按 `source_id` 参数化；缓存键含 source；`_auto_categorize` 改可扩展；`ToolInvocationLog` |
+| `evidence_graph.py` | 数据模型变更（第 5 节）；LR/对数几率置信度；写入网关 + grounding 校验 |
+| `base_agent.py` | RECORD 走网关；`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 |
+| `agent_factory.py` | `_AGENT_CLASSES` 扩充；源类型→agent 映射 |
+| `orchestrator.py` | Phase 1 逐源；Phase 4 跨源时区归一；Phase 5 综合报告 |
+| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py`；`communication.py` 泛化 |
+| `tools/` | 新增 `mobile_ios.py`（plist/sqlite/keychain）、`media.py`（OCR）、`archive.py`（解包） |
+| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`；新建 `case.yaml` 证据清单 |
+
+---
+
+## 7. 构建顺序（按依赖排序）
+
+| 阶段 | 内容 | 依赖 | 价值 |
+|---|---|---|---|
+| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基；先只在 USB E01 上跑通验证不破坏现有逻辑 |
+| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心；可量化「零幻觉录入」 |
+| **S3** | 4.5 LR/对数几率置信度 | 独立（可与 S2 并行） | 修 P3；置信度可辩护 |
+| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 |
+| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联，复杂场景能力成型 |
+| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 |
+
+S1+S2+S3 是「把系统改对」；S4-S6 是「把能力铺全」。建议严格按序——
+S1 不稳，后面全是空中楼阁。
+
+---
+
+## 8. 设计取舍与未决问题
+
+1. **grounding 对自由文本的边界**：只硬核验 `verified_facts` 里的结构化原子，
+   `interpretation` 不做逐字核验（诚实划界）。可加一个二级 lint：扫描
+   interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。
+2. **LR 标定表初值人定**：先用第 4.5 节的初值跑通；「从标注案例学习 LR」是后续工作。
+3. **安卓 userdata 加密**：能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。
+4. **实体解析的破坏性 vs 可逆**：本设计选**可逆的 `same_as` 边**而非破坏性归并——
+   牺牲一点查询效率换取完全可审计可回滚，符合原则 4。
+5. **报告粒度**：定为「一案一份综合报告」，内嵌每证据小节 + 跨源关联，
+   而非每证据独立成篇。
diff --git a/agent_factory.py b/agent_factory.py
index 18582f6..b7a1492 100644
--- a/agent_factory.py
+++ b/agent_factory.py
@@ -24,9 +24,12 @@ def _load_agent_classes() -> None:
     """Lazy-import agent classes to avoid circular imports."""
     if _AGENT_CLASSES:
         return
+    from agents.android_artifact import AndroidArtifactAgent
     from agents.communication import CommunicationAgent
     from agents.filesystem import FileSystemAgent
     from agents.hypothesis import HypothesisAgent
+    from agents.ios_artifact import IOSArtifactAgent
+    from agents.media import MediaAgent
     from agents.network import NetworkAgent
     from agents.registry import RegistryAgent
     from agents.report import ReportAgent
@@ -38,6 +41,50 @@ def _load_agent_classes() -> None:
     _AGENT_CLASSES["timeline"] = TimelineAgent
     _AGENT_CLASSES["hypothesis"] = HypothesisAgent
     _AGENT_CLASSES["report"] = ReportAgent
+    _AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent
+    _AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent
+    _AGENT_CLASSES["media"] = MediaAgent
+
+
+# Triage agent per (source.type, platform). disk_image is ambiguous on its
+# own — both a Windows USB image and an Android raw dump are disk_image —
+# so the routing helper also looks at source.meta.platform when present.
+SOURCE_TYPE_AGENTS: dict[str, str] = {
+    "disk_image":        "filesystem",       # default for unknown platform
+    "mobile_extraction": "ios_artifact",
+    "archive":           "filesystem",
+    "media_collection":  "media",
+}
+
+# Per-platform overrides for disk_image sources. Keys come from
+# source.meta.platform in case.yaml (lowercased).
+_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = {
+    "windows": "filesystem",
+    "linux":   "filesystem",
+    "android": "android_artifact",
+    "ios":     "ios_artifact",
+}
+
+
+def get_triage_agent_type(source) -> str:
+    """Pick the right Phase-1 agent for *source*.
+
+    Accepts either an :class:`EvidenceSource` or a raw source.type string
+    (for back-compat with the S5 signature). Disk-image sources additionally
+    consult ``source.meta.platform`` so Windows USBs and Android raw dumps —
+    both type=disk_image — get different agents.
+    """
+    # Back-compat: accept a plain type string.
+    if isinstance(source, str):
+        return SOURCE_TYPE_AGENTS.get(source, "filesystem")
+
+    src_type = getattr(source, "type", "disk_image")
+    if src_type == "disk_image":
+        meta = getattr(source, "meta", {}) or {}
+        platform = str(meta.get("platform", "")).lower()
+        if platform in _DISK_IMAGE_PLATFORM_AGENTS:
+            return _DISK_IMAGE_PLATFORM_AGENTS[platform]
+    return SOURCE_TYPE_AGENTS.get(src_type, "filesystem")
 
 
 logger = logging.getLogger(__name__)
diff --git a/agents/android_artifact.py b/agents/android_artifact.py
new file mode 100644
index 0000000..be5e6b8
--- /dev/null
+++ b/agents/android_artifact.py
@@ -0,0 +1,58 @@
+"""Android Artifact Agent — multi-partition analysis of raw Android dumps.
+
+DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is
+its own analysable surface. Ext4-backed partitions (typically SYSTEM,
+USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw
+partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with
+``search_strings``. Userdata is the prize and is often FBE-encrypted on
+modern devices — the agent must check fsstat before assuming readability
+(see ``probe_android_partitions`` for the survey).
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class AndroidArtifactAgent(BaseAgent):
+    name = "android_artifact"
+    role = (
+        "Android forensic analyst. You navigate raw Android disk dumps "
+        "(blk0_sda-style images) partition by partition. Workflow: call "
+        "probe_android_partitions ONCE to map the disk; pick the partitions "
+        "with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, "
+        "EFS); for each, call set_active_partition(offset_from_512_sector_column) "
+        "and then list_directory / extract_file / search_strings as usual. "
+        "For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly "
+        "to search_strings — they have no filesystem. If USERDATA shows "
+        "fs_type=unknown it is almost certainly FBE-encrypted: record that "
+        "as a negative finding (the absence IS evidence) and move on to "
+        "what's reachable."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            # Android-specific
+            "probe_android_partitions",
+            "set_active_partition",
+            # Reused TSK toolset — partition_offset comes from active_source
+            "partition_info", "filesystem_info", "list_directory",
+            "extract_file", "find_file", "search_strings",
+            "count_deleted_files", "build_filesystem_timeline",
+            # Generic parsers
+            "read_text_file", "read_binary_preview", "search_text_file",
+            "read_text_file_section", "list_extracted_dir", "find_files",
+            # SQLite — Android apps store data in sqlite too (WhatsApp, etc.)
+            "sqlite_tables", "sqlite_query",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
diff --git a/agents/ios_artifact.py b/agents/ios_artifact.py
new file mode 100644
index 0000000..9e4fe38
--- /dev/null
+++ b/agents/ios_artifact.py
@@ -0,0 +1,49 @@
+"""iOS Artifact Agent — analyses unpacked iOS extractions.
+
+DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family
+the system handles (alongside disk images and pcaps). This agent owns the
+iOS-specific toolset; the grounded ``add_phenomenon`` contract from
+BaseAgent applies unchanged — every fact must cite a tool invocation.
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class IOSArtifactAgent(BaseAgent):
+    name = "ios_artifact"
+    role = (
+        "iOS forensic analyst. You analyse unpacked iOS extractions — "
+        "binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, "
+        "AddressBook.sqlitedb), the keychain (keychain-2.db), and the "
+        "iDevice_info.txt summary — to extract device identity, accounts, "
+        "messaging, contacts, and credential metadata. Domain-rooted iOS "
+        "trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) "
+        "are your map; navigate by path, not by inode."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            # navigation — find_files is the workhorse on 10k+-file iOS trees;
+            # list_extracted_dir is for initial layout summary only.
+            "list_extracted_dir", "find_files",
+            "read_text_file", "read_text_file_section", "read_binary_preview",
+            "search_text_file",
+            # iOS-specific parsers
+            "parse_plist",
+            "sqlite_tables", "sqlite_query",
+            "parse_ios_keychain",
+            "read_idevice_info",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
diff --git a/agents/media.py b/agents/media.py
new file mode 100644
index 0000000..018127f
--- /dev/null
+++ b/agents/media.py
@@ -0,0 +1,52 @@
+"""Media Agent — OCR-based analysis of screenshot/photo evidence.
+
+DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG
+evidence must go through tesseract first. The agent runs OCR, then
+records extracted strings — especially identifiers (wallet addresses,
+phone numbers, usernames) — via the grounded observe_identity gateway so
+they participate in cross-source coref the same way iOS keychain entries
+or Windows account names do.
+
+If the OCR runtime is missing on the host, ocr_image returns an explicit
+install hint; the agent should record that as a negative finding ("no
+text extracted — tesseract not installed") rather than guessing.
+"""
+
+from __future__ import annotations
+
+from base_agent import BaseAgent
+from evidence_graph import EvidenceGraph
+from llm_client import LLMClient
+from tool_registry import TOOL_CATALOG
+
+
+class MediaAgent(BaseAgent):
+    name = "media"
+    role = (
+        "Media / OCR forensic analyst. You analyse screenshots, photos, and "
+        "scanned documents — any pixel-based evidence the LLM cannot read "
+        "directly. Workflow: list_extracted_dir to enumerate images, "
+        "ocr_image on each promising one, then add_phenomenon (with the "
+        "OCR'd text as the verified_fact value) and observe_identity for "
+        "any wallet addresses, phone numbers, email addresses, or "
+        "usernames the text contains. If OCR fails because tesseract is "
+        "missing, RECORD that as a negative finding instead of fabricating "
+        "image content — the absence is a real fact about this run."
+    )
+
+    def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
+        super().__init__(llm, graph)
+        self._register_tools()
+
+    def _register_tools(self) -> None:
+        tool_names = [
+            "ocr_image",
+            "list_extracted_dir", "find_files",
+            "read_binary_preview",
+            "read_text_file",
+            "search_text_file",
+        ]
+        for name in tool_names:
+            td = TOOL_CATALOG.get(name)
+            if td:
+                self.register_tool(td.name, td.description, td.input_schema, td.executor)
diff --git a/agents/report.py b/agents/report.py
index 2df76ea..41d7d21 100644
--- a/agents/report.py
+++ b/agents/report.py
@@ -12,9 +12,20 @@ class ReportAgent(BaseAgent):
     role = (
         "Forensic report writer. You synthesize all findings from the investigation "
         "into a structured, professional forensic analysis report organized by hypotheses.\n\n"
-        "Only include findings that have a source_tool attribution (marked VERIFIED). "
-        "If evidence lacks source attribution, mark it as UNVERIFIED. "
-        "Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence."
+        "Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), "
+        "TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When "
+        "writing the report, render verified_facts as primary evidence with their "
+        "invocation citations, and render interpretation as 'agent analysis' so the "
+        "reader can tell ground truth from inference. Do NOT invent or fabricate any "
+        "data, timestamps, or findings not present in the evidence.\n\n"
+        "This is a cross-source case: phenomena come from multiple evidence "
+        "sources, and entities discovered on different sources may refer to the "
+        "same real-world actor. ALWAYS include:\n"
+        "  - 'Findings by Source' section sourced from get_phenomena_by_source\n"
+        "  - 'Actor Clusters' section sourced from get_actor_clusters (the "
+        "cross-source attribution view — multi-source clusters answer "
+        "'which findings on different devices belong to the same person')\n"
+        "  - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix"
     )
     # Calling save_report is BOTH the recording action and the completion
     # signal. tool_call_loop returns the moment save_report executes; the
@@ -38,9 +49,12 @@ class ReportAgent(BaseAgent):
             f"Investigation state:\n{self.graph.stats_summary()}\n\n"
             f"Your task: {task}\n\n"
             f"WORKFLOW:\n"
-            f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities, get_case_info "
-            f"   to gather all the data needed for the report. Make these calls in parallel.\n"
-            f"2. Assemble the complete markdown forensic report.\n"
+            f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n"
+            f"   get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n"
+            f"   and get_phenomena_by_source in parallel — these are the eight data\n"
+            f"   sources you assemble the report from.\n"
+            f"2. Assemble the complete markdown forensic report. Cross-source\n"
+            f"   actor clusters and per-source breakdown are MANDATORY sections.\n"
             f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
             f"   This single call is the completion signal — the run ENDS the moment it executes.\n"
             f"   Do NOT call any read tools after this point; they will not run.\n"
@@ -83,6 +97,45 @@ class ReportAgent(BaseAgent):
             executor=self._get_entities,
         )
 
+        self.register_tool(
+            name="get_hypothesis_evidence_matrix",
+            description=(
+                "Render the hypothesis × evidence pivot as a markdown table. "
+                "Columns: per edge_type counts, log_odds, confidence, status. "
+                "Embed this directly in the report to show how each hypothesis "
+                "stands relative to the others on a single screen."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_hypothesis_evidence_matrix,
+        )
+
+        self.register_tool(
+            name="get_actor_clusters",
+            description=(
+                "Render the cross-source actor clusters: each cluster is the "
+                "set of Entity nodes the system currently treats as the same "
+                "actor (via active same_as edges backed by coref hypotheses "
+                "≥ 0.8). Includes the aggregated identifier evidence per "
+                "cluster. Use this in the report's 'Entities / Actors' "
+                "section so readers see who-is-who across devices, not just "
+                "raw entity rows."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_actor_clusters,
+        )
+
+        self.register_tool(
+            name="get_phenomena_by_source",
+            description=(
+                "Group every phenomenon by its originating evidence source "
+                "(source_id). Use this to drive the report's 'Findings by "
+                "Source' section so each evidence item's per-device "
+                "contribution is auditable."
+            ),
+            input_schema={"type": "object", "properties": {}},
+            executor=self._get_phenomena_by_source,
+        )
+
         self.register_tool(
             name="save_report",
             description="Save the final report to a file.",
@@ -115,12 +168,24 @@ class ReportAgent(BaseAgent):
             items = [ph for ph in phenomena.values() if ph.category == cat]
             lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---")
             for ph in items:
-                verified = "VERIFIED" if ph.source_tool else "UNVERIFIED"
-                lines.append(f"\n[{verified}] {ph.title} ({ph.id})")
+                # Grounded = at least one verified fact AND a source_tool.
+                grounded = bool(ph.verified_facts) and bool(ph.source_tool)
+                marker = "GROUNDED" if grounded else (
+                    "TOOL-ONLY" if ph.source_tool else "UNVERIFIED"
+                )
+                lines.append(f"\n[{marker}] {ph.title} ({ph.id})")
                 lines.append(f"  Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}")
                 if ph.timestamp:
                     lines.append(f"  Timestamp: {ph.timestamp}")
-                lines.append(f"  {ph.description[:500]}")
+                if ph.verified_facts:
+                    lines.append(f"  Verified facts ({len(ph.verified_facts)}):")
+                    for f in ph.verified_facts:
+                        lines.append(
+                            f"    - [{f.get('type','?')}] {str(f.get('value',''))[:200]} "
+                            f"(cite: {f.get('invocation_id','?')})"
+                        )
+                if ph.interpretation:
+                    lines.append(f"  Analysis: {ph.interpretation[:500]}")
         return "\n".join(lines)
 
     async def _get_hypotheses_with_evidence(self) -> str:
@@ -150,12 +215,87 @@ class ReportAgent(BaseAgent):
         return "\n".join(lines)
 
     async def _get_case_info(self) -> str:
-        info = self.graph.case_info
         lines = ["=== Case Information ==="]
-        for k, v in info.items():
-            lines.append(f"  {k}: {v}")
-        lines.append(f"  Image path: {self.graph.image_path}")
-        lines.append(f"  Partition offset: {self.graph.partition_offset}")
+        case = self.graph.case
+        if case is not None:
+            lines.append(f"  case_id: {case.case_id}")
+            lines.append(f"  name: {case.name}")
+            for k, v in (case.meta or {}).items():
+                lines.append(f"  {k}: {v}")
+            lines.append(f"  sources: {len(case.sources)}")
+            for s in case.sources:
+                owner = f", owner={s.owner}" if s.owner else ""
+                platform = s.meta.get("platform") if s.meta else None
+                plat = f", platform={platform}" if platform else ""
+                lines.append(
+                    f"    - {s.id}: {s.label} "
+                    f"(type={s.type}, mode={s.access_mode}{plat}{owner})"
+                )
+        else:
+            # Legacy single-image fallback — surface whatever case_info dict
+            # was passed in (e.g. the old CFReDS MD5 block).
+            for k, v in (self.graph.case_info or {}).items():
+                lines.append(f"  {k}: {v}")
+            lines.append(f"  Image path: {self.graph.image_path}")
+            lines.append(f"  Partition offset: {self.graph.partition_offset}")
+        return "\n".join(lines)
+
+    async def _get_hypothesis_evidence_matrix(self) -> str:
+        return self.graph.hypothesis_evidence_matrix_markdown()
+
+    async def _get_actor_clusters(self) -> str:
+        clusters = self.graph.actor_clusters()
+        if not clusters:
+            return "(no entities recorded)"
+        # Show multi-member clusters first — they're the cross-source links
+        # the human reader most needs to see.
+        clusters.sort(key=lambda c: (-len(c["members"]), c["members"]))
+        lines = [f"=== Actor Clusters ({len(clusters)}) ==="]
+        for i, c in enumerate(clusters, 1):
+            members = c["members"]
+            label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity"
+            lines.append(f"\n[{label} #{i}] {len(members)} member(s):")
+            for eid in members:
+                ent = self.graph.entities.get(eid)
+                if ent:
+                    lines.append(f"  - {ent.summary()}")
+            if c["identifiers"]:
+                lines.append("  Aggregated identifiers:")
+                for ident in c["identifiers"]:
+                    strong_tag = "strong" if ident.get("strong") else "weak"
+                    lines.append(
+                        f"    [{strong_tag}] {ident.get('type')}={ident.get('value')} "
+                        f"(on {ident.get('on_entity')})"
+                    )
+            if c["coref_hypotheses"]:
+                lines.append("  Backing coref hypotheses (≥0.8 active):")
+                for hid in c["coref_hypotheses"]:
+                    hyp = self.graph.hypotheses.get(hid)
+                    if hyp:
+                        lines.append(f"    - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}")
+        return "\n".join(lines)
+
+    async def _get_phenomena_by_source(self) -> str:
+        by_src: dict[str, list] = {}
+        for ph in self.graph.phenomena.values():
+            by_src.setdefault(ph.source_id or "(unbound)", []).append(ph)
+        if not by_src:
+            return "(no phenomena recorded)"
+        # Resolve source labels via graph.case when possible.
+        def _label(src_id: str) -> str:
+            if self.graph.case:
+                src = self.graph.case.get_source(src_id)
+                if src:
+                    return f"{src_id} — {src.label} ({src.type})"
+            return src_id
+
+        lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="]
+        for src_id in sorted(by_src):
+            phs = by_src[src_id]
+            lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---")
+            for ph in phs:
+                grounded = "G" if ph.verified_facts and ph.source_tool else "·"
+                lines.append(f"  [{grounded}] {ph.summary()}")
         return "\n".join(lines)
 
     async def _get_entities(self) -> str:
@@ -174,18 +314,27 @@ class ReportAgent(BaseAgent):
         return "\n".join(lines)
 
     async def _verify_phenomena(self) -> str:
-        verified = []
-        unverified = []
+        grounded: list[str] = []
+        tool_only: list[str] = []
+        unverified: list[str] = []
         for ph in self.graph.phenomena.values():
-            entry = f"  [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})"
-            if ph.source_tool:
-                verified.append(entry)
+            nf = len(ph.verified_facts)
+            entry = (
+                f"  [{ph.category}] {ph.title} "
+                f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})"
+            )
+            if ph.verified_facts and ph.source_tool:
+                grounded.append(entry)
+            elif ph.source_tool:
+                tool_only.append(entry)
             else:
                 unverified.append(entry)
 
         lines = ["=== Phenomena Verification Report ==="]
-        lines.append(f"\nVERIFIED ({len(verified)} — have source_tool):")
-        lines.extend(verified)
+        lines.append(f"\nGROUNDED ({len(grounded)} — facts + source_tool):")
+        lines.extend(grounded)
+        lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):")
+        lines.extend(tool_only)
         lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):")
         lines.extend(unverified)
         return "\n".join(lines)
diff --git a/agents/timeline.py b/agents/timeline.py
index 8efb955..4704daf 100644
--- a/agents/timeline.py
+++ b/agents/timeline.py
@@ -122,7 +122,15 @@ class TimelineAgent(BaseAgent):
         lines = []
         for ph in items:
             lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})")
-            lines.append(f"  {ph.description[:150]}")
+            preview = ph.interpretation[:150] if ph.interpretation else ""
+            if ph.verified_facts:
+                fact_preview = ", ".join(
+                    f"{f.get('type','?')}={str(f.get('value',''))[:40]}"
+                    for f in ph.verified_facts[:3]
+                )
+                preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]"
+            if preview:
+                lines.append(f"  {preview}")
         return "\n".join(lines)
 
     async def _add_temporal_edge(
diff --git a/base_agent.py b/base_agent.py
index 5357fe7..307887d 100644
--- a/base_agent.py
+++ b/base_agent.py
@@ -5,6 +5,7 @@ from __future__ import annotations
 import json
 import logging
 import time
+import uuid
 from typing import Any
 
 from evidence_graph import EvidenceGraph
@@ -36,7 +37,9 @@ class BaseAgent:
     # forced retry with an explicit "you forgot to record" instruction.
     # Subclasses override to declare their own recording responsibility
     # (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
-    mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",)
+    # observe_identity (S5) counts as a recording too — it writes through the
+    # same grounding gateway and produces an identity_observation phenomenon.
+    mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity")
 
     # Tools whose invocation ends the run immediately. After any terminal tool
     # is called, tool_call_loop returns with that tool's result text as
@@ -110,8 +113,23 @@ class BaseAgent:
             f"  Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n"
             f"  Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n"
             f"  Create add_lead for anything outside your expertise.\n\n"
-            f"Phase B — RECORD PHENOMENA:\n"
-            f"  For EACH significant finding from Phase A, call add_phenomenon.\n"
+            f"Phase B — RECORD PHENOMENA (GROUNDED):\n"
+            f"  For EACH significant finding from Phase A, call add_phenomenon with:\n"
+            f"    * interpretation: your analysis — free text, NOT verified.\n"
+            f"    * verified_facts: one entry per concrete atom (path, timestamp,\n"
+            f"      inode, hash, identifier, count) you want recorded as truth.\n"
+            f"      Each entry MUST have:\n"
+            f"        - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n"
+            f"        - value: a VERBATIM substring from the tool output\n"
+            f"        - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n"
+            f"          header at the top of the tool result that produced this value\n"
+            f"  IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n"
+            f"  whenever you see an email, phone number, Apple ID, IMEI, wallet\n"
+            f"  address, MAC, UDID, persistent nickname, or display name. Same\n"
+            f"  grounding contract: value must be verbatim in the cited tool\n"
+            f"  output. This is HOW cross-source attribution gets built — without\n"
+            f"  it, we can't tell whether the Apple ID in keychain belongs to the\n"
+            f"  same person as the Windows account on the USB.\n"
             f"  Do NOT call link_to_entity yet — just record all phenomena first.\n\n"
             f"Phase C — LINK ENTITIES:\n"
             f"  FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
@@ -125,20 +143,22 @@ class BaseAgent:
             f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
             f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
             f"a registry key) and found NOTHING, that absence IS evidence — call "
-            f"add_phenomenon with a 'No matches for X' title and the search scope in "
-            f"raw_data. Negative findings constrain the hypothesis space and prevent "
-            f"the next agent from wasting time re-searching.\n"
+            f"add_phenomenon with a 'No matches for X' title, the search scope in "
+            f"raw_data, and cite the search tool's invocation_id (verified_facts may "
+            f"be empty for a true negative; the cited invocation in source_tool still "
+            f"anchors it). Negative findings constrain the hypothesis space.\n"
             f"- If you stop without having called add_phenomenon at least once, the task "
-            f"is FAILED and a forced retry will fire.\n"
-            f"- Include exact file paths, inode numbers, timestamps, and the source_tool "
-            f"that produced each finding.\n\n"
-            f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
-            f"- ONLY record findings that appear VERBATIM in tool results you received\n"
-            f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
-            f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n"
-            f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n"
-            f"- Quote exact strings from tool output when recording evidence descriptions\n"
-            f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools"
+            f"is FAILED and a forced retry will fire.\n\n"
+            f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n"
+            f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n"
+            f"  is what you cite in each fact's invocation_id.\n"
+            f"- fact.value must be a substring of the cited invocation's output.\n"
+            f"  Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n"
+            f"  anything else fabricated is REJECTED with a per-fact reason.\n"
+            f"- On REJECTED: quote the literal text from the output (or drop the\n"
+            f"  fact), and put guesses / inferred paths / model names in\n"
+            f"  `interpretation` instead. Then call add_phenomenon again.\n"
+            f"- You may cite ONLY invocations made within THIS task."
         )
 
     async def run(self, task: str, lead_id: str | None = None) -> str:
@@ -146,6 +166,11 @@ class BaseAgent:
         _log(task, event="agent_start", agent=self.name)
         self.graph.agent_status[self.name] = "running"
         self.graph._current_agent = self.name
+        # Fresh task scope per agent run. Used by the grounding gateway to
+        # check that facts in add_phenomenon cite invocations made *within
+        # this run* — preventing the agent from forwarding stale IDs from
+        # earlier work or another agent.
+        self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}"
         self._current_lead_id = lead_id
 
         self._register_graph_tools()
@@ -350,20 +375,67 @@ class BaseAgent:
         self.register_tool(
             name="add_phenomenon",
             description=(
-                "Record a forensic finding (phenomenon) on the evidence graph. "
-                "You MUST specify source_tool: the name of the tool call that produced this finding."
+                "Record a forensic finding on the evidence graph. The finding is "
+                "split into provenance-bound atoms (verified_facts) and free-form "
+                "analysis (interpretation). Each fact MUST cite the invocation_id "
+                "of a tool call you made in THIS task — the gateway checks every "
+                "fact's value against that call's real output, byte-for-byte. "
+                "Any fact that fails grounding causes the whole record to be "
+                "rejected with a list of failures; fix the facts and call again."
             ),
             input_schema={
                 "type": "object",
                 "properties": {
                     "category": {"type": "string", "description": "Category of the finding."},
                     "title": {"type": "string", "description": "Short title."},
-                    "description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."},
+                    "interpretation": {
+                        "type": "string",
+                        "description": (
+                            "Free-form analysis text — your reasoning, why this "
+                            "matters, what it implies. NOT verified by the gateway. "
+                            "Rendered in reports as 'agent analysis', not truth."
+                        ),
+                    },
+                    "verified_facts": {
+                        "type": "array",
+                        "description": (
+                            "Atoms you want preserved as ground truth. Each must "
+                            "appear verbatim in the cited tool output."
+                        ),
+                        "items": {
+                            "type": "object",
+                            "properties": {
+                                "type": {
+                                    "type": "string",
+                                    "description": (
+                                        "Kind of fact: path, timestamp, inode, "
+                                        "hash, identifier, count, raw, ..."
+                                    ),
+                                },
+                                "value": {
+                                    "type": "string",
+                                    "description": (
+                                        "Verbatim substring from the cited tool "
+                                        "output. The gateway does a literal "
+                                        "string-in-string check — no paraphrasing."
+                                    ),
+                                },
+                                "invocation_id": {
+                                    "type": "string",
+                                    "description": (
+                                        "ID from the '[invocation: inv-xxx]' header "
+                                        "of the tool call that produced this value."
+                                    ),
+                                },
+                            },
+                            "required": ["type", "value", "invocation_id"],
+                        },
+                    },
                     "raw_data": {"type": "object", "description": "Structured raw data supporting this finding."},
                     "timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."},
                     "source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."},
                 },
-                "required": ["category", "title", "description", "source_tool"],
+                "required": ["category", "title", "source_tool"],
             },
             executor=self._add_phenomenon,
         )
@@ -414,6 +486,67 @@ class BaseAgent:
             executor=self._link_to_entity,
         )
 
+        self.register_tool(
+            name="observe_identity",
+            description=(
+                "Record a typed identifier (email / phone / Apple ID / IMEI / "
+                "wallet address / nickname / display name / …) for an entity. "
+                "Goes through the same grounding gateway as add_phenomenon — "
+                "value MUST be a verbatim substring of the cited tool output. "
+                "After attachment, the engine automatically proposes / "
+                "strengthens / weakens cross-source coreference hypotheses "
+                "between this entity and any others carrying the same or "
+                "conflicting identifiers. This is how 'is the Apple ID in iOS "
+                "keychain the same person as the Windows login name?' gets "
+                "answered. Call this in ADDITION to add_phenomenon for "
+                "identifier-bearing findings."
+            ),
+            input_schema={
+                "type": "object",
+                "properties": {
+                    "entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."},
+                    "entity_type": {
+                        "type": "string",
+                        "enum": ["person", "program", "file", "host", "ip_address"],
+                        "description": "Kind of entity this identifier belongs to (usually 'person').",
+                    },
+                    "identifier_type": {
+                        "type": "string",
+                        "description": (
+                            "Strong (near-unique): email, phone_number, imei, "
+                            "imsi, apple_id, icloud_id, google_account, "
+                            "wallet_address, udid, mac_address, device_serial. "
+                            "Weak (free-form, may collide): nickname, "
+                            "display_name, username, screen_name."
+                        ),
+                    },
+                    "value": {
+                        "type": "string",
+                        "description": (
+                            "The identifier value, quoted VERBATIM from the "
+                            "tool output you cite in invocation_id."
+                        ),
+                    },
+                    "invocation_id": {
+                        "type": "string",
+                        "description": (
+                            "ID from the '[invocation: inv-xxx]' header of "
+                            "the tool call that surfaced this identifier."
+                        ),
+                    },
+                    "source_tool": {
+                        "type": "string",
+                        "description": "Name of the tool that produced the identifier.",
+                    },
+                },
+                "required": [
+                    "entity_name", "entity_type", "identifier_type",
+                    "value", "invocation_id",
+                ],
+            },
+            executor=self._observe_identity,
+        )
+
     # ---- Tool executors -----------------------------------------------------
 
     async def _list_phenomena(self, category: str | None = None) -> str:
@@ -453,16 +586,29 @@ class BaseAgent:
         self,
         category: str,
         title: str,
-        description: str,
+        interpretation: str = "",
+        verified_facts: list[dict] | None = None,
         raw_data: dict | None = None,
         timestamp: str | None = None,
         source_tool: str = "",
+        # Back-compat: older prompts (and accidental LLM emissions) may pass
+        # ``description``; treat it as ``interpretation`` rather than failing.
+        description: str | None = None,
     ) -> str:
+        if description and not interpretation:
+            interpretation = description
+        # GroundingError propagates: llm_client._execute_single_tool turns
+        # raised exceptions into "Error executing add_phenomenon: <msg>" tool
+        # results the LLM sees, and _wrap_record_executor does NOT increment
+        # the mandatory-record counter (the increment only runs after a
+        # successful return), so the forced-retry mechanism still fires if
+        # the agent never lands a grounded phenomenon.
         pid, merged = await self.graph.add_phenomenon(
             source_agent=self.name,
             category=category,
             title=title,
-            description=description,
+            interpretation=interpretation,
+            verified_facts=verified_facts,
             raw_data=raw_data,
             timestamp=timestamp,
             source_tool=source_tool,
@@ -508,6 +654,51 @@ class BaseAgent:
         status = "linked to existing" if existing else "created and linked"
         return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}"
 
+    async def _observe_identity(
+        self,
+        entity_name: str,
+        entity_type: str,
+        identifier_type: str,
+        value: str,
+        invocation_id: str,
+        source_tool: str = "",
+    ) -> str:
+        # GroundingError / ValueError propagate to llm_client's per-tool
+        # exception handler, which formats them back to the LLM. That keeps
+        # the mandatory-record counter honest — only a successful return
+        # triggers the increment in _wrap_record_executor.
+        result = await self.graph.observe_identity(
+            entity_name=entity_name,
+            entity_type=entity_type,
+            identifier_type=identifier_type,
+            value=value,
+            source_agent=self.name,
+            source_tool=source_tool,
+            invocation_id=invocation_id,
+        )
+        lines = [
+            f"Identity observed: {identifier_type}={value} "
+            f"on entity {result['entity_id']} ({entity_name})."
+        ]
+        if result.get("new_identifier"):
+            lines.append(
+                f"  Observation phenomenon: {result['phenomenon_id']}"
+            )
+        else:
+            lines.append("  (identifier already recorded on this entity — idempotent)")
+        for prop in result.get("coref_proposals", []):
+            lines.append(
+                f"  → Coref candidate: {prop['other_entity_id']} via "
+                f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, "
+                f"hypothesis={prop['hypothesis_id']})"
+            )
+            for c in prop.get("conflicts", []):
+                lines.append(
+                    f"      ⚠ conflict on {c['type']}: "
+                    f"{c['new_value']} vs {c['other_value']}"
+                )
+        return "\n".join(lines)
+
     async def _list_assets(self, category: str | None = None) -> str:
         results = self.graph.list_assets(category)
         if not results:
diff --git a/case.example.yaml b/case.example.yaml
new file mode 100644
index 0000000..b3ed843
--- /dev/null
+++ b/case.example.yaml
@@ -0,0 +1,41 @@
+# MASForensics case definition — template
+#
+# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
+# exists in the working directory, `python main.py` loads it automatically;
+# otherwise main.py falls back to interactive single-image selection.
+#
+# A case is a set of evidence sources. Each source has:
+#   id              optional — auto-derived from label if omitted ("src-<slug>")
+#   label           human-readable name
+#   type            disk_image | mobile_extraction | archive | media_collection
+#   access_mode     image | tree   (optional — defaults by type)
+#                     image = block device / disk image, navigated by Sleuth Kit
+#                     tree  = mounted filesystem / unpacked extraction, path-based
+#   owner           optional — the person the source is associated with
+#   path            filesystem path (relative paths resolve against this file)
+#   partition_offset  image-mode only — sector offset of the partition to analyze
+#   meta            optional free-form notes
+#
+# NOTE: at the current refit stage only image-mode (disk) sources are
+# analysable; tree-mode sources are accepted but skipped.
+
+case_id: example-case
+name: "Example forensic case"
+meta:
+  notes: "free-form case-level metadata"
+
+sources:
+  - id: src-suspect-laptop
+    label: "Suspect laptop disk image"
+    type: disk_image
+    access_mode: image
+    owner: "John Doe"
+    path: image/suspect_laptop.E01
+    partition_offset: 0               # run `mmls <image>` to find the right offset
+
+  - id: src-suspect-phone
+    label: "Suspect phone extraction"
+    type: mobile_extraction
+    access_mode: tree
+    owner: "John Doe"
+    path: image/suspect_phone.zip
diff --git a/case.py b/case.py
new file mode 100644
index 0000000..f6c894d
--- /dev/null
+++ b/case.py
@@ -0,0 +1,226 @@
+"""Case and evidence-source model — the foundation for multi-evidence analysis.
+
+A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source
+has a *type* (disk image, mobile extraction, archive, ...) and an *access mode*
+that determines how forensic tools reach its contents:
+
+  - ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via
+    inode addressing (raw, E01, dd, ...).
+  - ``"tree"``  — an already-mounted filesystem or unpacked extraction,
+    navigated by ordinary filesystem paths.
+
+This module is pure data model + loading. Partition probing and interactive
+selection live in ``main.py``.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Recognised source types and access modes.
+SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"}
+ACCESS_MODES = {"image", "tree"}
+
+# Disk-image file extensions for interactive discovery.
+# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed
+# raw block-device dumps such as ``blk0_sda.bin``.
+DISK_IMAGE_EXTS = {
+    ".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd",
+}
+
+# Default access mode per source type.
+_DEFAULT_ACCESS_MODE = {
+    "disk_image": "image",
+    "mobile_extraction": "tree",
+    "archive": "tree",
+    "media_collection": "tree",
+}
+
+
+def slugify(text: str) -> str:
+    """Reduce *text* to a lowercase, hyphen-separated slug for use in IDs."""
+    slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
+    return slug or "src"
+
+
+@dataclass
+class EvidenceSource:
+    """One piece of evidence within a :class:`Case`."""
+
+    id: str                       # "src-<slug>"
+    label: str                    # human-readable name
+    type: str                     # one of SOURCE_TYPES
+    path: str                     # filesystem path to the evidence
+    access_mode: str              # "image" | "tree"
+    owner: str = ""               # associated person, if known
+    partition_offset: int = 0     # sector offset (image-mode sources only)
+    meta: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        return asdict(self)
+
+    @classmethod
+    def from_dict(cls, d: dict) -> EvidenceSource:
+        """Reconstruct from a dict, ignoring unknown keys (forward-compatible)."""
+        known = set(cls.__dataclass_fields__)
+        return cls(**{k: v for k, v in d.items() if k in known})
+
+    def summary(self) -> str:
+        loc = (
+            f"@{self.partition_offset}"
+            if self.access_mode == "image" and self.partition_offset
+            else ""
+        )
+        owner = f" owner={self.owner}" if self.owner else ""
+        return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}"
+
+
+@dataclass
+class Case:
+    """A forensic case: a set of evidence sources plus metadata."""
+
+    case_id: str
+    name: str
+    sources: list[EvidenceSource] = field(default_factory=list)
+    meta: dict = field(default_factory=dict)
+
+    def to_dict(self) -> dict:
+        return {
+            "case_id": self.case_id,
+            "name": self.name,
+            "sources": [s.to_dict() for s in self.sources],
+            "meta": dict(self.meta),
+        }
+
+    @classmethod
+    def from_dict(cls, d: dict) -> Case:
+        return cls(
+            case_id=d.get("case_id", ""),
+            name=d.get("name", ""),
+            sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])],
+            meta=d.get("meta", {}),
+        )
+
+    def get_source(self, source_id: str) -> EvidenceSource | None:
+        for s in self.sources:
+            if s.id == source_id:
+                return s
+        return None
+
+
+# ---------------------------------------------------------------------------
+# case.yaml loading
+# ---------------------------------------------------------------------------
+
+def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource:
+    """Validate and normalise one source entry from case.yaml.
+
+    Missing ``id`` is derived from the label; missing ``access_mode`` defaults
+    by type; relative paths are resolved against *base_dir* (the case file's
+    directory).
+    """
+    label = str(raw.get("label") or raw.get("id") or f"source-{index}")
+    src_type = str(raw.get("type", "disk_image"))
+    if src_type not in SOURCE_TYPES:
+        logger.warning("Unknown source type %r for %r — treating as disk_image",
+                        src_type, label)
+        src_type = "disk_image"
+
+    access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree"))
+    if access_mode not in ACCESS_MODES:
+        logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label)
+        access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree")
+
+    src_id = str(raw.get("id") or f"src-{slugify(label)}")
+    if not src_id.startswith("src-"):
+        src_id = f"src-{slugify(src_id)}"
+
+    raw_path = str(raw.get("path", "")).strip()
+    path = raw_path
+    if raw_path:
+        p = Path(raw_path).expanduser()
+        if not p.is_absolute():
+            p = (base_dir / p)
+        path = str(p)
+
+    return EvidenceSource(
+        id=src_id,
+        label=label,
+        type=src_type,
+        path=path,
+        access_mode=access_mode,
+        owner=str(raw.get("owner", "")),
+        partition_offset=int(raw.get("partition_offset", 0) or 0),
+        meta=dict(raw.get("meta", {})),
+    )
+
+
+def build_case(data: dict, base_dir: Path | None = None) -> Case:
+    """Build a validated :class:`Case` from a loosely-typed case.yaml dict."""
+    base_dir = base_dir or Path.cwd()
+    sources: list[EvidenceSource] = []
+    seen_ids: set[str] = set()
+    for i, raw in enumerate(data.get("sources", []) or []):
+        if not isinstance(raw, dict):
+            logger.warning("Skipping malformed source entry #%d", i)
+            continue
+        src = _build_source(raw, base_dir, i)
+        if src.id in seen_ids:
+            src.id = f"{src.id}-{i}"
+        seen_ids.add(src.id)
+        if not src.path:
+            logger.warning("Source %r has no path — keeping but it is not analysable",
+                            src.label)
+        sources.append(src)
+
+    return Case(
+        case_id=str(data.get("case_id", "case")),
+        name=str(data.get("name", "Untitled case")),
+        sources=sources,
+        meta=dict(data.get("meta", {})),
+    )
+
+
+def load_case(path: str | Path = "case.yaml") -> Case | None:
+    """Load a :class:`Case` from a case.yaml file. Returns None if absent."""
+    case_path = Path(path)
+    if not case_path.exists():
+        return None
+    import yaml
+
+    try:
+        data = yaml.safe_load(case_path.read_text()) or {}
+    except Exception as e:
+        logger.error("Failed to parse %s: %s", case_path, e)
+        return None
+    if not isinstance(data, dict):
+        logger.error("%s is not a YAML mapping", case_path)
+        return None
+
+    case = build_case(data, base_dir=case_path.resolve().parent)
+    logger.info("Loaded case %r with %d source(s) from %s",
+                case.name, len(case.sources), case_path)
+    return case
+
+
+def single_source_case(
+    image_path: str,
+    partition_offset: int = 0,
+    label: str | None = None,
+) -> Case:
+    """Wrap a single disk image as a one-source Case (interactive fallback)."""
+    name = label or Path(image_path).name
+    src = EvidenceSource(
+        id=f"src-{slugify(Path(image_path).stem)}",
+        label=name,
+        type="disk_image",
+        path=image_path,
+        access_mode="image",
+        partition_offset=partition_offset,
+    )
+    return Case(case_id="adhoc", name=name, sources=[src])
diff --git a/evidence_graph.py b/evidence_graph.py
index c1f9999..4735b8d 100644
--- a/evidence_graph.py
+++ b/evidence_graph.py
@@ -8,35 +8,180 @@ Edges: typed relationships with predefined weights for hypothesis confidence com
 from __future__ import annotations
 
 import asyncio
+import contextvars
+import hashlib
 import json
 import logging
+import re
 import uuid
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from pathlib import Path
 
+# Per-asyncio-task scoped values for "which agent is currently running" and
+# "which task scope does that agent's grounding live in". Backed by
+# ContextVars so concurrent agent runs (Phase 3's _dispatch_leads_parallel)
+# don't clobber each other — asyncio.create_task / asyncio.gather copies
+# the parent context per child task, and writes inside one task stay there.
+# Pre-P0 these were plain attributes on EvidenceGraph; the last setter won
+# under concurrency, tagging tool invocations with the WRONG agent and
+# making the grounding gateway falsely reject legitimate facts.
+_current_agent_ctx: contextvars.ContextVar[str] = contextvars.ContextVar(
+    "masforensics_current_agent", default="",
+)
+_current_task_id_ctx: contextvars.ContextVar[str] = contextvars.ContextVar(
+    "masforensics_current_task_id", default="",
+)
+
+from case import Case, EvidenceSource, single_source_case
+
 logger = logging.getLogger(__name__)
 
 # ---------------------------------------------------------------------------
-# Default edge weights for Phenomenon → Hypothesis relationships.
-# LLM only picks the edge type (categorical); the weight is looked up here.
-# Override per-graph via EvidenceGraph(edge_weights=...) or config.yaml's
-# `hypothesis_edge_weights` section.
+# Per-edge-type log₁₀(LR) — the calibration table backing hypothesis
+# confidence updates (DESIGN.md §4.5).
+#
+# The LLM only picks the *category* (direct_evidence, supports, …); the
+# numerical contribution is looked up here. Updates use the additive,
+# order-independent log-odds form
+#     L_post = L_prior + Σ log10(LR_i)
+#     confidence = 1 / (1 + 10^(−L_post))
+# which fixes the pre-S3 delta-update bug whose result depended on the
+# order edges arrived in.
+#
+# Override per-graph via EvidenceGraph(edge_log_lr=...) or config.yaml's
+# `hypothesis_log_lr` section.
 # ---------------------------------------------------------------------------
-_DEFAULT_EDGE_WEIGHTS: dict[str, float] = {
-    "direct_evidence": +0.25,
-    "supports": +0.15,
-    "prerequisite_met": +0.10,
-    "consequence_observed": +0.15,
-    "contradicts": -0.20,
-    "weakens": -0.10,
+_DEFAULT_LOG_LR: dict[str, float] = {
+    "direct_evidence":      +2.0,
+    "supports":             +1.0,
+    "consequence_observed": +1.0,
+    "prerequisite_met":     +0.5,
+    "weakens":              -0.5,
+    "contradicts":          -2.0,
+    # S5 cross-source coref (DESIGN.md §4.6) — same calibration scale.
+    # A single shared strong identifier (email, phone, wallet, IMEI…) is
+    # near-decisive; weak identifiers (nickname) accumulate slowly; a
+    # conflicting strong identifier is strong negative evidence.
+    "shared_strong_identifier":      +2.0,
+    "shared_weak_identifier":        +0.5,
+    "conflicting_strong_identifier": -2.0,
 }
 
+
+# DESIGN.md §4.6 identifier taxonomy. Strong identifiers approximate
+# global uniqueness — sharing one is high-confidence coref evidence.
+# Weak identifiers are nicknames / display names — accumulate via Bayes.
+STRONG_IDENTIFIER_TYPES: set[str] = {
+    "email",
+    "phone_number",
+    "imei",
+    "imsi",
+    "apple_id",
+    "icloud_id",
+    "google_account",
+    "wallet_address",
+    "udid",
+    "mac_address",
+    "device_serial",
+}
+
+WEAK_IDENTIFIER_TYPES: set[str] = {
+    "nickname",
+    "display_name",
+    "username",
+    "screen_name",
+}
+
+
+def is_strong_identifier(identifier_type: str) -> bool:
+    """True if the identifier carries enough uniqueness for a strong LR edge."""
+    return identifier_type in STRONG_IDENTIFIER_TYPES
+
+
+def _normalize_identifier(identifier_type: str, value: str) -> str:
+    """Canonicalise an identifier value so trivial spelling variants match.
+
+    - Lowercase for case-insensitive identifiers (email, hostnames, hex).
+    - Strip whitespace and the leading '+' on phone numbers / international
+      dialling, then keep only digits for phone matching.
+    - Pass-through for free-form strings (nicknames).
+    """
+    v = (value or "").strip()
+    if identifier_type in {"email", "apple_id", "icloud_id", "google_account",
+                            "mac_address", "wallet_address", "udid",
+                            "imei", "imsi", "device_serial"}:
+        v = v.lower()
+    if identifier_type == "phone_number":
+        import re as _re
+        v = _re.sub(r"\D", "", v)
+    return v
+
+
+def prob_to_log_odds(p: float) -> float:
+    """Logit (base 10). Clipped to keep ±∞ out of the graph."""
+    p = max(1e-9, min(1 - 1e-9, float(p)))
+    import math
+    return math.log10(p / (1.0 - p))
+
+
+def log_odds_to_prob(log_odds: float) -> float:
+    """Inverse of :func:`prob_to_log_odds`: 1 / (1 + 10^(−L))."""
+    return 1.0 / (1.0 + 10.0 ** (-float(log_odds)))
+
+
+_WS_RUN = re.compile(r"\s+")
+
+
+def _normalize_for_grounding(s: str) -> str:
+    """Canonicalise a string for the loose-match branch of fact grounding.
+
+    Strict ``value in inv.output`` rejected real evidence because the LLM
+    routinely normalises tool output before quoting:
+      - case-folds hex (``89 50 4e 47`` → ``89 50 4E 47``)
+      - flips path separators (``Sunny\\foo.exe`` → ``Sunny/foo.exe``)
+      - collapses whitespace across newlines (``AppleID:\\n  alice@x.com``
+        → ``AppleID: alice@x.com``)
+
+    None of those are hallucinations — they're presentation choices.  This
+    normaliser does the inverse so both sides line up:
+      - lowercase everything (handles hex case + email case + MAC case)
+      - collapse any run of whitespace to a single space
+      - replace ``\\`` with ``/`` (path-sep flip)
+
+    Genuine fabrications still fail: a value that doesn't appear (in any
+    form) inside the output normalises to a string that isn't a substring
+    of the normalised output, and the gateway rejects exactly as before.
+    """
+    if not s:
+        return ""
+    s = s.lower().replace("\\", "/")
+    s = _WS_RUN.sub(" ", s)
+    return s.strip()
+
+class GroundingError(ValueError):
+    """Raised by the add_phenomenon gateway when one or more verified_facts
+    fail the grounding check (missing/wrong invocation_id, wrong agent or
+    task, or fact.value not present in the cited tool output).
+
+    Carries the failed facts so callers (BaseAgent) can format them back
+    to the LLM for a corrective retry.
+    """
+
+    def __init__(self, message: str, failures: list[dict]) -> None:
+        super().__init__(message)
+        self.failures = failures
+
+
 # All valid edge types across the graph.
 VALID_EDGE_TYPES: set[str] = {
     # Phenomenon → Hypothesis
     "direct_evidence", "supports", "prerequisite_met",
     "consequence_observed", "contradicts", "weakens",
+    # Phenomenon → Hypothesis (S5 coref-specific — used between identifier
+    # observation phenomena and the "Entity A ≡ Entity B" coref hypothesis)
+    "shared_strong_identifier", "shared_weak_identifier",
+    "conflicting_strong_identifier",
     # Phenomenon → Phenomenon
     "temporal", "causal", "input_to", "modifies", "co_located", "corroborates",
     # Phenomenon → Entity
@@ -44,6 +189,8 @@ VALID_EDGE_TYPES: set[str] = {
     "associated_with", "found_on", "used_by",
     # Hypothesis → Hypothesis
     "refines", "conflicts", "depends_on",
+    # Entity → Entity (S5 — backed by a coref hypothesis ≥ threshold)
+    "same_as",
 }
 
 
@@ -55,22 +202,30 @@ def _compute_quality_score(
     source_tool: str,
     timestamp: str | None,
     raw_data: dict,
-    description: str,
+    interpretation: str,
+    verified_facts: list[dict],
     related_ids: list[str],
 ) -> float:
-    """Compute a quality score (0.0-1.0) based on evidence completeness."""
+    """Compute a quality score (0.0-1.0) based on evidence completeness.
+
+    A grounded phenomenon (any verified_facts) outweighs a long free-text
+    interpretation: the facts carry provenance, the interpretation doesn't.
+    """
     score = 0.0
     if source_tool:
-        score += 0.25
-    if timestamp is not None:
         score += 0.20
-    if raw_data:
-        score += 0.25
-    if len(description) >= 50:
+    if timestamp is not None:
         score += 0.15
+    if raw_data:
+        score += 0.15
+    if verified_facts:
+        # Capped contribution: 0.05 per fact up to 0.25.
+        score += min(0.25, 0.05 * len(verified_facts))
+    if len(interpretation) >= 50:
+        score += 0.10
     if related_ids:
         score += 0.15
-    return score
+    return min(1.0, score)
 
 
 def _jaccard_similarity(a: str, b: str) -> float:
@@ -84,17 +239,29 @@ def _jaccard_similarity(a: str, b: str) -> float:
 
 @dataclass
 class Phenomenon:
-    """Raw observable artifact found on disk."""
+    """Raw observable artifact found on disk.
+
+    DESIGN.md §4.4: a phenomenon is split into provenance-bound *facts*
+    and free-text *interpretation*. The gateway hard-validates every
+    fact against the recorded tool invocation it cites; interpretation
+    is the agent's narrative and is rendered as "agent analysis" in the
+    final report — not as truth.
+    """
 
     id: str                     # "ph-{uuid8}"
     source_agent: str
     category: str               # filesystem, registry, email, network, timeline
     title: str
-    description: str
+    # Free-form analysis text — the agent's reasoning. NOT verified.
+    interpretation: str = ""
+    # Grounded atoms. Each fact: {type, value, invocation_id}.
+    # type ∈ {path, timestamp, inode, hash, identifier, count, raw, ...}
+    verified_facts: list[dict] = field(default_factory=list)
     raw_data: dict = field(default_factory=dict)
     timestamp: str | None = None
     confidence: float = 1.0
     source_tool: str = ""
+    source_id: str = ""         # id of the EvidenceSource this finding came from
     corroborating_agents: list[str] = field(default_factory=list)
     from_lead_id: str | None = None
     created_at: str = ""
@@ -104,46 +271,98 @@ class Phenomenon:
 
     @classmethod
     def from_dict(cls, d: dict) -> Phenomenon:
-        return cls(**d)
+        """Reconstruct from a dict; migrate legacy ``description`` field.
+
+        Older runs persisted free text in ``description``; treat that as
+        ``interpretation`` so old graph_state.json files keep loading.
+        """
+        d = dict(d)
+        if "description" in d:
+            legacy = d.pop("description")
+            d.setdefault("interpretation", legacy or "")
+        d.setdefault("verified_facts", [])
+        known = set(cls.__dataclass_fields__)
+        return cls(**{k: v for k, v in d.items() if k in known})
 
     def summary(self) -> str:
         ts = f" @ {self.timestamp}" if self.timestamp else ""
-        return f"[{self.id}] [{self.category}] {self.title}{ts} (conf={self.confidence:.2f})"
+        nf = len(self.verified_facts)
+        facts_note = f" facts={nf}" if nf else ""
+        return (
+            f"[{self.id}] [{self.category}] {self.title}{ts} "
+            f"(conf={self.confidence:.2f}{facts_note})"
+        )
 
 
 @dataclass
 class Hypothesis:
-    """Interpretive claim about what happened on the system."""
+    """Interpretive claim about what happened on the system.
+
+    Confidence is a *derived* projection of ``log_odds`` (DESIGN.md §4.5):
+    every Phenomenon→Hypothesis edge contributes log₁₀(LR) to ``log_odds``,
+    and ``confidence = 1 / (1 + 10^(−log_odds))``. ``log_odds`` is the
+    canonical state; ``confidence`` is kept in sync for display and
+    threshold checks (≥0.8 supported / ≤0.2 refuted).
+
+    ``prior_prob`` seeds the starting log_odds (default 0.5 → 0.0).
+    """
 
     id: str                     # "hyp-{uuid8}"
     title: str
     description: str
-    confidence: float = 0.5
+    prior_prob: float = 0.5
+    log_odds: float = 0.0
+    confidence: float = 0.5     # derived from log_odds — kept in sync on update
     status: str = "active"      # active, supported, refuted, inconclusive
     parent_id: str | None = None
     created_by: str = ""        # "manual", "hypothesis_agent", agent name
     created_at: str = ""
     confidence_log: list[dict] = field(default_factory=list)
+    # S5 coref-specific: pair of entity ids this hypothesis claims are the
+    # same actor. Lets update_hypothesis_confidence sync the backing
+    # ``same_as`` edge automatically when contradicting evidence arrives.
+    coref_entity_pair: list[str] = field(default_factory=list)
 
     def to_dict(self) -> dict:
         return asdict(self)
 
     @classmethod
     def from_dict(cls, d: dict) -> Hypothesis:
-        return cls(**d)
+        """Reconstruct from a dict. Migrates pre-S3 records that only had
+        ``confidence`` by deriving ``log_odds`` via the logit transform.
+        """
+        d = dict(d)
+        if "log_odds" not in d:
+            d["log_odds"] = prob_to_log_odds(d.get("confidence", 0.5))
+        d.setdefault("prior_prob", 0.5)
+        # Re-sync confidence from log_odds in case of drift in old files.
+        d["confidence"] = log_odds_to_prob(d["log_odds"])
+        known = set(cls.__dataclass_fields__)
+        return cls(**{k: v for k, v in d.items() if k in known})
 
     def summary(self) -> str:
-        return f"[{self.id}] {self.title} (conf={self.confidence:.2f}, {self.status})"
+        return (
+            f"[{self.id}] {self.title} "
+            f"(conf={self.confidence:.2f}, L={self.log_odds:+.2f}, {self.status})"
+        )
 
 
 @dataclass
 class Entity:
-    """Recurring actor or object across phenomena."""
+    """Recurring actor or object across phenomena.
+
+    DESIGN.md §4.6 attaches typed identifiers directly to the entity for
+    fast blocking lookups during coref. Each identifier entry:
+        {type, value, normalized, strong, invocation_id, phenomenon_id, observed_at}
+    where ``normalized`` is the canonicalised form used for matching
+    (lower-cased email, digits-only phone, …).
+    """
 
     id: str                     # "ent-{uuid8}"
     name: str
     entity_type: str            # person, program, file, host, ip_address
     description: str = ""
+    identifiers: list[dict] = field(default_factory=list)
     created_at: str = ""
 
     def to_dict(self) -> dict:
@@ -151,10 +370,29 @@ class Entity:
 
     @classmethod
     def from_dict(cls, d: dict) -> Entity:
-        return cls(**d)
+        d = dict(d)
+        d.setdefault("identifiers", [])
+        known = set(cls.__dataclass_fields__)
+        return cls(**{k: v for k, v in d.items() if k in known})
+
+    def has_identifier(self, identifier_type: str, normalized_value: str) -> bool:
+        return any(
+            i.get("type") == identifier_type
+            and i.get("normalized") == normalized_value
+            for i in self.identifiers
+        )
 
     def summary(self) -> str:
-        return f"[{self.id}] {self.entity_type}: {self.name}"
+        idents = ""
+        if self.identifiers:
+            top = self.identifiers[:3]
+            preview = ", ".join(f"{i.get('type')}={i.get('value')}" for i in top)
+            extra = (
+                f" (+{len(self.identifiers) - 3} more)"
+                if len(self.identifiers) > 3 else ""
+            )
+            idents = f" [{preview}{extra}]"
+        return f"[{self.id}] {self.entity_type}: {self.name}{idents}"
 
 
 @dataclass
@@ -261,6 +499,41 @@ class ExtractedAsset:
         )
 
 
+@dataclass
+class ToolInvocation:
+    """One recorded tool call — the provenance unit for grounded facts.
+
+    Every wrapped tool executor records a ToolInvocation when it runs. The
+    grounding gateway looks these up by id when validating that a fact in
+    an ``add_phenomenon`` call traces back to a real tool output. Persisted
+    with the graph so a re-loaded run can still verify provenance.
+    """
+
+    id: str                     # "inv-{uuid8}"
+    tool: str                   # tool name as registered in TOOL_CATALOG
+    args: dict                  # kwargs passed to the executor
+    output: str                 # the raw output string the tool produced
+    output_sha256: str          # hexdigest — tamper-evident hash of output
+    agent: str                  # agent that issued the call
+    task_id: str                # agent run scope (graph._current_task_id at call time)
+    source_id: str              # active evidence source at call time
+    created_at: str             # ISO timestamp
+    cached: bool = False        # served from result cache without re-running
+
+    def to_dict(self) -> dict:
+        return asdict(self)
+
+    @classmethod
+    def from_dict(cls, d: dict) -> ToolInvocation:
+        return cls(**d)
+
+    def summary(self) -> str:
+        return (
+            f"[{self.id}] {self.tool}({json.dumps(self.args, ensure_ascii=False)}) "
+            f"@{self.created_at} agent={self.agent} cached={self.cached}"
+        )
+
+
 # ---------------------------------------------------------------------------
 # Evidence Graph
 # ---------------------------------------------------------------------------
@@ -277,16 +550,25 @@ class EvidenceGraph:
         self,
         case_info: dict | None = None,
         persist_path: Path | None = None,
-        edge_weights: dict[str, float] | None = None,
+        edge_log_lr: dict[str, float] | None = None,
     ) -> None:
         self.case_info: dict = case_info or {}
-        self.edge_weights: dict[str, float] = (
-            dict(edge_weights) if edge_weights else dict(_DEFAULT_EDGE_WEIGHTS)
+        # log₁₀(LR) per edge type — calibration table for confidence updates.
+        # Renamed from edge_weights (S3): the values are no longer deltas in
+        # confidence space, they are log-likelihood ratios in odds space.
+        self.edge_log_lr: dict[str, float] = (
+            dict(edge_log_lr) if edge_log_lr else dict(_DEFAULT_LOG_LR)
         )
         self.image_path: str = ""
         self.partition_offset: int = 0
         self.extracted_dir: str = "extracted"
 
+        # Multi-evidence: the case and the source tools/phenomena bind to.
+        # image_path / partition_offset above mirror active_source for
+        # backward-compatible readers; set_active_source keeps them in sync.
+        self.case: Case | None = None
+        self.active_source: EvidenceSource | None = None
+
         # Graph storage
         self.phenomena: dict[str, Phenomenon] = {}
         self.hypotheses: dict[str, Hypothesis] = {}
@@ -310,12 +592,43 @@ class EvidenceGraph:
         # gap-analysis coverage check.
         self.investigation_areas: dict[str, InvestigationArea] = {}
 
-        # Set by BaseAgent.run() before each agent execution
-        self._current_agent: str = ""
+        # Tool invocations — provenance log for grounded facts. Every wrapped
+        # tool executor records one entry; add_phenomenon's grounding gateway
+        # looks them up to validate cited invocation_ids and substring-match
+        # claimed fact values against real tool outputs.
+        self.tool_invocations: dict[str, ToolInvocation] = {}
+
+        # _current_agent / _current_task_id are exposed as @property below,
+        # backed by module-level ContextVars (race-free under asyncio.gather).
 
         self._lock = asyncio.Lock()
         self._persist_path: Path | None = persist_path
 
+    # ---- Per-asyncio-task scoped state ---------------------------------------
+    #
+    # Reads/writes through these properties hit ContextVars rather than
+    # instance attributes. Concurrent agent runs (Phase 3 parallel
+    # dispatch) each have their own task-local context, so writes inside
+    # one agent's run() are invisible to siblings — which means
+    # ``record_tool_invocation`` always tags an invocation with the agent
+    # and task scope that actually issued it.
+
+    @property
+    def _current_agent(self) -> str:
+        return _current_agent_ctx.get()
+
+    @_current_agent.setter
+    def _current_agent(self, value: str) -> None:
+        _current_agent_ctx.set(value or "")
+
+    @property
+    def _current_task_id(self) -> str:
+        return _current_task_id_ctx.get()
+
+    @_current_task_id.setter
+    def _current_task_id(self, value: str) -> None:
+        _current_task_id_ctx.set(value or "")
+
     # ---- Persistence -------------------------------------------------------
 
     def _auto_save(self) -> None:
@@ -325,6 +638,10 @@ class EvidenceGraph:
         try:
             state = {
                 "case_info": self.case_info,
+                "case": self.case.to_dict() if self.case else None,
+                "active_source_id": (
+                    self.active_source.id if self.active_source else ""
+                ),
                 "image_path": self.image_path,
                 "partition_offset": self.partition_offset,
                 "extracted_dir": self.extracted_dir,
@@ -338,6 +655,9 @@ class EvidenceGraph:
                 "investigation_areas": {
                     aid: a.to_dict() for aid, a in self.investigation_areas.items()
                 },
+                "tool_invocations": {
+                    iid: inv.to_dict() for iid, inv in self.tool_invocations.items()
+                },
                 "saved_at": datetime.now().isoformat(),
             }
             tmp = self._persist_path.with_suffix(".tmp")
@@ -357,18 +677,32 @@ class EvidenceGraph:
     def load_state(
         cls,
         path: Path,
-        edge_weights: dict[str, float] | None = None,
+        edge_log_lr: dict[str, float] | None = None,
     ) -> EvidenceGraph:
         """Restore an EvidenceGraph from a saved JSON state file."""
         data = json.loads(path.read_text())
         graph = cls(
             case_info=data.get("case_info", {}),
             persist_path=path,
-            edge_weights=edge_weights,
+            edge_log_lr=edge_log_lr,
         )
         graph.image_path = data.get("image_path", "")
         graph.partition_offset = data.get("partition_offset", 0)
         graph.extracted_dir = data.get("extracted_dir", "extracted")
+
+        # Restore the evidence-source model. State files predating the Case
+        # model carry only image_path/partition_offset → wrap as one source.
+        case_data = data.get("case")
+        if case_data:
+            graph.case = Case.from_dict(case_data)
+        elif graph.image_path:
+            graph.case = single_source_case(
+                graph.image_path, graph.partition_offset
+            )
+        if graph.case and graph.case.sources:
+            active = graph.case.get_source(data.get("active_source_id", ""))
+            graph.set_active_source(active or graph.case.sources[0])
+
         graph.phenomena = {
             pid: Phenomenon.from_dict(p)
             for pid, p in data.get("phenomena", {}).items()
@@ -392,6 +726,10 @@ class EvidenceGraph:
             aid: InvestigationArea.from_dict(a)
             for aid, a in data.get("investigation_areas", {}).items()
         }
+        graph.tool_invocations = {
+            iid: ToolInvocation.from_dict(inv)
+            for iid, inv in data.get("tool_invocations", {}).items()
+        }
         graph._rebuild_adjacency()
         logger.info(
             "EvidenceGraph restored: %d phenomena, %d hypotheses, %d entities, "
@@ -409,6 +747,21 @@ class EvidenceGraph:
             self._adj.setdefault(edge.source_id, []).append(edge)
             self._adj_rev.setdefault(edge.target_id, []).append(edge)
 
+    # ---- Evidence source ----------------------------------------------------
+
+    def set_active_source(self, source: EvidenceSource | None) -> None:
+        """Bind tools and newly recorded phenomena to *source*.
+
+        Syncs the legacy image_path / partition_offset fields so existing
+        readers (orchestrator logs, report naming, agent prompts) keep
+        working unchanged. The orchestrator calls this before dispatching an
+        agent; single-source runs call it once at startup.
+        """
+        self.active_source = source
+        if source is not None:
+            self.image_path = source.path
+            self.partition_offset = source.partition_offset
+
     # ---- Node helpers -------------------------------------------------------
 
     def _node_exists(self, node_id: str) -> bool:
@@ -432,7 +785,7 @@ class EvidenceGraph:
     # ---- Similarity merging (Phenomenon only) --------------------------------
 
     def _find_similar_phenomenon(
-        self, title: str, description: str, category: str,
+        self, title: str, interpretation: str, category: str,
     ) -> Phenomenon | None:
         best_match: Phenomenon | None = None
         best_score = 0.0
@@ -442,7 +795,9 @@ class EvidenceGraph:
             title_sim = _jaccard_similarity(ph.title, title)
             if title_sim <= 0.6:
                 continue
-            desc_sim = _jaccard_similarity(ph.description[:200], description[:200])
+            desc_sim = _jaccard_similarity(
+                ph.interpretation[:200], interpretation[:200],
+            )
             if desc_sim <= 0.4:
                 continue
             combined = title_sim * 0.6 + desc_sim * 0.4
@@ -458,19 +813,54 @@ class EvidenceGraph:
         source_agent: str,
         category: str,
         title: str,
-        description: str,
+        interpretation: str = "",
+        verified_facts: list[dict] | None = None,
         raw_data: dict | None = None,
         timestamp: str | None = None,
         source_tool: str = "",
         from_lead_id: str | None = None,
+        task_id: str | None = None,
+        # Pre-S2 callers passed analysis text as ``description``. Accept it
+        # as an alias for ``interpretation`` so legacy tests and any in-flight
+        # tool-call messages don't break. Not advertised in the LLM-facing
+        # tool schema — BaseAgent's add_phenomenon advertises the new fields.
+        description: str | None = None,
     ) -> tuple[str, bool]:
-        """Add a phenomenon. Returns (id, was_merged).
+        """Add a phenomenon under the grounding gateway. Returns (id, was_merged).
 
-        Confidence is auto-computed from evidence completeness (source_tool,
-        timestamp, raw_data, description length).
+        Each fact in ``verified_facts`` must point at a real ToolInvocation
+        made by this agent within ``task_id`` (defaults to the graph's
+        current task scope). Any fact failing grounding raises
+        :class:`GroundingError` — the whole call is rejected; the caller
+        must fix and retry. This is the code-level enforcement of
+        DESIGN.md §4.4.
         """
+        if description and not interpretation:
+            interpretation = description
+        facts = list(verified_facts or [])
+        active_task_id = task_id if task_id is not None else self._current_task_id
+
+        # Grounding gateway — validate every fact BEFORE acquiring the lock
+        # (read-only check; lookup uses dict access which is thread-safe).
+        failures: list[dict] = []
+        for fact in facts:
+            ok, reason = self.validate_fact_grounding(
+                fact, agent=source_agent, task_id=active_task_id or "",
+            )
+            if not ok:
+                failures.append({"fact": fact, "reason": reason})
+        if failures:
+            raise GroundingError(
+                "Phenomenon rejected — one or more facts are not grounded:\n"
+                + "\n".join(
+                    f"  - {f['reason']}: {json.dumps(f['fact'], ensure_ascii=False)}"
+                    for f in failures
+                ),
+                failures=failures,
+            )
+
         async with self._lock:
-            similar = self._find_similar_phenomenon(title, description, category)
+            similar = self._find_similar_phenomenon(title, interpretation, category)
             if similar is not None:
                 similar.confidence = min(1.0, similar.confidence + 0.15)
                 if source_agent not in similar.corroborating_agents:
@@ -479,6 +869,18 @@ class EvidenceGraph:
                     for k, v in raw_data.items():
                         if k not in similar.raw_data:
                             similar.raw_data[k] = v
+                # Merge any new facts whose (type, value, invocation_id)
+                # tuple isn't already on the existing phenomenon.
+                if facts:
+                    seen = {
+                        (f.get("type"), f.get("value"), f.get("invocation_id"))
+                        for f in similar.verified_facts
+                    }
+                    for f in facts:
+                        key = (f.get("type"), f.get("value"), f.get("invocation_id"))
+                        if key not in seen:
+                            similar.verified_facts.append(f)
+                            seen.add(key)
                 if from_lead_id and similar.from_lead_id is None:
                     similar.from_lead_id = from_lead_id
                 self._auto_save()
@@ -487,18 +889,20 @@ class EvidenceGraph:
             pid = f"ph-{uuid.uuid4().hex[:8]}"
             confidence = _compute_quality_score(
                 source_tool, timestamp, raw_data or {},
-                description, [],
+                interpretation, facts, [],
             )
             ph = Phenomenon(
                 id=pid,
                 source_agent=source_agent,
                 category=category,
                 title=title,
-                description=description,
+                interpretation=interpretation,
+                verified_facts=facts,
                 raw_data=raw_data or {},
                 timestamp=timestamp,
                 confidence=confidence,
                 source_tool=source_tool,
+                source_id=self.active_source.id if self.active_source else "",
                 from_lead_id=from_lead_id,
                 created_at=datetime.now().isoformat(),
             )
@@ -512,15 +916,24 @@ class EvidenceGraph:
         description: str,
         created_by: str = "",
         parent_id: str | None = None,
+        prior_prob: float = 0.5,
     ) -> str:
-        """Add a hypothesis. Returns the hypothesis ID."""
+        """Add a hypothesis. Returns the hypothesis ID.
+
+        ``prior_prob`` seeds the starting log_odds (default 0.5 → 0.0).
+        Pick a different prior when you have base-rate knowledge — e.g.
+        prior_prob=0.1 for an unusual claim, 0.9 for a strong default.
+        """
         async with self._lock:
             hid = f"hyp-{uuid.uuid4().hex[:8]}"
+            l_prior = prob_to_log_odds(prior_prob)
             hyp = Hypothesis(
                 id=hid,
                 title=title,
                 description=description,
-                confidence=0.5,
+                prior_prob=prior_prob,
+                log_odds=l_prior,
+                confidence=log_odds_to_prob(l_prior),
                 status="active",
                 parent_id=parent_id,
                 created_by=created_by,
@@ -593,16 +1006,25 @@ class EvidenceGraph:
         edge_type: str,
         reason: str = "",
     ) -> float:
-        """Update hypothesis confidence based on a phenomenon linkage.
+        """Apply one phenomenon→hypothesis edge as an additive log_odds update.
 
-        The edge_type must be one of self.edge_weights keys.
-        Weight is looked up from the configured table, NOT judged by LLM.
-        Returns the new confidence value.
+        DESIGN.md §4.5: edge_type → log₁₀(LR) is looked up in
+        ``self.edge_log_lr`` (LLM never emits the number). The update is
+        ``L_post = L_prior + log_lr`` and ``confidence = sigmoid(L_post)``
+        — commutative and order-independent, fixing the pre-S3 ordering
+        bug. Status flips at ≥0.8 → supported / ≤0.2 → refuted.
+
+        **Idempotency**: if a ``(phenomenon, hypothesis, edge_type)`` edge
+        already exists, this is a no-op — the same agent re-recording the
+        same link (or two agents linking via the orchestrator's batch
+        judge and a manual override) does not double-count. Independent
+        evidence — *different* phenomena pointing the same way — still
+        accumulates fully.
         """
-        if edge_type not in self.edge_weights:
+        if edge_type not in self.edge_log_lr:
             raise ValueError(
                 f"Invalid hypothesis edge type: {edge_type}. "
-                f"Must be one of: {list(self.edge_weights.keys())}"
+                f"Must be one of: {list(self.edge_log_lr.keys())}"
             )
 
         async with self._lock:
@@ -612,27 +1034,37 @@ class EvidenceGraph:
             if hyp is None:
                 raise ValueError(f"Hypothesis not found: {hyp_id}")
 
-            weight = self.edge_weights[edge_type]
+            # Idempotency check — same (ph, hyp, edge_type) already on graph.
+            for existing in self._adj.get(phenomenon_id, []):
+                if (
+                    existing.target_id == hyp_id
+                    and existing.edge_type == edge_type
+                ):
+                    return hyp.confidence
+
+            log_lr = self.edge_log_lr[edge_type]
+            old_log_odds = hyp.log_odds
             old_conf = hyp.confidence
+            new_log_odds = old_log_odds + log_lr
+            new_conf = log_odds_to_prob(new_log_odds)
 
-            if weight > 0:
-                delta = weight * (1 - old_conf)
-            else:
-                delta = weight * old_conf
-
-            new_conf = max(0.0, min(1.0, old_conf + delta))
+            hyp.log_odds = new_log_odds
             hyp.confidence = new_conf
 
             if new_conf >= 0.8:
                 hyp.status = "supported"
             elif new_conf <= 0.2:
                 hyp.status = "refuted"
+            else:
+                hyp.status = "active"
 
             hyp.confidence_log.append({
                 "timestamp": datetime.now().isoformat(),
                 "phenomenon_id": phenomenon_id,
                 "edge_type": edge_type,
-                "weight": weight,
+                "log_lr": log_lr,
+                "old_log_odds": round(old_log_odds, 4),
+                "new_log_odds": round(new_log_odds, 4),
                 "old_confidence": round(old_conf, 4),
                 "new_confidence": round(new_conf, 4),
                 "reason": reason,
@@ -645,7 +1077,7 @@ class EvidenceGraph:
                 source_id=phenomenon_id,
                 target_id=hyp_id,
                 edge_type=edge_type,
-                metadata={"reason": reason},
+                metadata={"reason": reason, "log_lr": log_lr},
                 created_by="hypothesis_engine",
                 created_at=datetime.now().isoformat(),
             )
@@ -654,7 +1086,381 @@ class EvidenceGraph:
             self._adj_rev.setdefault(hyp_id, []).append(edge)
 
             self._auto_save()
-            return new_conf
+
+        # If this is a coref hypothesis, mirror the new confidence into the
+        # entity-level same_as edge. Done OUTSIDE the lock — _sync_same_as_edge
+        # re-acquires it internally — so we avoid reentrant locking.
+        if hyp.coref_entity_pair and len(hyp.coref_entity_pair) == 2:
+            await self._sync_same_as_edge(
+                hyp.coref_entity_pair[0],
+                hyp.coref_entity_pair[1],
+                hyp_id,
+            )
+        return new_conf
+
+    # ---- Cross-source entity coreference (DESIGN.md §4.6) -------------------
+
+    @staticmethod
+    def _coref_hypothesis_id(eid_a: str, eid_b: str) -> str:
+        """Deterministic id for the coref hypothesis between an entity pair.
+
+        Same pair (regardless of arg order) always maps to the same id so
+        repeated observations augment the existing hypothesis rather than
+        spawning duplicates.
+        """
+        pair = "|".join(sorted([eid_a, eid_b]))
+        return f"hyp-coref-{hashlib.sha256(pair.encode()).hexdigest()[:10]}"
+
+    async def get_or_create_coref_hypothesis(
+        self, eid_a: str, eid_b: str,
+    ) -> tuple[str, bool]:
+        """Look up (or insert) the coreference hypothesis for an entity pair.
+
+        Uses a low prior (``prior_prob=0.1``) — saying any two entities are
+        the same actor is a strong claim, so the default should be
+        skeptical and let evidence move the needle.
+        """
+        hid = self._coref_hypothesis_id(eid_a, eid_b)
+        async with self._lock:
+            if hid in self.hypotheses:
+                return hid, False
+            ea = self.entities.get(eid_a)
+            eb = self.entities.get(eid_b)
+            if ea is None or eb is None:
+                raise ValueError(f"Unknown entity in coref pair: {eid_a}, {eid_b}")
+            l_prior = prob_to_log_odds(0.1)
+            self.hypotheses[hid] = Hypothesis(
+                id=hid,
+                title=f"Coreference: {ea.name} ≡ {eb.name}",
+                description=(
+                    f"Hypothesis that {ea.id} ({ea.name}, {ea.entity_type}) "
+                    f"and {eb.id} ({eb.name}, {eb.entity_type}) refer to "
+                    f"the same actor across evidence sources."
+                ),
+                prior_prob=0.1,
+                log_odds=l_prior,
+                confidence=log_odds_to_prob(l_prior),
+                status="active",
+                created_by="coref_engine",
+                created_at=datetime.now().isoformat(),
+                coref_entity_pair=sorted([eid_a, eid_b]),
+            )
+            self._auto_save()
+        return hid, True
+
+    async def _sync_same_as_edge(
+        self, eid_a: str, eid_b: str, hyp_id: str,
+    ) -> None:
+        """Mirror coref hypothesis confidence into a ``same_as`` entity edge.
+
+        - Confidence ≥ 0.8 → ensure an active ``same_as`` edge exists.
+        - Confidence  < 0.8 → mark any existing edge inactive (audit, not delete).
+        Idempotent on both transitions.
+        """
+        hyp = self.hypotheses.get(hyp_id)
+        if hyp is None:
+            return
+        active = hyp.confidence >= 0.8
+        async with self._lock:
+            existing = None
+            for edge in self.edges:
+                if (edge.edge_type == "same_as"
+                    and {edge.source_id, edge.target_id} == {eid_a, eid_b}):
+                    existing = edge
+                    break
+            if active:
+                if existing is None:
+                    eid = f"edge-{uuid.uuid4().hex[:8]}"
+                    edge = Edge(
+                        id=eid,
+                        source_id=eid_a,
+                        target_id=eid_b,
+                        edge_type="same_as",
+                        metadata={
+                            "backed_by": hyp_id,
+                            "active": True,
+                            "confidence_at_creation": hyp.confidence,
+                        },
+                        created_by="coref_engine",
+                        created_at=datetime.now().isoformat(),
+                    )
+                    self.edges.append(edge)
+                    self._adj.setdefault(eid_a, []).append(edge)
+                    self._adj_rev.setdefault(eid_b, []).append(edge)
+                elif not existing.metadata.get("active"):
+                    existing.metadata["active"] = True
+                    existing.metadata["reactivated_at"] = datetime.now().isoformat()
+            else:
+                if existing is not None and existing.metadata.get("active"):
+                    existing.metadata["active"] = False
+                    existing.metadata["deactivated_at"] = datetime.now().isoformat()
+            self._auto_save()
+
+    async def observe_identity(
+        self,
+        entity_name: str,
+        entity_type: str,
+        identifier_type: str,
+        value: str,
+        source_agent: str,
+        invocation_id: str,
+        source_tool: str = "",
+        task_id: str | None = None,
+    ) -> dict:
+        """Record a typed identifier for an entity through the grounding gateway.
+
+        DESIGN.md §4.6. Steps:
+
+        1. Validate ``invocation_id`` + ``value`` via the same gateway
+           ``add_phenomenon`` uses (raises :class:`GroundingError` on failure).
+        2. Get-or-create the entity.
+        3. Record an ``identity_observation`` phenomenon carrying the
+           identifier as its sole verified fact.
+        4. Attach the identifier to the entity (idempotent by
+           ``(type, normalized_value)``).
+        5. If the attachment is new, scan other entities for shared
+           identifiers (strong / weak) and any conflicting strong
+           identifiers, then propose / strengthen / weaken the coref
+           hypothesis between each candidate pair. ``same_as`` edges are
+           kept in sync with the hypothesis confidence.
+
+        Returns a dict summarising the entity id, observation phenomenon,
+        whether the identifier was new, and any coref proposals fired.
+        """
+        if identifier_type not in (STRONG_IDENTIFIER_TYPES | WEAK_IDENTIFIER_TYPES):
+            raise ValueError(
+                f"Unknown identifier_type: {identifier_type}. "
+                f"Strong: {sorted(STRONG_IDENTIFIER_TYPES)}; "
+                f"Weak: {sorted(WEAK_IDENTIFIER_TYPES)}."
+            )
+        if not value:
+            raise ValueError("identifier value must be non-empty")
+
+        # add_phenomenon enforces the grounding contract for the fact below.
+        active_task = task_id if task_id is not None else self._current_task_id
+        fact = {"type": identifier_type, "value": value, "invocation_id": invocation_id}
+
+        # Get-or-create entity first so we can attribute the observation.
+        eid, _existed = await self.add_entity(entity_name, entity_type)
+
+        norm = _normalize_identifier(identifier_type, value)
+        title = f"{identifier_type}={value} on {entity_name}"
+        pid, _merged = await self.add_phenomenon(
+            source_agent=source_agent,
+            category="identity_observation",
+            title=title,
+            interpretation=(
+                f"Agent attributed identifier {identifier_type}={value} "
+                f"(normalized={norm}) to entity {entity_name} ({entity_type})."
+            ),
+            verified_facts=[fact],
+            source_tool=source_tool,
+            task_id=active_task,
+        )
+
+        # Attach identifier to entity (idempotent on type + normalized value).
+        new_identifier = False
+        async with self._lock:
+            ent = self.entities[eid]
+            if not ent.has_identifier(identifier_type, norm):
+                ent.identifiers.append({
+                    "type": identifier_type,
+                    "value": value,
+                    "normalized": norm,
+                    "strong": is_strong_identifier(identifier_type),
+                    "invocation_id": invocation_id,
+                    "phenomenon_id": pid,
+                    "observed_at": datetime.now().isoformat(),
+                })
+                new_identifier = True
+            self._auto_save()
+
+        coref_proposals: list[dict] = []
+        if new_identifier:
+            coref_proposals = await self._propose_coref_for_new_identifier(
+                new_eid=eid,
+                new_type=identifier_type,
+                new_norm=norm,
+                new_phenomenon_id=pid,
+            )
+
+        return {
+            "entity_id": eid,
+            "phenomenon_id": pid,
+            "new_identifier": new_identifier,
+            "coref_proposals": coref_proposals,
+        }
+
+    async def _propose_coref_for_new_identifier(
+        self,
+        new_eid: str,
+        new_type: str,
+        new_norm: str,
+        new_phenomenon_id: str,
+    ) -> list[dict]:
+        """Blocking + propose: find candidate entities that share this
+        identifier with ``new_eid``, register / strengthen the coref
+        hypothesis for each pair, and emit conflicting-identifier edges
+        where the two entities have *different* values for the same
+        strong identifier type. O(|entities| × identifiers) — blocking
+        is implicit in the fact that the new identifier is fixed.
+        """
+        new_ent = self.entities.get(new_eid)
+        if new_ent is None:
+            return []
+
+        is_strong_new = is_strong_identifier(new_type)
+        match_edge = "shared_strong_identifier" if is_strong_new else "shared_weak_identifier"
+
+        proposals: list[dict] = []
+
+        for other_eid, other_ent in list(self.entities.items()):
+            if other_eid == new_eid:
+                continue
+
+            # Match: other entity carries the same (type, normalized).
+            if not other_ent.has_identifier(new_type, new_norm):
+                continue
+
+            # Collect conflicting strong identifiers between the pair —
+            # they'll fire negative-LR edges on the same coref hypothesis.
+            conflicts: list[dict] = []
+            for a_ident in new_ent.identifiers:
+                if not a_ident.get("strong"):
+                    continue
+                for b_ident in other_ent.identifiers:
+                    if (b_ident.get("type") == a_ident.get("type")
+                        and b_ident.get("strong")
+                        and b_ident.get("normalized") != a_ident.get("normalized")):
+                        conflicts.append({
+                            "type": a_ident.get("type"),
+                            "new_value": a_ident.get("value"),
+                            "other_value": b_ident.get("value"),
+                            "new_phenomenon_id": a_ident.get("phenomenon_id"),
+                        })
+
+            hid, _created = await self.get_or_create_coref_hypothesis(
+                new_eid, other_eid,
+            )
+
+            # +shared identifier edge (one per identifier, anchored to the
+            # newly recorded observation phenomenon). update_hypothesis_
+            # confidence is idempotent on (ph, hyp, edge_type), so re-running
+            # the same observation does not double-count.
+            await self.update_hypothesis_confidence(
+                hid, new_phenomenon_id, match_edge,
+                reason=f"shared {new_type}={new_norm}",
+            )
+
+            # −conflicting strong identifier edges — one per conflict, anchored
+            # to the *new* entity's observation phenomenon for that identifier.
+            for c in conflicts:
+                ph_src = c["new_phenomenon_id"]
+                if not ph_src:
+                    continue
+                await self.update_hypothesis_confidence(
+                    hid, ph_src, "conflicting_strong_identifier",
+                    reason=(
+                        f"conflict {c['type']}: "
+                        f"{c['new_value']} vs {c['other_value']}"
+                    ),
+                )
+
+            await self._sync_same_as_edge(new_eid, other_eid, hid)
+
+            proposals.append({
+                "hypothesis_id": hid,
+                "other_entity_id": other_eid,
+                "match": {"type": new_type, "normalized": new_norm,
+                          "edge_type": match_edge},
+                "conflicts": conflicts,
+                "confidence": self.hypotheses[hid].confidence,
+            })
+
+        return proposals
+
+    # ---- Cross-source entity cluster queries (DESIGN.md §4.6) ----------------
+
+    def _active_same_as_neighbors(self, entity_id: str) -> set[str]:
+        """Neighbours of *entity_id* via ``same_as`` edges that are still active.
+
+        ``same_as`` edges are non-destructive: a coref hypothesis that drops
+        below threshold marks ``metadata['active']=False`` rather than
+        deleting, so the audit trail survives. Cluster queries respect that.
+        """
+        out: set[str] = set()
+        for edge in self.edges:
+            if edge.edge_type != "same_as":
+                continue
+            if not edge.metadata.get("active", True):
+                continue
+            if edge.source_id == entity_id:
+                out.add(edge.target_id)
+            elif edge.target_id == entity_id:
+                out.add(edge.source_id)
+        return out
+
+    def resolve_actor_cluster(self, entity_id: str) -> set[str]:
+        """Return the connected component containing *entity_id* via active
+        ``same_as`` edges — the set of entity ids that current coref evidence
+        treats as the same actor.
+
+        Reversible: deactivating a ``same_as`` edge (because the backing
+        coref hypothesis drops below 0.8) breaks the component, so this
+        always reflects the *current* state of the graph.
+        """
+        if entity_id not in self.entities:
+            return set()
+        seen: set[str] = {entity_id}
+        frontier: list[str] = [entity_id]
+        while frontier:
+            cur = frontier.pop()
+            for nbr in self._active_same_as_neighbors(cur):
+                if nbr not in seen:
+                    seen.add(nbr)
+                    frontier.append(nbr)
+        return seen
+
+    def actor_clusters(self) -> list[dict]:
+        """Group all entities into actor clusters via active ``same_as``.
+
+        Returns a list of ``{members: [...], identifiers: [...], coref_hypotheses: [...]}``
+        for the report agent and the orchestrator's cross-source views.
+        """
+        unseen = set(self.entities.keys())
+        clusters: list[dict] = []
+        while unseen:
+            start = next(iter(unseen))
+            members = self.resolve_actor_cluster(start)
+            unseen -= members
+            # Aggregate identifiers across the cluster (deduped on type+normalized).
+            ident_seen: set[tuple[str, str]] = set()
+            idents: list[dict] = []
+            for eid in members:
+                for ident in self.entities[eid].identifiers:
+                    key = (ident.get("type"), ident.get("normalized"))
+                    if key in ident_seen:
+                        continue
+                    ident_seen.add(key)
+                    idents.append({
+                        "type": ident.get("type"),
+                        "value": ident.get("value"),
+                        "strong": ident.get("strong"),
+                        "on_entity": eid,
+                    })
+            coref_hyps = sorted({
+                e.metadata.get("backed_by", "")
+                for e in self.edges
+                if e.edge_type == "same_as"
+                and e.metadata.get("active", True)
+                and (e.source_id in members or e.target_id in members)
+            } - {""})
+            clusters.append({
+                "members": sorted(members),
+                "identifiers": idents,
+                "coref_hypotheses": coref_hyps,
+            })
+        return clusters
 
     # ---- Lead management (same as old Blackboard) ----------------------------
 
@@ -754,6 +1560,96 @@ class EvidenceGraph:
             self._auto_save()
             return aid, False
 
+    # ---- Tool invocation log -------------------------------------------------
+
+    async def record_tool_invocation(
+        self,
+        tool: str,
+        args: dict,
+        output: str,
+        cached: bool = False,
+    ) -> str:
+        """Record one tool call. Returns the invocation_id.
+
+        Source / agent / task_id are read from the graph's current run
+        context (set by BaseAgent.run and set_active_source) so executors
+        can stay stateless.
+        """
+        iid = f"inv-{uuid.uuid4().hex[:8]}"
+        src_id = self.active_source.id if self.active_source else ""
+        inv = ToolInvocation(
+            id=iid,
+            tool=tool,
+            args=dict(args),
+            output=output,
+            output_sha256=hashlib.sha256(output.encode("utf-8", errors="replace")).hexdigest(),
+            agent=self._current_agent or "unknown",
+            task_id=self._current_task_id or "",
+            source_id=src_id,
+            created_at=datetime.now().isoformat(),
+            cached=cached,
+        )
+        async with self._lock:
+            self.tool_invocations[iid] = inv
+            # Cheap on cache hit; expensive but bounded otherwise. Skip
+            # auto-save here — too noisy if every tool call rewrites the
+            # state file; the next phenomenon write will flush.
+        return iid
+
+    def validate_fact_grounding(
+        self,
+        fact: dict,
+        agent: str,
+        task_id: str,
+    ) -> tuple[bool, str]:
+        """Check a single verified_fact's grounding. Returns (ok, reason).
+
+        Rules (DESIGN.md §4.4, refined after first end-to-end run):
+          1. invocation_id must exist in self.tool_invocations.
+          2. The invocation must have been made by `agent` within `task_id`.
+          3. fact.value must appear in invocation.output — either as a
+             strict substring, OR (loose-match fallback) once both sides
+             are normalised via :func:`_normalize_for_grounding`
+             (case-folded, whitespace-collapsed, path-sep unified).
+
+        The loose match catches the LLM's routine presentation
+        normalisations (case-folded hex, slash-flipped paths, collapsed
+        multi-line labels) without enabling fabrication: a string that
+        isn't present in ANY form still fails the normalised check.
+        """
+        inv_id = fact.get("invocation_id", "")
+        value = fact.get("value", "")
+        if not inv_id:
+            return False, "missing invocation_id"
+        inv = self.tool_invocations.get(inv_id)
+        if inv is None:
+            return False, f"invocation_id {inv_id} not found in invocation log"
+        if inv.agent != agent:
+            return False, (
+                f"invocation {inv_id} was made by agent '{inv.agent}', "
+                f"not '{agent}' — cannot be cited by a different agent"
+            )
+        if task_id and inv.task_id and inv.task_id != task_id:
+            return False, (
+                f"invocation {inv_id} was made in a different task scope "
+                f"({inv.task_id}) — cite only invocations from your current task"
+            )
+        if not isinstance(value, str) or not value:
+            return False, "fact.value must be a non-empty string"
+        if value in inv.output:
+            return True, "ok"
+        # Loose fallback: normalised comparison absorbs case / whitespace /
+        # path-sep differences but a genuinely absent value still fails.
+        if _normalize_for_grounding(value) in _normalize_for_grounding(inv.output):
+            return True, "ok-normalized"
+        return False, (
+            f"fact.value not found in invocation {inv_id} output — even after "
+            f"case/whitespace/path-sep normalisation. Copy a literal substring "
+            f"from that tool's result; if the content is a guess (device model, "
+            f"constructed path, label-joined value), move it into `interpretation` "
+            f"instead of `verified_facts`."
+        )
+
     # ---- Asset library -------------------------------------------------------
 
     async def register_asset(
@@ -838,7 +1734,12 @@ class EvidenceGraph:
         kw = keyword.lower()
         results = []
         for ph in self.phenomena.values():
-            if kw in ph.title.lower() or kw in ph.description.lower():
+            haystack = (
+                ph.title.lower()
+                + " " + ph.interpretation.lower()
+                + " " + " ".join(str(f.get("value", "")).lower() for f in ph.verified_facts)
+            )
+            if kw in haystack:
                 results.append(ph.summary())
         for hyp in self.hypotheses.values():
             if kw in hyp.title.lower() or kw in hyp.description.lower():
@@ -899,6 +1800,87 @@ class EvidenceGraph:
             if h.status == "active":
                 h.status = "inconclusive"
 
+    # ---- Hypothesis × Evidence matrix (DESIGN.md §4.5) -----------------------
+
+    def hypothesis_evidence_matrix(self) -> dict:
+        """Structured pivot of every Phenomenon→Hypothesis edge.
+
+        Returns ``{"hypotheses": [...], "phenomena": [...], "cells": {...},
+        "counts_by_edge_type": {hyp_id: {edge_type: count}}}`` — the cells
+        map ``(hyp_id, ph_id)`` to a *list* of edge_type strings (a single
+        phenomenon may link via several edge_types after a manual override
+        plus an LLM judge call). Drives report rendering and gap selection.
+        """
+        cells: dict[tuple[str, str], list[str]] = {}
+        counts: dict[str, dict[str, int]] = {h: {} for h in self.hypotheses}
+        for edge in self.edges:
+            if not (
+                edge.source_id.startswith("ph-")
+                and edge.target_id.startswith("hyp-")
+                and edge.edge_type in self.edge_log_lr
+            ):
+                continue
+            key = (edge.target_id, edge.source_id)
+            cells.setdefault(key, []).append(edge.edge_type)
+            counts.setdefault(edge.target_id, {})[edge.edge_type] = (
+                counts.setdefault(edge.target_id, {}).get(edge.edge_type, 0) + 1
+            )
+
+        hypotheses = [
+            {
+                "id": h.id,
+                "title": h.title,
+                "confidence": h.confidence,
+                "log_odds": h.log_odds,
+                "status": h.status,
+            }
+            for h in self.hypotheses.values()
+        ]
+        referenced = {ph_id for (_, ph_id) in cells}
+        phenomena = [
+            {"id": ph.id, "title": ph.title, "category": ph.category}
+            for ph in self.phenomena.values()
+            if ph.id in referenced
+        ]
+        return {
+            "hypotheses": hypotheses,
+            "phenomena": phenomena,
+            "cells": {f"{h}|{p}": types for (h, p), types in cells.items()},
+            "counts_by_edge_type": counts,
+        }
+
+    def hypothesis_evidence_matrix_markdown(self) -> str:
+        """Render the matrix as a compact markdown pivot.
+
+        Columns are the edge types (counts), plus log_odds, confidence,
+        status — enough for the report agent to ground every hypothesis
+        in its supporting and contradicting evidence at a glance.
+        """
+        if not self.hypotheses:
+            return "(no hypotheses)"
+        matrix = self.hypothesis_evidence_matrix()
+        edge_types = sorted(self.edge_log_lr.keys())
+        header = (
+            "| Hypothesis | "
+            + " | ".join(edge_types)
+            + " | log_odds | conf | status |"
+        )
+        sep = (
+            "|---|"
+            + "|".join(["---:"] * len(edge_types))
+            + "|---:|---:|---|"
+        )
+        rows = [header, sep]
+        for h in matrix["hypotheses"]:
+            counts = matrix["counts_by_edge_type"].get(h["id"], {})
+            cells = [str(counts.get(et, 0)) for et in edge_types]
+            rows.append(
+                f"| {h['title']} | "
+                + " | ".join(cells)
+                + f" | {h['log_odds']:+.2f} | {h['confidence']:.2f} | {h['status']} |"
+            )
+        return "\n".join(rows)
+
     # ---- Summary (lightweight, for system prompt) ----------------------------
 
     def stats_summary(self) -> str:
diff --git a/llm_client.py b/llm_client.py
index 6b6e7d3..cee06a9 100644
--- a/llm_client.py
+++ b/llm_client.py
@@ -142,6 +142,12 @@ READ_ONLY_TOOLS: set[str] = {
     # Parser reads
     "read_text_file", "read_binary_preview", "search_text_file",
     "read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
+    "find_files",
+    # iOS plugin reads (S4)
+    "parse_plist", "sqlite_tables", "sqlite_query",
+    "parse_ios_keychain", "read_idevice_info",
+    # Android + media reads (S6) — set_active_partition is NOT read-only.
+    "probe_android_partitions", "ocr_image",
 }
 
 
@@ -503,7 +509,7 @@ class LLMClient:
         tools: list[dict],
         tool_executor: dict[str, Any],
         system: str | None = None,
-        max_iterations: int = 40,
+        max_iterations: int = 60,
         terminal_tools: tuple[str, ...] = (),
     ) -> tuple[str, list[dict]]:
         """Run a tool-calling loop using OpenAI-native tool calls.
diff --git a/main.py b/main.py
index 51a416d..7f57639 100644
--- a/main.py
+++ b/main.py
@@ -15,17 +15,21 @@ from pathlib import Path
 import yaml
 
 from agent_factory import AgentFactory
+from case import (
+    DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case,
+)
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient
 from log_config import setup_logging
 from orchestrator import AnalysisAborted, Orchestrator
 from tool_registry import register_all_tools
+from tools.archive import unzip_archive_sync
 
 RUNS_DIR = Path("runs")
 IMAGE_DIR = Path("image")
-
-# Common forensic image extensions (only first segment / single-file formats)
-_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"]
+# Persistent unpack cache for tree-mode sources (zip extractions). Lives
+# at project root so multiple runs can reuse the same unpacked tree.
+SOURCE_CACHE_DIR = Path(".cache/sources")
 
 
 def load_config(path: str = "config.yaml") -> dict:
@@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict:
 # ---------------------------------------------------------------------------
 
 def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]:
-    """Find forensic disk image files under *search_dir*."""
-    images: set[Path] = set()
-    for glob in _IMAGE_GLOBS:
-        images.update(search_dir.glob(glob))
-    return sorted(images)
+    """Find forensic disk image files under *search_dir* (case-insensitive ext)."""
+    if not search_dir.is_dir():
+        return []
+    return sorted(
+        p for p in search_dir.iterdir()
+        if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS
+    )
 
 
 def _parse_mmls(output: str) -> list[dict]:
@@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
     images = _discover_images(image_dir)
     if not images:
         print(f"No disk images found in {image_dir}/")
-        print("Supported formats: " + ", ".join(_IMAGE_GLOBS))
+        print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS)))
         sys.exit(1)
 
     if len(images) == 1:
@@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
         print("Invalid choice.")
 
 
+def resolve_case() -> Case:
+    """Resolve the Case to analyze.
+
+    Priority: an explicit case file given as a CLI argument, then ./case.yaml
+    in the working directory, then legacy interactive single-image selection.
+    """
+    # 1. Explicit case file passed on the command line
+    if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")):
+        case = load_case(sys.argv[1])
+        if case is None:
+            print(f"Error: could not load case file {sys.argv[1]}")
+            sys.exit(1)
+        print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
+        return case
+
+    # 2. ./case.yaml in the working directory
+    case = load_case()
+    if case is not None:
+        print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
+        return case
+
+    # 3. Legacy interactive single-image selection
+    cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
+    image_path, partition_offset = select_image_interactive(cli_dir)
+    return single_source_case(image_path, partition_offset)
+
+
+def _is_analysable(src: EvidenceSource) -> bool:
+    """A source is analysable when it has a path AND its mode has tooling.
+
+    S4 lights up tree-mode iOS extractions; image-mode disks were already
+    supported. Media-collection (screenshots) remain skipped until S6.
+    """
+    if not src.path:
+        return False
+    if src.access_mode == "image":
+        return True
+    if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
+        return True
+    return False
+
+
+def list_analysable_sources(case: Case) -> list[EvidenceSource]:
+    """Return every analysable source in the case (orchestrator iterates them).
+
+    Pre-S6 main.py used to force-choose one source here; the multi-source
+    orchestrator (Phase 1 per-source triage) now consumes the full list.
+    Skipped sources are still reported for visibility.
+    """
+    analysable = [s for s in case.sources if _is_analysable(s)]
+    skipped = [s for s in case.sources if not _is_analysable(s)]
+    if skipped:
+        print(
+            f"Note: {len(skipped)} source(s) not analysable in this build: "
+            + ", ".join(f"{s.label} ({s.type})" for s in skipped)
+        )
+    if not analysable:
+        print("No analysable sources in this case.")
+        sys.exit(1)
+    print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:")
+    for s in analysable:
+        print(f"  - {s.summary()}")
+    return analysable
+
+
+def prepare_source(src: EvidenceSource) -> EvidenceSource:
+    """Materialise a tree-mode source for analysis.
+
+    Mobile / archive sources arrive as .zip files. We unpack once into a
+    project-level cache (``.cache/sources/<src.id>/``) and rewrite
+    ``src.path`` to point at the unpacked directory. Idempotent — a
+    second run with the cache present is a no-op (unzip_archive_sync
+    skips files that already exist with the matching size).
+
+    Disk-image and already-tree sources pass through unchanged.
+    """
+    if src.access_mode != "tree":
+        return src
+    p = Path(src.path)
+    if p.is_dir():
+        return src  # already a directory, nothing to do
+    if not p.is_file():
+        print(f"Warning: source path {src.path} does not exist; leaving as-is.")
+        return src
+    if p.suffix.lower() != ".zip":
+        # Other archive types (tar, 7z, ...) — not handled yet.
+        print(f"Warning: tree-mode source {src.id} is not a .zip "
+                f"({p.suffix}); leaving as-is.")
+        return src
+
+    dest = SOURCE_CACHE_DIR / src.id
+    dest.mkdir(parents=True, exist_ok=True)
+    # Password-protected zips (e.g. CTF artefacts) carry their key in
+    # case.yaml's meta.password — never logged, never persisted.
+    password = (src.meta or {}).get("password")
+    pw_note = " (password from meta)" if password else ""
+    print(f"Unpacking {p.name} → {dest}{pw_note} (idempotent) ...")
+    result = unzip_archive_sync(str(p), str(dest), password=password)
+    first_line = result.split("\n", 1)[0]
+    print("  " + first_line)
+    if first_line.startswith("Error:"):
+        # Surface the multi-line guidance from _do_extract verbatim.
+        for extra in result.split("\n")[1:]:
+            print("  " + extra)
+        print(f"  Source {src.id} stays unanalysable until this is resolved.")
+        # Leave src.path unchanged so the source remains marked unanalysable.
+        return src
+    src.path = str(dest)
+    src.access_mode = "tree"
+    return src
+
+
 def find_resumable_run() -> Path | None:
     """Find the most recent incomplete run with a saved graph state."""
     if not RUNS_DIR.exists():
@@ -225,22 +343,30 @@ async def async_main() -> None:
 
     # Initialize evidence graph
     if graph is None:
-        # CLI arg takes priority, otherwise interactive prompt
-        cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
-        image_path, partition_offset = select_image_interactive(cli_dir)
+        case = resolve_case()
+        # case_info derived from THIS case's meta (case.yaml), not from
+        # config.yaml's legacy `cfreds_hacking_case` block. Without this,
+        # the old CFReDS evidence MD5s would be embedded in reports for
+        # every subsequent unrelated case.
         graph = EvidenceGraph(
-            case_info=config.get("cfreds_hacking_case", {}),
+            case_info=dict(case.meta or {}),
             persist_path=run_dir / "graph_state.json",
-            edge_weights=config.get("hypothesis_edge_weights"),
+            edge_log_lr=config.get("hypothesis_log_lr"),
         )
-        graph.image_path = image_path
-        graph.partition_offset = partition_offset
+        graph.case = case
         graph.extracted_dir = str(run_dir / "extracted")
+        analysable = list_analysable_sources(case)
+        # Prepare every analysable source up front (unzip tree-mode zips,
+        # etc.). Idempotent on cache hits — second run is a no-op.
+        prepared = [prepare_source(s) for s in analysable]
+        # Seed the active source so tools that resolve lazily have a target
+        # before Phase 1 begins; the orchestrator resets it per source.
+        graph.set_active_source(prepared[0])
     else:
         graph._persist_path = run_dir / "graph_state.json"
 
-    # Register all tools with bound image path
-    register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir)
+    # Register all tools — they resolve the active evidence source at call time
+    register_all_tools(graph)
 
     # Create agent factory
     factory = AgentFactory(llm, graph)
diff --git a/orchestrator.py b/orchestrator.py
index a71ade8..7821ad7 100644
--- a/orchestrator.py
+++ b/orchestrator.py
@@ -10,7 +10,7 @@ import time
 from datetime import datetime
 from pathlib import Path
 
-from agent_factory import AgentFactory
+from agent_factory import AgentFactory, get_triage_agent_type
 from evidence_graph import EvidenceGraph
 from llm_client import LLMClient, _extract_first_balanced, _safe_json_loads
 from tool_registry import TOOL_CATALOG
@@ -518,7 +518,7 @@ class Orchestrator:
         if not unlinked:
             return
 
-        valid_types = list(self.graph.edge_weights.keys())
+        valid_types = list(self.graph.edge_log_lr.keys())
 
         hyp_section = "\n".join(
             f"  [{h.id}] {h.title}: {h.description}" for h in active
@@ -551,7 +551,7 @@ class Orchestrator:
                 if (
                     hyp_id in self.graph.hypotheses
                     and ph_id in self.graph.phenomena
-                    and edge_type in self.graph.edge_weights
+                    and edge_type in self.graph.edge_log_lr
                 ):
                     await self.graph.update_hypothesis_confidence(
                         hyp_id=hyp_id,
@@ -593,7 +593,7 @@ class Orchestrator:
                     ph_id = j.get("phenomenon_id", "")
                     edge_type = j.get("edge_type", "")
                     reason = j.get("reason", "")
-                    if ph_id in self.graph.phenomena and edge_type in self.graph.edge_weights:
+                    if ph_id in self.graph.phenomena and edge_type in self.graph.edge_log_lr:
                         await self.graph.update_hypothesis_confidence(
                             hyp_id=hyp.id,
                             phenomenon_id=ph_id,
@@ -618,7 +618,10 @@ class Orchestrator:
         phenomena (deterministic — the canonical tool was actually called).
         """
         evidence_text = " ".join(
-            f"{ph.category} {ph.title} {ph.description}".lower()
+            (
+                f"{ph.category} {ph.title} {ph.interpretation} "
+                + " ".join(str(f.get("value", "")) for f in ph.verified_facts)
+            ).lower()
             for ph in self.graph.phenomena.values()
         )
         used_tools: set[str] = {
@@ -747,28 +750,103 @@ class Orchestrator:
 
     # ---- Main pipeline -------------------------------------------------------
 
+    # ---- Phase 1 helpers (multi-source triage) -------------------------------
+
+    @staticmethod
+    def _is_analysable(src) -> bool:
+        """Mirror of main._is_analysable so the orchestrator doesn't depend
+        on main.py's import. Disk-image sources need a path; tree-mode
+        sources are analysable when they're mobile_extraction or archive.
+        """
+        if not getattr(src, "path", ""):
+            return False
+        if src.access_mode == "image":
+            return True
+        if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
+            return True
+        # media_collection is analysable too once a MediaAgent is registered.
+        if src.type == "media_collection":
+            return True
+        return False
+
+    def _sources_to_triage(self) -> list:
+        """Pick every analysable source in the case (or fall back to the
+        single active_source for the legacy single-image path).
+        """
+        case = self.graph.case
+        if case is None or not case.sources:
+            return [self.graph.active_source] if self.graph.active_source else []
+        return [s for s in case.sources if self._is_analysable(s)]
+
+    async def _phase1_triage_source(self, src) -> tuple[int, int]:
+        """Run the right triage agent on one source. Returns (Δphenomena, Δleads)."""
+        ph_before = len(self.graph.phenomena)
+        leads_before = sum(1 for l in self.graph.leads if l.status == "pending")
+
+        self.graph.set_active_source(src)
+        agent_type = get_triage_agent_type(src)
+        agent = self.factory.get_or_create_agent(agent_type)
+        if agent is None:
+            logger.warning(
+                "No agent registered for type %s — skipping source %s",
+                agent_type, src.id,
+            )
+            return 0, 0
+
+        _log(
+            f"Phase 1 triage: {src.id} ({src.label}) → {agent_type}",
+            event="dispatch", agent=agent_type, source=src.id,
+        )
+        try:
+            await agent.run(
+                f"Perform an initial Phase-1 triage of source {src.id} "
+                f"({src.label}, type={src.type}). Survey the source's "
+                f"structure, identify the most interesting artefacts, and "
+                f"record significant findings via add_phenomenon. Call "
+                f"observe_identity for any concrete identifiers (email, "
+                f"phone, Apple ID, IMEI, wallet address, persistent "
+                f"username) you encounter — that's how this finding will "
+                f"link across the other sources in the case. Create "
+                f"add_lead for follow-up that's outside your scope."
+            )
+        except Exception as e:
+            logger.error("Phase 1 agent [%s] failed on %s: %s", agent_type, src.id, e)
+
+        return (
+            len(self.graph.phenomena) - ph_before,
+            sum(1 for l in self.graph.leads if l.status == "pending") - leads_before,
+        )
+
     async def run(self, resume_phase: int = 1) -> str:
         """Run the 5-phase hypothesis-driven forensic analysis pipeline."""
-        _log(f"Phase 1: Filesystem Survey (image: {Path(self.graph.image_path).name})", event="phase")
+        sources = self._sources_to_triage()
+        _log(
+            f"Phase 1: per-source triage ({len(sources)} source(s))",
+            event="phase",
+        )
 
         report = ""
         try:
-            # Phase 1: Initial filesystem survey
+            # Phase 1: Initial per-source triage (S6 multi-source).
+            # Runs sequentially so each agent gets its own task_id scope —
+            # the grounding gateway requires that, and shared graph state
+            # (active_source, partition_offset) would race under parallel
+            # dispatch anyway.
             if resume_phase <= 1:
                 t0 = time.monotonic()
                 ph_before = len(self.graph.phenomena)
-                fs_agent = self.factory.get_or_create_agent("filesystem")
-                if fs_agent:
-                    await fs_agent.run(
-                        "Perform an initial survey of this disk image. "
-                        "Examine the partition table, filesystem type, and root directory structure. "
-                        "List key user directories and identify interesting files (documents, emails, "
-                        "chat logs, installed programs, registry hives). "
-                        "Create leads for other agents based on what you find."
+                for src in sources:
+                    new_ph, new_leads = await self._phase1_triage_source(src)
+                    _log(
+                        f"  {src.id}: +{new_ph} phenomena, +{new_leads} leads",
+                        event="progress", source=src.id,
                     )
-                new_ph = len(self.graph.phenomena) - ph_before
-                new_leads = sum(1 for l in self.graph.leads if l.status == "pending")
-                _log(f"+{new_ph} phenomena, +{new_leads} leads", event="progress", elapsed=time.monotonic() - t0)
+                total_ph = len(self.graph.phenomena) - ph_before
+                total_leads = sum(1 for l in self.graph.leads if l.status == "pending")
+                _log(
+                    f"Phase 1 total: +{total_ph} phenomena, {total_leads} pending leads",
+                    event="progress", elapsed=time.monotonic() - t0,
+                )
 
             # Phase 2: Hypothesis generation
             if resume_phase <= 2:
@@ -865,8 +943,15 @@ class Orchestrator:
                     "6. Conclusions and Recommendations"
                 )
 
-                image_stem = Path(self.graph.image_path).stem
-                report_name = f"{image_stem}_forensic_report.md"
+                # Multi-source case → name by case_id (stable across sources).
+                # Legacy single-image runs without a Case → fall back to the
+                # last active image's stem so old workflows still produce a
+                # plausible filename.
+                if self.graph.case and self.graph.case.case_id:
+                    stem = self.graph.case.case_id
+                else:
+                    stem = Path(self.graph.image_path).stem or "case"
+                report_name = f"{stem}_forensic_report.md"
                 report_path = (self.run_dir / report_name) if self.run_dir else Path(report_name)
                 try:
                     report_path.write_text(report)
diff --git a/pyproject.toml b/pyproject.toml
index f53f96d..c163497 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,6 +6,8 @@ requires-python = ">=3.14"
 dependencies = [
     "httpx[socks]>=0.28.1",
     "openai>=2.36.0",
+    "pillow>=12.2.0",
+    "pytesseract>=0.3.13",
     "pyyaml",
     "regipy>=6.2.1",
 ]
diff --git a/regenerate_report.py b/regenerate_report.py
index ee13af4..e4cc58c 100644
--- a/regenerate_report.py
+++ b/regenerate_report.py
@@ -32,10 +32,10 @@ async def main() -> None:
     config = yaml.safe_load(open("config.yaml"))
     agent_cfg = config["agent"]
 
-    # Load graph (edge_weights from config — applied to the loaded graph)
+    # Load graph (edge_log_lr from config — applied to the loaded graph)
     graph = EvidenceGraph.load_state(
         state_path,
-        edge_weights=config.get("hypothesis_edge_weights"),
+        edge_log_lr=config.get("hypothesis_log_lr"),
     )
     print(f"Loaded: {graph.stats_summary()}")
 
@@ -49,7 +49,7 @@ async def main() -> None:
         thinking_enabled=agent_cfg.get("thinking_enabled", False),
     )
 
-    register_all_tools(graph.image_path, graph.partition_offset, graph)
+    register_all_tools(graph)
     factory = AgentFactory(llm, graph)
 
     # Run only the report agent
diff --git a/tests/test_optimizations.py b/tests/test_optimizations.py
index 2b703ee..bf139ec 100644
--- a/tests/test_optimizations.py
+++ b/tests/test_optimizations.py
@@ -8,8 +8,9 @@ import time
 import pytest
 
 from evidence_graph import (
-    EvidenceGraph, Phenomenon, Lead,
+    EvidenceGraph, Phenomenon, Hypothesis, Lead, GroundingError,
     _compute_quality_score, _jaccard_similarity,
+    prob_to_log_odds, log_odds_to_prob,
 )
 from llm_client import (
     _truncate_tool_result, _partition_tool_calls, _ToolBatch, READ_ONLY_TOOLS,
@@ -26,11 +27,15 @@ from tool_registry import (
 
 class TestQualityScore:
     def test_full_score(self):
+        # With five grounded facts (5 × 0.05 = max 0.25 contribution) plus
+        # source_tool 0.20 + timestamp 0.15 + raw_data 0.15 + long interp 0.10
+        # + related_ids 0.15 = 1.00.
         score = _compute_quality_score(
             source_tool="list_dir",
             timestamp="2024-01-01",
             raw_data={"key": "val"},
-            description="A" * 50,
+            interpretation="A" * 50,
+            verified_facts=[{"type": "raw", "value": "x", "invocation_id": "inv-1"}] * 5,
             related_ids=["ph-0"],
         )
         assert score == pytest.approx(1.0)
@@ -40,20 +45,23 @@ class TestQualityScore:
             source_tool="",
             timestamp=None,
             raw_data={},
-            description="short",
+            interpretation="short",
+            verified_facts=[],
             related_ids=[],
         )
         assert score == pytest.approx(0.0)
 
     def test_partial_score(self):
+        # source_tool 0.20 + interpretation>=50 0.10 = 0.30 (no facts, no ts, no raw)
         score = _compute_quality_score(
             source_tool="parse_registry_key",
             timestamp=None,
             raw_data={},
-            description="A" * 50,
+            interpretation="A" * 50,
+            verified_facts=[],
             related_ids=[],
         )
-        assert score == pytest.approx(0.40)
+        assert score == pytest.approx(0.30)
 
 
 class TestJaccardSimilarity:
@@ -90,7 +98,9 @@ class TestPhenomenonDedup:
         )
         assert not merged
         ph = graph.phenomena[pid]
-        assert ph.confidence == pytest.approx(0.40)
+        # source_tool 0.20 + interpretation >= 50 chars 0.10 = 0.30
+        # (verified_facts and raw_data both empty here)
+        assert ph.confidence == pytest.approx(0.30)
 
     @pytest.mark.asyncio
     async def test_identical_phenomenon_merges(self, graph):
@@ -198,8 +208,9 @@ class TestHypothesisConfidence:
         pid, _ = await graph.add_phenomenon("fs", "filesystem", "test", "test desc", source_tool="t")
         hid = await graph.add_hypothesis("test hyp", "desc", created_by="test")
         conf = await graph.update_hypothesis_confidence(hid, pid, "direct_evidence", "reason")
-        # direct_evidence weight is +0.25 * (1-0.5) = +0.125
-        assert conf == pytest.approx(0.625)
+        # S3 log-odds: prior 0.5 (L=0) + direct_evidence (+2.0) = L=2.0
+        # confidence = 1 / (1 + 10^-2) ≈ 0.9901
+        assert conf == pytest.approx(0.9901, abs=1e-3)
 
     @pytest.mark.asyncio
     async def test_confidence_log_tracked(self, graph):
@@ -1506,3 +1517,1287 @@ class TestInvestigationAreaDerivation:
         assert graph.leads[0].hypothesis_id == h
         assert graph.leads[0].target_agent == "registry"
         assert graph.leads[0].priority == 2
+
+
+# ---------------------------------------------------------------------------
+# S2 grounding gateway — ToolInvocation + add_phenomenon validation
+# ---------------------------------------------------------------------------
+
+class TestAgentTaskContextIsolation:
+    """P0 race fix: _current_agent / _current_task_id are per-asyncio-task.
+
+    Pre-fix, ``graph._current_agent`` was a plain instance attribute that
+    every concurrent agent stomped over. With contextvars-backed
+    properties, three agents launched via :func:`asyncio.gather` each see
+    their own values, and ``record_tool_invocation`` tags each invocation
+    with the correct issuer.
+    """
+
+    @pytest.mark.asyncio
+    async def test_concurrent_agents_get_isolated_contexts(self):
+        graph = EvidenceGraph()
+
+        async def run_agent(name: str) -> dict:
+            graph._current_agent = name
+            graph._current_task_id = f"task-{name}"
+            # Yield so the other tasks get a chance to overwrite if the
+            # state were shared. Pre-fix, this is what surfaced the race.
+            await asyncio.sleep(0.001)
+            inv = await graph.record_tool_invocation(
+                tool="probe", args={"who": name}, output=f"hello from {name}",
+            )
+            return {
+                "name": name,
+                "read_agent": graph._current_agent,
+                "read_task": graph._current_task_id,
+                "inv": graph.tool_invocations[inv],
+            }
+
+        results = await asyncio.gather(
+            run_agent("agent-A"),
+            run_agent("agent-B"),
+            run_agent("agent-C"),
+        )
+        for r in results:
+            # Each task reads back its own values, not a sibling's.
+            assert r["read_agent"] == r["name"], r
+            assert r["read_task"] == f"task-{r['name']}", r
+            # And the invocation log tags the invocation with the right agent / task.
+            assert r["inv"].agent == r["name"]
+            assert r["inv"].task_id == f"task-{r['name']}"
+
+    @pytest.mark.asyncio
+    async def test_grounding_gateway_under_concurrent_agents(self):
+        """End-to-end: two concurrent agents each record + cite their own
+        invocation; the grounding gateway accepts both without cross-talk.
+        """
+        graph = EvidenceGraph()
+
+        async def cycle(name: str) -> str:
+            graph._current_agent = name
+            graph._current_task_id = f"task-{name}"
+            await asyncio.sleep(0.001)
+            inv = await graph.record_tool_invocation(
+                tool="t", args={}, output=f"unique-output-for-{name}",
+            )
+            await asyncio.sleep(0.001)
+            pid, _ = await graph.add_phenomenon(
+                source_agent=name, category="x", title=f"finding by {name}",
+                interpretation="",
+                verified_facts=[{
+                    "type": "raw",
+                    "value": f"unique-output-for-{name}",
+                    "invocation_id": inv,
+                }],
+                source_tool="t",
+            )
+            return pid
+
+        pids = await asyncio.gather(
+            cycle("agent-A"),
+            cycle("agent-B"),
+            cycle("agent-C"),
+        )
+        # All three phenomena landed — none was rejected by the gateway.
+        assert all(pid in graph.phenomena for pid in pids)
+
+
+class TestGroundingGateway:
+    """Code-level enforcement of DESIGN.md §4.4.
+
+    A phenomenon's verified_facts must trace back to a real ToolInvocation
+    made by the same agent within the same task scope, and each fact.value
+    must appear verbatim in that invocation's recorded output.
+    """
+
+    @pytest.fixture
+    def graph(self):
+        g = EvidenceGraph()
+        g._current_agent = "fs"
+        g._current_task_id = "task-abc"
+        return g
+
+    @pytest.mark.asyncio
+    async def test_record_invocation_and_cite_succeeds(self, graph):
+        inv_id = await graph.record_tool_invocation(
+            tool="list_directory",
+            args={"inode": "33"},
+            output="d/d 33-128-1: secret.txt\nr/r 33-128-2: another.bin",
+        )
+        pid, merged = await graph.add_phenomenon(
+            source_agent="fs", category="filesystem",
+            title="Secret file present",
+            interpretation="Found a suggestive filename",
+            verified_facts=[
+                {"type": "path", "value": "secret.txt", "invocation_id": inv_id},
+                {"type": "inode", "value": "33-128-1", "invocation_id": inv_id},
+            ],
+            source_tool="list_directory",
+        )
+        assert not merged
+        ph = graph.phenomena[pid]
+        assert len(ph.verified_facts) == 2
+
+    @pytest.mark.asyncio
+    async def test_fact_value_not_in_output_rejected(self, graph):
+        inv_id = await graph.record_tool_invocation(
+            tool="list_directory", args={}, output="just two files",
+        )
+        with pytest.raises(GroundingError) as exc:
+            await graph.add_phenomenon(
+                source_agent="fs", category="filesystem", title="bogus",
+                verified_facts=[
+                    {"type": "path", "value": "/etc/shadow", "invocation_id": inv_id},
+                ],
+                source_tool="list_directory",
+            )
+        # New error message wording — strict + loose both failed.
+        msg = str(exc.value)
+        assert "not found" in msg and "normalisation" in msg
+
+    @pytest.mark.asyncio
+    async def test_loose_match_accepts_case_folded_hex(self, graph):
+        """LLM frequently up-cases hex bytes; the loose match accepts it."""
+        inv_id = await graph.record_tool_invocation(
+            tool="read_binary_preview", args={}, output="header: 89 50 4e 47 ...",
+        )
+        pid, _ = await graph.add_phenomenon(
+            source_agent="fs", category="filesystem", title="PNG header",
+            verified_facts=[{
+                "type": "hash", "value": "89 50 4E 47", "invocation_id": inv_id,
+            }],
+            source_tool="read_binary_preview",
+        )
+        assert pid in graph.phenomena
+
+    @pytest.mark.asyncio
+    async def test_loose_match_accepts_label_collapsed_across_newlines(self, graph):
+        """`AppleID:\\n  whoishogan@gmail.com` ≡ `AppleID: whoishogan@gmail.com`."""
+        inv_id = await graph.record_tool_invocation(
+            tool="parse_plist", args={},
+            output="Key: AppleID\n  Value: whoishogan@gmail.com\nNext: ...",
+        )
+        pid, _ = await graph.add_phenomenon(
+            source_agent="fs", category="identity_observation",
+            title="Apple ID on burner",
+            verified_facts=[{
+                "type": "email",
+                "value": "Value: whoishogan@gmail.com",   # label + value
+                "invocation_id": inv_id,
+            }],
+            source_tool="parse_plist",
+        )
+        assert pid in graph.phenomena
+
+    @pytest.mark.asyncio
+    async def test_loose_match_accepts_path_separator_flip(self, graph):
+        """Backslash output, forward-slash value — same file."""
+        inv_id = await graph.record_tool_invocation(
+            tool="search_strings", args={}, output="archive: Sunny\\tor-portable.exe",
+        )
+        pid, _ = await graph.add_phenomenon(
+            source_agent="fs", category="filesystem", title="tor portable in archive",
+            verified_facts=[{
+                "type": "path", "value": "Sunny/tor-portable.exe", "invocation_id": inv_id,
+            }],
+            source_tool="search_strings",
+        )
+        assert pid in graph.phenomena
+
+    @pytest.mark.asyncio
+    async def test_loose_match_does_not_let_genuine_hallucination_through(self, graph):
+        """Even with all normalisations applied, an absent value is still rejected."""
+        inv_id = await graph.record_tool_invocation(
+            tool="t", args={}, output="alice and bob discussed lunch",
+        )
+        with pytest.raises(GroundingError):
+            await graph.add_phenomenon(
+                source_agent="fs", category="x", title="fabricated",
+                verified_facts=[{
+                    "type": "raw", "value": "they planned the heist",
+                    "invocation_id": inv_id,
+                }],
+                source_tool="t",
+            )
+
+    @pytest.mark.asyncio
+    async def test_invocation_from_other_agent_rejected(self, graph):
+        # record under a different agent
+        graph._current_agent = "registry"
+        inv_id = await graph.record_tool_invocation(
+            tool="parse_registry_key", args={}, output="HKLM\\System\\CCS",
+        )
+        # now switch to fs and try to cite registry's invocation
+        graph._current_agent = "fs"
+        with pytest.raises(GroundingError) as exc:
+            await graph.add_phenomenon(
+                source_agent="fs", category="registry", title="hijacked",
+                verified_facts=[
+                    {"type": "raw", "value": "HKLM", "invocation_id": inv_id},
+                ],
+                source_tool="parse_registry_key",
+            )
+        assert "different agent" in str(exc.value) or "not '" in str(exc.value)
+
+    @pytest.mark.asyncio
+    async def test_invocation_from_other_task_rejected(self, graph):
+        # Agent runs task A
+        graph._current_task_id = "task-A"
+        inv_a = await graph.record_tool_invocation(
+            tool="search_strings", args={"pattern": "foo"}, output="foo found at offset 1024",
+        )
+        # Same agent runs task B — must not be allowed to forward task A's id
+        graph._current_task_id = "task-B"
+        with pytest.raises(GroundingError) as exc:
+            await graph.add_phenomenon(
+                source_agent="fs", category="filesystem", title="stale",
+                verified_facts=[
+                    {"type": "raw", "value": "foo found", "invocation_id": inv_a},
+                ],
+                source_tool="search_strings",
+            )
+        assert "different task" in str(exc.value)
+
+    @pytest.mark.asyncio
+    async def test_missing_invocation_id_rejected(self, graph):
+        with pytest.raises(GroundingError):
+            await graph.add_phenomenon(
+                source_agent="fs", category="filesystem", title="ghost",
+                verified_facts=[{"type": "raw", "value": "x"}],
+                source_tool="t",
+            )
+
+    @pytest.mark.asyncio
+    async def test_unknown_invocation_id_rejected(self, graph):
+        with pytest.raises(GroundingError) as exc:
+            await graph.add_phenomenon(
+                source_agent="fs", category="filesystem", title="ghost",
+                verified_facts=[
+                    {"type": "raw", "value": "x", "invocation_id": "inv-does-not-exist"},
+                ],
+                source_tool="t",
+            )
+        assert "not found in invocation log" in str(exc.value)
+
+    @pytest.mark.asyncio
+    async def test_empty_verified_facts_allowed_for_negative_findings(self, graph):
+        # A negative finding ("searched X, found nothing") is permitted —
+        # no fact is required, the interpretation alone is recorded.
+        pid, _ = await graph.add_phenomenon(
+            source_agent="fs", category="filesystem",
+            title="No matches for 'cain' in user dirs",
+            interpretation="Searched C:\\Documents and Settings — no hits.",
+            source_tool="search_strings",
+        )
+        assert pid in graph.phenomena
+        assert graph.phenomena[pid].verified_facts == []
+
+    @pytest.mark.asyncio
+    async def test_persistence_round_trip_includes_invocations(self, graph, tmp_path):
+        inv_id = await graph.record_tool_invocation(
+            tool="list_directory", args={"inode": "1"}, output="root listing here",
+        )
+        path = tmp_path / "state.json"
+        graph.save_state(path)
+        loaded = EvidenceGraph.load_state(path)
+        assert inv_id in loaded.tool_invocations
+        inv = loaded.tool_invocations[inv_id]
+        assert inv.tool == "list_directory"
+        assert inv.output == "root listing here"
+
+    @pytest.mark.asyncio
+    async def test_legacy_description_loaded_as_interpretation(self):
+        # Pre-S2 state files persist description; from_dict must migrate.
+        legacy = {
+            "id": "ph-legacy01",
+            "source_agent": "fs",
+            "category": "filesystem",
+            "title": "old finding",
+            "description": "this was the analysis text in the old schema",
+        }
+        ph = Phenomenon.from_dict(legacy)
+        assert ph.interpretation == "this was the analysis text in the old schema"
+        assert ph.verified_facts == []
+        assert not hasattr(ph, "description")
+
+
+# ---------------------------------------------------------------------------
+# S3 log-odds confidence — order independence, calibration, idempotency
+# ---------------------------------------------------------------------------
+
+class TestLogOddsConfidence:
+    """Verify the additive log-odds update fixes the P3 ordering bug and
+    matches DESIGN.md §4.5 calibration values.
+    """
+
+    @pytest.fixture
+    def graph(self):
+        return EvidenceGraph()
+
+    def test_log_odds_round_trip(self):
+        for p in (0.1, 0.3, 0.5, 0.7, 0.9):
+            assert log_odds_to_prob(prob_to_log_odds(p)) == pytest.approx(p)
+        assert prob_to_log_odds(0.5) == pytest.approx(0.0)
+        assert log_odds_to_prob(0.0) == pytest.approx(0.5)
+
+    @pytest.mark.asyncio
+    async def test_order_independence(self, graph):
+        """Same edges in different orders → same final confidence (P3 fix)."""
+        # Two parallel runs with different application orders.
+        confs = []
+        for order in (["supports", "weakens", "supports"],
+                      ["weakens", "supports", "supports"],
+                      ["supports", "supports", "weakens"]):
+            g = EvidenceGraph()
+            hid = await g.add_hypothesis("h", "d")
+            for i, etype in enumerate(order):
+                pid, _ = await g.add_phenomenon(
+                    "fs", "filesystem", f"ph {i}", f"interp {i}",
+                    source_tool=f"t{i}",
+                )
+                await g.update_hypothesis_confidence(hid, pid, etype, "")
+            confs.append(g.hypotheses[hid].confidence)
+        # All three orderings must agree exactly.
+        assert confs[0] == pytest.approx(confs[1])
+        assert confs[1] == pytest.approx(confs[2])
+        # And the value should be 1 + 1 − 0.5 = 1.5 → sigmoid ≈ 0.9694
+        assert confs[0] == pytest.approx(0.9694, abs=1e-3)
+
+    @pytest.mark.asyncio
+    async def test_each_edge_type_calibrated(self, graph):
+        """Each edge type produces the documented log_lr contribution."""
+        expected = {
+            "direct_evidence":      +2.0,
+            "supports":             +1.0,
+            "consequence_observed": +1.0,
+            "prerequisite_met":     +0.5,
+            "weakens":              -0.5,
+            "contradicts":          -2.0,
+        }
+        for etype, log_lr in expected.items():
+            g = EvidenceGraph()
+            pid, _ = await g.add_phenomenon(
+                "fs", "filesystem", f"ph-{etype}", "interp",
+                source_tool="t",
+            )
+            hid = await g.add_hypothesis("h", "d")
+            conf = await g.update_hypothesis_confidence(hid, pid, etype, "")
+            assert g.hypotheses[hid].log_odds == pytest.approx(log_lr)
+            assert conf == pytest.approx(log_odds_to_prob(log_lr))
+
+    @pytest.mark.asyncio
+    async def test_status_flips_at_threshold(self, graph):
+        hid = await graph.add_hypothesis("h", "d")
+        pid, _ = await graph.add_phenomenon(
+            "fs", "filesystem", "evidence", "interp", source_tool="t",
+        )
+        # Single supports (+1.0) → conf ≈ 0.909 (above 0.8 threshold)
+        await graph.update_hypothesis_confidence(hid, pid, "supports", "")
+        assert graph.hypotheses[hid].status == "supported"
+
+        # Now stack two contradicts (−2 ×2 = −4); log_odds = 1 − 4 = −3 → conf ≈ 0.001
+        for i in range(2):
+            pid_n, _ = await graph.add_phenomenon(
+                "fs", "filesystem", f"contra {i}", f"d {i}", source_tool="t",
+            )
+            await graph.update_hypothesis_confidence(hid, pid_n, "contradicts", "")
+        assert graph.hypotheses[hid].status == "refuted"
+        assert graph.hypotheses[hid].confidence < 0.05
+
+    @pytest.mark.asyncio
+    async def test_idempotency_same_triple(self, graph):
+        """Re-applying the same (ph, hyp, edge_type) triple must NOT double-count."""
+        pid, _ = await graph.add_phenomenon(
+            "fs", "filesystem", "evidence", "interp", source_tool="t",
+        )
+        hid = await graph.add_hypothesis("h", "d")
+        await graph.update_hypothesis_confidence(hid, pid, "supports", "first")
+        first = graph.hypotheses[hid].confidence
+        first_edges = sum(1 for e in graph.edges if e.edge_type == "supports")
+
+        # Same triple again — should be a no-op (return current confidence).
+        again = await graph.update_hypothesis_confidence(hid, pid, "supports", "dup")
+        assert again == pytest.approx(first)
+        # No additional edge created.
+        assert sum(1 for e in graph.edges if e.edge_type == "supports") == first_edges
+
+    @pytest.mark.asyncio
+    async def test_independent_evidence_accumulates(self, graph):
+        """Distinct phenomena with same edge_type DO accumulate (independent)."""
+        hid = await graph.add_hypothesis("h", "d")
+        for i in range(3):
+            pid, _ = await graph.add_phenomenon(
+                "fs", "filesystem", f"ph {i}", f"d {i}", source_tool="t",
+            )
+            await graph.update_hypothesis_confidence(hid, pid, "supports", "")
+        # 3 × +1.0 = +3.0 log_odds → conf ≈ 0.999
+        assert graph.hypotheses[hid].log_odds == pytest.approx(3.0)
+        assert graph.hypotheses[hid].confidence > 0.99
+    @pytest.mark.asyncio
+    async def test_prior_prob_shifts_starting_log_odds(self, graph):
+        # prior 0.9 → log_odds ≈ +0.954
+        hid = await graph.add_hypothesis("h", "d", prior_prob=0.9)
+        h = graph.hypotheses[hid]
+        assert h.log_odds == pytest.approx(0.9542, abs=1e-3)
+        assert h.confidence == pytest.approx(0.9)
+
+    @pytest.mark.asyncio
+    async def test_legacy_hypothesis_migrates_log_odds(self):
+        """Pre-S3 state with confidence only: from_dict derives log_odds."""
+        legacy = {
+            "id": "hyp-old",
+            "title": "legacy",
+            "description": "d",
+            "confidence": 0.8,
+            "status": "supported",
+        }
+        h = Hypothesis.from_dict(legacy)
+        assert h.log_odds == pytest.approx(prob_to_log_odds(0.8))
+        assert h.confidence == pytest.approx(0.8)
+        assert h.prior_prob == 0.5
+
+    @pytest.mark.asyncio
+    async def test_invalid_edge_type_rejected(self, graph):
+        pid, _ = await graph.add_phenomenon("fs", "filesystem", "ph", "d", source_tool="t")
+        hid = await graph.add_hypothesis("h", "d")
+        with pytest.raises(ValueError):
+            await graph.update_hypothesis_confidence(hid, pid, "bogus_edge_type", "")
+
+    @pytest.mark.asyncio
+    async def test_matrix_view_counts_edges(self, graph):
+        h1 = await graph.add_hypothesis("h1", "d")
+        h2 = await graph.add_hypothesis("h2", "d")
+        p1, _ = await graph.add_phenomenon("fs", "filesystem", "a", "x", source_tool="t")
+        p2, _ = await graph.add_phenomenon("fs", "filesystem", "b", "y", source_tool="t")
+        await graph.update_hypothesis_confidence(h1, p1, "supports", "")
+        await graph.update_hypothesis_confidence(h1, p2, "supports", "")
+        await graph.update_hypothesis_confidence(h2, p1, "contradicts", "")
+
+        mat = graph.hypothesis_evidence_matrix()
+        assert mat["counts_by_edge_type"][h1]["supports"] == 2
+        assert mat["counts_by_edge_type"][h2]["contradicts"] == 1
+        md = graph.hypothesis_evidence_matrix_markdown()
+        assert "h1" in md and "h2" in md
+        assert "supports" in md and "contradicts" in md
+
+
+# ---------------------------------------------------------------------------
+# S4 iOS plugin — archive + plist + sqlite + idevice + agent routing
+# ---------------------------------------------------------------------------
+
+class TestArchiveAndIOSPlugin:
+    """End-to-end smoke tests for the iOS toolset.
+
+    Synthesize a tiny iOS-like extraction in tmp_path (XML plist, binary
+    plist, iDevice_info.txt, a sqlite db, a keychain-shaped sqlite),
+    then drive every new tool against it. No network, no external
+    binaries — stdlib zipfile + plistlib + sqlite3 only.
+    """
+
+    @pytest.fixture
+    def fake_extraction_zip(self, tmp_path):
+        """Build a small zip that mimics an iOS extraction."""
+        import plistlib
+        import sqlite3
+        import zipfile
+
+        staging = tmp_path / "staging"
+        staging.mkdir()
+
+        # XML plist
+        xml_plist = staging / "Library" / "Preferences" / "com.example.plist"
+        xml_plist.parent.mkdir(parents=True)
+        with open(xml_plist, "wb") as f:
+            plistlib.dump({"DeviceName": "iPhone-Test", "UDID": "abcd1234"}, f)
+
+        # Binary plist
+        bin_plist = staging / "var" / "Info.plist"
+        bin_plist.parent.mkdir(parents=True)
+        with open(bin_plist, "wb") as f:
+            plistlib.dump(
+                {"ProductVersion": "16.5", "BuildVersion": "20F66"},
+                f, fmt=plistlib.FMT_BINARY,
+            )
+
+        # iDevice_info.txt
+        idinfo = staging / "iDevice_info.txt"
+        idinfo.write_text(
+            "DeviceName: iPhone-Test\n"
+            "ProductType: iPhone14,5\n"
+            "UniqueDeviceID: abcd1234ef567890\n"
+        )
+
+        # Generic SMS-like sqlite db
+        sms_db = staging / "HomeDomain" / "Library" / "SMS" / "sms.db"
+        sms_db.parent.mkdir(parents=True)
+        conn = sqlite3.connect(sms_db)
+        conn.executescript(
+            """
+            CREATE TABLE message (id INTEGER PRIMARY KEY, text TEXT, handle_id INTEGER);
+            CREATE TABLE handle (id INTEGER PRIMARY KEY, contact TEXT);
+            INSERT INTO message VALUES (1, 'meet at 8', 100);
+            INSERT INTO message VALUES (2, 'ok', 101);
+            INSERT INTO handle VALUES (100, '+85291234567');
+            INSERT INTO handle VALUES (101, '+85298765432');
+            """
+        )
+        conn.commit()
+        conn.close()
+
+        # Keychain-shaped sqlite at the canonical path
+        kc_db = staging / "var" / "keychains" / "keychain-2.db"
+        kc_db.parent.mkdir(parents=True)
+        conn = sqlite3.connect(kc_db)
+        conn.executescript(
+            """
+            CREATE TABLE genp (agrp TEXT, acct TEXT, svce TEXT, data BLOB);
+            INSERT INTO genp VALUES ('com.apple.test', 'alice', 'AppleID', NULL);
+            INSERT INTO genp VALUES ('com.apple.test', 'bob',   'iCloud',  NULL);
+            CREATE TABLE inet (agrp TEXT, acct TEXT, srvr TEXT);
+            INSERT INTO inet VALUES ('com.apple.test', 'alice', 'gmail.com');
+            """
+        )
+        conn.commit()
+        conn.close()
+
+        # Zip it
+        zpath = tmp_path / "fake_ios.zip"
+        with zipfile.ZipFile(zpath, "w", zipfile.ZIP_DEFLATED) as zf:
+            for p in staging.rglob("*"):
+                if p.is_file():
+                    zf.write(p, p.relative_to(staging))
+        return zpath
+
+    @pytest.mark.asyncio
+    async def test_unzip_archive_extracts_safely(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        dest = tmp_path / "out"
+        result = await unzip_archive(str(fake_extraction_zip), str(dest))
+        assert "Extracted" in result
+        assert (dest / "iDevice_info.txt").is_file()
+        assert (dest / "HomeDomain" / "Library" / "SMS" / "sms.db").is_file()
+        assert (dest / "var" / "keychains" / "keychain-2.db").is_file()
+
+    @pytest.mark.asyncio
+    async def test_unzip_archive_blocks_zip_slip(self, tmp_path):
+        import zipfile
+        from tools.archive import unzip_archive
+        malicious = tmp_path / "evil.zip"
+        with zipfile.ZipFile(malicious, "w") as zf:
+            zf.writestr("../../escape.txt", "should not land here")
+            zf.writestr("/abs/path.txt", "neither should this")
+            zf.writestr("clean.txt", "this is fine")
+        dest = tmp_path / "out"
+        result = await unzip_archive(str(malicious), str(dest))
+        assert "Skipped" in result
+        # Only the clean entry made it in.
+        assert (dest / "clean.txt").is_file()
+        assert not (tmp_path / "escape.txt").exists()
+        assert not (tmp_path.parent / "escape.txt").exists()
+
+    @pytest.mark.asyncio
+    async def test_encrypted_zip_without_password_returns_clear_error(self, tmp_path):
+        import zipfile
+        from tools.archive import unzip_archive
+        enc = tmp_path / "locked.zip"
+        with zipfile.ZipFile(enc, "w") as zf:
+            zf.writestr("secret.txt", "do not read")
+            for info in zf.infolist():
+                info.flag_bits |= 0x1   # mark as encrypted (ZipCrypto-style)
+        result = await unzip_archive(str(enc), str(tmp_path / "out"))
+        assert result.startswith("Error:")
+        assert "password-protected" in result
+        assert "meta.password" in result
+
+    # NB: there's no roundtrip extract-with-correct-password test because
+    # stdlib zipfile cannot WRITE encrypted archives, so we can't synthesize
+    # one in tmp_path without a third-party writer. The encrypted-without-
+    # password path above exercises the new error branch; the happy-path
+    # password decryption is best verified end-to-end against a real
+    # password-protected source from case.yaml.
+
+    @pytest.mark.asyncio
+    async def test_unzip_archive_idempotent(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        dest = tmp_path / "out"
+        r1 = await unzip_archive(str(fake_extraction_zip), str(dest))
+        r2 = await unzip_archive(str(fake_extraction_zip), str(dest))
+        assert "Extracted" in r1
+        # Second pass should report zero new extractions.
+        assert "Extracted 0 file(s)" in r2
+
+    @pytest.mark.asyncio
+    async def test_parse_plist_xml_and_binary(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.mobile_ios import parse_plist
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+
+        xml = await parse_plist(str(dest / "Library" / "Preferences" / "com.example.plist"))
+        assert "iPhone-Test" in xml
+        assert "abcd1234" in xml
+
+        bin_ = await parse_plist(str(dest / "var" / "Info.plist"))
+        assert "16.5" in bin_
+        assert "20F66" in bin_
+
+    @pytest.mark.asyncio
+    async def test_sqlite_tables_and_query(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.mobile_ios import sqlite_tables, sqlite_query
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+        sms = dest / "HomeDomain" / "Library" / "SMS" / "sms.db"
+
+        tables = await sqlite_tables(str(sms))
+        assert "message" in tables and "handle" in tables
+        assert "2 row(s)" in tables  # message + handle each have 2 rows
+
+        rows = await sqlite_query(str(sms), "SELECT text FROM message", max_rows=10)
+        assert "meet at 8" in rows
+        assert "ok" in rows
+
+    @pytest.mark.asyncio
+    async def test_sqlite_query_rejects_non_select(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.mobile_ios import sqlite_query
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+        sms = dest / "HomeDomain" / "Library" / "SMS" / "sms.db"
+        result = await sqlite_query(str(sms), "DELETE FROM message")
+        assert "only single SELECT" in result
+        # And a multi-statement chain is also blocked
+        result2 = await sqlite_query(
+            str(sms), "SELECT 1; SELECT 2"
+        )
+        assert "multi-statement" in result2
+
+    @pytest.mark.asyncio
+    async def test_parse_ios_keychain(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.mobile_ios import parse_ios_keychain
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+        # Pass the containing dir — the parser auto-locates keychain-2.db.
+        result = await parse_ios_keychain(str(dest / "var" / "keychains"))
+        assert "genp" in result
+        assert "alice" in result
+        assert "AppleID" in result
+        assert "inet" in result
+
+    @pytest.mark.asyncio
+    async def test_read_idevice_info(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.mobile_ios import read_idevice_info
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+        # Pass the root — the helper finds iDevice_info.txt inside.
+        result = await read_idevice_info(str(dest))
+        assert "DeviceName: iPhone-Test" in result
+        assert "iPhone14,5" in result
+
+    @pytest.mark.asyncio
+    async def test_list_extracted_dir_summarises_huge_tree(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.parsers import list_extracted_dir
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+        summary = await list_extracted_dir(str(dest))
+        # Smart summary: total, extension breakdown, layout, largest.
+        assert "Total:" in summary
+        assert "Extension breakdown" in summary
+        assert "Top-level layout" in summary
+        # The iOS fixture has HomeDomain + var + Library at the top.
+        assert "HomeDomain" in summary or "Library" in summary or "var" in summary
+        # And the breakdown lists sqlite/plist extensions.
+        assert ".plist" in summary
+        # Steers the agent toward find_files for targeted lookups.
+        assert "find_files" in summary
+
+    @pytest.mark.asyncio
+    async def test_find_files_locates_specific_artefacts(self, fake_extraction_zip, tmp_path):
+        from tools.archive import unzip_archive
+        from tools.parsers import find_files
+        dest = tmp_path / "out"
+        await unzip_archive(str(fake_extraction_zip), str(dest))
+
+        # Glob for sms.db anywhere — the iOS fixture has it under HomeDomain.
+        sms_hits = await find_files(str(dest), "**/sms.db")
+        assert "sms.db" in sms_hits
+        assert "matches: 1" in sms_hits
+
+        # Plists at any depth.
+        plist_hits = await find_files(str(dest), "**/*.plist")
+        # The fixture writes com.example.plist + Info.plist.
+        assert "matches: 2" in plist_hits or "matches: 3" in plist_hits
+
+        # Anchored glob: only entries under var/.
+        var_hits = await find_files(str(dest), "var/**")
+        assert "matches:" in var_hits
+        # Keychain DB is under var/keychains in the fixture, so it shows up.
+        assert "keychain-2.db" in var_hits
+
+        # A pattern that matches nothing returns the empty-result message.
+        none = await find_files(str(dest), "**/nonexistent_pattern_xyz")
+        assert "no matches" in none
+
+
+class TestAgentFactoryRouting:
+    """SOURCE_TYPE_AGENTS maps source.type to the right triage agent."""
+
+    def test_disk_image_routes_to_filesystem(self):
+        from agent_factory import get_triage_agent_type
+        assert get_triage_agent_type("disk_image") == "filesystem"
+
+    def test_mobile_extraction_routes_to_ios_artifact(self):
+        from agent_factory import get_triage_agent_type
+        assert get_triage_agent_type("mobile_extraction") == "ios_artifact"
+
+    def test_unknown_type_falls_back_to_filesystem(self):
+        from agent_factory import get_triage_agent_type
+        assert get_triage_agent_type("totally_unknown_type") == "filesystem"
+
+    def test_ios_artifact_class_is_registered(self):
+        from agent_factory import _AGENT_CLASSES, _load_agent_classes
+        from agents.ios_artifact import IOSArtifactAgent
+        _load_agent_classes()
+        assert _AGENT_CLASSES["ios_artifact"] is IOSArtifactAgent
+
+
+# ---------------------------------------------------------------------------
+# S5 cross-source entity coreference (DESIGN.md §4.6)
+# ---------------------------------------------------------------------------
+
+class TestEntityCoref:
+    """Identity observation + automatic coref hypothesis + same_as edges.
+
+    Setup helper: every test records an invocation whose output literally
+    contains the identifier values being asserted, so the grounding gateway
+    accepts the observe_identity calls.
+    """
+
+    @pytest.fixture
+    def graph(self):
+        g = EvidenceGraph()
+        g._current_agent = "tester"
+        g._current_task_id = "task-coref"
+        return g
+
+    async def _record(self, graph, output: str) -> str:
+        return await graph.record_tool_invocation(
+            tool="probe", args={}, output=output,
+        )
+
+    @pytest.mark.asyncio
+    async def test_strong_shared_identifier_creates_supported_coref(self, graph):
+        inv = await self._record(
+            graph, "Found email alice@example.com on system A\nAlso alice@example.com on system B",
+        )
+        # Two entities (different sources, different names) sharing an email.
+        r1 = await graph.observe_identity(
+            entity_name="alice@example.com (laptop)", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        r2 = await graph.observe_identity(
+            entity_name="alice@example.com (phone)", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        # Second observation should fire a coref proposal.
+        assert r2["coref_proposals"], "expected a coref proposal on the second observation"
+        prop = r2["coref_proposals"][0]
+        # log_odds = prior(0.1)→-0.954 + shared_strong(+2.0) = +1.046 → conf ≈ 0.917
+        assert prop["confidence"] > 0.8
+        # And a same_as edge between the two entities is now active.
+        e1, e2 = r1["entity_id"], r2["entity_id"]
+        cluster = graph.resolve_actor_cluster(e1)
+        assert e2 in cluster and e1 in cluster
+
+    @pytest.mark.asyncio
+    async def test_weak_identifier_alone_is_not_enough(self, graph):
+        inv = await self._record(graph, "nickname mr_evil appears here and there")
+        r1 = await graph.observe_identity(
+            entity_name="user-A", entity_type="person",
+            identifier_type="nickname", value="mr_evil",
+            source_agent="tester", invocation_id=inv,
+        )
+        r2 = await graph.observe_identity(
+            entity_name="user-B", entity_type="person",
+            identifier_type="nickname", value="mr_evil",
+            source_agent="tester", invocation_id=inv,
+        )
+        prop = r2["coref_proposals"][0]
+        # log_odds = -0.954 + shared_weak(+0.5) = -0.454 → conf ≈ 0.260
+        assert prop["confidence"] < 0.8
+        # No active same_as edge between them.
+        cluster = graph.resolve_actor_cluster(r1["entity_id"])
+        assert r2["entity_id"] not in cluster
+
+    @pytest.mark.asyncio
+    async def test_conflicting_strong_identifier_blocks_coref(self, graph):
+        # Two entities share a nickname but each has a DIFFERENT strong email.
+        inv = await self._record(
+            graph,
+            "user-A: nickname mr_evil, email alice@example.com\n"
+            "user-B: nickname mr_evil, email bob@other.org",
+        )
+        await graph.observe_identity(
+            entity_name="user-A", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        await graph.observe_identity(
+            entity_name="user-B", entity_type="person",
+            identifier_type="email", value="bob@other.org",
+            source_agent="tester", invocation_id=inv,
+        )
+        r1 = await graph.observe_identity(
+            entity_name="user-A", entity_type="person",
+            identifier_type="nickname", value="mr_evil",
+            source_agent="tester", invocation_id=inv,
+        )
+        r2 = await graph.observe_identity(
+            entity_name="user-B", entity_type="person",
+            identifier_type="nickname", value="mr_evil",
+            source_agent="tester", invocation_id=inv,
+        )
+        # r2's nickname triggers blocking + conflict detection.
+        prop = r2["coref_proposals"][0]
+        # +shared_weak (0.5) + −conflicting_strong (-2.0) = net -1.5
+        # from prior -0.954 → L = -2.454 → conf ≈ 0.0035 (refuted)
+        assert prop["confidence"] < 0.05
+        assert prop["conflicts"], "expected conflicting email to be flagged"
+        # No same_as edge.
+        cluster = graph.resolve_actor_cluster(r1["entity_id"])
+        assert r2["entity_id"] not in cluster
+
+    @pytest.mark.asyncio
+    async def test_observe_identity_grounding_rejects_fabrication(self, graph):
+        inv = await self._record(graph, "only this exact string is allowed")
+        with pytest.raises(GroundingError):
+            await graph.observe_identity(
+                entity_name="ghost", entity_type="person",
+                identifier_type="email", value="not-in-output@nope.com",
+                source_agent="tester", invocation_id=inv,
+            )
+
+    @pytest.mark.asyncio
+    async def test_unknown_identifier_type_rejected(self, graph):
+        inv = await self._record(graph, "value-here")
+        with pytest.raises(ValueError):
+            await graph.observe_identity(
+                entity_name="x", entity_type="person",
+                identifier_type="not_a_real_type", value="value-here",
+                source_agent="tester", invocation_id=inv,
+            )
+
+    @pytest.mark.asyncio
+    async def test_phone_number_normalization(self, graph):
+        # Two formats of the same number should collide.
+        inv = await self._record(
+            graph, "saw +852 9123-4567 on A and 85291234567 on B",
+        )
+        r1 = await graph.observe_identity(
+            entity_name="caller-A", entity_type="person",
+            identifier_type="phone_number", value="+852 9123-4567",
+            source_agent="tester", invocation_id=inv,
+        )
+        r2 = await graph.observe_identity(
+            entity_name="caller-B", entity_type="person",
+            identifier_type="phone_number", value="85291234567",
+            source_agent="tester", invocation_id=inv,
+        )
+        assert r2["coref_proposals"], "phone normalization should match"
+        assert r2["coref_proposals"][0]["confidence"] > 0.8
+
+    @pytest.mark.asyncio
+    async def test_repeating_identifier_is_idempotent(self, graph):
+        inv = await self._record(graph, "alice@example.com everywhere")
+        r1 = await graph.observe_identity(
+            entity_name="alice", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        r2 = await graph.observe_identity(
+            entity_name="alice", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        # Same entity, same identifier — second call is a no-op.
+        assert r1["entity_id"] == r2["entity_id"]
+        assert r2["new_identifier"] is False
+        ent = graph.entities[r1["entity_id"]]
+        assert len([i for i in ent.identifiers if i["type"] == "email"]) == 1
+
+    @pytest.mark.asyncio
+    async def test_coref_is_reversible_via_contradicting_evidence(self, graph):
+        # Establish a supported coref via shared email.
+        inv = await self._record(
+            graph, "shared email alice@example.com on A and B",
+        )
+        r1 = await graph.observe_identity(
+            entity_name="A", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        r2 = await graph.observe_identity(
+            entity_name="B", entity_type="person",
+            identifier_type="email", value="alice@example.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        hid = r2["coref_proposals"][0]["hypothesis_id"]
+        assert graph.hypotheses[hid].confidence > 0.8
+        # Now a separate phenomenon contradicts the coref directly.
+        contra_pid, _ = await graph.add_phenomenon(
+            source_agent="tester", category="identity_observation",
+            title="A and B were online simultaneously on different devices",
+            interpretation="they cannot be the same actor",
+            source_tool="tester",
+        )
+        await graph.update_hypothesis_confidence(
+            hid, contra_pid, "contradicts", "different physical presence",
+        )
+        # log_odds was ~+1.046; contradicts (-2.0) drops to -0.954 → conf 0.10
+        assert graph.hypotheses[hid].confidence < 0.2
+        # same_as edge between the entities is now inactive — cluster shrinks.
+        cluster = graph.resolve_actor_cluster(r1["entity_id"])
+        assert r2["entity_id"] not in cluster
+        # The audit edge still exists, just inactive.
+        edges = [
+            e for e in graph.edges
+            if e.edge_type == "same_as"
+            and {e.source_id, e.target_id} == {r1["entity_id"], r2["entity_id"]}
+        ]
+        assert len(edges) == 1
+        assert edges[0].metadata.get("active") is False
+
+    @pytest.mark.asyncio
+    async def test_blocking_avoids_n_squared(self, graph):
+        # Build 30 entities with NO shared/similar identifiers — verify no
+        # coref hypothesis is created (no O(n²) hyp explosion).
+        for i in range(30):
+            inv = await self._record(graph, f"unique-email-{i}@example.com")
+            await graph.observe_identity(
+                entity_name=f"person-{i}", entity_type="person",
+                identifier_type="email", value=f"unique-email-{i}@example.com",
+                source_agent="tester", invocation_id=inv,
+            )
+        coref_hyps = [
+            h for h in graph.hypotheses.values()
+            if h.id.startswith("hyp-coref-")
+        ]
+        assert coref_hyps == [], "blocking should prevent O(n²) coref hypotheses"
+
+    @pytest.mark.asyncio
+    async def test_actor_cluster_spans_three_entities(self, graph):
+        # A ≡ B via email, B ≡ C via phone → transitively, all three cluster.
+        inv = await self._record(
+            graph,
+            "A: alice@a.com\n"
+            "B: alice@a.com, phone 12345\n"
+            "C: phone 12345",
+        )
+        rA = await graph.observe_identity(
+            entity_name="A", entity_type="person",
+            identifier_type="email", value="alice@a.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        rB1 = await graph.observe_identity(
+            entity_name="B", entity_type="person",
+            identifier_type="email", value="alice@a.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        rB2 = await graph.observe_identity(
+            entity_name="B", entity_type="person",
+            identifier_type="phone_number", value="12345",
+            source_agent="tester", invocation_id=inv,
+        )
+        rC = await graph.observe_identity(
+            entity_name="C", entity_type="person",
+            identifier_type="phone_number", value="12345",
+            source_agent="tester", invocation_id=inv,
+        )
+        cluster = graph.resolve_actor_cluster(rA["entity_id"])
+        assert {rA["entity_id"], rB1["entity_id"], rC["entity_id"]} <= cluster
+
+    @pytest.mark.asyncio
+    async def test_actor_clusters_renders_via_report_tool(self, graph):
+        """Report agent's get_actor_clusters renders cluster + identifiers."""
+        from agents.report import ReportAgent
+        from unittest.mock import AsyncMock
+        inv = await self._record(graph, "shared email alice@a.com on both A and B")
+        await graph.observe_identity(
+            entity_name="A", entity_type="person",
+            identifier_type="email", value="alice@a.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        await graph.observe_identity(
+            entity_name="B", entity_type="person",
+            identifier_type="email", value="alice@a.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        # Drive the read tool directly (no LLM in the loop for this assertion).
+        agent = ReportAgent(AsyncMock(), graph)
+        rendered = await agent._get_actor_clusters()
+        assert "MULTI-SOURCE CLUSTER" in rendered
+        assert "alice@a.com" in rendered
+        assert "Backing coref hypotheses" in rendered
+
+    @pytest.mark.asyncio
+    async def test_actor_clusters_groups_all_entities(self, graph):
+        # Two distinct clusters: A≡B, and C alone.
+        inv = await self._record(graph, "A=alice@a.com B=alice@a.com C=carol@c.com")
+        await graph.observe_identity(
+            entity_name="A", entity_type="person",
+            identifier_type="email", value="alice@a.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        await graph.observe_identity(
+            entity_name="B", entity_type="person",
+            identifier_type="email", value="alice@a.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        await graph.observe_identity(
+            entity_name="C", entity_type="person",
+            identifier_type="email", value="carol@c.com",
+            source_agent="tester", invocation_id=inv,
+        )
+        clusters = graph.actor_clusters()
+        # Two clusters (size 2 + size 1) — exact membership.
+        sizes = sorted(len(c["members"]) for c in clusters)
+        assert sizes == [1, 2]
+        # And each cluster aggregates its identifiers.
+        for c in clusters:
+            assert any(i["type"] == "email" for i in c["identifiers"])
+
+
+# ---------------------------------------------------------------------------
+# S6 Android + Media + multi-source orchestration
+# ---------------------------------------------------------------------------
+
+class TestAndroidPartitionProbe:
+    """probe_android_partitions parses mmls output and translates sector units."""
+
+    def test_parse_mmls_with_4096_byte_sectors(self):
+        from tools.mobile_android import _parse_mmls_with_unit
+        sample = (
+            "GUID Partition Table (EFI)\n"
+            "Offset Sector: 0\n"
+            "Units are in 4096-byte sectors\n"
+            "\n"
+            "      Slot      Start        End          Length       Description\n"
+            "000:  Meta      0000000000   0000000000   0000000001   Safety Table\n"
+            "001:  -------   0000000000   0000001023   0000001024   Unallocated\n"
+            "002:  Meta      0000000001   0000000001   0000000001   GPT Header\n"
+            "003:  Meta      0000000002   0000000005   0000000004   Partition Table\n"
+            "004:  000       0000001024   0000002047   0000001024   BOTA0\n"
+            "017:  013       0000048128   0001148927   0001100800   SYSTEM\n"
+            "021:  017       0001203968   0007806719   0006602752   USERDATA\n"
+        )
+        sector_size, parts = _parse_mmls_with_unit(sample)
+        assert sector_size == 4096
+        # Three real partition rows (rest are Meta / Unallocated).
+        assert len(parts) == 3
+        # Sector translation: BOTA0 at native 1024 (4K sectors) = 4194304 bytes = sector 8192 (512-byte).
+        bota0 = parts[0]
+        assert bota0["start_native"] == 1024
+        # The probe formats the 512-sector explicitly when emitting markdown
+        # (test that separately on the live image); here just sanity-check parse.
+        assert bota0["description"] == "BOTA0"
+
+    def test_parse_mmls_defaults_to_512_when_unit_missing(self):
+        from tools.mobile_android import _parse_mmls_with_unit
+        sample = (
+            "DOS Partition Table\n"
+            "Offset Sector: 0\n"
+            "Units are in 512-byte sectors\n"
+            "\n"
+            "002:  000       0000000063   0019999999   0019999937   NTFS\n"
+        )
+        sector_size, parts = _parse_mmls_with_unit(sample)
+        assert sector_size == 512
+        assert len(parts) == 1
+        assert parts[0]["start_native"] == 63
+
+
+class TestMediaOCR:
+    """ocr_image returns a clear install hint when tesseract is missing."""
+
+    @pytest.mark.asyncio
+    async def test_missing_runtime_returns_install_hint(self, tmp_path):
+        from tools.media import ocr_image, _has_ocr_runtime
+        # On a host without the runtime, the tool should not raise — it
+        # should return an Error: prefixed string the agent can record as
+        # a negative finding.
+        available, _reason = _has_ocr_runtime()
+        if available:
+            pytest.skip("OCR runtime is installed on this host")
+        # Create a placeholder file so the path-existence check passes.
+        fake = tmp_path / "fake.jpg"
+        fake.write_bytes(b"\xff\xd8\xff\xe0not-a-real-image")
+        result = await ocr_image(str(fake))
+        assert result.startswith("Error: OCR runtime not available")
+        assert "pip install pytesseract pillow" in result
+        assert "tesseract-ocr" in result
+
+    @pytest.mark.asyncio
+    async def test_missing_file_returns_clear_error(self, tmp_path):
+        from tools.media import ocr_image
+        result = await ocr_image(str(tmp_path / "no-such-file.jpg"))
+        assert result.startswith("Error: ")
+        assert "is not a file" in result
+
+
+class TestSetActivePartition:
+    """set_active_partition mutates graph.active_source.partition_offset."""
+
+    @pytest.mark.asyncio
+    async def test_mutates_offset(self):
+        from case import EvidenceSource
+        from tool_registry import register_all_tools, TOOL_CATALOG
+        graph = EvidenceGraph()
+        graph.active_source = EvidenceSource(
+            id="src-test", label="test", type="disk_image",
+            path="/tmp/whatever", access_mode="image", partition_offset=0,
+        )
+        graph.partition_offset = 0
+        register_all_tools(graph)
+        td = TOOL_CATALOG["set_active_partition"]
+        # set_active_partition output is wrapped with [invocation: inv-xxx]\n
+        result = await td.executor(partition_offset=614400)
+        assert "0 → 614400" in result
+        assert graph.active_source.partition_offset == 614400
+        # Legacy mirror field kept in sync too.
+        assert graph.partition_offset == 614400
+
+
+class TestPlatformRouting:
+    """get_triage_agent_type distinguishes Windows vs Android disk_images."""
+
+    def test_disk_image_default_falls_back_to_filesystem(self):
+        from agent_factory import get_triage_agent_type
+        from case import EvidenceSource
+        src = EvidenceSource(
+            id="x", label="x", type="disk_image", path="/x",
+            access_mode="image",
+        )
+        assert get_triage_agent_type(src) == "filesystem"
+
+    def test_disk_image_windows_routes_to_filesystem(self):
+        from agent_factory import get_triage_agent_type
+        from case import EvidenceSource
+        src = EvidenceSource(
+            id="x", label="x", type="disk_image", path="/x",
+            access_mode="image", meta={"platform": "windows"},
+        )
+        assert get_triage_agent_type(src) == "filesystem"
+
+    def test_disk_image_android_routes_to_android_artifact(self):
+        from agent_factory import get_triage_agent_type
+        from case import EvidenceSource
+        src = EvidenceSource(
+            id="x", label="x", type="disk_image", path="/x",
+            access_mode="image", meta={"platform": "android"},
+        )
+        assert get_triage_agent_type(src) == "android_artifact"
+
+    def test_media_collection_routes_to_media(self):
+        from agent_factory import get_triage_agent_type
+        from case import EvidenceSource
+        src = EvidenceSource(
+            id="x", label="x", type="media_collection", path="/x",
+            access_mode="tree",
+        )
+        assert get_triage_agent_type(src) == "media"
+
+    def test_string_signature_back_compat(self):
+        # The S5 signature accepted a plain source.type string.
+        from agent_factory import get_triage_agent_type
+        assert get_triage_agent_type("mobile_extraction") == "ios_artifact"
+        assert get_triage_agent_type("disk_image") == "filesystem"
+
+    def test_android_and_media_classes_registered(self):
+        from agent_factory import _AGENT_CLASSES, _load_agent_classes
+        from agents.android_artifact import AndroidArtifactAgent
+        from agents.media import MediaAgent
+        _load_agent_classes()
+        assert _AGENT_CLASSES["android_artifact"] is AndroidArtifactAgent
+        assert _AGENT_CLASSES["media"] is MediaAgent
+
+
+class TestOrchestratorMultiSource:
+    """Phase 1 iterates over every analysable source in the case."""
+
+    @pytest.mark.asyncio
+    async def test_phase1_dispatches_one_agent_per_source(self):
+        from unittest.mock import AsyncMock
+        from agent_factory import AgentFactory
+        from case import Case, EvidenceSource
+        from orchestrator import Orchestrator
+
+        graph = EvidenceGraph()
+        case = Case(
+            case_id="multi",
+            name="multi-source",
+            sources=[
+                EvidenceSource(
+                    id="src-win", label="USB", type="disk_image", path="/tmp/usb",
+                    access_mode="image", meta={"platform": "windows"},
+                ),
+                EvidenceSource(
+                    id="src-ios", label="iPhone", type="mobile_extraction",
+                    path="/tmp/ios-tree", access_mode="tree",
+                ),
+                EvidenceSource(
+                    id="src-droid", label="Android", type="disk_image",
+                    path="/tmp/droid", access_mode="image",
+                    meta={"platform": "android"},
+                ),
+            ],
+        )
+        graph.case = case
+        graph.set_active_source(case.sources[0])
+
+        invoked: list[tuple[str, str]] = []
+
+        class FakeAgent:
+            def __init__(self, name): self.name = name
+            async def run(self, task, lead_id=None):
+                invoked.append((self.name, graph.active_source.id))
+
+        llm = AsyncMock()
+        factory = AgentFactory(llm, graph)
+        factory._cache = {
+            "filesystem":       FakeAgent("filesystem"),
+            "ios_artifact":     FakeAgent("ios_artifact"),
+            "android_artifact": FakeAgent("android_artifact"),
+        }
+        orch = Orchestrator(llm, graph, factory)
+
+        sources = orch._sources_to_triage()
+        for src in sources:
+            await orch._phase1_triage_source(src)
+
+        # Each source got triaged by exactly the right agent, with
+        # active_source pointed at it during the run.
+        assert invoked == [
+            ("filesystem",       "src-win"),
+            ("ios_artifact",     "src-ios"),
+            ("android_artifact", "src-droid"),
+        ]
+
+    def test_is_analysable_filters_correctly(self):
+        from case import EvidenceSource
+        from orchestrator import Orchestrator
+        ok_disk = EvidenceSource(id="a", label="", type="disk_image", path="/x", access_mode="image")
+        ok_ios = EvidenceSource(id="b", label="", type="mobile_extraction", path="/y", access_mode="tree")
+        ok_archive = EvidenceSource(id="c", label="", type="archive", path="/z", access_mode="tree")
+        ok_media = EvidenceSource(id="d", label="", type="media_collection", path="/w", access_mode="tree")
+        no_path = EvidenceSource(id="e", label="", type="disk_image", path="", access_mode="image")
+        assert Orchestrator._is_analysable(ok_disk)
+        assert Orchestrator._is_analysable(ok_ios)
+        assert Orchestrator._is_analysable(ok_archive)
+        assert Orchestrator._is_analysable(ok_media)
+        assert not Orchestrator._is_analysable(no_path)
+
diff --git a/tool_registry.py b/tool_registry.py
index 63b0239..87d5d67 100644
--- a/tool_registry.py
+++ b/tool_registry.py
@@ -1,6 +1,8 @@
 """Central tool registry — catalogs all available forensic tools.
 
-Tools are registered once at startup with bound image_path and offset.
+Tools are registered once at startup. Sleuth Kit tools resolve their image
+path and partition offset from graph.active_source at call time, so a single
+registered tool follows whichever evidence source is currently active.
 The AgentFactory uses this catalog to compose agents dynamically.
 """
 
@@ -14,6 +16,11 @@ import re
 from dataclasses import dataclass, field
 from typing import Any
 
+from evidence_graph import GroundingError
+from tools import archive as arc
+from tools import media as med
+from tools import mobile_android as android
+from tools import mobile_ios as ios
 from tools import parsers
 from tools import registry as reg
 from tools import sleuthkit as tsk
@@ -35,6 +42,13 @@ CACHEABLE_TOOLS: set[str] = {
     "parse_registry_key", "search_registry", "get_user_activity",
     "read_text_file", "read_binary_preview", "search_text_file",
     "read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
+    "find_files",
+    # iOS (read-only file parses):
+    "parse_plist", "sqlite_tables", "sqlite_query",
+    "parse_ios_keychain", "read_idevice_info",
+    # Android + media (read-only):
+    "probe_android_partitions", "ocr_image",
+    # NB: unzip_archive and set_active_partition are NOT cached — they have side effects.
 }
 
 
@@ -45,24 +59,106 @@ def _cache_key(tool_name: str, kwargs: dict) -> str:
     return f"{tool_name}:{args_hash}"
 
 
+def _looks_like_error(text: str) -> bool:
+    """Heuristic for unsuccessful tool output (mirrors the prior cache filter)."""
+    return text.startswith("Error") or text.startswith("[Command failed") or text.startswith("[icat failed")
+
+
 def _make_cached(tool_name: str, executor: Any) -> Any:
-    """Wrap an executor with an in-memory result cache."""
+    """Thin in-memory cache wrapper around a tool executor.
+
+    Kept as a standalone primitive (no graph dependency) so unit tests can
+    exercise caching in isolation. Production wiring composes this with
+    invocation logging via :func:`_make_invocation_executor`.
+    """
 
     async def wrapper(**kwargs) -> str:
         key = _cache_key(tool_name, kwargs)
-        cached = _tool_result_cache.get(key)
-        if cached is not None:
-            logger.debug("Cache hit: %s(%s)", tool_name, kwargs)
-            return cached
+        hit = _tool_result_cache.get(key)
+        if hit is not None:
+            return hit
         result = await executor(**kwargs)
-        # Only cache successful results (not errors)
-        if not result.startswith("Error") and not result.startswith("[Command failed"):
+        if not _looks_like_error(result):
             _tool_result_cache[key] = result
         return result
 
     return wrapper
 
 
+def _make_invocation_executor(
+    tool_name: str,
+    executor: Any,
+    graph: Any,
+    *,
+    cacheable: bool,
+    auto_record_category: str | None = None,
+) -> Any:
+    """Single uniform wrapper around a forensic tool executor.
+
+    Responsibilities (in order):
+      1. Serve from the result cache when ``cacheable=True`` and the key
+         is hot. Cached hits still produce a fresh ToolInvocation record
+         marked ``cached=True`` so the agent can cite their work.
+      2. Call the underlying executor on cache miss; store on success.
+      3. Record a :class:`ToolInvocation` on the graph (this is the
+         provenance unit the grounding gateway looks up).
+      4. (Optionally) auto-record the raw output as a Phenomenon with a
+         single ``type=raw`` fact citing the invocation just made. This
+         replaces the pre-S2 ``_make_auto_record`` shortcut.
+      5. Return the result with a ``[invocation: inv-xxx]`` header so
+         the LLM learns the ID to put in ``add_phenomenon`` facts.
+    """
+
+    async def wrapper(**kwargs) -> str:
+        cached_flag = False
+        cache_hit_key: str | None = None
+        text: str | None = None
+
+        if cacheable:
+            cache_hit_key = _cache_key(tool_name, kwargs)
+            hit = _tool_result_cache.get(cache_hit_key)
+            if hit is not None:
+                logger.debug("Cache hit: %s(%s)", tool_name, kwargs)
+                text, cached_flag = hit, True
+
+        if text is None:
+            text = await executor(**kwargs)
+            if cacheable and cache_hit_key and not _looks_like_error(text):
+                _tool_result_cache[cache_hit_key] = text
+
+        inv_id = await graph.record_tool_invocation(
+            tool=tool_name, args=kwargs, output=text, cached=cached_flag,
+        )
+
+        # Auto-record the raw output as a phenomenon (single grounded fact).
+        # Skipped on error outputs and when no graph is present.
+        if auto_record_category and not _looks_like_error(text):
+            agent = getattr(graph, "_current_agent", "") or "unknown"
+            first_line = text.split("\n", 1)[0][:80]
+            try:
+                await graph.add_phenomenon(
+                    source_agent=agent,
+                    category=auto_record_category,
+                    title=f"{tool_name}: {first_line}",
+                    interpretation="(auto-recorded raw tool output)",
+                    verified_facts=[{
+                        "type": "raw",
+                        "value": text[:2000],
+                        "invocation_id": inv_id,
+                    }],
+                    source_tool=tool_name,
+                )
+            except GroundingError as e:
+                # Should never happen for auto-record (we just wrote the
+                # invocation; value is a literal prefix of output). Log
+                # loudly if it does — that's a bug, not a hallucination.
+                logger.error("Auto-record grounding failed for %s: %s", tool_name, e)
+
+        return f"[invocation: {inv_id}]\n{text}"
+
+    return wrapper
+
+
 def get_cache_stats() -> dict[str, int]:
     """Return cache statistics for diagnostics."""
     return {"entries": len(_tool_result_cache)}
@@ -77,12 +173,11 @@ ASSET_CATEGORIES = [
 ]
 
 
-def _auto_categorize(filename: str) -> str:
-    """Infer asset category from filename."""
+def _auto_categorize_windows(filename: str) -> str:
+    """Original Windows-leaning heuristic for disk-image-extracted artifacts."""
     name_lower = filename.lower()
     ext = os.path.splitext(name_lower)[1]
 
-    # Check full name (with extension) and base name against known hive names
     if name_lower in _REGISTRY_HIVE_NAMES:
         return "registry_hive"
     if ext == ".pf":
@@ -93,7 +188,7 @@ def _auto_categorize(filename: str) -> str:
         return "address_book"
     if name_lower == "info2" or re.match(r"dc\d+\.exe", name_lower):
         return "recycle_bin"
-    # Extension-based checks before keyword-based (e.g. mirc.ini → config, not chat)
+    # Extension-based checks before keyword-based (e.g. mirc.ini → config, not chat).
     if ext in (".ini", ".csv", ".dat", ".cfg"):
         return "config_file"
     if ext in (".log", ".lst"):
@@ -107,6 +202,49 @@ def _auto_categorize(filename: str) -> str:
     return "other"
 
 
+def _auto_categorize_ios(filename: str) -> str:
+    """iOS extraction heuristic — plist / sqlite / keychain land here.
+
+    Domain-rooted iOS extractions yield specific filenames (sms.db,
+    AddressBook.sqlitedb, keychain-2.db, *.plist) that the Windows
+    categorizer would dump into 'other' — fixing P4.
+    """
+    name_lower = filename.lower()
+    ext = os.path.splitext(name_lower)[1]
+
+    if name_lower == "keychain-2.db":
+        return "ios_keychain"
+    if name_lower in ("sms.db", "chatstorage.sqlite"):
+        return "messaging_db"
+    if name_lower in ("addressbook.sqlitedb", "addressbookimages.sqlitedb"):
+        return "address_book"
+    if name_lower == "idevice_info.txt":
+        return "device_info"
+    if ext in (".sqlite", ".sqlite3", ".sqlitedb", ".db"):
+        return "sqlite_db"
+    if ext == ".plist":
+        return "plist"
+    if ext in (".log",):
+        return "text_log"
+    return "other"
+
+
+# Per-source-type categorizers — dispatched by _auto_categorize at call time
+# based on graph.active_source.type. Solves P4 (Windows-only categorization).
+_CATEGORIZERS = {
+    "disk_image": _auto_categorize_windows,
+    "mobile_extraction": _auto_categorize_ios,
+    "archive": _auto_categorize_windows,
+    "media_collection": lambda fn: "other",
+}
+
+
+def _auto_categorize(filename: str, source_type: str = "disk_image") -> str:
+    """Dispatch to a source-type-aware categorizer (defaults to Windows)."""
+    fn = _CATEGORIZERS.get(source_type, _auto_categorize_windows)
+    return fn(filename)
+
+
 @dataclass
 class ToolDefinition:
     """A registered tool available for agent composition."""
@@ -123,44 +261,53 @@ class ToolDefinition:
 TOOL_CATALOG: dict[str, ToolDefinition] = {}
 
 
-def _make_auto_record(tool_name: str, category: str, executor: Any, graph: Any) -> Any:
-    """Wrap a forensic tool to auto-record its result as a phenomenon."""
-
-    async def wrapper(**kwargs) -> str:
-        result = await executor(**kwargs)
-        if graph is None or not result or result.startswith("Error") or result.startswith("["):
-            return result
-        # Auto-record: the tool produced a forensic fact
-        agent = getattr(graph, "_current_agent", "") or "unknown"
-        title = f"{tool_name}: {result.split(chr(10))[0][:80]}"
-        await graph.add_phenomenon(
-            source_agent=agent,
-            category=category,
-            title=title,
-            description=result[:2000],
-            source_tool=tool_name,
-        )
-        return result
-
-    return wrapper
+# Set of (tool_name, category) pairs that auto-record a phenomenon when run.
+# Replaces the pre-S2 ``_make_auto_record`` per-tool wrapping; the central
+# instrumentation pass at the end of register_all_tools applies these.
+AUTO_RECORD_TOOLS: dict[str, str] = {
+    "list_installed_software": "registry",
+    "get_system_info": "registry",
+    "get_timezone_info": "registry",
+    "get_computer_name": "registry",
+    "get_shutdown_time": "registry",
+    "enumerate_users": "registry",
+    "get_network_interfaces": "registry",
+    "get_email_config": "registry",
+    "parse_prefetch": "filesystem",
+}
 
 
-def register_all_tools(
-    image_path: str,
-    partition_offset: int,
-    graph: Any = None,
-    extracted_dir: str = "extracted",
-) -> None:
-    """Populate TOOL_CATALOG with all available tools, pre-bound to image/offset."""
+def register_all_tools(graph: Any) -> None:
+    """Populate TOOL_CATALOG with all available forensic tools.
+
+    Tools no longer close over a fixed image path. The Sleuth Kit tools
+    resolve the image path and partition offset from ``graph.active_source``
+    at call time, so the same registered tool follows whichever evidence
+    source the orchestrator has made active.
+    """
     TOOL_CATALOG.clear()
 
+    def _img() -> str:
+        """Resolve the active source's image path at tool-call time."""
+        src = getattr(graph, "active_source", None)
+        if src is None or not src.path:
+            raise RuntimeError(
+                "No active evidence source — call graph.set_active_source() first."
+            )
+        return src.path
+
+    def _off() -> int:
+        """Resolve the active source's partition offset at tool-call time."""
+        src = getattr(graph, "active_source", None)
+        return src.partition_offset if src is not None else 0
+
     # ---- Sleuth Kit tools ----
 
     TOOL_CATALOG["partition_info"] = ToolDefinition(
         name="partition_info",
         description="Get the partition table layout of the disk image. Run this first to understand disk structure.",
         input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.partition_info(image_path),
+        executor=lambda: tsk.partition_info(_img()),
         module="sleuthkit",
         tags=["filesystem", "disk", "partition"],
     )
@@ -169,7 +316,7 @@ def register_all_tools(
         name="filesystem_info",
         description="Get detailed filesystem information (type, block size, volume name, etc.) for the selected partition.",
         input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.filesystem_info(image_path, partition_offset),
+        executor=lambda: tsk.filesystem_info(_img(), _off()),
         module="sleuthkit",
         tags=["filesystem", "disk"],
     )
@@ -185,7 +332,7 @@ def register_all_tools(
             },
         },
         executor=lambda inode=None, recursive=False: tsk.list_directory(
-            image_path, partition_offset, inode, recursive
+            _img(), _off(), inode, recursive
         ),
         module="sleuthkit",
         tags=["filesystem", "directory", "listing"],
@@ -204,12 +351,13 @@ def register_all_tools(
                 )
 
         # Resolve real disk path first
-        orig_path = (await tsk.find_file(image_path, inode, partition_offset)).strip()
+        orig_path = (await tsk.find_file(_img(), inode, _off())).strip()
         if not orig_path or "not found" in orig_path.lower():
             return f"Error: inode {inode} not found on the disk image."
 
         # Derive local filename from real disk path
         filename = os.path.basename(orig_path)
+        extracted_dir = graph.extracted_dir
         local_path = os.path.join(extracted_dir, filename)
 
         # Handle name collisions by appending inode
@@ -219,12 +367,15 @@ def register_all_tools(
             filename = os.path.basename(local_path)
 
         # Extract
-        result = await tsk.extract_file(image_path, inode, local_path, partition_offset)
+        result = await tsk.extract_file(_img(), inode, local_path, _off())
         if result.startswith("[icat failed"):
             return result
 
         size = os.path.getsize(local_path) if os.path.exists(local_path) else 0
-        category = _auto_categorize(os.path.basename(orig_path))
+        src_type = (
+            graph.active_source.type if graph.active_source else "disk_image"
+        )
+        category = _auto_categorize(os.path.basename(orig_path), src_type)
 
         # Register
         if graph is not None:
@@ -275,7 +426,7 @@ def register_all_tools(
             },
             "required": ["inode"],
         },
-        executor=lambda inode: tsk.find_file(image_path, inode, partition_offset),
+        executor=lambda inode: tsk.find_file(_img(), inode, _off()),
         module="sleuthkit",
         tags=["filesystem"],
     )
@@ -290,7 +441,7 @@ def register_all_tools(
             },
             "required": ["pattern"],
         },
-        executor=lambda pattern: tsk.search_strings(image_path, pattern),
+        executor=lambda pattern: tsk.search_strings(_img(), pattern),
         module="sleuthkit",
         tags=["filesystem", "search", "strings"],
     )
@@ -299,7 +450,7 @@ def register_all_tools(
         name="count_deleted_files",
         description="List and count all deleted files. Shows total count, executables, and extension breakdown.",
         input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.count_deleted_files(image_path, partition_offset),
+        executor=lambda: tsk.count_deleted_files(_img(), _off()),
         module="sleuthkit",
         tags=["filesystem", "deleted", "recovery"],
     )
@@ -308,7 +459,7 @@ def register_all_tools(
         name="build_filesystem_timeline",
         description="Build a MAC timeline from the filesystem (Modified/Accessed/Changed times for all files).",
         input_schema={"type": "object", "properties": {}},
-        executor=lambda: tsk.build_timeline(image_path, partition_offset),
+        executor=lambda: tsk.build_timeline(_img(), _off()),
         module="sleuthkit",
         tags=["filesystem", "timeline"],
     )
@@ -341,8 +492,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("list_installed_software", "registry",
-                                   lambda hive_path: reg.list_installed_software(hive_path), graph),
+        executor=lambda hive_path: reg.list_installed_software(hive_path),
         module="registry",
         tags=["registry", "software", "installed"],
     )
@@ -390,8 +540,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("get_system_info", "registry",
-                                   lambda hive_path: reg.get_system_info(hive_path), graph),
+        executor=lambda hive_path: reg.get_system_info(hive_path),
         module="registry",
         tags=["registry", "system"],
     )
@@ -406,8 +555,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("get_timezone_info", "registry",
-                                   lambda hive_path: reg.get_timezone_info(hive_path), graph),
+        executor=lambda hive_path: reg.get_timezone_info(hive_path),
         module="registry",
         tags=["registry", "timezone", "system"],
     )
@@ -422,8 +570,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("get_computer_name", "registry",
-                                   lambda hive_path: reg.get_computer_name(hive_path), graph),
+        executor=lambda hive_path: reg.get_computer_name(hive_path),
         module="registry",
         tags=["registry", "system", "hostname"],
     )
@@ -438,8 +585,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("get_shutdown_time", "registry",
-                                   lambda hive_path: reg.get_shutdown_time(hive_path), graph),
+        executor=lambda hive_path: reg.get_shutdown_time(hive_path),
         module="registry",
         tags=["registry", "system", "shutdown"],
     )
@@ -454,8 +600,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("enumerate_users", "registry",
-                                   lambda hive_path: reg.enumerate_users(hive_path), graph),
+        executor=lambda hive_path: reg.enumerate_users(hive_path),
         module="registry",
         tags=["registry", "user", "accounts", "sam"],
     )
@@ -470,8 +615,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("get_network_interfaces", "registry",
-                                   lambda hive_path: reg.get_network_interfaces(hive_path), graph),
+        executor=lambda hive_path: reg.get_network_interfaces(hive_path),
         module="registry",
         tags=["registry", "network", "adapter", "ip"],
     )
@@ -486,8 +630,7 @@ def register_all_tools(
             },
             "required": ["hive_path"],
         },
-        executor=_make_auto_record("get_email_config", "registry",
-                                   lambda hive_path: reg.get_email_config(hive_path), graph),
+        executor=lambda hive_path: reg.get_email_config(hive_path),
         module="registry",
         tags=["registry", "email", "account"],
     )
@@ -504,8 +647,7 @@ def register_all_tools(
             },
             "required": ["file_path"],
         },
-        executor=_make_auto_record("parse_prefetch", "filesystem",
-                                   lambda file_path: parsers.parse_prefetch(file_path), graph),
+        executor=lambda file_path: parsers.parse_prefetch(file_path),
         module="parsers",
         tags=["filesystem", "prefetch", "execution"],
     )
@@ -577,7 +719,13 @@ def register_all_tools(
 
     TOOL_CATALOG["list_extracted_dir"] = ToolDefinition(
         name="list_extracted_dir",
-        description="List files in an extracted directory with sizes.",
+        description=(
+            "Summarise an extracted directory tree: total counts, "
+            "extension breakdown, top-level layout, largest files. "
+            "Scales to 10k+-file trees without truncating into uselessness. "
+            "For targeted searches (find every *.plist, locate sms.db, ...) "
+            "use find_files instead."
+        ),
         input_schema={
             "type": "object",
             "properties": {
@@ -590,6 +738,31 @@ def register_all_tools(
         tags=["filesystem", "listing", "extracted"],
     )
 
+    TOOL_CATALOG["find_files"] = ToolDefinition(
+        name="find_files",
+        description=(
+            "Recursively find files under a directory by glob pattern. "
+            "Use this on tree-mode sources (iOS extractions, archives, "
+            "Android-mounted partitions) to locate specific artefacts in "
+            "huge trees. Patterns are fnmatch-style; '**' means 'any "
+            "depth'. Examples: '**/sms.db', '**/keychain-2.db', "
+            "'**/ChatStorage.sqlite', '**/*.plist', 'HomeDomain/Library/**'. "
+            "Results sort by size descending; capped at max_results."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "root": {"type": "string", "description": "Directory to search under."},
+                "pattern": {"type": "string", "description": "fnmatch glob pattern (use '**' for any depth)."},
+                "max_results": {"type": "integer", "description": "Result cap (default 500)."},
+            },
+            "required": ["root", "pattern"],
+        },
+        executor=lambda root, pattern, max_results=500: parsers.find_files(root, pattern, max_results),
+        module="parsers",
+        tags=["filesystem", "search", "extracted", "glob"],
+    )
+
     TOOL_CATALOG["parse_pcap_strings"] = ToolDefinition(
         name="parse_pcap_strings",
         description="Extract HTTP headers, hosts, User-Agent, cookies, and URLs from a PCAP/capture file.",
@@ -605,11 +778,224 @@ def register_all_tools(
         tags=["network", "pcap", "http", "capture"],
     )
 
-    # ---- Apply result caching to deterministic read-only tools ----
-    # Must come AFTER all tools are registered. Auto-record wrapped tools
-    # (e.g. get_system_info) are NOT in CACHEABLE_TOOLS since they write
-    # to the evidence graph as a side effect.
+    # ---- Archive tools (tree-mode prep) ----
+
+    TOOL_CATALOG["unzip_archive"] = ToolDefinition(
+        name="unzip_archive",
+        description=(
+            "Extract a .zip archive into a target directory. Defensive against "
+            "zip-slip; skips symlinks. Idempotent on rerun. Pass `password` for "
+            "password-protected zips — only the legacy ZipCrypto algorithm is "
+            "supported by stdlib (AES zips need an external `7z x` step)."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "zip_path": {"type": "string", "description": "Path to the .zip file."},
+                "dest_dir": {"type": "string", "description": "Directory to extract into (created if missing)."},
+                "password": {"type": "string", "description": "Password for encrypted zips (omit for plain archives)."},
+            },
+            "required": ["zip_path", "dest_dir"],
+        },
+        executor=lambda zip_path, dest_dir, password=None: arc.unzip_archive(zip_path, dest_dir, password),
+        module="archive",
+        tags=["archive", "zip", "extract", "ingest"],
+    )
+
+    # ---- iOS plugin tools (DESIGN.md §4.7) ----
+
+    TOOL_CATALOG["parse_plist"] = ToolDefinition(
+        name="parse_plist",
+        description=(
+            "Parse a .plist file (XML or binary) and return its contents as JSON. "
+            "Bytes are rendered as hex; dates as ISO-8601."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "file_path": {"type": "string", "description": "Path to .plist file."},
+            },
+            "required": ["file_path"],
+        },
+        executor=lambda file_path: ios.parse_plist(file_path),
+        module="mobile_ios",
+        tags=["ios", "plist", "parse"],
+    )
+
+    TOOL_CATALOG["sqlite_tables"] = ToolDefinition(
+        name="sqlite_tables",
+        description=(
+            "List user tables in a sqlite database with row counts and column "
+            "names. Use this to scout an unfamiliar .sqlite / .db file before "
+            "querying it."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "db_path": {"type": "string", "description": "Path to .sqlite/.db file."},
+            },
+            "required": ["db_path"],
+        },
+        executor=lambda db_path: ios.sqlite_tables(db_path),
+        module="mobile_ios",
+        tags=["sqlite", "schema", "ios", "android"],
+    )
+
+    TOOL_CATALOG["sqlite_query"] = ToolDefinition(
+        name="sqlite_query",
+        description=(
+            "Run a single read-only SELECT against a sqlite file. "
+            "Multi-statement queries and non-SELECT statements are rejected. "
+            "Use this for sms.db / ChatStorage.sqlite / AddressBook.sqlitedb / etc."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "db_path": {"type": "string", "description": "Path to .sqlite/.db file."},
+                "query": {"type": "string", "description": "A single SELECT statement."},
+                "max_rows": {"type": "integer", "description": "Row cap (default 100)."},
+            },
+            "required": ["db_path", "query"],
+        },
+        executor=lambda db_path, query, max_rows=100: ios.sqlite_query(db_path, query, max_rows),
+        module="mobile_ios",
+        tags=["sqlite", "query", "ios", "android"],
+    )
+
+    TOOL_CATALOG["parse_ios_keychain"] = ToolDefinition(
+        name="parse_ios_keychain",
+        description=(
+            "Locate and summarise iOS keychain entries (keychain-2.db). "
+            "Pass either the db file directly or the containing directory; "
+            "dumps accounting metadata from genp/inet/cert/keys tables."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "keychain_root": {
+                    "type": "string",
+                    "description": "Path to keychain-2.db or a directory that contains it.",
+                },
+            },
+            "required": ["keychain_root"],
+        },
+        executor=lambda keychain_root: ios.parse_ios_keychain(keychain_root),
+        module="mobile_ios",
+        tags=["ios", "keychain", "credentials"],
+    )
+
+    TOOL_CATALOG["read_idevice_info"] = ToolDefinition(
+        name="read_idevice_info",
+        description=(
+            "Read the iDevice_info.txt summary at the root of an iOS extraction. "
+            "Pass the file path or the extraction root directory."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "file_path": {"type": "string", "description": "Path to iDevice_info.txt or extraction root."},
+            },
+            "required": ["file_path"],
+        },
+        executor=lambda file_path: ios.read_idevice_info(file_path),
+        module="mobile_ios",
+        tags=["ios", "device", "metadata"],
+    )
+
+    # ---- Android plugin (DESIGN.md §4.7) ----
+
+    TOOL_CATALOG["probe_android_partitions"] = ToolDefinition(
+        name="probe_android_partitions",
+        description=(
+            "Survey every partition on an Android disk dump (mmls + per-"
+            "partition fsstat). Returns a markdown table with name, native "
+            "and 512-byte sector offsets, filesystem type, and a strategy "
+            "hint per partition. Use this BEFORE deciding which partitions "
+            "to dive into via set_active_partition + list_directory."
+        ),
+        input_schema={"type": "object", "properties": {}},
+        executor=lambda: android.probe_android_partitions(_img()),
+        module="mobile_android",
+        tags=["android", "partition", "survey"],
+    )
+
+    async def _set_active_partition(partition_offset: int) -> str:
+        src = getattr(graph, "active_source", None)
+        if src is None:
+            return "Error: no active evidence source."
+        old = src.partition_offset
+        new = int(partition_offset)
+        src.partition_offset = new
+        # Sync the legacy mirror field so older readers stay consistent.
+        graph.partition_offset = new
+        return (
+            f"Active partition offset: {old} → {new} (512-byte sectors). "
+            f"Subsequent list_directory / extract_file / search_strings "
+            f"calls now target this partition on {src.id} ({src.label})."
+        )
+
+    TOOL_CATALOG["set_active_partition"] = ToolDefinition(
+        name="set_active_partition",
+        description=(
+            "Switch the current partition offset (in 512-byte sectors) on "
+            "the active disk-image source. Use the values from "
+            "probe_android_partitions's '512-sector' column. NOT a "
+            "forensic read — purely repoints the TSK toolset. Mutates "
+            "shared state; call serially within one agent run."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "partition_offset": {
+                    "type": "integer",
+                    "description": "Partition start in 512-byte sectors.",
+                },
+            },
+            "required": ["partition_offset"],
+        },
+        executor=_set_active_partition,
+        module="android",
+        tags=["android", "partition", "navigation"],
+    )
+
+    # ---- Media plugin (DESIGN.md §4.7) ----
+
+    TOOL_CATALOG["ocr_image"] = ToolDefinition(
+        name="ocr_image",
+        description=(
+            "Extract text from an image via tesseract. The LLM backend has "
+            "no vision, so this is the only way to read JPEG/PNG evidence "
+            "(screenshots of chats, transactions, IDs). Default lang covers "
+            "English + Simplified & Traditional Chinese; override `lang` "
+            "if you know the artefact's language. Returns 'Error: OCR "
+            "runtime not available' with an install hint when tesseract "
+            "isn't on the host — record that absence as a negative "
+            "finding rather than guessing."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "file_path": {"type": "string", "description": "Path to image file."},
+                "lang": {"type": "string", "description": "Tesseract language code(s), e.g. 'eng' or 'eng+chi_sim'."},
+            },
+            "required": ["file_path"],
+        },
+        executor=lambda file_path, lang="eng+chi_sim+chi_tra": med.ocr_image(file_path, lang),
+        module="media",
+        tags=["media", "ocr", "image"],
+    )
+
+    # ---- Wrap every executor with invocation logging (+ cache + auto-record) ----
+    # Must run AFTER all tools are registered. Every tool call now produces
+    # a ToolInvocation entry on the graph (provenance for grounding), and
+    # returns the result prefixed with ``[invocation: inv-xxx]`` so the LLM
+    # can cite the call in add_phenomenon facts.
     _tool_result_cache.clear()
     for tool_name, td in TOOL_CATALOG.items():
-        if tool_name in CACHEABLE_TOOLS:
-            td.executor = _make_cached(tool_name, td.executor)
+        td.executor = _make_invocation_executor(
+            tool_name,
+            td.executor,
+            graph,
+            cacheable=(tool_name in CACHEABLE_TOOLS),
+            auto_record_category=AUTO_RECORD_TOOLS.get(tool_name),
+        )
diff --git a/tools/archive.py b/tools/archive.py
new file mode 100644
index 0000000..42ab1f7
--- /dev/null
+++ b/tools/archive.py
@@ -0,0 +1,156 @@
+"""Archive extraction tools — generic unzip for tree-mode evidence sources.
+
+Mobile extractions (iOS / Android backups), archive sources, and shared
+work products all arrive as .zip files. The forensic agents work on the
+unpacked tree; this module is the single entry point for safely turning
+an archive into a directory.
+
+Stdlib-only. No graph dependency.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import zipfile
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def _is_within(base: Path, target: Path) -> bool:
+    """True when *target* resolves to a path inside *base* — symlink-safe."""
+    try:
+        base_r = base.resolve()
+        target_r = target.resolve()
+    except OSError:
+        return False
+    try:
+        target_r.relative_to(base_r)
+    except ValueError:
+        return False
+    return True
+
+
+def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool:
+    """True when any entry has the zip 'encrypted' flag bit set."""
+    return any(info.flag_bits & 0x1 for info in zf.infolist())
+
+
+def _do_extract(
+    zip_path: str,
+    dest_dir: str,
+    password: str | None = None,
+) -> str:
+    """Shared core for unzip_archive (async) and unzip_archive_sync.
+
+    Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files
+    whose target already exists at the matching size are skipped). Returns
+    a multi-line summary the agent can read directly.
+    """
+    zp = Path(zip_path)
+    if not zp.is_file():
+        return f"Error: {zip_path} is not a file."
+
+    dest = Path(dest_dir)
+    dest.mkdir(parents=True, exist_ok=True)
+
+    extracted = 0
+    skipped: list[str] = []
+    total_bytes = 0
+    pwd_bytes = password.encode("utf-8") if password else None
+
+    try:
+        with zipfile.ZipFile(zp, "r") as zf:
+            encrypted = _is_zip_encrypted(zf)
+            if encrypted and pwd_bytes is None:
+                return (
+                    f"Error: {zip_path} is password-protected. "
+                    f"Provide the password via case.yaml's "
+                    f"meta.password on this source, or pass `password=` "
+                    f"explicitly. Stdlib zipfile only supports the legacy "
+                    f"ZipCrypto algorithm — AES-encrypted zips (created by "
+                    f"7-Zip / WinZip) need an external tool like 7z."
+                )
+            for info in zf.infolist():
+                name = info.filename
+                # Block absolute paths and parent-escape attempts up front.
+                if name.startswith(("/", "\\")) or ".." in Path(name).parts:
+                    skipped.append(f"escape: {name}")
+                    continue
+                target = dest / name
+                if not _is_within(dest, target):
+                    skipped.append(f"escape: {name}")
+                    continue
+                # Symlink entries — skip rather than risk traversing out.
+                if info.external_attr >> 16 & 0o120000 == 0o120000:
+                    skipped.append(f"symlink: {name}")
+                    continue
+                if info.is_dir():
+                    target.mkdir(parents=True, exist_ok=True)
+                    continue
+                # Skip if already extracted with matching size (idempotent rerun).
+                if target.exists() and target.stat().st_size == info.file_size:
+                    continue
+                target.parent.mkdir(parents=True, exist_ok=True)
+                try:
+                    with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out:
+                        while True:
+                            chunk = src.read(65536)
+                            if not chunk:
+                                break
+                            out.write(chunk)
+                except RuntimeError as e:
+                    # zipfile raises RuntimeError for bad-password / AES-encrypted.
+                    msg = str(e)
+                    if "Bad password" in msg or "password required" in msg:
+                        return (
+                            f"Error: bad or missing password for {zip_path}. "
+                            f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib "
+                            f"cannot decrypt it — use `7z x -p<pwd> ...` "
+                            f"externally and point the source path at the result."
+                        )
+                    raise
+                extracted += 1
+                total_bytes += info.file_size
+    except zipfile.BadZipFile as e:
+        return f"Error: {zip_path} is not a valid zip archive: {e}"
+    except Exception as e:
+        return f"Error extracting {zip_path}: {e}"
+
+    parts = [
+        f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}",
+    ]
+    if skipped:
+        parts.append(f"Skipped {len(skipped)} unsafe entries:")
+        for s in skipped[:10]:
+            parts.append(f"  - {s}")
+        if len(skipped) > 10:
+            parts.append(f"  ... ({len(skipped) - 10} more)")
+    return "\n".join(parts)
+
+
+async def unzip_archive(
+    zip_path: str, dest_dir: str, password: str | None = None,
+) -> str:
+    """Extract *zip_path* into *dest_dir*. Idempotent on rerun.
+
+    Defensive: rejects entries with absolute paths, leading '..', or that
+    would resolve outside *dest_dir* (the classic zip-slip vector). Symlink
+    entries are skipped (we never follow symlinks into the host filesystem).
+    Password-protected zips need the password argument (or
+    ``meta.password`` on the source in case.yaml) — stdlib ``zipfile``
+    only handles the legacy ZipCrypto algorithm.
+    """
+    return _do_extract(zip_path, dest_dir, password)
+
+
+def unzip_archive_sync(
+    zip_path: str, dest_dir: str, password: str | None = None,
+) -> str:
+    """Synchronous variant of :func:`unzip_archive` for startup-time prepare_source.
+
+    Same behaviour, just no async wrapping — used before the event loop
+    starts so we don't have to spin one up just to unpack a zip.
+    """
+    return _do_extract(zip_path, dest_dir, password)
diff --git a/tools/media.py b/tools/media.py
new file mode 100644
index 0000000..7738107
--- /dev/null
+++ b/tools/media.py
@@ -0,0 +1,87 @@
+"""Media plugin — OCR for image evidence.
+
+DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run
+OCR locally for any image-bearing evidence. Tesseract via pytesseract is
+the default; if the runtime is missing those packages, the tool returns a
+clear install hint rather than failing silently.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT = 8000
+
+_INSTALL_HINT = (
+    "Error: OCR runtime not available. Install with:\n"
+    "  pip install pytesseract pillow\n"
+    "  sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n"
+    "(or the equivalent for your distribution). Then retry."
+)
+
+
+def _has_ocr_runtime() -> tuple[bool, str]:
+    """Return (available, reason). reason is empty when available."""
+    try:
+        import pytesseract  # noqa: F401
+        from PIL import Image  # noqa: F401
+    except ImportError as e:
+        return False, f"missing python package: {e.name}"
+    # Check the tesseract binary too.
+    import shutil
+    if shutil.which("tesseract") is None:
+        return False, "tesseract binary not on PATH"
+    return True, ""
+
+
+async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str:
+    """Extract text from an image via tesseract.
+
+    *lang* defaults to English + Simplified + Traditional Chinese, matching
+    the multi-language artefacts the current case involves. Pass a single
+    language code (e.g. ``"eng"``) to skip language packs that aren't
+    installed.
+    """
+    p = Path(file_path)
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    available, reason = _has_ocr_runtime()
+    if not available:
+        return f"{_INSTALL_HINT}\n[detail: {reason}]"
+
+    import pytesseract
+    from PIL import Image
+
+    try:
+        img = Image.open(p)
+    except Exception as e:
+        return f"Error: could not open image {file_path}: {e}"
+
+    try:
+        text = pytesseract.image_to_string(img, lang=lang)
+    except pytesseract.TesseractError as e:
+        msg = str(e)
+        if "Failed loading language" in msg or "Error opening data file" in msg:
+            return (
+                f"Error: tesseract is installed but missing language pack(s) for {lang!r}. "
+                f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a "
+                f"different `lang`. Detail: {msg}"
+            )
+        return f"Error running tesseract: {msg}"
+    except Exception as e:
+        return f"Error during OCR: {e}"
+
+    size = p.stat().st_size
+    header = (
+        f"ocr: {file_path} ({size} bytes, lang={lang}, "
+        f"{len(text.splitlines())} line(s))\n"
+    )
+    if len(text) > MAX_OUTPUT - len(header):
+        body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]"
+    else:
+        body = text
+    return header + body
diff --git a/tools/mobile_android.py b/tools/mobile_android.py
new file mode 100644
index 0000000..5d6e44b
--- /dev/null
+++ b/tools/mobile_android.py
@@ -0,0 +1,160 @@
+"""Android plugin tools — partition survey + sector translation.
+
+DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source;
+``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK
+toolchain already handles ext4/F2FS reads, so once the agent picks a partition
+offset the standard list_directory / extract_file / search_strings tools work.
+
+Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but
+TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions``
+emits BOTH unit systems so the agent can plug the right ``partition_offset``
+value into ``set_active_partition``.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import re
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+MAX_OUTPUT = 8000
+
+# Partitions worth flagging when we encounter them — informs the agent's
+# strategy. Not exhaustive; just opinionated hints.
+_PARTITION_HINTS: dict[str, str] = {
+    "EFS":      "modem firmware area; often contains IMEI / MAC / serial",
+    "PARAM":    "boot parameters; cmdline + flags",
+    "BOOT":     "kernel + initramfs (raw image)",
+    "RECOVERY": "recovery image (raw)",
+    "SYSTEM":   "Android /system — read-only OS partition (ext4)",
+    "CACHE":    "downloaded OTA payloads; usually transient",
+    "USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices",
+    "PERSISTENT": "Samsung persistent partition; carrier/device flags",
+    "STEADY":   "Samsung steady-state config",
+    "HIDDEN":   "Samsung hidden partition; check before assuming empty",
+    "CP_DEBUG": "modem debug logs",
+    "TOMBSTONES": "userland crash dumps",
+}
+
+
+def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]:
+    """Parse mmls output, returning (sector_size_bytes, partitions).
+
+    mmls states ``Units are in N-byte sectors`` near the top; we extract N
+    to translate between image-native units and the 512-byte units TSK
+    tools accept via ``-o``.
+    """
+    sector_size = 512
+    m = re.search(r"Units are in (\d+)-byte sectors", output)
+    if m:
+        sector_size = int(m.group(1))
+
+    parts: list[dict] = []
+    for line in output.splitlines():
+        m = re.match(
+            r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)",
+            line,
+        )
+        if not m:
+            continue
+        _row, slot, start, end, length, desc = m.groups()
+        if slot == "Meta" or slot.startswith("---"):
+            continue
+        parts.append({
+            "slot": slot,
+            "start_native": int(start),
+            "end_native": int(end),
+            "length_native": int(length),
+            "description": desc.strip(),
+        })
+    return sector_size, parts
+
+
+async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]:
+    proc = await asyncio.create_subprocess_exec(
+        *cmd,
+        stdout=asyncio.subprocess.PIPE,
+        stderr=asyncio.subprocess.PIPE,
+    )
+    try:
+        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+    except asyncio.TimeoutError:
+        proc.kill()
+        return 124, "", f"timeout after {timeout}s"
+    return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace")
+
+
+_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE)
+
+
+async def _classify_partition(image_path: str, sector_offset_512: int) -> str:
+    """Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'.
+
+    fsstat's "Cannot determine file system type" is treated as 'unknown' —
+    typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data
+    (modern userdata under FBE).
+    """
+    rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15)
+    if rc != 0:
+        return "unknown"
+    m = _FS_TYPE_RE.search(out)
+    if m:
+        return m.group(1)
+    return "unknown"
+
+
+async def probe_android_partitions(image_path: str) -> str:
+    """Survey every partition on an Android disk dump and return a table.
+
+    The agent reads this once to plan its work: which partitions are
+    Ext4/F2FS (use TSK), which are raw (extract image / strings only),
+    which are encrypted (skip until decrypted).
+    """
+    p = Path(image_path)
+    if not p.is_file():
+        return f"Error: {image_path} is not a file."
+
+    rc, out, err = await _run(["mmls", str(p)], timeout=30)
+    if rc != 0:
+        return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}"
+
+    sector_size, parts = _parse_mmls_with_unit(out)
+    if not parts:
+        return f"No partitions detected in {image_path}."
+
+    lines = [
+        f"Android partition survey: {image_path}",
+        f"  mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)",
+        f"  {len(parts)} data partitions",
+        "",
+        "| slot | name | start (native) | start (512-sector) | size | fs_type | hint |",
+        "|---|---|---:|---:|---|---|---|",
+    ]
+    for prt in parts:
+        sector_512 = prt["start_native"] * sector_size // 512
+        bytes_size = prt["length_native"] * sector_size
+        # human-readable size
+        if bytes_size >= 1 << 30:
+            size_h = f"{bytes_size / (1 << 30):.1f} GB"
+        elif bytes_size >= 1 << 20:
+            size_h = f"{bytes_size / (1 << 20):.1f} MB"
+        else:
+            size_h = f"{bytes_size // 1024} KB"
+        fs_type = await _classify_partition(str(p), sector_512)
+        # Try to extract a friendly partition name from the description
+        # (mmls description often includes the partition name uppercase).
+        name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"])
+        pname = name_match.group(0) if name_match else prt["description"][:20]
+        hint = _PARTITION_HINTS.get(pname, "")
+        lines.append(
+            f"| {prt['slot']} | {pname} | {prt['start_native']} | "
+            f"{sector_512} | {size_h} | {fs_type} | {hint} |"
+        )
+
+    body = "\n".join(lines)
+    if len(body) > MAX_OUTPUT:
+        body = body[:MAX_OUTPUT] + "\n\n[truncated]"
+    return body
diff --git a/tools/mobile_ios.py b/tools/mobile_ios.py
new file mode 100644
index 0000000..cfc62d0
--- /dev/null
+++ b/tools/mobile_ios.py
@@ -0,0 +1,274 @@
+"""iOS extraction parsers — plist / sqlite / keychain / iDevice info.
+
+DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth
+Kit, no graph dependency. Stdlib + sqlite3 only.
+
+iOS extractions typically arrive as a zip containing domain-rooted trees
+(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary,
+binary/XML plists, and several SQLite databases (sms.db, AddressBook,
+keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite).
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import os
+import plistlib
+import re
+import sqlite3
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Output cap (chars) — keeps a single tool result under the LLM context budget.
+MAX_OUTPUT = 8000
+
+
+def _trunc(text: str, limit: int = MAX_OUTPUT) -> str:
+    if len(text) <= limit:
+        return text
+    return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]"
+
+
+# ---------------------------------------------------------------------------
+# plist
+# ---------------------------------------------------------------------------
+
+def _to_jsonable(obj):
+    """Make plist values JSON-serializable: bytes → hex preview, dates → iso."""
+    import datetime
+    if isinstance(obj, bytes):
+        if len(obj) <= 64:
+            return {"_bytes_hex": obj.hex()}
+        return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)}
+    if isinstance(obj, datetime.datetime):
+        return obj.isoformat()
+    if isinstance(obj, dict):
+        return {str(k): _to_jsonable(v) for k, v in obj.items()}
+    if isinstance(obj, (list, tuple)):
+        return [_to_jsonable(v) for v in obj]
+    return obj
+
+
+async def parse_plist(file_path: str) -> str:
+    """Parse a .plist file (XML or binary) and return its contents as JSON.
+
+    Both formats are handled transparently by ``plistlib.load``.
+    """
+    p = Path(file_path)
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    try:
+        with open(p, "rb") as f:
+            data = plistlib.load(f)
+    except plistlib.InvalidFileException as e:
+        return f"Error: {file_path} is not a valid plist ({e})"
+    except Exception as e:
+        return f"Error parsing plist {file_path}: {e}"
+
+    serial = _to_jsonable(data)
+    rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str)
+    header = f"plist: {file_path} ({p.stat().st_size} bytes)\n"
+    return header + _trunc(rendered)
+
+
+# ---------------------------------------------------------------------------
+# sqlite
+# ---------------------------------------------------------------------------
+
+_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE)
+
+
+async def sqlite_tables(db_path: str) -> str:
+    """List user tables in a sqlite file with row counts and column names."""
+    p = Path(db_path)
+    if not p.is_file():
+        return f"Error: {db_path} is not a file."
+    try:
+        conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db_path} (read-only): {e}"
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT name FROM sqlite_master "
+            "WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
+        )
+        tables = [r[0] for r in cur.fetchall()]
+        if not tables:
+            return f"No user tables in {db_path}."
+        lines = [f"sqlite: {db_path} ({len(tables)} tables)"]
+        for name in tables:
+            try:
+                cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
+                count = cur.fetchone()[0]
+            except sqlite3.DatabaseError as e:
+                count = f"(count failed: {e})"
+            try:
+                cur.execute(f"PRAGMA table_info(\"{name}\")")
+                cols = [r[1] for r in cur.fetchall()]
+            except sqlite3.DatabaseError:
+                cols = []
+            lines.append(f"  {name}: {count} row(s); cols: {', '.join(cols)}")
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+async def sqlite_query(
+    db_path: str,
+    query: str,
+    max_rows: int = 100,
+) -> str:
+    """Run a single read-only SELECT against a sqlite file.
+
+    Multi-statement queries and anything other than a SELECT are rejected
+    (we open the database in read-only mode anyway, so writes would fail
+    too — but the explicit check keeps the agent honest).
+    """
+    if not _SELECT_RE.match(query):
+        return "Error: only single SELECT statements are allowed."
+    if ";" in query.rstrip(";"):
+        return "Error: multi-statement queries are not allowed."
+
+    p = Path(db_path)
+    if not p.is_file():
+        return f"Error: {db_path} is not a file."
+    try:
+        conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db_path} (read-only): {e}"
+
+    try:
+        cur = conn.cursor()
+        try:
+            cur.execute(query)
+        except sqlite3.DatabaseError as e:
+            return f"Error executing query: {e}"
+        cols = [d[0] for d in cur.description] if cur.description else []
+        rows = cur.fetchmany(max(1, int(max_rows)))
+        lines = [
+            f"sqlite query: {db_path}",
+            f"columns: {cols}",
+            f"rows ({len(rows)}, capped at {max_rows}):",
+        ]
+        for row in rows:
+            rendered = [
+                (v.hex() if isinstance(v, bytes) else str(v))
+                for v in row
+            ]
+            lines.append("  " + " | ".join(rendered))
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+# ---------------------------------------------------------------------------
+# iOS keychain (keychain-2.db)
+# ---------------------------------------------------------------------------
+
+# Standard iOS keychain tables. genp = generic passwords, inet = internet
+# passwords, cert = certificates, keys = key material. Forensic extractions
+# of locked keychains have ``data`` columns NULL but accounting metadata
+# (agrp, acct, svce) intact — already useful for attribution work.
+_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys")
+
+
+async def parse_ios_keychain(keychain_root: str) -> str:
+    """Locate and summarize iOS keychain entries under *keychain_root*.
+
+    *keychain_root* may be a path to ``keychain-2.db`` directly or to a
+    directory that contains it (e.g. ``.../var/keychains``).
+    """
+    root = Path(keychain_root)
+    db: Path | None = None
+    if root.is_file() and root.name == "keychain-2.db":
+        db = root
+    elif root.is_dir():
+        candidate = root / "keychain-2.db"
+        if candidate.is_file():
+            db = candidate
+        else:
+            # Fall back to a shallow recursive search.
+            for found in root.rglob("keychain-2.db"):
+                db = found
+                break
+    if db is None:
+        return f"No keychain-2.db found under {keychain_root}."
+
+    try:
+        conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
+    except sqlite3.OperationalError as e:
+        return f"Error opening {db}: {e}"
+
+    try:
+        cur = conn.cursor()
+        cur.execute(
+            "SELECT name FROM sqlite_master "
+            "WHERE type='table' AND name IN ({})".format(
+                ",".join("?" * len(_KEYCHAIN_TABLES))
+            ),
+            _KEYCHAIN_TABLES,
+        )
+        present = [r[0] for r in cur.fetchall()]
+        if not present:
+            return f"keychain-2.db at {db} has no recognised tables."
+
+        lines = [f"keychain: {db}"]
+        for name in present:
+            cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
+            count = cur.fetchone()[0]
+            lines.append(f"\n[{name}] {count} row(s)")
+            cur.execute(f"PRAGMA table_info(\"{name}\")")
+            cols = [r[1] for r in cur.fetchall()]
+            # Pick a useful subset of accounting columns when present.
+            preferred = [
+                c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr")
+                if c in cols
+            ]
+            if not preferred:
+                preferred = cols[:5]
+            sel = ", ".join(f'"{c}"' for c in preferred)
+            cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30")
+            for row in cur.fetchall():
+                lines.append("  " + " | ".join(
+                    (v.hex() if isinstance(v, bytes) else str(v))
+                    for v in row
+                ))
+        return _trunc("\n".join(lines))
+    finally:
+        conn.close()
+
+
+# ---------------------------------------------------------------------------
+# iDevice_info.txt
+# ---------------------------------------------------------------------------
+
+async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str:
+    """Read the standard iDevice_info.txt summary at the root of an iOS extraction.
+
+    The file is a flat ``Key: value`` dump from libimobiledevice / native
+    extraction tools. We surface the first *max_chars* of content verbatim
+    — the agent can search/extract specific keys via search_text_file if
+    the head isn't enough.
+    """
+    p = Path(file_path)
+    if p.is_dir():
+        # Be helpful: if the agent passed the extraction root, find the file.
+        candidate = p / "iDevice_info.txt"
+        if candidate.is_file():
+            p = candidate
+    if not p.is_file():
+        return f"Error: {file_path} is not a file."
+    try:
+        with open(p, "r", encoding="utf-8", errors="replace") as f:
+            content = f.read(max_chars)
+        size = p.stat().st_size
+        header = f"iDevice_info: {p} ({size} bytes)\n"
+        if size > max_chars:
+            content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]"
+        return header + content
+    except Exception as e:
+        return f"Error reading {file_path}: {e}"
diff --git a/tools/parsers.py b/tools/parsers.py
index 186613a..967ed9f 100644
--- a/tools/parsers.py
+++ b/tools/parsers.py
@@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str:
         return f"[Error parsing Prefetch: {e}]"
 
 
-async def list_extracted_dir(dir_path: str) -> str:
-    """List files in an extracted directory."""
+async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str:
+    """Smart summary of a (potentially huge) extracted tree.
+
+    Earlier versions dumped up to 200 random entries then truncated — that
+    leaves the agent blind on 10k+-file iOS extractions. The new layout
+    returns a compact summary that scales: total counts, extension
+    breakdown, top-level directories with their sizes, and the largest
+    files. For targeted lookups (e.g. find every ``*.sqlite`` under the
+    tree) the agent should use ``find_files`` instead.
+    """
+    if not os.path.isdir(dir_path):
+        return f"[Error: {dir_path} is not a directory]"
+
     try:
-        entries = []
-        for root, dirs, files in os.walk(dir_path):
+        total_files = 0
+        total_bytes = 0
+        ext_counts: dict[str, int] = {}
+        ext_bytes: dict[str, int] = {}
+        top_level_dirs: dict[str, dict] = {}
+        biggest: list[tuple[int, str]] = []   # (size, relpath)
+
+        dir_path_abs = os.path.abspath(dir_path)
+        for root, dirs, files in os.walk(dir_path_abs):
+            # Track top-level directory aggregates (cheap; no per-entry cost
+            # beyond the walk we're already doing).
+            rel_root = os.path.relpath(root, dir_path_abs)
+            if rel_root == ".":
+                top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs}
+                top_level_dirs.update(top_dirs)
+                top_key = None
+            else:
+                top_key = rel_root.split(os.sep, 1)[0]
+                if top_key not in top_level_dirs:
+                    top_level_dirs[top_key] = {"files": 0, "bytes": 0}
+
             for f in files:
                 full = os.path.join(root, f)
-                rel = os.path.relpath(full, dir_path)
-                size = os.path.getsize(full)
-                entries.append(f"  {rel} ({size} bytes)")
-            if len(entries) > 200:
-                entries.append(f"  ... (truncated)")
-                break
+                try:
+                    size = os.path.getsize(full)
+                except OSError:
+                    continue
+                total_files += 1
+                total_bytes += size
+                ext = os.path.splitext(f)[1].lower() or "(no ext)"
+                ext_counts[ext] = ext_counts.get(ext, 0) + 1
+                ext_bytes[ext] = ext_bytes.get(ext, 0) + size
+                if top_key is not None:
+                    top_level_dirs[top_key]["files"] += 1
+                    top_level_dirs[top_key]["bytes"] += size
+                # Maintain a top-10 largest list cheaply (bounded insertion).
+                if len(biggest) < 10:
+                    biggest.append((size, os.path.relpath(full, dir_path_abs)))
+                    biggest.sort(reverse=True)
+                elif size > biggest[-1][0]:
+                    biggest[-1] = (size, os.path.relpath(full, dir_path_abs))
+                    biggest.sort(reverse=True)
 
-        return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries)
+        def _human(n: int) -> str:
+            for unit in ("B", "KB", "MB", "GB"):
+                if n < 1024:
+                    return f"{n:.1f}{unit}" if unit != "B" else f"{n}B"
+                n /= 1024
+            return f"{n:.1f}TB"
+
+        lines = [
+            f"Directory: {dir_path}",
+            f"  Total: {total_files} file(s), {_human(total_bytes)}",
+        ]
+
+        # Top-level directory layout (immediate children, sorted by file count).
+        if top_level_dirs:
+            lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):")
+            sorted_tlds = sorted(
+                top_level_dirs.items(), key=lambda kv: -kv[1]["files"],
+            )[:15]
+            for d, stats in sorted_tlds:
+                lines.append(
+                    f"  {d}/  ({stats['files']} files, {_human(stats['bytes'])})"
+                )
+            if len(top_level_dirs) > 15:
+                lines.append(f"  ... ({len(top_level_dirs) - 15} more top-level dirs)")
+
+        # Extension breakdown.
+        if ext_counts:
+            lines.append(f"\nExtension breakdown (top 15):")
+            for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]:
+                lines.append(
+                    f"  {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}"
+                )
+
+        # Largest files (often the highest-value forensic targets).
+        if biggest:
+            lines.append("\nLargest files:")
+            for size, rel in biggest:
+                lines.append(f"  {rel} ({_human(size)})")
+
+        lines.append(
+            f"\nNext step: call find_files with a pattern like "
+            f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts."
+        )
+
+        return "\n".join(lines)
     except Exception as e:
         return f"[Error listing {dir_path}: {e}]"
+
+
+async def find_files(
+    root: str,
+    pattern: str,
+    max_results: int = 500,
+) -> str:
+    """Recursively find files under *root* whose path matches *pattern*.
+
+    Uses fnmatch-style globs against the *full relative path*; ``**`` is
+    treated as "any number of path segments" (so ``**/*.plist`` finds
+    every plist no matter how deep). Examples:
+
+      - ``**/sms.db``               — iOS SMS database
+      - ``**/keychain-2.db``        — iOS keychain
+      - ``**/ChatStorage.sqlite``   — WhatsApp app store
+      - ``HomeDomain/Library/**``   — anchor at a known iOS domain root
+      - ``**/*.{plist,sqlite,db}``  — multi-extension (use 2+ calls or a regex if needed)
+
+    Results are sorted by size descending — the biggest hits usually
+    matter most. Capped at *max_results* to keep the LLM context bounded.
+    """
+    import fnmatch
+
+    if not os.path.isdir(root):
+        return f"[Error: {root} is not a directory]"
+
+    root_abs = os.path.abspath(root)
+    # Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /).
+    # fnmatch doesn't natively distinguish segment vs path; expanding ``**``
+    # to ``*`` and letting fnmatch match the full relpath is good enough for
+    # forensic lookups.
+    fn_pattern = pattern.replace("**", "*")
+
+    hits: list[tuple[int, str]] = []
+    truncated = False
+    try:
+        for dirpath, _dirs, files in os.walk(root_abs):
+            for f in files:
+                full = os.path.join(dirpath, f)
+                rel = os.path.relpath(full, root_abs)
+                if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern):
+                    try:
+                        size = os.path.getsize(full)
+                    except OSError:
+                        size = 0
+                    hits.append((size, rel))
+                    if len(hits) >= max_results * 4:
+                        # Hard upper bound to keep the walk cheap on huge trees.
+                        truncated = True
+                        break
+            if truncated:
+                break
+    except Exception as e:
+        return f"[Error searching {root}: {e}]"
+
+    hits.sort(reverse=True)
+    if len(hits) > max_results:
+        truncated = True
+        hits = hits[:max_results]
+
+    lines = [
+        f"find_files: pattern={pattern!r} under {root}",
+        f"  matches: {len(hits)}" + (" (truncated)" if truncated else ""),
+    ]
+    if not hits:
+        lines.append("  (no matches)")
+    else:
+        for size, rel in hits:
+            lines.append(f"  {rel} ({size} bytes)")
+    return "\n".join(lines)
diff --git a/uv.lock b/uv.lock
index 4634a1f..4a39485 100644
--- a/uv.lock
+++ b/uv.lock
@@ -170,6 +170,8 @@ source = { virtual = "." }
 dependencies = [
     { name = "httpx", extra = ["socks"] },
     { name = "openai" },
+    { name = "pillow" },
+    { name = "pytesseract" },
     { name = "pyyaml" },
     { name = "regipy" },
 ]
@@ -184,6 +186,8 @@ dev = [
 requires-dist = [
     { name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
     { name = "openai", specifier = ">=2.36.0" },
+    { name = "pillow", specifier = ">=12.2.0" },
+    { name = "pytesseract", specifier = ">=0.3.13" },
     { name = "pyyaml" },
     { name = "regipy", specifier = ">=6.2.1" },
 ]
@@ -222,6 +226,39 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
 ]
 
+[[package]]
+name = "pillow"
+version = "12.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
+    { url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
+    { url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
+    { url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
+    { url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
+    { url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
+    { url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
+    { url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
+    { url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
+    { url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
+    { url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
+    { url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
+    { url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
+]
+
 [[package]]
 name = "pluggy"
 version = "1.6.0"
@@ -296,6 +333,19 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
 ]
 
+[[package]]
+name = "pytesseract"
+version = "0.3.13"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "packaging" },
+    { name = "pillow" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" },
+]
+
 [[package]]
 name = "pytest"
 version = "9.0.2"