feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

Consolidates the long-running refit work (DESIGN.md as authoritative spec) into a single baseline commit. Six stages landed together: S1 Case + EvidenceSource abstraction; tools parameterised by source_id (case.py, main.py multi-source bootstrap, .bin extension support) S2 Grounding gateway in add_phenomenon: verified_facts cite real ToolInvocation ids; substring / normalised match enforced; agent + task scope checked. Phenomenon.description split into verified_facts (grounded) + interpretation (free text). [invocation: inv-xxx] prefix on every wrapped tool result so the LLM can cite. S3 Confidence as additive log-odds: edge_type → log10(LR) calibration table; commutative updates; supported / refuted thresholds derived from log_odds; hypothesis × evidence matrix view. S4 iOS plugin: unzip_archive + parse_plist / sqlite_tables / sqlite_query / parse_ios_keychain / read_idevice_info; IOSArtifactAgent; SOURCE_TYPE_AGENTS routing. S5 Cross-source entity resolution: typed identifiers on Entity, observe_identity gateway, auto coref hypothesis with shared / conflicting strong/weak LR edges, reversible same_as edges, actor_clusters() view. S6 Android partition probe + AndroidArtifactAgent; MediaAgent with OCR fallback; orchestrator Phase 1 iterates every analysable source; platform-aware get_triage_agent_type; ReportAgent renders actor clusters + per-source breakdown. 142 unit tests / 1 skipped — full coverage of the new gateway, log-odds math, coref hypothesis fall-out, and orchestrator multi-source dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:12:10 -10:00
parent 444d58726a
commit 81ade8f7ac
24 changed files with 5137 additions and 244 deletions
--- a/case.example.yaml
+++ b/case.example.yaml
@@ -0,0 +1,41 @@
+# MASForensics case definition — template
+#
+# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
+# exists in the working directory, `python main.py` loads it automatically;
+# otherwise main.py falls back to interactive single-image selection.
+#
+# A case is a set of evidence sources. Each source has:
+#   id              optional — auto-derived from label if omitted ("src-<slug>")
+#   label           human-readable name
+#   type            disk_image | mobile_extraction | archive | media_collection
+#   access_mode     image | tree   (optional — defaults by type)
+#                     image = block device / disk image, navigated by Sleuth Kit
+#                     tree  = mounted filesystem / unpacked extraction, path-based
+#   owner           optional — the person the source is associated with
+#   path            filesystem path (relative paths resolve against this file)
+#   partition_offset  image-mode only — sector offset of the partition to analyze
+#   meta            optional free-form notes
+#
+# NOTE: at the current refit stage only image-mode (disk) sources are
+# analysable; tree-mode sources are accepted but skipped.
+
+case_id: example-case
+name: "Example forensic case"
+meta:
+  notes: "free-form case-level metadata"
+
+sources:
+  - id: src-suspect-laptop
+    label: "Suspect laptop disk image"
+    type: disk_image
+    access_mode: image
+    owner: "John Doe"
+    path: image/suspect_laptop.E01
+    partition_offset: 0               # run `mmls <image>` to find the right offset
+
+  - id: src-suspect-phone
+    label: "Suspect phone extraction"
+    type: mobile_extraction
+    access_mode: tree
+    owner: "John Doe"
+    path: image/suspect_phone.zip