feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

Consolidates the long-running refit work (DESIGN.md as authoritative spec)
into a single baseline commit. Six stages landed together:

  S1  Case + EvidenceSource abstraction; tools parameterised by source_id
      (case.py, main.py multi-source bootstrap, .bin extension support)
  S2  Grounding gateway in add_phenomenon: verified_facts cite real
      ToolInvocation ids; substring / normalised match enforced; agent +
      task scope checked. Phenomenon.description split into verified_facts
      (grounded) + interpretation (free text). [invocation: inv-xxx]
      prefix on every wrapped tool result so the LLM can cite.
  S3  Confidence as additive log-odds: edge_type → log10(LR) calibration
      table; commutative updates; supported / refuted thresholds derived
      from log_odds; hypothesis × evidence matrix view.
  S4  iOS plugin: unzip_archive + parse_plist / sqlite_tables /
      sqlite_query / parse_ios_keychain / read_idevice_info;
      IOSArtifactAgent; SOURCE_TYPE_AGENTS routing.
  S5  Cross-source entity resolution: typed identifiers on Entity,
      observe_identity gateway, auto coref hypothesis with shared /
      conflicting strong/weak LR edges, reversible same_as edges,
      actor_clusters() view.
  S6  Android partition probe + AndroidArtifactAgent; MediaAgent with
      OCR fallback; orchestrator Phase 1 iterates every analysable
      source; platform-aware get_triage_agent_type; ReportAgent renders
      actor clusters + per-source breakdown.

142 unit tests / 1 skipped — full coverage of the new gateway, log-odds
math, coref hypothesis fall-out, and orchestrator multi-source dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
BattleTag
2026-05-21 02:12:10 -10:00
parent 444d58726a
commit 81ade8f7ac
24 changed files with 5137 additions and 244 deletions

41
case.example.yaml Normal file
View File

@@ -0,0 +1,41 @@
# MASForensics case definition — template
#
# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
# exists in the working directory, `python main.py` loads it automatically;
# otherwise main.py falls back to interactive single-image selection.
#
# A case is a set of evidence sources. Each source has:
# id optional — auto-derived from label if omitted ("src-<slug>")
# label human-readable name
# type disk_image | mobile_extraction | archive | media_collection
# access_mode image | tree (optional — defaults by type)
# image = block device / disk image, navigated by Sleuth Kit
# tree = mounted filesystem / unpacked extraction, path-based
# owner optional — the person the source is associated with
# path filesystem path (relative paths resolve against this file)
# partition_offset image-mode only — sector offset of the partition to analyze
# meta optional free-form notes
#
# NOTE: at the current refit stage only image-mode (disk) sources are
# analysable; tree-mode sources are accepted but skipped.
case_id: example-case
name: "Example forensic case"
meta:
notes: "free-form case-level metadata"
sources:
- id: src-suspect-laptop
label: "Suspect laptop disk image"
type: disk_image
access_mode: image
owner: "John Doe"
path: image/suspect_laptop.E01
partition_offset: 0 # run `mmls <image>` to find the right offset
- id: src-suspect-phone
label: "Suspect phone extraction"
type: mobile_extraction
access_mode: tree
owner: "John Doe"
path: image/suspect_phone.zip