MASForensic

Author	SHA1	Message	Date
BattleTag	6ebbc675c1	feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses to ground its decision each round. graph_overview() — hypotheses table (log_odds, conf, edges_in, distinct_sources, recent_flip), sources table, pending leads. distinct_sources is the critical signal: a hypothesis with 23 edges but only 1 distinct_source has fragile cross- source independence and is a candidate for a corroboration-seeking lead. source_coverage(src) — per-source ✓/✗ against an expected-artefact catalogue. Catalogue is heuristic hints, NOT a forced checklist. Footer reminds the strategist to investigate ✗ items only when an active hypothesis depends on them — this is the "应试能力存在但不被绑死" guardrail. marginal_yield(N) — new phenomena / edges / status flips per recent round. Two consecutive zero-yield rounds = strong signal to declare complete. budget_status() — usage vs caps (tool_calls, rounds, wall clock). Pacing warnings at 70% / 90%. tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type table of (name, detector, value_for) entries. Detectors are substring patterns on tool name + args; the matcher resolves at call time against graph.tool_invocations. Catalogue covers iOS / Android / Windows disk / media-collection / archive source types. All four tools registered in tool_registry, listed as read-only in llm_client.READ_ONLY_TOOLS for parallel execution. They go through the invocation-logging wrapper so the strategist's reads are themselves auditable (the wrapper does NOT cache them — graph state changes between calls). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 02:19:54 -10:00
BattleTag	81ade8f7ac	feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source Consolidates the long-running refit work (DESIGN.md as authoritative spec) into a single baseline commit. Six stages landed together: S1 Case + EvidenceSource abstraction; tools parameterised by source_id (case.py, main.py multi-source bootstrap, .bin extension support) S2 Grounding gateway in add_phenomenon: verified_facts cite real ToolInvocation ids; substring / normalised match enforced; agent + task scope checked. Phenomenon.description split into verified_facts (grounded) + interpretation (free text). [invocation: inv-xxx] prefix on every wrapped tool result so the LLM can cite. S3 Confidence as additive log-odds: edge_type → log10(LR) calibration table; commutative updates; supported / refuted thresholds derived from log_odds; hypothesis × evidence matrix view. S4 iOS plugin: unzip_archive + parse_plist / sqlite_tables / sqlite_query / parse_ios_keychain / read_idevice_info; IOSArtifactAgent; SOURCE_TYPE_AGENTS routing. S5 Cross-source entity resolution: typed identifiers on Entity, observe_identity gateway, auto coref hypothesis with shared / conflicting strong/weak LR edges, reversible same_as edges, actor_clusters() view. S6 Android partition probe + AndroidArtifactAgent; MediaAgent with OCR fallback; orchestrator Phase 1 iterates every analysable source; platform-aware get_triage_agent_type; ReportAgent renders actor clusters + per-source breakdown. 142 unit tests / 1 skipped — full coverage of the new gateway, log-odds math, coref hypothesis fall-out, and orchestrator multi-source dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 02:12:10 -10:00
BattleTag	444d58726a	refactor: native tool calling + generic forced-retry + terminal exit - llm_client: switch tool_call_loop from text-based <tool_call> regex to OpenAI-native tools=[...] / structured tool_calls field; accumulate delta.reasoning_content for DeepSeek thinking-mode echo-back; fold preserves system msg and aligns boundary to never orphan role:tool - base_agent: generic forced-retry via mandatory_record_tools class attr (filesystem -> add_phenomenon, timeline -> add_temporal_edge, hypothesis -> add_hypothesis, report -> save_report); count via executor wrapper - terminal_tools class attr + loop short-circuit: when a terminal tool is called, loop exits with its raw return as final_text. ReportAgent declares save_report as terminal - replaces the <answer>-tag stop signal that native tool calling broke - _execute_*: return (raw, formatted) - terminal exit uses untruncated raw, conversation history uses 3000-char-capped formatted - evidence_graph + orchestrator: LLM-derived InvestigationArea support (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS / _AREA_TOOLS); manual yaml block kept as optional seed - strip <answer> references from agent prompts (no longer load-bearing) Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures (was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via save_report (was max_iterations regression). 78/78 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 13:51:19 +08:00
BattleTag	0a2b344c84	fix: share _safe_json_loads with tool-call parser, not just orchestrator Move _safe_json_loads from orchestrator.py to llm_client.py and have _extract_tool_calls use it when parsing <tool_call> JSON blocks from model output. orchestrator now imports it from llm_client. Background: in the first full DeepSeek run (runs/2026-05-12T17-25-38), ~10 'Failed to parse tool call JSON' warnings appeared, all from regex patterns where the LLM wrote \. or \* inside JSON string values: Failed to parse tool call JSON: {..., "pattern": "Outlook Express\|...\|\.dbx"} Failed to parse tool call JSON: {..., "pattern": "ethereal.\.pcap"} Failed to parse tool call JSON: {..., "pattern": "lookatlan.\.txt\|..."} These are exactly the kind of stray-backslash errors stage-1 sanitize already handles for orchestrator JSON calls — but tool-call extraction was using bare json.loads. Result: each failed tool call silently dropped on the floor, the LLM never got a result, and at least one network agent burned 14m26s spinning before hitting max_iterations=40. Now the sanitize/log-on-failure path is shared. Verified against the three failure cases from yesterday's log: all three now parse cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:29:21 +08:00
BattleTag	0a966d8476	feat: switch LLM client to OpenAI SDK for DeepSeek compatibility The previous LLMClient used raw httpx + Claude Messages API (/v1/messages, x-api-key, Anthropic SSE event types). Incompatible with DeepSeek. Rewrite LLMClient.__init__/chat/close to use openai.AsyncOpenAI: - /v1/chat/completions endpoint, OpenAI message format - Bearer auth, native SDK error types - Stream chunks via async for + chunk.choices[0].delta.content Tool calling protocol (ReAct text-based tags) and all surrounding helpers (_apply_progressive_decay, _fold_old_messages, _partition_tool_calls, tool_call_loop, etc.) are unchanged — endpoint-agnostic by design. New optional config params surfaced to config.yaml.agent: - reasoning_effort: "high" \| "medium" \| "low" — DeepSeek/o1-style depth - thinking_enabled: bool — DeepSeek extra_body.thinking switch main.py and regenerate_report.py pass these through to LLMClient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:13:54 +08:00
BattleTag	097d2ce472	Initial commit Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:36:26 +08:00

6 Commits