Commit Graph

20 Commits

Author SHA1 Message Date
BattleTag
388321ee30 feat(strategist) S7: strategist resume / open-round repair
DESIGN_STRATEGIST.md §5. Support resume from a crash mid-strategist-loop.

_resume_strategist_state inspects investigation_rounds for a tail entry
without completed_at — an "open" round, i.e. one that started but never
closed. Two repairs:

  1. Mark the round closed with strategist_action="interrupted_resume"
     so the run history reflects what actually happened.
  2. Walk that round's leads; any still in "assigned" state are
     re-marked as "failed" with failure_reason="interrupted before
     complete". The Retry-failed-leads + Gap-analysis passes that run
     after the strategist loop can pick them up.

Returns max(round_number) + 1 — the round at which to resume the loop.
On a clean graph (no prior rounds) returns 1 and makes no changes.

_phase3_strategist_loop now calls this helper before the main for-loop
and uses its return value as start_round, so a resume run lands at the
right round number rather than restarting from R1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:27:05 -10:00
BattleTag
093f3cec1f feat(strategist) S6: config.example.yaml schema for strategist + budgets
DESIGN_STRATEGIST.md §6. Document the strategist loop's tunables so
operators can override defaults without code changes.

config.yaml itself is gitignored (it carries the API key), so this
commit adds config.example.yaml as the tracked schema reference.
The runtime reads config.yaml; operators copy the example as a
starting point.

  strategist.enabled       — default true; false routes Phase 3 through
                             the legacy fixed-round loop instead.
  strategist.max_rounds    — orchestrator cap (default 10).
  strategist.hard_stop_marginal_yield_zero_rounds — safety net for
                             over-eager strategist + zero-yielding
                             workers (default 3).
  budgets.tool_calls_total — global tool-call hard cap.
  budgets.strategist_rounds_max — informational, surfaced via
                             budget_status (orchestrator enforces
                             via strategist.max_rounds instead).
  budgets.wall_clock_minutes_max — wall-clock hard cap.

Comment out any budget cap to make it unbounded — Orchestrator's
_budget_exceeded treats missing caps as no-op.

Legacy max_investigation_rounds is kept as the fallback used only when
strategist.enabled is false; documented inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:26:12 -10:00
BattleTag
a103c17bdb feat(strategist) S5: Phase 3 strategist loop in orchestrator
DESIGN_STRATEGIST.md §4. Replace the fixed-round hypothesis-directed
loop with a belief-driven strategist loop that runs the strategist
agent once per round and dispatches the leads it proposes.

New helpers on Orchestrator:
  _budget_exceeded()              hard budget caps (tool_calls,
                                  wall_clock_minutes), complementing
                                  strategist self-throttling.
  _execute_strategist_lead(lead)  dispatch one lead serially; the
                                  next strategist round sees the
                                  cumulative effect of this lead's
                                  graph mutations.
  _phase3_strategist_loop()       main loop. Open round, run strategist,
                                  exit on declare_complete or empty
                                  proposals, otherwise dispatch each
                                  lead, judge new phenomena, close round,
                                  apply yield/budget checks.
  _phase3_legacy_loop()           fallback when strategist.enabled is
                                  false. Identical to the
                                  pre-DESIGN_STRATEGIST behaviour.

The run() entry point branches on strategist_cfg.enabled (default
true) and always follows up with _retry_failed_leads() + Gap
Analysis + mark_remaining_inconclusive() regardless of variant.

Orchestrator.__init__ also wires graph.budgets and
graph.run_start_monotonic from config so the budget_status tool
sees real numbers.

Integration tests use a mock strategist + mock workers to verify
declare_complete, propose_lead -> worker dispatch, zero-yield-streak
hard stop, and budget-cap-stops-the-loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:25:04 -10:00
BattleTag
65745d21dc feat(strategist) S4: InvestigationStrategist agent
DESIGN_STRATEGIST.md §3. The smallest possible agent — its entire
output per round is one decision: propose 1-3 leads (each citing a
real hypothesis it expects to move) OR declare the investigation
complete with a reason.

Constraint surface:
  mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
  terminal_tools         = ("declare_investigation_complete",)

The agent inherits the BaseAgent forced-retry mechanism: if it returns
without calling either action tool, the orchestrator force-prompts a
RECORD-only retry. declare_complete being terminal means the
tool_call_loop short-circuits the moment the strategist decides
we're done.

_register_graph_tools overrides BaseAgent's default to skip
_register_graph_write_tools entirely — the strategist NEVER writes
phenomena, entities, edges, or hypotheses directly. All graph
mutations come from the workers it dispatches via leads. This keeps
the planning agent's responsibility surface narrow: read the graph,
choose what to do next, that's it.

Prompt walks through the workflow (call graph_overview / marginal_
yield / budget_status / source_coverage first, then take exactly
one terminal action) with decision criteria for propose vs stop.

Registered in agent_factory._AGENT_CLASSES["strategist"].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:22:05 -10:00
BattleTag
ff3a05d7ce feat(strategist) S3: propose_lead / declare_investigation_complete
DESIGN_STRATEGIST.md §2.5. The strategist's two write actions.

propose_lead validates motivating_hypothesis exists in the graph,
validates expected_evidence_type is a real edge type, validates
source_id refers to a real source in the case — fast specific
errors so the strategist gets fixable feedback rather than a
generic crash. On success, calls graph.add_lead with proposed_by=
"strategist" and round_number=graph.current_strategist_round so
the round-completion code can collect this round's leads.

declare_investigation_complete sets graph.strategist_complete_requested
which the orchestrator inspects after each strategist run to decide
whether to break the loop. reason must come from a closed enum so
the audit log is consistent.

EvidenceGraph gains two transient run-context fields:
  current_strategist_round       — set by orchestrator at start of round
  strategist_complete_requested  — flipped by declare_complete

These are intentionally NOT persisted — they're per-run flags, not
graph state.

Both tools required to be in InvestigationStrategist.mandatory_record_
tools (added in S4) so the agent's forced-retry mechanism kicks in if
it returns without taking a documented decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:21:13 -10:00
BattleTag
6ebbc675c1 feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status
DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses
to ground its decision each round.

  graph_overview()      — hypotheses table (log_odds, conf, edges_in,
                          distinct_sources, recent_flip), sources table,
                          pending leads. distinct_sources is the
                          critical signal: a hypothesis with 23 edges
                          but only 1 distinct_source has fragile cross-
                          source independence and is a candidate for
                          a corroboration-seeking lead.
  source_coverage(src)  — per-source ✓/✗ against an expected-artefact
                          catalogue. Catalogue is heuristic hints,
                          NOT a forced checklist. Footer reminds the
                          strategist to investigate ✗ items only when
                          an active hypothesis depends on them — this
                          is the "应试能力存在但不被绑死" guardrail.
  marginal_yield(N)     — new phenomena / edges / status flips per
                          recent round. Two consecutive zero-yield
                          rounds = strong signal to declare complete.
  budget_status()       — usage vs caps (tool_calls, rounds, wall
                          clock). Pacing warnings at 70% / 90%.

tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type
table of (name, detector, value_for) entries. Detectors are
substring patterns on tool name + args; the matcher resolves at
call time against graph.tool_invocations. Catalogue covers iOS /
Android / Windows disk / media-collection / archive source types.

All four tools registered in tool_registry, listed as read-only in
llm_client.READ_ONLY_TOOLS for parallel execution. They go through
the invocation-logging wrapper so the strategist's reads are
themselves auditable (the wrapper does NOT cache them — graph
state changes between calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:19:54 -10:00
BattleTag
ca96f29849 feat(strategist) S1: Lead extension + InvestigationRound model
DESIGN_STRATEGIST.md §1. Foundation for the Phase 3 strategist loop.

Lead now carries four annotations that let the orchestrator measure
marginal yield per lead and dedupe strategist proposals:
  - proposed_by         (agent that proposed it: "strategist", "filesystem", …)
  - motivating_hypothesis (hyp-id the lead is meant to corroborate/refute)
  - expected_evidence_type (edge type the lead's worker should produce)
  - round_number        (0 = Phase 1 lead, ≥1 = strategist-proposed)

add_lead idempotently dedupes strategist proposals on
(motivating_hypothesis, expected_evidence_type, target_agent, source_id)
to prevent the "strategist loops on the same lead" failure mode.

New InvestigationRound dataclass records per-round provenance: before/
after hypothesis status snapshots, phenomena + edge count deltas, and
the strategist's decision_rationale. ``new_phenomena_count``,
``new_edges_count``, ``status_flips`` are derived properties that the
marginal_yield tool will use.

start_investigation_round / complete_investigation_round /
get_investigation_round / latest_round / leads_from_round complete the
lifecycle. complete is idempotent on already-closed rounds.

Lead.from_dict is forward-compat for state files written before this
commit. InvestigationRound persists as a top-level list in
graph_state.json (auto-save + load_state both wired).

EvidenceGraph also gains graph.budgets and graph.run_start_monotonic
fields that the budget_status view (S2) will read; orchestrator
populates them in S5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:18:35 -10:00
BattleTag
8020c24776 fix(graph): harmonic damping for repeated same-edge_type evidence
First full-case run (runs/2026-05-20T20-15-04/) produced hypotheses
with log_odds +31 (8 direct_evidence + 15 supports). That's the
naive-Bayes independence assumption breaking down: 15 different
phenomena all "supporting" the same hypothesis from one source are
not 15 independent pieces of evidence, they're highly correlated.
DESIGN.md §4.5 last bullet flagged this as a "未实施旋钮" — this
commit implements it.

Rule: the k-th edge of a given (hyp_id, edge_type) contributes
log_lr_base / k instead of log_lr_base. Cumulative is harmonic
sum H_N, bounded by ~ ln N. Single-edge hypotheses unaffected
(k=1 → /1 → no change). Replaying the 2026-05-20 graph's 108
edges under the new rule pulls the top hypothesis from +31.0 →
+8.75; the smallest active hypothesis from +4.0 → +2.08.

Also adds rank + log_lr_base to confidence_log entries so the
math is auditable from the persisted graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:16:37 -10:00
BattleTag
f04ccd4bc7 fix(base_agent): forced-retry iter cap 10→30 + narrow tools to record+read
Timeline agent on the 2026-05-20 full run produced 0 phenomena: initial
round hit max_iterations=60 cap before recording, forced retry then hit
max_iterations=10 cap because every grounding-rejected call burns one
iteration in the new gateway. Two changes restore depth without re-
introducing the original "agent wanders off and never records" failure:

  1. Raise retry cap 10 → 30. With grounding auto-rescue (prev commit)
     most rejections heal on the first retry, but some still need 2-3
     turns; 10 is empirically too tight, 30 leaves headroom.

  2. Narrow the retry tool surface to RECORD + graph-write +
     read-only-graph-query tools. Investigation tools (list_directory,
     sqlite_query, parse_registry_key) are dropped on retry so the agent
     can't restart its search loop — the retry is explicitly "record
     what you already found, then stop".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:15:08 -10:00
BattleTag
6b485b98f7 fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error
First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.

Two-layer fix in evidence_graph.validate_fact_grounding:

  Layer A (silent heal): when the cited inv-id misses, search the same
  agent / task's invocations for one whose output contains the value
  (strict or normalised substring). If exactly one matches, rewrite
  fact.invocation_id in place and accept. Multi-match is NOT auto-
  rescued — the candidate ids go back to the LLM so it picks deliberately.

  Layer B (informative retry): GroundingError now appends the agent's
  recent invocation ids and a brief tool-call summary, so the LLM has
  the real ids in front of it for the next attempt rather than
  fabricating again from memory.

Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:14:20 -10:00
BattleTag
81ade8f7ac feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source
Consolidates the long-running refit work (DESIGN.md as authoritative spec)
into a single baseline commit. Six stages landed together:

  S1  Case + EvidenceSource abstraction; tools parameterised by source_id
      (case.py, main.py multi-source bootstrap, .bin extension support)
  S2  Grounding gateway in add_phenomenon: verified_facts cite real
      ToolInvocation ids; substring / normalised match enforced; agent +
      task scope checked. Phenomenon.description split into verified_facts
      (grounded) + interpretation (free text). [invocation: inv-xxx]
      prefix on every wrapped tool result so the LLM can cite.
  S3  Confidence as additive log-odds: edge_type → log10(LR) calibration
      table; commutative updates; supported / refuted thresholds derived
      from log_odds; hypothesis × evidence matrix view.
  S4  iOS plugin: unzip_archive + parse_plist / sqlite_tables /
      sqlite_query / parse_ios_keychain / read_idevice_info;
      IOSArtifactAgent; SOURCE_TYPE_AGENTS routing.
  S5  Cross-source entity resolution: typed identifiers on Entity,
      observe_identity gateway, auto coref hypothesis with shared /
      conflicting strong/weak LR edges, reversible same_as edges,
      actor_clusters() view.
  S6  Android partition probe + AndroidArtifactAgent; MediaAgent with
      OCR fallback; orchestrator Phase 1 iterates every analysable
      source; platform-aware get_triage_agent_type; ReportAgent renders
      actor clusters + per-source breakdown.

142 unit tests / 1 skipped — full coverage of the new gateway, log-odds
math, coref hypothesis fall-out, and orchestrator multi-source dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:12:10 -10:00
BattleTag
444d58726a refactor: native tool calling + generic forced-retry + terminal exit
- llm_client: switch tool_call_loop from text-based <tool_call> regex
  to OpenAI-native tools=[...] / structured tool_calls field; accumulate
  delta.reasoning_content for DeepSeek thinking-mode echo-back; fold
  preserves system msg and aligns boundary to never orphan role:tool
- base_agent: generic forced-retry via mandatory_record_tools class attr
  (filesystem -> add_phenomenon, timeline -> add_temporal_edge,
  hypothesis -> add_hypothesis, report -> save_report); count via
  executor wrapper
- terminal_tools class attr + loop short-circuit: when a terminal tool
  is called, loop exits with its raw return as final_text. ReportAgent
  declares save_report as terminal - replaces the <answer>-tag stop
  signal that native tool calling broke
- _execute_*: return (raw, formatted) - terminal exit uses untruncated
  raw, conversation history uses 3000-char-capped formatted
- evidence_graph + orchestrator: LLM-derived InvestigationArea support
  (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS /
  _AREA_TOOLS); manual yaml block kept as optional seed
- strip <answer> references from agent prompts (no longer load-bearing)

Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures
(was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via
save_report (was max_iterations regression). 78/78 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:51:19 +08:00
BattleTag
0a2b344c84 fix: share _safe_json_loads with tool-call parser, not just orchestrator
Move _safe_json_loads from orchestrator.py to llm_client.py and have
_extract_tool_calls use it when parsing <tool_call> JSON blocks from
model output. orchestrator now imports it from llm_client.

Background: in the first full DeepSeek run (runs/2026-05-12T17-25-38),
~10 'Failed to parse tool call JSON' warnings appeared, all from regex
patterns where the LLM wrote \. or \* inside JSON string values:

  Failed to parse tool call JSON: {..., "pattern": "Outlook Express|...|\.dbx"}
  Failed to parse tool call JSON: {..., "pattern": "ethereal.*\.pcap"}
  Failed to parse tool call JSON: {..., "pattern": "lookatlan.*\.txt|..."}

These are exactly the kind of stray-backslash errors stage-1 sanitize
already handles for orchestrator JSON calls — but tool-call extraction
was using bare json.loads. Result: each failed tool call silently dropped
on the floor, the LLM never got a result, and at least one network agent
burned 14m26s spinning before hitting max_iterations=40.

Now the sanitize/log-on-failure path is shared. Verified against the
three failure cases from yesterday's log: all three now parse cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:29:21 +08:00
BattleTag
76df34ed79 docs: add TODO marker for adaptive edge weights
Note that the hard-coded HYPOTHESIS_EDGE_WEIGHTS table is a temporary
choice; an adaptive scheme should be explored once the full pipeline
is stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:23 +08:00
BattleTag
893f5b5de2 fix: address agent boundary / JSON robustness / Phase 4 no-op from CFReDS run
Issues found running the system end-to-end on the NIST CFReDS Hacking Case
disk image (SCHARDT.001, Mr. Evil). Four interconnected fixes:

1. HypothesisAgent boundary leak (two layers)
   B.1 Tool set: BaseAgent._register_graph_tools was registering
       add_phenomenon / add_lead / link_to_entity for every agent. With
       an empty graph in Phase 2, HypothesisAgent "compensated" by
       inventing phenomena, dispatching leads, and linking entities.
   B.2 Prompt leak: BaseAgent's shared system prompt hard-coded "Call
       investigation tools (list_directory, parse_registry_key, etc.)".
       HypothesisAgent hallucinated list_directory and wasted 2 LLM
       rounds on 'unknown tool' errors before backing off.

   Fix:
   - Split _register_graph_tools into _register_graph_read_tools +
     _register_graph_write_tools.
   - HypothesisAgent, ReportAgent, TimelineAgent override
     _register_graph_tools to skip write tools.
   - HypothesisAgent and TimelineAgent override _build_system_prompt
     with focused, role-specific workflows (no Phase A-D investigation
     boilerplate).

2. JSON parse failures in Phase 3 lead generation (5/6 hypotheses lost)
   DeepSeek emits JSON with stray backslashes (Windows path references)
   and occasional minor syntax slips. Old single-stage sanitize couldn't
   recover; per-hypothesis fallback silently swallowed each failure.

   Fix:
   - _safe_json_loads: progressive — stage 0 as-is, stage 1 escape stray
     \X (anything not in valid JSON escape set), log raw input on final
     failure for diagnosis.
   - New _call_llm_for_json helper: on parse failure, append the error
     to the prompt and re-call LLM (self-correcting retry, up to 2).
   - All 4 LLM-JSON callsites in orchestrator refactored to use it.

3. Phase 1 sometimes skipped add_phenomenon (LLM treated <answer> as deliverable)
   Strengthen BaseAgent's RECORDING REQUIREMENT — explicit "your <answer>
   is DISCARDED; only graph mutations propagate" plus a new rule:
   negative findings (searched X, found nothing) MUST also be recorded
   as phenomena, since they constrain the hypothesis space.

4. Phase 4 Timeline was a no-op
   TimelineAgent inherited BaseAgent's Phase A-D prompt and never called
   add_temporal_edge — produced 0 temporal edges. Override the prompt
   with concrete workflow (build_filesystem_timeline ->
   get_timestamped_phenomena -> 15-40 add_temporal_edge calls) and
   restrict tool set to read-only + its 3 temporal tools.

Verified end-to-end: HypothesisAgent now 8 tools (no writes), ReportAgent
13 (no graph writes), TimelineAgent 10 (read + temporal + timeline).
All 60 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:16 +08:00
BattleTag
0a966d8476 feat: switch LLM client to OpenAI SDK for DeepSeek compatibility
The previous LLMClient used raw httpx + Claude Messages API (/v1/messages,
x-api-key, Anthropic SSE event types). Incompatible with DeepSeek.

Rewrite LLMClient.__init__/chat/close to use openai.AsyncOpenAI:
- /v1/chat/completions endpoint, OpenAI message format
- Bearer auth, native SDK error types
- Stream chunks via async for + chunk.choices[0].delta.content

Tool calling protocol (ReAct text-based tags) and all surrounding helpers
(_apply_progressive_decay, _fold_old_messages, _partition_tool_calls,
tool_call_loop, etc.) are unchanged — endpoint-agnostic by design.

New optional config params surfaced to config.yaml.agent:
- reasoning_effort: "high" | "medium" | "low" — DeepSeek/o1-style depth
- thinking_enabled: bool — DeepSeek extra_body.thinking switch

main.py and regenerate_report.py pass these through to LLMClient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:13:54 +08:00
BattleTag
31812a72ee test: track tests/ directory in version control
tests/test_optimizations.py — 60 pytest cases covering:
- EvidenceGraph: quality scoring, Jaccard merge, async safety,
  hypothesis confidence updates, asset library
- llm_client: tool-result truncation, parallel batch execution,
  progressive context decay, message folding
- orchestrator: parallel dispatch, batched lead generation,
  batched judging
- tool_registry: result cache key derivation

FakeAgent.run signatures updated to BaseAgent.run(task, lead_id=None).

Previously listed in .gitignore (which is itself untracked, so the
ignore rule lives only locally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:10:31 +08:00
BattleTag
74e6bde13a refactor: lead provenance, unified link path, SSOT cleanup, configurable weights
Five interrelated cleanups:

1. Lead -> Phenomenon provenance
   - Phenomenon.from_lead_id field on the dataclass
   - BaseAgent.run(lead_id=...) writes self._current_lead_id
   - _add_phenomenon auto-injects from agent state (LLM unaware)
   - Orchestrator dispatch passes lead.id; Phase 1/2-auto/4/5 stay None
   - Merge path preserves the first non-None lead_id on collision

2. Unified Phenomenon <-> Hypothesis link path
   - HypothesisAgent only adds hypotheses, never links
   - link_phenomenon_to_hypothesis tool + executor removed
   - All links go through Orchestrator._judge_new_phenomena
   - Phase 2 unconditionally judges after hypothesis generation
   - Gap Analysis judges after each dispatch round
   (Three previously-missing judge calls now in place.)

3. SSOT in agent subclasses
   - Remove RoleTemplate dataclass, ROLE_TEMPLATES dict,
     _instantiate_from_template method
   - Each agent subclass owns name, role, and tool list
   - agent_factory.py shrinks from 299 to 153 lines
   - All 7 agents now route through _AGENT_CLASSES (filesystem,
     registry, communication, network, timeline were previously dead
     subclasses overridden by templates)

4. Configurable edge weights
   - HYPOTHESIS_EDGE_WEIGHTS -> _DEFAULT_EDGE_WEIGHTS (private default)
   - EvidenceGraph(edge_weights=...) override via config.yaml
   - hypothesis_edge_weights section in config.yaml (commented example)
   - main.py and regenerate_report.py read and pass through

5. regenerate_report.py auto-picks the latest run/*/graph_state.json
   when no CLI arg is given (was a hardcoded date path)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:10:15 +08:00
BattleTag
fde96c7d9f docs: rewrite README for EvidenceGraph + 5-phase + 7-agent architecture
Previous README described a Blackboard-based 4-phase, 6-agent system.
The actual code uses:
- EvidenceGraph with typed weighted edges (Phenomenon/Hypothesis/Entity)
- 5 phases (explicit Hypothesis Generation between survey and investigation)
- 7 agents (added HypothesisAgent)

Documents the confidence update formula, Phenomenon Jaccard merging,
Asset Library inode dedup, tool-result caching, Gap Analysis coverage
check, auto-persistence, and the resume mechanism.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:09:59 +08:00
BattleTag
097d2ce472 Initial commit
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 17:36:26 +08:00