feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status

DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses
to ground its decision each round.

  graph_overview()      — hypotheses table (log_odds, conf, edges_in,
                          distinct_sources, recent_flip), sources table,
                          pending leads. distinct_sources is the
                          critical signal: a hypothesis with 23 edges
                          but only 1 distinct_source has fragile cross-
                          source independence and is a candidate for
                          a corroboration-seeking lead.
  source_coverage(src)  — per-source ✓/✗ against an expected-artefact
                          catalogue. Catalogue is heuristic hints,
                          NOT a forced checklist. Footer reminds the
                          strategist to investigate ✗ items only when
                          an active hypothesis depends on them — this
                          is the "应试能力存在但不被绑死" guardrail.
  marginal_yield(N)     — new phenomena / edges / status flips per
                          recent round. Two consecutive zero-yield
                          rounds = strong signal to declare complete.
  budget_status()       — usage vs caps (tool_calls, rounds, wall
                          clock). Pacing warnings at 70% / 90%.

tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type
table of (name, detector, value_for) entries. Detectors are
substring patterns on tool name + args; the matcher resolves at
call time against graph.tool_invocations. Catalogue covers iOS /
Android / Windows disk / media-collection / archive source types.

All four tools registered in tool_registry, listed as read-only in
llm_client.READ_ONLY_TOOLS for parallel execution. They go through
the invocation-logging wrapper so the strategist's reads are
themselves auditable (the wrapper does NOT cache them — graph
state changes between calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
BattleTag
2026-05-21 02:19:54 -10:00
parent ca96f29849
commit 6ebbc675c1
4 changed files with 660 additions and 0 deletions

View File

@@ -3214,3 +3214,84 @@ class TestInvestigationRound:
assert r.decision_rationale == "probe complete"
assert hid in r.hypothesis_status_snapshot_before
@pytest.mark.asyncio
async def test_strategy_tool_helpers(self):
"""Smoke-test the strategy tool renders against a small graph.
Renders are markdown strings — we assert structural anchors rather
than full text so the test is robust to wording tweaks.
"""
from tools import strategy
from case import Case, EvidenceSource
graph = EvidenceGraph()
src = EvidenceSource(
id="src-test", label="test iOS", type="mobile_extraction",
access_mode="tree", path="/tmp/x",
)
graph.case = Case(case_id="c", name="c", sources=[src])
graph.set_active_source(src)
graph._current_agent = "ios_artifact"
graph._current_task_id = "task-1"
# graph_overview on empty graph still renders.
ov = strategy.graph_overview(graph)
assert "# Investigation State" in ov
assert "_(none yet" in ov # hypotheses section
assert "src-test" in ov
# Source coverage — every entry should be ✗ initially.
cov = strategy.source_coverage(graph, "src-test")
assert "Coverage: **0/" in cov
assert "" in cov
assert "Coverage hints are heuristics" in cov
# Record one invocation that matches the AddressBook detector.
await graph.record_tool_invocation(
tool="sqlite_query",
args={"db_path": "/x/var/mobile/Library/AddressBook/AddressBook.sqlitedb"},
output="contact list",
)
cov2 = strategy.source_coverage(graph, "src-test")
assert "Coverage: **1/" in cov2 or "Coverage: **2/" in cov2
# marginal_yield: no rounds → empty render.
my = strategy.marginal_yield(graph)
assert "no completed investigation rounds" in my
# budget_status with no budgets shows unbounded.
bs = strategy.budget_status(graph, None, None)
assert "tool_calls" in bs
assert "(unbounded)" in bs
# budget_status with budgets + pacing hint.
bs2 = strategy.budget_status(
graph,
{"tool_calls_total": 1, "strategist_rounds_max": 1},
None,
)
assert "≥ 90%" in bs2 # already over 90% (1 of 1 tool calls used)
@pytest.mark.asyncio
async def test_marginal_yield_after_two_rounds(self):
"""Verify marginal_yield captures phenomena/edge/status deltas."""
from tools import strategy
graph = EvidenceGraph()
hid = await graph.add_hypothesis("h", "d")
rid1 = await graph.start_investigation_round(1)
pid, _ = await graph.add_phenomenon(
"fs", "filesystem", "p1", "interp", source_tool="t",
)
await graph.update_hypothesis_confidence(hid, pid, "direct_evidence", "")
await graph.complete_investigation_round(rid1)
rid2 = await graph.start_investigation_round(2)
await graph.complete_investigation_round(rid2)
out = strategy.marginal_yield(graph, last_n_rounds=2)
assert "R1" in out and "R2" in out
assert "Trend" in out
assert ("collapsed" in out or "Decelerating" in out
or "Diminishing" in out or "diminishing" in out)