feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status
DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses
to ground its decision each round.
graph_overview() — hypotheses table (log_odds, conf, edges_in,
distinct_sources, recent_flip), sources table,
pending leads. distinct_sources is the
critical signal: a hypothesis with 23 edges
but only 1 distinct_source has fragile cross-
source independence and is a candidate for
a corroboration-seeking lead.
source_coverage(src) — per-source ✓/✗ against an expected-artefact
catalogue. Catalogue is heuristic hints,
NOT a forced checklist. Footer reminds the
strategist to investigate ✗ items only when
an active hypothesis depends on them — this
is the "应试能力存在但不被绑死" guardrail.
marginal_yield(N) — new phenomena / edges / status flips per
recent round. Two consecutive zero-yield
rounds = strong signal to declare complete.
budget_status() — usage vs caps (tool_calls, rounds, wall
clock). Pacing warnings at 70% / 90%.
tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type
table of (name, detector, value_for) entries. Detectors are
substring patterns on tool name + args; the matcher resolves at
call time against graph.tool_invocations. Catalogue covers iOS /
Android / Windows disk / media-collection / archive source types.
All four tools registered in tool_registry, listed as read-only in
llm_client.READ_ONLY_TOOLS for parallel execution. They go through
the invocation-logging wrapper so the strategist's reads are
themselves auditable (the wrapper does NOT cache them — graph
state changes between calls).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3214,3 +3214,84 @@ class TestInvestigationRound:
|
||||
assert r.decision_rationale == "probe complete"
|
||||
assert hid in r.hypothesis_status_snapshot_before
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_strategy_tool_helpers(self):
|
||||
"""Smoke-test the strategy tool renders against a small graph.
|
||||
Renders are markdown strings — we assert structural anchors rather
|
||||
than full text so the test is robust to wording tweaks.
|
||||
"""
|
||||
from tools import strategy
|
||||
from case import Case, EvidenceSource
|
||||
|
||||
graph = EvidenceGraph()
|
||||
src = EvidenceSource(
|
||||
id="src-test", label="test iOS", type="mobile_extraction",
|
||||
access_mode="tree", path="/tmp/x",
|
||||
)
|
||||
graph.case = Case(case_id="c", name="c", sources=[src])
|
||||
graph.set_active_source(src)
|
||||
graph._current_agent = "ios_artifact"
|
||||
graph._current_task_id = "task-1"
|
||||
|
||||
# graph_overview on empty graph still renders.
|
||||
ov = strategy.graph_overview(graph)
|
||||
assert "# Investigation State" in ov
|
||||
assert "_(none yet" in ov # hypotheses section
|
||||
assert "src-test" in ov
|
||||
|
||||
# Source coverage — every entry should be ✗ initially.
|
||||
cov = strategy.source_coverage(graph, "src-test")
|
||||
assert "Coverage: **0/" in cov
|
||||
assert "✗" in cov
|
||||
assert "Coverage hints are heuristics" in cov
|
||||
|
||||
# Record one invocation that matches the AddressBook detector.
|
||||
await graph.record_tool_invocation(
|
||||
tool="sqlite_query",
|
||||
args={"db_path": "/x/var/mobile/Library/AddressBook/AddressBook.sqlitedb"},
|
||||
output="contact list",
|
||||
)
|
||||
cov2 = strategy.source_coverage(graph, "src-test")
|
||||
assert "Coverage: **1/" in cov2 or "Coverage: **2/" in cov2
|
||||
|
||||
# marginal_yield: no rounds → empty render.
|
||||
my = strategy.marginal_yield(graph)
|
||||
assert "no completed investigation rounds" in my
|
||||
|
||||
# budget_status with no budgets shows unbounded.
|
||||
bs = strategy.budget_status(graph, None, None)
|
||||
assert "tool_calls" in bs
|
||||
assert "(unbounded)" in bs
|
||||
|
||||
# budget_status with budgets + pacing hint.
|
||||
bs2 = strategy.budget_status(
|
||||
graph,
|
||||
{"tool_calls_total": 1, "strategist_rounds_max": 1},
|
||||
None,
|
||||
)
|
||||
assert "≥ 90%" in bs2 # already over 90% (1 of 1 tool calls used)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_marginal_yield_after_two_rounds(self):
|
||||
"""Verify marginal_yield captures phenomena/edge/status deltas."""
|
||||
from tools import strategy
|
||||
|
||||
graph = EvidenceGraph()
|
||||
hid = await graph.add_hypothesis("h", "d")
|
||||
|
||||
rid1 = await graph.start_investigation_round(1)
|
||||
pid, _ = await graph.add_phenomenon(
|
||||
"fs", "filesystem", "p1", "interp", source_tool="t",
|
||||
)
|
||||
await graph.update_hypothesis_confidence(hid, pid, "direct_evidence", "")
|
||||
await graph.complete_investigation_round(rid1)
|
||||
|
||||
rid2 = await graph.start_investigation_round(2)
|
||||
await graph.complete_investigation_round(rid2)
|
||||
|
||||
out = strategy.marginal_yield(graph, last_n_rounds=2)
|
||||
assert "R1" in out and "R2" in out
|
||||
assert "Trend" in out
|
||||
assert ("collapsed" in out or "Decelerating" in out
|
||||
or "Diminishing" in out or "diminishing" in out)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user