fix(graph): harmonic damping for repeated same-edge_type evidence
First full-case run (runs/2026-05-20T20-15-04/) produced hypotheses with log_odds +31 (8 direct_evidence + 15 supports). That's the naive-Bayes independence assumption breaking down: 15 different phenomena all "supporting" the same hypothesis from one source are not 15 independent pieces of evidence, they're highly correlated. DESIGN.md §4.5 last bullet flagged this as a "未实施旋钮" — this commit implements it. Rule: the k-th edge of a given (hyp_id, edge_type) contributes log_lr_base / k instead of log_lr_base. Cumulative is harmonic sum H_N, bounded by ~ ln N. Single-edge hypotheses unaffected (k=1 → /1 → no change). Replaying the 2026-05-20 graph's 108 edges under the new rule pulls the top hypothesis from +31.0 → +8.75; the smallest active hypothesis from +4.0 → +2.08. Also adds rank + log_lr_base to confidence_log entries so the math is auditable from the persisted graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1020,9 +1020,20 @@ class EvidenceGraph:
|
||||
**Idempotency**: if a ``(phenomenon, hypothesis, edge_type)`` edge
|
||||
already exists, this is a no-op — the same agent re-recording the
|
||||
same link (or two agents linking via the orchestrator's batch
|
||||
judge and a manual override) does not double-count. Independent
|
||||
evidence — *different* phenomena pointing the same way — still
|
||||
accumulates fully.
|
||||
judge and a manual override) does not double-count.
|
||||
|
||||
**Harmonic damping of repeated same-direction evidence** (added
|
||||
post first full-case run, 2026-05-20): independent evidence —
|
||||
different phenomena pointing the same way — still accumulates,
|
||||
but with diminishing returns: the k-th edge of the same
|
||||
``(hyp_id, edge_type)`` contributes ``log_lr_base / k``. After N
|
||||
same-direction edges the cumulative contribution is
|
||||
``log_lr_base · H_N`` (harmonic sum, grows as ln N). This
|
||||
formalises the naive-Bayes-breakdown DESIGN.md §4.5 calls out:
|
||||
"同一发现被多 agent 重复入图". Single-edge hypotheses are
|
||||
unaffected (k=1, damping = 1.0). Cross-direction edges (supports
|
||||
vs contradicts) keep their own independent counts so a strong
|
||||
contradicting fact still bites against piled-on supports.
|
||||
"""
|
||||
if edge_type not in self.edge_log_lr:
|
||||
raise ValueError(
|
||||
@@ -1045,7 +1056,17 @@ class EvidenceGraph:
|
||||
):
|
||||
return hyp.confidence
|
||||
|
||||
log_lr = self.edge_log_lr[edge_type]
|
||||
# Harmonic damping rank: count existing edges of the SAME
|
||||
# edge_type already incident on this hypothesis. The new edge
|
||||
# becomes the (rank+1)-th of its kind. _adj_rev is keyed by
|
||||
# target so this is O(in-degree(hyp)) without scanning all edges.
|
||||
existing_same_type = sum(
|
||||
1 for e in self._adj_rev.get(hyp_id, [])
|
||||
if e.edge_type == edge_type
|
||||
)
|
||||
rank = existing_same_type + 1
|
||||
log_lr_base = self.edge_log_lr[edge_type]
|
||||
log_lr = log_lr_base / rank
|
||||
old_log_odds = hyp.log_odds
|
||||
old_conf = hyp.confidence
|
||||
new_log_odds = old_log_odds + log_lr
|
||||
@@ -1065,7 +1086,9 @@ class EvidenceGraph:
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"phenomenon_id": phenomenon_id,
|
||||
"edge_type": edge_type,
|
||||
"log_lr": log_lr,
|
||||
"log_lr_base": log_lr_base,
|
||||
"rank": rank,
|
||||
"log_lr": round(log_lr, 4),
|
||||
"old_log_odds": round(old_log_odds, 4),
|
||||
"new_log_odds": round(new_log_odds, 4),
|
||||
"old_confidence": round(old_conf, 4),
|
||||
|
||||
Reference in New Issue
Block a user