How risky is the code your AI just wrote?
CI/CD pipelines treat all AI-generated code equally — a one-line docstring edit and a 500-line autonomous refactor get the same review process. PRISM quantifies risk from VIBES audit signals, so your pipeline can tell the difference.
Built on VIBES audit data · Attested via VERIFY · Feeds EVOLVE learning
AI-generated code is flooding into production at a pace human reviewers cannot match. But not all AI-generated code carries the same risk. A model adding a docstring is fundamentally different from a model autonomously creating an authentication handler — yet most teams have no way to distinguish between them at review time.
Without quantified risk:
VIBES already captures the signals that risk assessment needs — action types, scope, assurance levels, review status, temperature. PRISM defines how to combine those signals into a single quantified score that your CI/CD pipeline can act on.
PRISM (Provenance & Risk Intelligence Scoring Model) is a standalone risk scoring extension built on VIBES audit data. Every annotation in a VIBES audit trail carries contextual signals — what kind of action was taken, how large the change was, what assurance level was configured versus what was actually recorded, whether human review occurred. PRISM combines these signals into a single 0.0–1.0 score.
PRISM is a framework, not a fixed formula. The reference algorithm below uses a weighted average, but implementors are free to substitute their own scoring models as long as the output conforms to the 0.0–1.0 range and the risk_factors array provides transparency into which signals drove the score.
PRISM integrates with VERIFY for attested risk scores and with EVOLVE for agent learning feedback — but operates independently as its own extension.
PRISM scores map to four severity bands, each with a recommended action for CI/CD pipeline integration.
| Band | Range | Meaning | Recommended Action |
|---|---|---|---|
| Low | 0.00 – 0.29 | Routine change with minimal risk signals | Auto-merge permitted |
| Medium | 0.30 – 0.59 | Moderate risk — larger scope or assurance gap | Flag for review; require approval |
| High | 0.60 – 0.79 | Significant risk — complex change or missing review | Block merge; require senior review |
| Critical | 0.80 – 1.00 | Extreme risk — large unreviewed creation at high temperature | Block merge; escalate to security team |
The following signals are available for PRISM computation. Each signal produces a normalized 0.0–1.0 value and carries an implementor-defined weight.
temperature — Model sampling temperature at generation time. Higher temperatures increase output randomness and reduce reproducibility. optionalaction_type — Whether the change was a create, modify, or review. New file creation carries higher risk than modification of existing, reviewed code. requiredscope_lines — Total line count of the annotated change. Larger changes have more surface area for defects. requiredassurance_gap — Difference between the configured assurance level and the actual assurance data recorded. A project configured for High assurance that only captured Low-level data has a significant gap. optionalreview_status — Whether the change has been human-reviewed. Unreviewed AI-generated code carries inherently higher risk. optionalprompt_complexity — Estimated complexity of the prompt that generated the change, derived from token count and instruction structure. Available at Medium assurance and above. optionalThe reference PRISM computation is a weighted average of available signals. Implementors may substitute any scoring model as long as the output conforms to the 0.0–1.0 range and provides a transparent risk_factors array.
PRISM data is stored directly on VIBES annotation records using two optional fields: risk_score (the computed 0.0–1.0 value) and risk_factors (an array of signal assessments providing transparency into the score).
PRISM scores are most powerful when they drive automated pipeline decisions. Rather than treating every AI-generated change identically, teams can set thresholds that gate merges based on quantified risk — low-risk changes flow through automatically while high-risk changes require human review.
The vibecheck CLI provides built-in commands for PRISM evaluation. Run these in your CI pipeline to enforce risk-based gating without custom scripting.
Run vibecheck risk in your project directory. This scans the .ai-audit/annotations.jsonl file, computes PRISM scores for every annotation that has signal data, and outputs a summary with per-file scores and an aggregate project score.
Use vibecheck risk --threshold 0.6 --ci to fail the pipeline if any annotation exceeds the threshold. The --ci flag sets the exit code to non-zero on threshold violation, making it compatible with any CI system that checks exit codes.
Use vibecheck risk --format json to produce machine-readable output. Pipe this into your PR bot or GitHub Action to post a risk summary comment on every pull request, giving reviewers immediate visibility into which files carry elevated risk.
For high-stakes repositories, set a hard gate: vibecheck risk --threshold 0.8 --ci --fail-on critical. Any annotation in the Critical band (PRISM ≥ 0.80) blocks the merge and triggers an escalation notification to the security team.
A minimal GitHub Actions step that blocks merges where any annotation exceeds the High severity threshold:
The --format json flag produces structured output suitable for dashboards, PR bots, and downstream analysis tools.