Empirical Foundation Model Benchmarks (Jan 1, 2026 - May 31, 2026)
Model Terminology
Key Term
Description
Measures the relative workload as the proportion of vulnerabilities prioritized out of all the possible CVEs: True Positives + False Positives / Everything
Measures how well our strategy covers the vulnerabilities we prioritized from all vulnerabilities that show exploitation activity (False Positives are prioritized without any exploitation activity observed): True Positives / True Positives + False Positives
Measures how accurately our strategy focuses on vulnerabilities that have exploitation activity (False Negatives are not prioritized and exploitation activity is later observed): True Positives / True Positives + False Negatives
Effort, Coverage, and Efficiency
CloseAll Published Vulns (CVEs we could prioritize)
Effort
Measures the relative workload as the proportion of vulnerabilities prioritized out of all the possible CVEs:
True Positives + False Positives / Everything
Coverage
Measures how well our strategy covers the vulnerabilities we prioritized from all vulnerabilities that show exploitation activity (False Positives are prioritized without any exploitation activity observed):
True Positives / True Positives + False Positives
Efficiency
Measures how accurately our strategy focuses on vulnerabilities that have exploitation activity (False Negatives are not prioritized and exploitation activity is later observed):
True Positives / True Positives + False Negatives