ZeroDayBench Replication: What Actually Holds Up in Practice

One of the stranger things about AI security is how many people trust benchmark scores they would never trust anywhere else. If someone told you a new static analyzer catches 90% of vulnerabilities, your first question would be: 90% of what? In what code? Under what assumptions? What did it miss? But when an LLM benchmark shows a leaderboard, people often skip those questions and go straight to conclusions. ...

March 21, 2026 · 4 min · Napat Boonsaeng