Llm | Napat's Inverse Blog

One of the stranger things about AI security is how many people trust benchmark scores they would never trust anywhere else. If someone told you a new static analyzer catches 90% of vulnerabilities, your first question would be: 90% of what? In what code? Under what assumptions? What did it miss? But when an LLM benchmark shows a leaderboard, people often skip those questions and go straight to conclusions. ...