OpenAI and Paradigm Launch EVMbench to Benchmark AI on EVM Smart-Contract Security

Published: 2026-02-18 22:14:06 |

OpenAI and crypto VC Paradigm have launched EVMbench, a benchmarking framework that tests AI agents on detecting, patching and exploiting high‑severity vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts. EVMbench uses 120 curated real‑world vulnerabilities drawn from 40 audits and open‑audit competitions (including Code4rena) and includes scenarios from Stripe’s Tempo audit. The tool runs three modes—detect (vulnerability recall), patch (automated fixes that preserve intended functionality) and exploit (end‑to‑end fund‑draining attacks executed in a deterministic sandbox). In exploit mode, newer models performed significantly better: GPT‑5.3‑Codex reached a 72.2% success rate versus 31.9% for GPT‑5; detect and patch scores were weaker, reflecting incomplete audit traces and difficulty maintaining contract functionality after fixes. OpenAI stresses EVMbench does not capture all real‑world complexity but argues measuring model performance in economically relevant, replayable environments is crucial as AI becomes a tool for both attackers and defenders. Alongside the benchmark, OpenAI expanded the private beta of its security research agent Aardvark and committed $10 million in API credits through a Cybersecurity Grant Program to support defensive research for open‑source and critical infrastructure projects. The release underscores a growing intersection of AI and blockchain security with implications for audit automation, attacker tooling, and defensive workflows—factors traders should monitor as they may affect exploit risk, audit market demand, and valuations of EVM‑aligned projects.

Neutral

Short-term market impact on native EVM tokens (notably ETH) is likely neutral. EVMbench highlights that AI can speed exploit development (attacker tooling) but also improve defensive automation (audit and patching). The benchmark shows attackers could gain more effective exploit capabilities as models improve, which raises tail risk for vulnerable contracts and could prompt short‑term volatility around exploited projects. However, the emphasis on defensive tools—OpenAI’s Aardvark beta and $10M in API credits for defensive research—counterbalances risk by accelerating automated audits and patching, which should reduce long‑term systemic vulnerability. For traders: monitor exploit disclosures, audit adoption and premiums for professionally audited contracts; sudden exploit successes could cause sharp, asset‑specific drawdowns, while broader uptake of automated security tooling could gradually strengthen confidence in EVM project valuations. Overall, benefits to defensive workflows and funding for security research make a significant negative price shock across major EVM tokens unlikely; impacts will be concentrated and event‑driven.