What You'll Learn
- Why the $1.73 reverse engineering result is more important than any benchmark percentage for security leaders
- How the cost of AI-powered offensive security has collapsed while defensive costs remain high
- Why both GPT-5.5 and Claude Mythos passing AISI's \"Last Ones\" benchmark signals an industry-wide threshold crossing
- What CISOs must do to adapt their threat models for the AI cost-asymmetry era
The $1.73 Figure That Changes Everything
In April 2026, the UK AI Security Institute published its evaluation of OpenAI's GPT-5.5 cyber capabilities, and one data point dominated every subsequent security analysis: GPT-5.5 solved a Rust virtual machine reverse engineering challenge in 10 minutes and 22 seconds with zero human assistance, at a total API cost of $1.73. The same task would take a human expert approximately 12 hours. The cost ratio — roughly 400:1 in time and even starker in dollar terms — represents a discontinuity in the threat landscape that conventional security frameworks cannot accommodate.
The AISI evaluation, published on the official aisi.gov.uk domain, tested GPT-5.5 across a range of offensive cyber tasks including vulnerability discovery, reverse engineering, and multi-step network intrusion. While the model's overall expert-level pass rate of 71.4% was notable, the $1.73 figure became the focal point because it translates abstract capability percentages into a concrete economic reality. A capability that used to require a highly skilled human operator working for half a day can now be replicated by a publicly available API call for less than the price of a coffee.
This is not a theoretical risk. GPT-5.5 is publicly accessible through OpenAI's API. Claude Mythos Preview, which achieved a 78% score on the ExploitBench vulnerability discovery ladder compared to GPT-5.5's approximately 50%, is restricted through Project Glasswing. But the $1.73 figure applies to the publicly available model. The most cost-effective offensive AI tool is the one anyone can use. ExploitBench results show that Mythos is more capable on a per-episode basis, but GPT-5.5's availability makes it the more present threat for most organizations.
The Cost Asymmetry Collapse: Before AI vs After AI
Before frontier AI models reached their current capability level, the cybersecurity landscape had a rough cost parity between offense and defense. Both sides required expensive human expertise. A skilled penetration tester cost the same as a skilled security engineer. An attacker needed to invest significant resources to develop an exploit, and a defender needed comparable resources to find and patch the same vulnerability. The economics were symmetric, which meant that the defender's advantage — needing to be right every time — was partially offset by the attacker's higher cost per attempt.
After AI, that parity has collapsed. The marginal cost of an offensive AI operation approaches zero. GPT-5.5 can attempt a reverse engineering task for $1.73. At scale, an attacker can run the same attack 10 times for $17.30. With AISI data showing a 30% end-to-end success rate on 32-step attack chains, the effective cost per successful breach is approximately $5.77. No human operator can compete with that cost structure. No traditional security model accounts for adversaries who can launch thousands of intrusion attempts at near-zero marginal cost.
Claude Mythos compounds the asymmetry in a different direction. While Mythos costs approximately $203.93 per episode on ExploitBench compared to GPT-5.5's $51.40, Mythos succeeds roughly 18 times more often on autonomous cyber exploitation tasks. When adjusted for success rate, Mythos can be cheaper per successful exploit than GPT-5.5 for certain high-complexity targets. The effective cost of offensive AI is not a single number — it depends on the target, the task, and the model — but across every scenario, the trend is the same: costs are collapsing while capabilities are rising. Comparing AI model pricing across frontier models shows that the offensive AI market is becoming more cost-effective for attackers with each new model release.
| Era | Offensive Cost | Defensive Cost | Asymmetry |
|---|---|---|---|
| Before AI (2024) | High — $200+/hr human expert | High — $200+/hr human expert | Parity — both sides expensive |
| GPT-5.5 (2026) | $1.73 per RE attempt | $75K-$200K red team assessment | Massive — offense 40,000x cheaper |
| At scale (10 attempts) | $17.30 for 3 breaches | Same fixed cost | Defender cannot scale proportionally |
GPT-5.5 vs Claude Mythos: The Economics of Offense
The cost comparison between GPT-5.5 and Claude Mythos on offensive security tasks reveals a nuanced picture that defies simple conclusions. On ExploitBench, a benchmark that measures the ability to discover and exploit vulnerabilities across 122 distinct episodes, GPT-5.5 costs $51.40 per episode while Mythos costs $203.93 per episode. However, Mythos succeeds approximately 18 times more often on autonomous cyber exploitation (ACE) tasks, meaning that Mythos often achieves a lower cost per successful exploit despite its higher per-episode price.
The practical implication for defenders is that they cannot assume that a particular model's cost structure makes certain attack vectors uneconomical. The cheapest attack per attempt comes from GPT-5.5 at $1.73 per reverse engineering task. The most capable attack per attempt comes from Mythos, which achieves higher success rates on complex multi-step exploits. And for intermediate targets, GPT-5.5's 71.4% expert-level pass rate may be entirely sufficient. The threat landscape is not shaped by a single model but by a portfolio of offensive AI tools that attackers can select based on their specific objectives and budget.
The AISI evaluation further demonstrated that both models passed the \"Last Ones\" benchmark — a 32-step simulated corporate network attack that AISI estimates would take a human expert 20 hours to complete. Mythos solved it in 3 of 10 attempts. GPT-5.5 solved it in 2 of 10 attempts. The fact that two independently developed models from different organizations both crossed this threshold in the same evaluation period confirms that this is not an Anthropic-specific or OpenAI-specific capability. It is an industry-wide capability threshold that has been crossed, and the economic implications apply to every organization with an internet-connected attack surface. GPT-5.5 versus Claude model comparisons in other domains show similar convergence at the frontier.
The "Last Ones" Benchmark: An Industry-Wide Threshold
The \"Last Ones\" (TLO) benchmark, developed by AISI in collaboration with academic researchers, is a 32-step corporate network attack simulation that spans initial reconnaissance, privilege escalation, lateral movement, persistence establishment, and data exfiltration. Each step requires the model to analyze outputs, make decisions, and execute commands in a simulated environment. The benchmark was designed to be difficult enough that no AI model could complete it — hence the name \"The Last Ones\" — but both GPT-5.5 and Claude Mythos proved that assumption wrong within weeks of each other.
The larger significance of the TLO results is that they demonstrate autonomous multi-step reasoning at a level that was previously considered years away. A 30% end-to-end success rate on a 32-step attack chain means that a model can sometimes execute a full corporate intrusion without human guidance. The steps that models fail on tend to be scattered across the chain rather than concentrated at any particular stage, which means that a human operator with basic technical skills could potentially assist the model through its failure points and achieve a much higher overall success rate.
AISI's subsequent analysis, published in May 2026, confirmed the accelerating trend. The institute tested an additional set of cyber ranges and found continued improvement across both models, with performance scaling directly with inference-time compute. The implication is that as hardware improves and inference costs decline, the success rates on multi-step attack chains will increase without any further model advancement. The cost of offensive AI is not just low today — it is on a trajectory to approach zero for an expanding set of tasks. Defensive AI security results from Mozilla and OpenBSD demonstrate that AI can defend as well as attack, but the defender still carries the burden of patching every vulnerability while the attacker needs only one.
The Defender's Dilemma
Cybersecurity has always had an asymmetry problem. Defenders need to be right every time; attackers only need to be right once. What the AISI data and the $1.73 figure reveal is that the cost of being an attacker is collapsing while the cost of being a defender remains stubbornly high. An organization can spend $200,000 on a comprehensive red team assessment, and an attacker can probe that same organization's defenses for $17.30 across 10 attempts. The defender's investment is a fixed cost. The attacker's investment is a variable cost that can be scaled linearly with the number of targets.
The defender's dilemma extends beyond cost to capability. A security team can improve its detection and response capabilities, but each improvement addresses specific attack vectors while the attacker can shift to a different vector at zero marginal cost. AI makes the attacker's job not just cheaper but more adaptable. A human attacker who specializes in web application vulnerabilities may struggle with binary exploitation. An AI model can switch between vulnerability classes, operating systems, and attack patterns without retraining.
For CISOs, the implication is that traditional threat modeling — which assumes that attackers face meaningful resource constraints — is no longer adequate. The threat model must account for adversaries who can run thousands of attack permutations at near-zero cost, who can scale their operations linearly with API access, and who can switch between attack vectors faster than any human team can respond. The question is no longer whether an attacker will find a vulnerability. The question is how much they are willing to spend to find one, and the answer is increasingly measured in cents. Anthropic's safety guardrails attempt to address this asymmetry for Mythos, but GPT-5.5's availability creates a gap that no single company's safety policies can fill.
What CISOs Must Do Now
The first and most urgent action for security leaders is to update their threat models to account for AI-driven cost compression. Any risk assessment that assumes a human attacker with finite time and budget is outdated. The relevant question for each vulnerability class is not \"would a human attacker bother?\" but \"can an AI model exploit this at scale for under $100?\" If the answer is yes, the vulnerability must be treated as actively exploitable regardless of its complexity or the skill level required to trigger it manually.
The second priority is to invest in AI-powered defensive tools that can match the speed of AI-powered attacks. Manual vulnerability scanning and quarterly penetration testing are no longer sufficient. Continuous AI-assisted code review, automated patch management, and AI-driven security monitoring are becoming essential infrastructure rather than optional enhancements. Organizations that delay this transition will face an expanding window of exposure as offensive AI capabilities outpace their defensive upgrades.
The third priority is to engage with AI security programs that provide structured access to defensive AI tools. Project Glasswing, AISI's evaluation frameworks, and industry-specific AI security initiatives offer templates for how organizations can integrate AI into their security posture without creating additional risk. The window for establishing these capabilities before offensive AI reaches commodity pricing is finite. The $1.73 cyberattack is not a warning about the future. It is a description of the present.
Conclusion
The AISI evaluation of GPT-5.5 established a data point that security leaders cannot ignore: a task that cost 12 hours of human expertise now costs $1.73 and 10 minutes of AI compute. The cost asymmetry between offense and defense has collapsed from rough parity to a factor of thousands. Both GPT-5.5 and Claude Mythos have crossed the \"Last Ones\" threshold, proving this is an industry-wide capability shift. For CISOs, the message is that the threat model has fundamentally changed. The organizations that adapt their security posture to account for AI-driven cost compression will have a structural advantage over those that continue operating under pre-AI assumptions about attacker economics.