M, the model

85 years years ago was released M a famous film by Fritz Lang which depicts the contagious hysteria induced by the hunt of a child-killer.

A week ago, Anthropic announced Mythos preview, a new LLM, allegedly too dangerous to be released to everyone.

In the headlines, they have claimed to have found many legacy bugs, such as a 27-year-old bug in OpenBSD.

It did not take long to have clickbait posts and article spreading over the internet:

  • Any system can be taken down with Mythos
  • It took only $50 to hack OpenBSD
  • Millions of developers reviewed and missed critical vulnerabilities

Let's take a first step back: OpenBSD is an operating system created in 1994, it belongs to the BSD family, mostly based on the C programming language. It has a strong emphasis on security, with only to remotely exploitable vulnerabilities, we can assume the team is quite successful, so much so, finding a vulnerability, even minor, would make headlines.

OpenBSD has also a small community, one of the estimations is 0.04%, and GitHub contributors count is about 100, but the reference repository is in CVS, consequently, many members are not listed.

Looking at the patch, we can analyse the git blame of the file: most of the code comes from NetBSD, is 31 years old, the remaining code were literally written by a handful of people.

If I had to bet, I would say less than 50, 20 if I want to be playful, people had a deep look at it.

Basically, Mythos found a segfault in a 4k LoC implementation of a complex protocol in an unsafe programming language, even if most of the code out there fit this description.

Regarding the process:

In order to increase the diversity of bugs we find [..] we ask each agent to focus on a different file in the project.

This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs [..] the total cost was under $20,000 and found several dozen more findings.

Basically, they have tried each file independently until they have found an interesting bug.

The $50 dollar figure comes from the rest of the paragraph:

While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can't know in advance which run will succeed.

And this is the important point of the whole event.

Not the actual marketing stunt, the promise that, with even token we can find, and fix all bugs.

The point is that, with both proficiency with software security and LLMs, the cost to discover and exploit bugs dropped significantly, even with current generally available models.

Few years ago, I was taking an intensive training in software security, the pool of experts was small at the time, and the pool of people mastering LLMs is not bigger.

The overlap is even smaller, which creates a massive shortage, leading us to an upcoming massive era of daily security breaches.