ZrArm's comments

ZrArm · 2026-05-11T14:20:39 1778509239

> Unless you have reasons to not believe the dozens of credentialed, well respected people in the field that have already shared their opinions after working with mythos.

Exactly the same argument was made about o3-preview, lol. But anyway, do they talk about all domains where Mythos did the leap in capabilities (math and other research, ML, SWE) or only about cybersec?

> And then there's the team at mozilla. They wrote a blog about this, and they've worked with anthropic before, using opus 4.6 and found and fixed 22 vulnerabilities. Then they worked with mythos and found and fixed 271 vulnerabilities

Those 22 bugs were found in February, at the time when Mozilla were doing first small-scale experiments with Opus 4.6 (i.e. no proper integration into workflow, likely relatively simple harness, likely only small part of codebase was covered). You can't compare "22 bugs which were found during very early attempts to apply AI" and "271 bugs which were found during large-scale codebase scanning with properly configured AI". The fact that Mozilla is pretty vague about "contribution of other AI models" makes it even worse.

> Unless you're going to accuse them of being shills, these are unquestionable numbers. The model is quantitatively better at this thing

They found another ~150 bugs after their first announce, and only like ~35 were found by Mythos. It's already very sharp drop in contribution.

> I think there are better things to accuse anthropic of, than that they are simply lying for marketing purposes.

Anthropic already used a lot of "technically correct but in fact deceiving" statements in Mythos system card. They are playing both "It's too dangerous" and "We don't have enough compute for that super model" at the moment (it's usually a big red falg). Opus 4.7 (which was likely supposed to be "Opus 5.0", given various facts) is a disaster from various points of views. Of course people don't really believe Anthropic.

ZrArm · 2026-05-08T04:13:17 1778213597

> Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

> For us this is substantial enough evidence to consider it a security vulnerability at that point

Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

[1] https://www.mozilla.org/en-US/security/advisories/mfsa2026-3...

mozdeco · 2026-05-08T12:36:47 1778243807

> But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

This is just the standard sentence we've been using for years. It has nothing to do with Mythos and for Mythos, almost all bugs show evidence of memory corruption (we do have a handful of bugs in JS IPC / JS Actors, one is in the blog post).

> Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

Yes but if we have a choice between writing exploits and scanning more source, potentially finding more bugs, then of course we prioritize the latter.

sfink · 2026-05-08T05:56:00 1778219760

> But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

I'm guessing a bit, but for example: out of bounds reads are not memory corruption. Assertion failures in debug builds are also usually not memory corruption, and I'd guess that many of these bugs were found through assertions. (Some parts of Firefox like the SpiderMonkey JS engine make heavy use of assertions, and that's the biggest signal used for defect validation. An assertion firing is almost always treated as a real and serious problem. Though with our harness, Opus and Mythos try to come up with an exploit PoC anyway.)

ZrArm · 2026-05-08T11:27:32 1778239652

It makes sense, thanks, even though that wording is still somewhat confusing.