More

sfink · 2026-05-08T18:28:19 1778264899

You haven't needed to in quite a while. Well, assuming you're on desktop? I guess I've never tried it on mobile. That feels like asking for trouble.

I ran Zoom in my Firefox desktop browser for a while, but it tended to overheat my laptop. Other things overheat it too, so I don't know how much was specific to Zoom on Firefox.

I just checked. Still gives me the option ("Join from browser" in a less highlighted option, trying to drive you to their native client I guess.)

sfink · 2026-05-08T17:52:21 1778262741

> It's still a lot of work to get fixes upstreamed from outside

I'm going to disagree in the specific case of Firefox. First, although it has diverged a long way from its roots, Mozilla still has the community project ideal in its DNA. Enough, at least, that I stumbled while reading the clause "from outside" -- if you're finding and reporting actual relevant security bugs, you're already on the inside. SpiderMonkey in particular still has a good amount of code being written and even maintained by non-employees. (Examples: Temporal and LoongArch64 JIT support).

Second, the bug bounty program still exists[0] and is being used. If someone were sitting on a pile of AI-discovered exploits, then it has monetary value which is rapidly draining away the longer they aren't reported.[1] That's incentive to put in the work to report them properly.

Third, I agree that finding bugs is likely not the bottleneck. Validating them is. With previous models, the false positive rate was too high so they required too much work to whittle down to the valid ones. A PoC is a very strong signal that a bug is valid, and that's where I just don't believe you: without a really good harness, I don't think Opus was good enough to find very many bugs with PoCs. It could find some, just not very many.[2]

[0] For now. It remains to be seen how it will adapt to the AI age. For the moment, it hasn't been severely nerfed like Google's.

[1] One could make the argument that people who are inexpert enough to only be able to poke an AI to find bugs are also the people more likely to sell them on the black market rather than disclosing them. It seems plausible. Still, some people would still be disclosing, and not many were filing quality bugs pre-Mythos. Some were, but it was a trickle compared to post-Mythos.

[2] Also note that I personally, as a SpiderMonkey developer, don't find a huge amount of value in the AI-generated patches that accompany these bug reports. Sometimes they're useful to better illustrate the problem, especially since the AI's problem analysis is usually subtly wrong in important ways. They can be a decent starting point for a real patch. But I'll still need to go through my own process of figuring out what the right fix is, even in the handful of cases where I end up with the same thing the AI did.

cassianoleal · 2026-05-10T08:55:36 1778403336

Hi! First of all, thanks for your incredibly thoughtful and enlightening answers, and most of all for helping keep Firefox alive.

You said:

> Still, some people would still be disclosing, and not many were filing quality bugs pre-Mythos. Some were, but it was a trickle compared to post-Mythos.

How much of this could be just due to focus? i.e. prior to the partnership with Anthropic to test Mythos Preview, has there ever been a similarly focused project, specifically trying to find security bugs in Firefox?

sfink · 2026-05-08T06:51:40 1778223100

More, many more. See the bug reports linked in the post. I checked a few, and all of them had an exploit in them, and there are definitely more than 1 bugs listed.

Well, depending on how you define "exploit"; some might only read arbitrary pointers or just out of bounds. Those would be useful primitives in a chain of vulnerabilities, not exploits themselves. You'll have to read through the first comments yourself, but if you're hoping that this is all nonsense and ignorable hype, you're going to be disappointed.

sfink · 2026-05-08T06:42:47 1778222567

The best general purpose one, anyway. Specialty tools can be much better for their niches. Heck, compiler warnings are one such niche tool, and some of them are quite good.

sfink · 2026-05-08T06:39:54 1778222394

The raw number of things found by Claude (Opus or Mythos) was much higher and would be more comparable to a new clang warning. I vaguely remember seeing a number early on in this process that was in the mid-thousands. The 271 is a small, validated subset of that. None of the 271 were deemed false positives iiuc. Most instances of a new clang warning will be false positives. (Same as most of the raw problems reported by the AI.)

It is still unclear and open for speculation as to what percentage of all security bugs in Firefox today are being found by the AIs (as opposed to not being found at all). It might be that AI is very good at certain types of problems, even if we can't put our finger on what those types are, and that after the initial wave of bug reports the AI findings will slow to a trickle even while many many other bugs remain in the codebase. Or it might be that AI really does detect most instances of some class of problems and all those bugs will now be gone forever, never to return as long as Mozilla keeps paying the token monster. This is closely related to the oft-asked question "are we better or worse off after both attackers and defenders have access to this new capability?"

sfink · 2026-05-08T06:27:21 1778221641

The report I saw kind of seemed to be pointing at a flaw and asking "do you see it?" which is not the same thing. I felt a pretty large difference between Opus 4.6's results and Mythos's, so I would be surprised if even weaker models did anywhere near as well. I'd like to see these results, if they are using a decent methodology.

Of course, even the reports with flawed methodology could be suggesting that a great harness + weak model might achieve a similar level of results as a mediocre harness + strong model. But I'd want to see solid evidence for that.

sfink · 2026-05-08T06:19:48 1778221188

I've been thinking about this. Static analysis tools can also be much faster and most are fully deterministic, so including them in CI can catch bugs or latent bugs before they have a chance to land.

I maintain a static analysis tool using in Firefox's CI. False positives have to be fixed or annotated as non-problems in order for you to land a patch in our tree. That means permitting zero positives (false or true), which is a strict threshold. This is a conscious tradeoff; it requires weakening the analysis and getting some false negatives (missed bugs) in order to keep the signal-to-noise ratio high enough that people don't just ignore it and annotate everything away, or stop running it. Nearly all static analysis tools have to do this balancing act.

AI, as commonly used, is given more leeway. It's kind of fundamental that it must be allowed to hallucinate false positives; that's the source of much of its power. Which means you need layers of verification and validation on top of it. It can be slow, you'll never be able to say "it catches 100% of the errors of this particular form: ...", and yet it catches so much stuff.

Data point: my analysis didn't cover one case that I erroneously thought was unlikely to produce true positives (real bugs), and was more complex to implement than seemed worth the trouble. Opus or Mythos, I'm not sure which, started reporting vulnerabilities stemming from that case, so I scrambled and extended the analysis to cover the gap. It took me long enough to implement that by the time I had a full scan of the source tree, Claude had found every important problem that it reported. The static analysis found several others, and I still honestly don't know whether any of them could ever be triggered in practice.

I still think there's value in the static analysis. Some of those occurrences of the problematic pattern might be reachable now through paths too tricky for the AI to construct. Some of them might turn into real problems when other code changes. It seems worth having fixes for all of them now for both possibilities, and also for the lesser reason of not wanting the AI to waste time trying to exploit them. At the same time, clearly the cost/benefit balance has shifted.

They could also team up: if I relax my standards and allow my analysis to write an additional warnings report of suspected problems, with the clear expectation that they might be false alarms, then I could feed that list to an AI to validate them. Essentially, feed slop to the slop machine and have it nondeterministically filter out the diamonds in the rough.

Food for thought...

pabs3 · 2026-05-08T12:40:01 1778244001

Which tool?

sfink · 2026-05-08T15:48:37 1778255317

https://firefox-source-docs.mozilla.org/js/HazardAnalysis/in... though I really ought to update that page.

It's for detecting a specific situation: you grab a pointer to a GC-managed object, call something that might possibly trigger a GC even if it probably won't, and then use the pointer. (The GC might collect that object, or it might move it somewhere else.)

Claude is pretty good at weaponizing these UAFs.

sfink · 2026-05-08T05:56:00 1778219760

> But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

I'm guessing a bit, but for example: out of bounds reads are not memory corruption. Assertion failures in debug builds are also usually not memory corruption, and I'd guess that many of these bugs were found through assertions. (Some parts of Firefox like the SpiderMonkey JS engine make heavy use of assertions, and that's the biggest signal used for defect validation. An assertion firing is almost always treated as a real and serious problem. Though with our harness, Opus and Mythos try to come up with an exploit PoC anyway.)

ZrArm · 2026-05-08T11:27:32 1778239652

It makes sense, thanks, even though that wording is still somewhat confusing.

sfink · 2026-05-08T05:36:32 1778218592

At Mozilla, everything is called a bug. It's what other systems call an "issue". So it's too late for your terminology at Mozilla. (Example: I have a bug to improve the HTML output of my static analysis tool. There is nothing incorrect or flawed about the current output.)

At Mozilla, but not everywhere: exploits are a subset of vulnerabilities are a subset of bugs.

freedomben · 2026-05-08T10:59:13 1778237953

Fwiw i think this is right. A bug is anything that doesn't do what you want it to do, and nobody should want a vulnerability in their software

darkwater · 2026-05-08T16:10:15 1778256615

That's why they created Bugzilla :) (https://www.bugzilla.org/blog/2023/08/26/bugzilla-celebrates...)

sfink · 2026-05-07T15:44:10 1778168650

You keep saying this, when it is counter to the actual text of the article.

> Yet aside from the raw corruption, these incidents also raise a larger question....What broader damage does this kind of unchecked insider trading do?

That is what the article is about. It's saying that this is corruption, and discussing what the effect of this corruption is. "Beyond just corruption" is wholly your invention. I don't even know what it would mean?

Corruption is the cause. The article discusses the resulting effect.