I dislike the title because it doesn't clearly state it's a layoff. "Building for the future" gave me the impression that it's about some major new initiative with a roadmap outlining plans.
Yes. We've since changed the top link to a third-party article. We prefer to do this with corporate press releases* - this is probably the #1 exception to HN's "please post the original source" rule (https://news.ycombinator.com/newsguidelines.html). If anyone sees a better third-party article, we can change it again.
(Edit: it's not really an exception because the purpose of a corporate press release is usually to obscure the main story, which means it's misleading, so by HN rules we should change it.)
(Edit 2: I feel like I should add that this isn't specific to Cloudflare! It's literally a generic problem.)
It's interesting how every time there's a layoff, the blog post always has a title like "Preparing for what's next" or "An update on our workforce" or "Getting ready for the agentic era"!
I’ll never forget how when I was at Google, every email with subject line “An update on X” meant X was getting axed. Like, just say so in the subject line…
> "Building for the future" gave me the impression that it's about some major new initiative...
If you'll believe them, it indeed is:
... [the Leadership at Cloudflare] have to be intentional in how we architect our company for the agentic AI era ... reimagining every internal process, team, and role across the company.
... [This layoff is] not a cost-cutting exercise ... [but] Cloudflare defining how a world-class, high-growth company operates.
... We don't want to [mass layoff] again for the foreseeable future.
... [Cloudflare] cannot rest on the workflows and organizational structures that worked yesterday. We're confident that [Cloudflare] will be even faster and more innovative [after layoffs] ...
They're architecting their company for an agentic future? They're reimaginging the definition of a world-class, high-growth company? They're not resting on the workflows that worked yesterday?
blegh
What the hell does any of that actually mean? Like in real life words? Because that much corporate bullshit really sounds like it is a cost-cutting exercise.
The blog post was published a couple months ago, and it looks like there hasn't been a follow-up release with the fully trained model. I'm not sure if there's much to take away from an early checkpoint besides the unique architectural choices they made in their model for faster inference.
Some smaller models from the LFM2.5 family have been published on Huggingface by the end of March, a month ago.
It can be assumed that this larger model takes more time to complete post-training, but it will follow in the near future after those smaller LFM2.5 models.
I tried the four pieces of text with Opus 4.7 (in incognito) and it guessed correctly for two of them, and I made sure to specify no web search and the model seems to have obeyed my instructions with that.
Although this is just a single piece of text from a prolific writer, it'll go much further with deanonymizing anyone when combining multiple pieces of text plus other contextual information about the writer that might give away their age range, location, and occupation.
How widely known were the pieces of text? Are we talking about a section of MLK's I Have a Dream speech or hand written birthday cards from your grandma?
I'm using those as the two extremes, but if it's anything by anyone moderately well known (even a lesser known piece of writing), I'm not too surprised that it didn't need the web to figure it out. It's like if you showed me a Wes Anderson film or played me a Bob Dylan song I'd never seen/heard before, I could probably still figure out who it is without looking anything up. I don't think it's surprising that an LLM can do that much better than a human can.
Now, if you're giving it things like personal emails between you and your family and it's able to guess who you are, that's much, much scarier.
As long as there's sufficient online presence otherwise I see no reason why a successful identity wouldn't be made. Unless there's significant effort put into making those emails different from the online content, and even then there will probably still be some "tells" that an AI can pick up on.
I mean I tried sending the pieces of text to Opus that Kelsey was referring to on her blog just to independently check the identification claim. Presumably those pieces of text first appeared on the web when the blog post was published a week ago, so no model should have memorized the exact text yet. My prompt had to specify no web search, otherwise Opus would try to search the web, though it didn't seem like Opus could find that blog post even when it did try to search the web.
If I'm understanding this correctly, it doesn't simulate any general purpose quantum circuit with 1000 qubits, only ones where there's a more efficient strategy than an exponential state where exact simulation is feasible.
This is the right partnership to happen. SpaceX has all the compute but is missing the talent for training LLMs, especially on the RL side. Cursor has the talent and RL stack, but doesn't have their own pretrained base model or own their compute. Both will be on a bad trajectory without cooperating because Claude Code and Codex have gained so much momentum already.
This seems like a wasted effort when AI will primarily learn the majority consensus view and not one-off misinformation. AI tries to learn pattern matching for generalization, so garbage data doesn't make AI learn the wrong patterns, at best just slows down learning the actual patterns. When most compute for training is spent on curated data and RL rather than random web-scraped data, the impact is likely negligible.
> This seems like a wasted effort when AI will primarily learn the majority consensus view and not one-off misinformation.
We have evidence to the contrary. Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus. And even though the falsity of this information was made public by the author of the experiment and the results of their actions were widely published, it took a while before the models started to get wind of it and stopped treating the fake disease as real. Imagine what you can do if you publish false information and have absolutely no reason to later reveal that you did so in the first place.
> Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus
Wrong. There are no 'majority consensus' against 'bixonimania' because they made it up, that was the point. It's unsurprisingly easy to get LLMs to repeat the only source on a term never before seen. This usually works; made-up neologisms are the fruitfly of data poisoning because it is so easy to do and so unambiguous where the information came from. (And retrieval-based poisoning is the very easiest and laziest and most meaningless kind of poisoning, tantamount to just copying the poison into the prompt and asking a question about it.) But the problem with them is that also by definition, it is hard for them to matter; why would anyone be searching or asking about a made-up neologism? And if it gets any criticism, the LLMs will pick that up, as your link discusses. (In contrast, the more sources are affected, the harder it is to assign blame; some papermills picked up 'bixonimania'? Well, they might've gotten it from the poisoned LLMs... or they might've gotten it from the same place the LLMs did which poisoned their retrievals, Medium et al.)
The LLMs didn't only talk about the disease when prompted by the neologism. They also brought it up when asked about the symptoms. From the article:
> OpenAI’s ChatGPT was telling users whether their symptoms amounted to bixonimania. Some of those responses were prompted by asking about bixonimania, and others were in response to questions about hyperpigmentation on the eyelids from blue-light exposure.
And yes, sure, in this example the scientific peer-review process may have eventually criticised and countered 'bixonimania' as a hoax were the researcher to have never revealed its falsity—emphasis on 'may', few researchers have the time and energies to trawl through crap papermill articles and publish criticisms. Either way, that is a feature of the scientific process and is not a given to any online information.
What happens when false information is divulged by other means that do not attempt to self-regulate? And how do we distinguish one-off falsities from the myriad of obscure true things that the public is expecting LLMs to 'know' even when there is comparatively little published information about them and therefore no consensus per se?
"hyperpigmentation on the eyelids from blue-light exposure" is a super specific query almost definitionally 'bixonimania' which probably brought up the 'bixonimania' poison at the time (the search hits for that query right now in Google are weak and poorly relevant so it would not be hard to outrank them or at least get into the top 50 or so where a retrieval LLM would see them and would followup), and so still an instance of what I mean.
> Either way, that is a feature of the scientific process and is not a given to any online information.
Which does not distinguish it in any way from human errors like a crank or activist etc.
And I don't know, how did we handle false information before on niche topics no one cared about and which were unimportant? It's just noise. The worldwide corpus has always been full of extremely incorrect, mislabeled, corrupted, distorted, information on niche topics of no importance. But it's generally not important.
All the examples you gave are chatbots with web search integrated. Are you sure those chatbots didn't just reference false information it found in web searches? That's fundamentally different than poisoning the training of AI models.
> The problem was that the experiment worked too well. Within weeks of her uploading information about the condition, attributed to a fictional author, major artificial-intelligence systems began repeating the invented condition as if it were real.
This seems to imply the poisoning affected the web search results, not the actual model itself, because it takes months for data to make it into a trained base model.
We already learned how to defeat this from SEO spammers and citation farmers: by building networks that cross reference and corroborate one another’s fake stories.
We’re already at a point where much of the academic research you find in online databases can’t be trusted without vetting through real world trustworthy institutions and experts in relevant fields. How is an LLM supposed to do this kind of vetting without the help of human curators?
If all the LLM training teams have to stop indiscriminate crawling and fall back to human curation and data labeling then the poisoners will have won.
I think of journalism like any other job where there's an expectation to produce results, where the main objective here is to write an article that lots of people read. It's a topic that catches a lot of people's attention, so in a sense they've succeeded by getting a lot of people to read and talk about it.
It’s like saying chirurgeon job is like any other job, and the most people operated in a minimum amount of time is all that matter to optimize. But even in the most cynical Machiavelli™ hospital, reputation and actual operational results have to be taken into account if the institution want to continue to be frequented.
reply