Running LLMs locally is one way to realize the level of hardware and infrastructure that frontier AI companies are running. Makes me wonder about future strategies.
As one commenter mentioned, 2x Mac Studio M3 Max with 512GB can run frontier models and it costs $30k (with RDMA). Apply an efficiency ratio for being in a datacenter, and you understand why OpenAI and the likes spend north of $10k _per customer_ of CAPEX.
Add to that the electricity costs and you've got a very shaky business model. I for one would like to thank the VC for subsidizing my tokens.
With that said, the VCs are not crazy and probably factored in an annual cost decrease of computing power. But how do you make sure that we won't run local LLMs when the HW becomes affordable -- if ever ?
The answer has always been the same in our industry: vendor lock-in. They are getting the users now at a loss, hoping for future captive revenues.
So, be careful when your code maintenance requires the full context that yielded that code, and that this context is in [Claude Code|Codex|Cursor].
Containerizing every app is what iOS / iPadOS already do.
It is regularly pointed out as a drawback by Android users (e.g. "I can't run that doomscrolling blocker in iOS"), but from a security-model perspective it was visionary back in 2008.
If there is any learning from the AI craze, it is that there is no coming back on the pace of breach discoveries.
Sure we've just faced an acceleration phase and a wave of patches will follow before settling in. But where we used to find x zero-day per million LoC, we will now find 10x ZD/MLoC. [hopefully detection will become part of CI so that number may vary]
So, we will have more disasters waiting to happen. Assume that they will happen.
My #1 recommendation is to curate a list of the auth tokens that you use (keep the list, not the actual tokens in a central place...), and be ready to rotate them as automatically as possible. You already have backups. Know how to rotate all your credentials.
Funny story, I actually used a very AI-generated Astro template made by someone else, and it's gotten me annoyed enough that I'm now making my own from scratch. So perhaps there will be a writeup about that.
Actually I thought the same for macOS for certain things, like window management was way better on Windows than on macOS (snapping etc.) Luckily after years macOS finally has something decent to organize windows. I know that there were 3rd party apps like rectangles but something simple like organizing windows missing from a desktop OS always felt weird.
On the other hand, at least since a couple of releases, I have lots of troubles with the highlight annotation in Preview, especially in PDFs with tables. So much so that I have to resort to 3rd party software for that (PDF Expert in my case).
But yeah, PDF support is basically native in macOS since Mac OS X.
This is a weird comment because I feel the same about getting macOS to a useable place.
I probably have 5 or 6 things installed on my Mac like Scroll Reverse and Rectangle, just trying to beat the window manager into something that resembles useable.
Nuclear is already at a much higher safety standard than 99.99%!
About costs: it is actually cheap. 95% of the average total cost of a MWh is in building the plant. Comparisons sometimes show the cost of a MWh from wind or solar, but is a fallacy because they assume an infrastructure on the side to ensure 24x7 power generation (i.e. they point out a marginal cost instead of average total cost).
Wind / solar + (largely non-existent) batteries are cheap!
Until you factor in the gas peaker plants that need to be built watt-for-watt unless you’re okay with poor people freezing in the dark, or melting in the heat. Because rich people can afford their own back up generators or on-site batteries.
As one commenter mentioned, 2x Mac Studio M3 Max with 512GB can run frontier models and it costs $30k (with RDMA). Apply an efficiency ratio for being in a datacenter, and you understand why OpenAI and the likes spend north of $10k _per customer_ of CAPEX.
Add to that the electricity costs and you've got a very shaky business model. I for one would like to thank the VC for subsidizing my tokens.
With that said, the VCs are not crazy and probably factored in an annual cost decrease of computing power. But how do you make sure that we won't run local LLMs when the HW becomes affordable -- if ever ?
The answer has always been the same in our industry: vendor lock-in. They are getting the users now at a loss, hoping for future captive revenues.
So, be careful when your code maintenance requires the full context that yielded that code, and that this context is in [Claude Code|Codex|Cursor].
reply