Hacker Newsnew | past | comments | ask | show | jobs | submit | galkk's commentslogin

Given rapidly decelerating quality of, at least, claude code output, the agentic coding use may decrease. It is insane how bad the results of background agents are now: constant hallucinations, nonsensical outputs.

The heavy users of Claude at my job disagree (me included), our work gets shipped and the quality has increased by all metrics. Are you talking about enterprise or consumer Claude subscriptions? I think they're serving drastic different quality depending on how much $ you fork up.

I don't see much sense to have hn as support thread, but here are quotes from my single claude investigation session, and that happens in every claude code session that I have, especially with 4.7

* The first agent's claim that was 3.x-only was wrong * is nice-to-have but doesn't target our exact case as cleanly as the agent claimed. * The agent's "direct fix for yyy" is overstated. * not 57% as the earlier agent claimed

etc etc etc

And I forgot how many times my session with claude starts: did you read my personal CLAUDE.md and use background agents for long running operations?

I use enterprise subscription, max effort, was with both 4.6 and 4.7.

And please refrain from comments like "you're using it wrong", as the drop in output quality is very clear and noticeable.


Much like Windows threads with people experience strange bugs without knowing any of their workloads and tools, it's impossible to say. We've got a team of 30 using it full time, and as a member of end leadership I would be hearing if it was constantly missing expectations. It did take iterations to get here, as with everything.

Some of the usual suspects when people are getting bad results: * Overbloated claude.md, it should not contain everything, it should be a table of contents pointing to other files * Max effort - why? Overthinking on simpler tasks results in degraded quality, much like in humans. * You speak of your single session but with agents reviewing other agent outputs. Without knowing your goal and your prompt, and what the agents had access to, my first inclination is that the initial request was vague, a bunch of unnecessary info was returned, and your review step caught that extra jank.

I'm not gonna bother making the joke you allude to, but every single employee I've worked with in person has had glaring holes in their setups which, once solved, dramatically reduced stuff like what you're talking about.


After your comment I went to original and it really looks like ai assisted rewrite with prompt like “give more explanations about basic concepts”…

I’m curious, I have 10gb switch and 5gb fiber internet. Will such adapter work on Xbox searies x?


Don you know?

The world peace and harmony will be achieved when all the good guys will gather together and kill all the bad guys.


It was good for you but you don’t address reality of life.

Here’s one possible scenario: After graduation, you (or somebody else) shares the program with friend, with a promise to not to share further. Soon enough, it’s on everybody’s calculator. What did real educational thing for you, is just cheat where one needs to press the right buttons and get the right answer. This completely destroys the educational purpose, but significant amount of people just don’t care and want to get a pass.

Yes, there always is a counter weapon by teachers: for example, to point to random line and ask to explain and whatever, but this is not (always) scalable.

I’ve seen this in reality in college, when there was a cs/database course final project implementation, written in Delphi (very popular at a time in xussr), that was passed from year to year, that the professors and ta were so fed up, that I got almost auto pass because I wrote mine in C++…

——

To summarize - the overinreasing amount of pure slop is seen everywhere. Regular multi-thousand line prs where author didn’t even bother to look into code, written by ai. Just prompt -> commit, push, or. Nobody wants to deal with that

Same is happening here - u it’s not to punish people who use tool in proper context, it’s to filter out people who just don’t give a fuck.


Thank you!

Do you have anything else as useful as this? THis is perfect


Well I suppose I can't miss out on the opportunity to plug my open-source menu bar app for voice-to-text in any app! Going on three years of development, believe it or not.

https://github.com/corlinp/voibe


Ended up with Victor Mono and indeed, I used that font for years before switching to Iosevka.

The game certainly needs a progress bar (I tried on iPhone) and option “there’s no chance in world that I will ever use any of proposed options”.

Funny enough I realized that every several years I oscillate between trying to get readable narrow fonts (that brought me to Iosevka) and wide ones (Azeret Mono, anyone?)


Victor Mono made it fairly late in my rounds, but the "@" character looked terrible as rendered in firefox. It looks a bit better in my terminal, which points out to a downside of this that others have observed.


I’m very conflicted about the message. Author takes a specific , rather simple, oltp case, like joining one table with, essentially, dictionary tables (that most database servers may get into memory and have essentially hash joins) and ends up with generic statement. Yes, in the cas that you’re analyzing it’s fine.

I always was thinking about guide like “joins are expensive” for cases like here’s query in your relational database, here are multi table joins, on top of them there are more complex filters (especially if there are subqueries and/or ), statistics is stale-ish, cardinality estimation goes out of the window and join ordering problem kills you. Especially bad when the same query was working no problem yesterday.

And this is the place when people usually quickly start to study query hints section of their server of choice. (pg_hint_plan)

And, as usual, quote from https://www.vldb.org/pvldb/vol9/p204-leis.pdf

> … For all systems we routinely observe misestimates by a factor of 1000 or more. Furthermore, as witnessed by the increasing height of the box plots, the errors grow exponentially (note the logarithmic scale) 207 as the number of joins increases [21]. For PostgreSQL 16% of the estimates for 1 join are wrong by a factor of 10 or more. This per- centage increases to 32% with 2 joins, and to 52% with 3 joins.


Is there anybody in this story who ends up appearing as not a complete nutjob?


squidfunk that just rewrote the entire thing as Zensical.

https://github.com/zensical/zensical


Yeah, the author fails to present his case even in the intro

> A CRDT merge always succeeds by definition, so there are no conflicts in the traditional sense — the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails. This project works that out.

It has clear contradiction. Crdt always succeed by definition, no conflicts in traditional sense so (rephrasing) conflicting changes are marked as conflicted. Emm, like in any other source control?

In fact, after rereading that intro while writing that answer I start suspect at least smell of an ai writing.


The benefit of using a crdt for this is that you can get better merge semantics. Rebase and merge become the same thing. Commits can’t somehow conflict with themselves. You can have the system handle 2 non conflicting changes on the same line of code if you want. You can keep the system in a conflict state and add more changes if you want to. Or undo just a single commit from a long time ago. And you can put non text data in an crdt and have all the same merge and branching functionality.


The README of the repo offers a hint:

> The code in this project was written artisanally. This README was not.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: