Hacker Newsnew | past | comments | ask | show | jobs | submit | Tarcroi's commentslogin

I find it improbable that a car could have gotten through in a marathon as big as London's, with its incredible organization. In any case, having run quite a few marathons, I can tell that a marathon is anything but a distraction, it's a real challenge.

This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?


It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.


You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.


They probably do, then they pipe it into a bunch of Claude subagents and then you get the current mess.


Different thing. SkillKit distributes skills to agents. Skrun runs them as APIs.


As I mentioned in another comment here, I've been working on an open-source alternative. Multi-model, 5 providers with fallback. Happy to share the repo if you're interested.


I agree on the commodity point, that's why I went multi-model from start.

The registry question is the one I'm thinking about the most. Right now it's flat. I plan to integrate usage data (success rates, cost, trust scores). So the registry tells you which skills actually work well, and that's valuable.

Your article looks interesting, I'll read it.


I've been building exactly this. it's Open source, multi-model (5 providers with fallback), from now, it runs locally but the architecture is designed for self-hosted deployment.


You're right. For now, it's only local. For a public deployment, the idea is to have sandboxes and verification steps. That won't completely eliminate the risk of prompt injection, but so far no solution has managed to completely resolve this problem.


Thanks! Sandbox deployment is planned in the roadmap. I already have a RuntimeAdapter interface in my architecture that I'll use to isolate the VMs. I'm doing exactly the same thing: I'm cross-referencing the models to challenge their plan, and my code reviewer agent's API is a big help.


Can this be run a second time and compared against a previous audit?


Curious, are you thinking about this for continuous monitoring, or more for before/after comparison when agent get updated?


Both. In my opinion, an agent has a life cycle and needs observability.


It's true. Make sense


Thanks for asking. Not yet, but it is in backlog. I will be doing this in the future.


Hi, I'm the "colleague", Impatient to have your feedback!


Thanks for sharing a cool project! Just fyi, more idiomatic English would be "eager to have your feedback" since "impatient" implies frustration.


Ha, thanks for the correction! I'll remember that!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: