It's a good question and I won't pretend to predict the future on this one. I will say, I think Airbyte Agents is in a good position because our core Data Replication product has always had to mitigate the impacts of rate limiting and cumbersome upstream APIs. The new Agents toolset gives you the ability to query the upstream APIs directly (read: as a passthrough) while also letting you bypass them entirely when your agent can answer its question via the Context Store directly. Time and feedback from our users will confirm, but I do think this gives our customers a good balance of control - when to query upstream directly and when to utilize the Context Store to work around API limitations - whether inherent or artificially enforced by the vendor.
Helpful feedback, thank you! And your instincts are spot on. As of now, we have API based search, with filter predicates and field selection in JSON. While we haven't published anything on the backend implementation, I can say it does use a cloud-native storage medium where the filters are indeed pushed down as SQL. We want to be careful about if/when we offer direct SQL access, specifically because SQL dialects can differ drastically and we wouldn't want to break consumers if/when we change which dialect(s) are supported.
That said, please stay tuned - and thank you again for this valuable feedback.
Great question, @Tsarp - Skill and tools work great together. What we've found is that agents generally need both to achieve great results. We're actually not trying to replace skills, but to give them new super powers.
Are there any examples you've run into where skills were missing tools (or data) that they needed for a specific task?
Hmm, hoping this isn't a generic LLM generated response.
Skills have the scripts folder and you can precisely describe when and when not to use a script. This can end up directly wrapping API(s), CLIs, generic scripts or even other MCP servers.
CC and codex both have the skill creator and you can have them build the skill for you.
Havent run into any scenarios where skills were missing tools. 1-2 iterations and its usually taken care off quite quickly.
Hey, fair enough. (100% human here, btw.) I think I misread your original question to be asking "why do we need a service (whether accessed via API/SDK/MCP/etc.)" vs just having skills (markdown + scripts)".
If you are already leveraging skills as scripts and APIs in your skills, then you understand the distinction. I'll attempt to re-answer your question with now hopefully a better understanding:
I think Airbyte Agents helps your agent by giving access to data across any and all of the systems it may need to get data from, or write data to. While you could hit the service APIs directly (via REST/CLI/etc.), in practice we find that not all use cases are amenable to this. Airbyte Agents does have REST APIs as well as SDKs and of course the MCP interface - so it's not really about MCP tools specifically, more about how you can access the data. The Airbyte Agents interface also reduces the number of creds that the agent needs to handle, giving a single portal (with logging and audit capabilities) for all the actions your agent is taking.
Sorry for the red herring of skills-v-tools. Let me know if you have any additional questions!
The new Airbyte Agents offering brings a ton of new capabilities actually.
1. Programmatic Interfaces: Including a new REST API, SDK, and MCP Server.
2. New action verbs: Not just replication anymore. We have get/set/list/update/upload, and more!
3. New credentials passthrough: For all the above, you OAuth to Airbyte and we OAuth on your behalf to the systems your agent needs. No need to provide your agents dozens of different secrets in order to access the systems it needs.
4. Context Store. Like your agents' own data warehouse, but completely automatic and hands-free. For those use cases that just aren't possible when calling the REST API directly.
Hi, @jessewmc. Thanks for your reply. Regarding your points:
> If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
While we haven't yet published details on the backend implementation, I can say that our implementation performs very well without needing to prioritize specific fields for indexing. We aim for large text fields to perform decently and retrieval based on small/compressible fields like ints to be fast. (More to come on this in the coming months.)
> Have you done any testing on guided indexing, or metadata layers on top of the data?
We've been testing with different data scales and shapes. Nothing detailed to share yet, but performance has (so far) never itself become the bottleneck in our agent testing. (The LLM thinking itself is often the bottleneck.)
> My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time.
Airbyte has rich metadata on our upstream connector's data models, which I think helps us a lot to deliver helpful context to the agent. Another option, when optimizing for specific use cases, is to build your own agent tools on top of our Agent SDK. This allows you to make the calls organic and build the tools in a way that makes natural sense to the agent, regardless of source shape or which system(s) that data is coming from.
> This does look like a good foundation for that kind of tooling though!
We agree! Thanks again for sharing your thoughts here.
Great launch btw! I have some questions if you don't mind
you mentioned that performance was never an issue, I am really intrigued how this is achieved.
I have 3 General questions:
1. How big (estimate in bytes) and complex were the test datasources? I couldn't find this in the benchmark repo.
2. how is the business context managed? In the blog "Airbyte Agents: A New Era for Airbyte" it was mentioned handling the business context but in the context layer docs it only talks about schema discovery (I got a bit confused)
3. When you said performance was never an issue, do you mean the user always got the answer it was looking for?
> airbyte agents could serve as a form of MCP gateway
Exactly! And a single set of tools for agents to access both realtime (direct reads/writes) as well as cached (Context Store), bringing hopefully the best access path for each different use case.
> would love a "data engineering for ai engineers" type braindump ... at AIE
Great idea - we have a booth at AIE, and we'll submit there for a talk. Mario will reach out to you about this. :)
This is where Airbyte really can shine, I think, and the total can be more the sum of the parts. Because Airbyte excels at data replication already, we can populate your the Agent Context Store without users or agents ever needing to think about the words "ELT" or "ETL".
We're listening carefully to feedback so we hope you will give it a try and let us know how it goes! Thanks!
yeah this is one of the few AI-related products that I have seen that make sense to me
but i also wonder to what extent this needs to be its own thing or if this is just something that it looks like we need but really people just need to shovel more stuff into their data warehouse / data lake that you never had reason to before, because now that's all fodder for agentic search
Great point. Many of Airbyte's customers are doing just that - adding new sources to their warehouses - like Google Drive, Gong, and a ton of sources that weren't as interesting previously for data analytics. But this creates a ton of work for the data engineering teams - to not only load all that extra data, but to deal with rate limits and then to conform the schemas into a usable format after loading.
For now, I think its 100% appropriate to think of the Context Store complementing the Warehouse and not replacing it per se. We're evaluating future integration options between the new Context Store and the traditional data warehouse, but nothing we have publicly announced as of now. I think both approaches have their strengths and killer use cases.
Hello, Jared! Small world! Yes, we did deprecate our old PbA (Powered by Airbyte) offering, but in many ways our new Agents and Embedded offering is a more robust and agent-friendly successor to that older offering.
I am happy to hear you are still getting value out of PyAirbyte! If you do try out Airbyte Agents, please let us know how it goes! We are always listening to feedback and would love to hear from you as you explore the new tools and capabilities.
reply