Open Sourcing the New Data Stack: Our Seed Investment in Airbyte
Modern data infrastructures in SMBs, mid-sized, and enterprise companies are being reimagined with the emergence of Snowflake, Redshift, BigQuery, and Azure as the centering anchor. Increasingly, as data itself grows geometrically within organizations, data engineers are shifting away from storing pre-structured and pre-transformed data in data warehouses, as they are unable to fully anticipate downstream use cases — which are themselves multiplying at a similar rate. Instead, they are leveraging a “catch all” data lake and emergent products that enable data engineers to transform, adapt, and route this raw data for a growing assortment of end-user applications: business intelligence, real time analytics, customer support, the list goes on.
This standardization around the ELT paradigm has created a renewed urgency around better tooling for data ingest. In speaking with data engineers and data architects at leading companies, it’s clear that internal implementations today are still frequently a hodgepodge of custom code and various stitched together enterprise offerings, needlessly consuming engineering bandwidth to build out robust and maintainable data pipelines. This is often the end result because inevitably companies have to implement several connectors themselves in order to account for ad-hoc needs outside the “happy path”, or even more simply, they find themselves paying much more than they originally expected as their data needs grow inexorably.
For all of these reasons, we believe there exists an opportunity for an open source project to effectively commoditize these connectors, allowing enterprises infinite flexibility to adapt them for their individual needs, but still benefitting from a collective effort to handle ongoing maintenance, share best practices, and so on. Airbyte is the leading such open source project, having seen a resounding early response from the community around their vision and product velocity. More than 600 companies are now relying on Airbyte to replicate data, and there is a growing library of connectors, 20% of which have been contributed by the community — we expect that number to only increase as we start getting into the torso and long tail of connectors.
When I first met with Michel Tricot and John Lafleur, the co-founders of Airbyte, there was an instant alignment around the state of the industry and the opportunity in front of us. As I got to know them better over the course of several conversations and brainstorming sessions, two things stood out to me which convinced me that they were the right team to go after this. First, Michel previously owned this very same infrastructure at LiveRamp as director of engineering, and had a deep intuitive understanding of the problem and experience with wrangling data at true internet scale. Secondly, most of the early team (and some of the early angel investors) had previously worked together at LiveRamp; as an ex-founder myself who worked with several of the same teammates more than once, I’m always impressed and encouraged by founders who are adept at recruiting their former coworkers.
Today, I’m thrilled to announce that Accel is leading Airbyte’s seed round, alongside our friends at 8VC and leading angels who know a thing or two about this ecosystem and open source including Calvin French-Owen, founder of Accel-backed Segment, Charles Zedlewski, former GM of Accel-backed Cloudera, and Auren Hoffman, the founder and former CEO of LiveRamp. This represents the latest investment in a long standing belief in the power of open source, and our work with leading companies such as Snyk, Segment, Sentry, Vercel, Tailscale, Altinity, Heptio, and many more. And in Airbyte, we see many of the same telltale early signals of community adoption, product execution, and market pull.