← Previous · All Episodes · Next →
The story of Pydantic and Logfire | Samuel Colvin Episode 110

The story of Pydantic and Logfire | Samuel Colvin

· 35:28

|
Jack Bridger:

Imagine you started writing a data validation library in 2017. It's just for yourself because you felt it was missing from Python, but a few things go your way. Firstly, there's a big movement in programming in general back towards types And secondly, in 2020, OpenAI released GPT free and gradually and then suddenly every company in the world needs to validate unknown data coming back from an API. Your library is used by Facebook, Google, Anthropic, OpenAI, even the NSA. It's one of the most popular libraries in the world.

Jack Bridger:

So how do you build a startup of this? You could launch a hosted version, you could add premium features like tailwind do or you could rip up the playbook entirely. This is an interview with Samuel Colvin, the creator of Pydantic. Last month, Samuel announced that he'd raised 12 and a half $1,000,000 from Sequoia to build something that has very little to do with data validation. He's building LogFire, an observability tool and his reasons are very interesting.

Samuel Colvin:

At the risk of getting kind of philosophical and looking into the camera there's been a like progressive change in software over the last, like, 30 years. So nineties, everything was strictly typed. We had SQL databases and we had c plus plus and Java. And then everyone discovered that they really wanted the the, like, rails taking off and and having the the everything was relaxed, and you could do it however you wanted. And we had Python, and we had JavaScript.

Samuel Colvin:

We had no type hints. We had no SQL databases where you could just lob anything into the database, and it would always work. And then there's been this slow progress back the other way almost since about 2012. We had, like, TypeScript, type hints in Python. Like, we like, the the no one's building or very few people are building with no SQL databases now from scratch.

Samuel Colvin:

People are back to to wanting SQL. They want turns out they want some version of those guardrails. And, obviously, they don't want it to look like it did in the nineties, but people want guardrails when they're building larger applications because this, like, anything works runs into limitations when you don't have any warning that you're doing it wrong.

Jack Bridger:

Yeah. Yeah. That makes sense. And it it it just feel like every well, I'm mostly familiar with JavaScript and then to some smaller extent Python and it it seems like there's just this big that broader movement towards that.

Samuel Colvin:

I think if you had said to everyone in 2,008 that in 15 years time everyone was spending their time getting the static type checker to pass, people would have been very confused. They would have been like, we just got away from trying to get the compiler to work, and we're now we can just write Python and JavaScript, and it always just works. Yeah. But it turns out that it's infuriating when just works covers up a whole bunch of, you know, data integrity issues or weird runtime exceptions or whatever it might be, and so turns out people want some version of that guardrails.

Jack Bridger:

Yeah. Yeah. That makes sense. This episode is brought to you by Work OS. At some point, you're gonna land a big customer, and they're gonna ask you for enterprise features.

Jack Bridger:

That's where Work OS comes in because they give you these features out the box. Features like skin provisioning, SAML authentication, and audit logs. They have an easy to use API, and they're trusted by big dev tools like Vercel as well as smaller fast growing dev tools like Nock. So if you're looking to cross the enterprise chasm and make yourself enterprise ready, check out Work OS. We've also done an episode with Michael, the founder of Work OS, where he shares a lot of tips around crossing the enterprise chasm, landing your first enterprise deals and making sure that you're ready for them.

Jack Bridger:

Thanks Workhorse for sponsoring the podcast and back to the show. And so when you when you first started Pydantic, were you like what were like the first lines of code that you were writing? Was it just

Samuel Colvin:

I was I was working on some project of my own and I wanted to be able to validate HTTP headers, and I had this weird situation where I would have code that did that validation and then I would have type hints and I had to manually keep the 2 in sync.

Jack Bridger:

And

Samuel Colvin:

that seemed like such a such a weird thing. I found the fact that it is if you if you step back and you stop thinking about the like, if you, you know, you don't think about the history of Python, where it's all come from, and you imagine being someone in your 1st week of writing code, and you get shown a data class in Python, and you get told this is the correct way to do things. You should put these type ins in here, and that will make your application safer because this clever program will become will able to come along and detect certain sorts of error. And then you try using it, and you're like, oh, by the way, oh, by the way, you can pass any value you like to that thing that says Instagenix to it. Nothing will go wrong until you try using it.

Samuel Colvin:

That is, like, really weird. I mean, I understand why we are that way. I'm not saying that that, like, data classes should be automatically sort of type checked. I think one of the good things Python has done is not bring that into the language so that libraries like Pylantic can innovate more quickly, but it is kind of weird. Yeah.

Samuel Colvin:

And that is that is the thing that kinda drives people, I think, to use Pydantic because they they they they use type hints within their program where everything is typed. It's really helpful. But you get to the boundaries of that, whether you're reading, API request or a CSV file or HTTP headers or direct user input or stuff from a from a database or whatever it might be, then suddenly those type ins are useless.

Jack Bridger:

Mhmm.

Samuel Colvin:

And PyLantic effectively, you know, allows you to use them.

Jack Bridger:

Did anything exist when you started or was it just like some stuff existed but it wasn't so great?

Samuel Colvin:

So there were libraries like Marshmallow and Django, Russell framework, and Cerberus and a bunch of other ones that did some form of data validation, yes, but none of them were using type hints because they all predated the type hint world. Yeah. So when I first started, like, playing around with can we use them, I thought I might have to do some horrid thing where I, like, read the raw Python file again to find the type hints. I didn't know about this kind of odd mistake when they were adding type hints to Python that they had led left annotations around at runtime in this done to annotated done to annotations attribute. And I was like, oh, this is much easier than I thought it was gonna be and much less hacky than I thought it was gonna be.

Samuel Colvin:

And so I released it and put it on hack and use, and it got even then, it got a lot more attention than most things I've, you know, ever got on Hacker News since almost open source libraries I created back then. People immediately were compelled by, oh, it's type hints. Yeah. But there was a sense of, oh, well, Marshmallow is the incumbent. You'll never be able to do all the stuff that Marshmallow can do.

Samuel Colvin:

And, obviously, over time, you know, I added that stuff, but it was definitely a slow progress. And and I think I look now at lots of the people doing kind of open source for for start up, as in, I'm gonna build this open source library, and then I'm gonna try and raise money off it 3 weeks later, 3 months later. Some of them are very successful, but lots of them, less so. And I feel like there's a big difference between a library like that that grew organically for year after year before I tried to make I before I needed it to be successful for commercial reasons.

Jack Bridger:

What what do you think are the biggest differences between

Samuel Colvin:

The the the I go back to downloads as as a metric of success in no small part because it's the metric in which you're most successful. I also think it's a it's the most meaningful one. In some ways, it's better, like sure. They're all vanity metrics. They're all, to some extent, to be questioned one way or another, but, like, I think it's more meaningful than GitHub stars, for example.

Samuel Colvin:

Yeah. But but that happens because over years, people adopt things. But also the just the, like, the breadth of usage and the and the the knowledge that it it definitely can be used in production Yeah. That you don't get with a library that's 3 months old where no one's using it in production yet.

Jack Bridger:

Yeah. It must be challenging if you're someone like Langchain that just, like, completely catches fire, like, immediately and, like, you're trying to, like there's that you don't have that time to kind

Samuel Colvin:

of I think that, you know, langchain has been in some ways very successful, and that's that's great for them and wish them all all, you know, best of luck. But you can see how that has come with some challenges in terms of yeah. I think if it had had 5 years to, like, rattle around in the open source ecosystem with people coming along and tweaking it here and there and improving it, it would probably be more robust now than if it, you know, is almost a victim of its own success in the sense that it, like, you know, people were so desperate to do anything with an LLM, that it took off so quickly.

Jack Bridger:

Yeah. Yeah. It's, it it's interesting. And then I I first came across Pydantic at the AI Engineer Summit. Oh yeah.

Jack Bridger:

And, Jason Liu did a talk called Pydantic is All You Need. Yep. And I'd somehow assumed that this was like really important because it just seemed to like suddenly everyone was talking about Pydantic in that little world that I was in. But I wondered if that did actually was that, like, a kind of a noticeable thing?

Samuel Colvin:

I think, not in terms of downloads directly because the numbers were were already, like, very large and there were also large organ you know, those large organizations where they're building every, basically, there are builds going on constantly, and there are presumably, if you're inside a a company like OpenAI or Anthropic and you have a big monorepo, there are basically builds going on constantly 247, and every single one of them is downloading Pydantic. So I imagine those projects and inside banks and stuff are downloading it, you know, multiple times a minute for you know, all the time. So I don't think I saw it in terms of downloads, but definitely in terms of the the kind of zeitgeist of how to build things with LLMs, Jason's, like, support of Pidantic has been incredibly valuable in, like, you know particularly in that there were lots of people building that stuff who were not steeped in the history of Python and how to go and do things, who might not find out about Pidantic if it wasn't for those who are, like, known in AI and are and are talking about using it.

Jack Bridger:

Yeah. And did that kind of come about well, completely organically?

Samuel Colvin:

Completely organically. I didn't I hadn't I don't think I I think I'd seen Jason on Twitter at that point, but I don't think I'd spoken to him at all. I think I reached out to him because I I suddenly saw on Twitter this random, like, blacked out run, massive massive, like, you know, message. Planting is all you need. You can't you couldn't dream of that kind of marketing, right, at at some, like, AI summit.

Samuel Colvin:

So

Jack Bridger:

Yeah. That's why I kind of almost, like, had to ask because it's always just like it's like literally, like, it's all clear. It's the it's like a sales pitch for Pydantic, but he has completely unaffiliated.

Samuel Colvin:

Jason is successful enough that I definitely couldn't afford to get him to to PR for us if I was having to pay him. So thankfully, he just did it for us.

Jack Bridger:

I think he was he also did another one. Right? Py Pydadsick is still all you need.

Samuel Colvin:

He's done a he's done a number. I mean, now now he's he's a friend and I speak to him quite a lot, and he's, you know, given us advice on stuff. So now it's it's less, like, purely organic, but but yeah.

Jack Bridger:

That was very cool. Sorry. Once again, this is just water. So, okay, Pydantic, very very successful. It's I think it's pretty much like, you know, it's you've kind of hit that level where it's, like, it's hard to find projects that are more successful in terms of open source projects.

Samuel Colvin:

In terms of downloads, definitely. And in terms of, like, adoption, in terms of the vanity metric of stars, we're we're nowhere, but, like, that's fine. We're a building block, and and people, like, people know it, and that's that's great.

Jack Bridger:

Yep. Yeah. Maybe there's a quick aside. Like, why do you think stars is a vanity metric and downloads is

Samuel Colvin:

Because stars is is like, you know, you can go and look at something and you can oh, that looks nice, Star. And then maybe you install it. It's useless. You or the documentation is lacking or it just turns out not to be what you need. No one goes and un stars it.

Samuel Colvin:

And so look, I've built libraries like fast UI. I think it's a really exciting idea. I really wanna work on it more. I haven't had time to go and finish it or write the documentation. It has 8,000 stars.

Samuel Colvin:

Why? Because when you read the, like, the first, paragraph of the read me, it seems compelling. And people go, oh, that's cool. Star. Then then I then they some they, you know, recommend it to someone at work.

Samuel Colvin:

Someone goes and tries to use it. It's like, it's crap. It has no documentation. It doesn't seem complete. And I we use it, and it's great for our our internal use, but that's because I built it so I know how to use it.

Samuel Colvin:

But, like, without, for example, documentation or finishing some of the edge cases, it's not much use. And I think that there are lots of other libraries out there which would fall into the same category.

Jack Bridger:

So it's like it's it's got 8,000 stars. I think Podantic has, like, 21,000. Yep. It's not, you know, almost 50% as complete as yeah.

Samuel Colvin:

Yeah. Yeah. I mean and it and it has I I don't know this, but some thousands of downloads a week.

Jack Bridger:

Okay. So Pydantic has, you said.

Samuel Colvin:

Like, whatever that comes out to be. 70,000,000. Yeah. Like, 70,000,000 a week. Right?

Samuel Colvin:

It's it's it's it's completely different. And Yeah. There are other libraries that that I maintain that it is just not a good good correlation for for whether or not things people have found things useful.

Jack Bridger:

Yeah.

Samuel Colvin:

I have this theory that to build a great open source library, it has to be understandable in 30 seconds what it does. It has to be useful in 3 minutes when you start using it, and it has to not have pissed you off after 300 hours of using it. And the stars only really covers the first point. After 30 seconds, you get what it does and it seems compelling, you click star. To even after the 3 minutes of, oh, oh, it actually works and I find it useful, you can fail and still get the star.

Samuel Colvin:

Let alone the, like, a year later of using it in my library, I actually still think it's a good library.

Jack Bridger:

Yeah. That that's a really good, good framework, actually. That should be your you should you should write a essay on that.

Samuel Colvin:

I mean, I I think it also applies to to to to product. Right? As in, like, it it maybe is different. Of course, things aren't always as easy to adopt as a as a library like that, but, you know, you should think about that. Like, it has to carry on being compelling all the way through.

Samuel Colvin:

Yeah. And if you miss any of those three points, you no one's gonna use it, in effect. As in if if no one gets it when they look at the readme, you're just getting people that are just gonna, like, bounce off and go somewhere else. If when they start trying to use it, it's annoying or confusing, a lot of people will abandon it. There are lots of other options.

Samuel Colvin:

We're not in a world of, like, you have to use the Microsoft framework, and therefore, if it takes you a week to understand it, that's just how it is. There's always another library. If it's annoying when you start using it, you'll abandon it. And if you get 2 months down the line, then you're like, it just turns out that it has it crashes occasionally and we don't know why. You'll end up ripping it out because it's there'll be something else out there.

Samuel Colvin:

And that's what's great about open source. Right? There's it's a competitive market even if it's a market of free things.

Jack Bridger:

Yeah. This is, I think this is a really good point and probably like, especially I feel like if you're like an investor it's like probably like something to really watch out for is like if someone's just hitting those like star metrics, but Yep. No one's actually using it. I have a lot of people that ask me to like start their repos and I always think it's like kind of, like, a ridiculous thing to ask Yeah. Because it's like, what do you get from that?

Jack Bridger:

Like

Samuel Colvin:

Well and also, one star doesn't get you very far. Right? As in, you know, what does Langjean have? 70, 80,000 stars? Same as fast API.

Samuel Colvin:

If you wanna be taken seriously on the star count, you need to do you gonna ask 80,000 people to style your repo? Obviously not.

Jack Bridger:

It's better to focus on, like, really hitting a compelling case Yep. And description and then going on going viral on Hacker News or something getting

Samuel Colvin:

Yep.

Jack Bridger:

A 1,000 in a hour.

Samuel Colvin:

But also, I mean, I think it's I think one of the the the reasons Pylantic is successful is it's not I said there were some other libraries in the space, I'm sure there are, but it is nowhere near as competitive space as, say, databases. Right? Because everyone knows if you build a good open source database, there are so many companies who've then made had commercial success from that. And so it's a really tough market to break through in because the level of the threshold of confidence you need to use a database is very high. There's lots of great competition.

Samuel Colvin:

Data validation, I just don't I mean, one of the characteristics is no one had just no one had put as much time into it as we had it getting it right. And Astral's another example of that. Right? They they went and took, Linting initially where, sure, there were some other libraries in Python, but there weren't funded startups who were, like, going as hard as they could to make it you know, to to work on that stuff at all. They went and made an amazing developer experience in in that, and then it went on to to formatting, etcetera, etcetera.

Samuel Colvin:

But, like, I think they did a they they've done an amazing job from a technical point of view. But whether by happenstance or deliberate choice, They chose a market where there was an enormous opportunity to do better because there just weren't that many people competing in it.

Jack Bridger:

Yeah. That makes sense. So you've got an advantage in that there aren't loads of people trying to build, compete with Pydantic in in that sense like startups. Mhmm. But I guess the challenge is, like, if you if you do build, like, a database project that becomes really popular, it's very obvious your path to kind of build a VC backed startup.

Jack Bridger:

Yeah. Whereas for you, I think there was a a lot more consideration in the direction that you take.

Samuel Colvin:

Right. And and there is not an obvious or at least a compelling to us, Pydantic as a service Yeah. In a way that there's an obvious and compelling Sentry as a service or MongoDB as a service that, like, just works, and no one has to think that hard about why you would wanna use the hosted version. There are there are some some things that are close to Pydantic as a service. We talked in our, like, roadmap article last year about building a schema catalog, some way of of, like, looking at all of your Pydantic schemers, how they change over time, detecting incompatibilities between them, etcetera, etcetera.

Samuel Colvin:

But, like, one of my well, I had kind of 2 rules starting a business. Well, 3 rules, one of which I'm prepared to break. 1st is, recurring revenue just for my own, like, sanity having, you know, by default, all the people who paid me last year, again, last month, they're gonna pay me this month is so nice. To be my own customer, I wanted to I wanted I required that I would use the thing. Even if there was some enormous market and building something else someone else wanted, I just wasn't interested because I've I run a company where you have to ask someone else whether you've built the right thing, and it's annoying.

Samuel Colvin:

And then thirdly, lots of small customers rather than a few big customers. But that one, I I'd give up if we got the right big customer. I don't really care. But the first two, I think, are really important.

Jack Bridger:

Yeah. And so how did you kinda land on, on LogFire?

Samuel Colvin:

Well, I mean, pretty much. I mean, no small part from that exact rule. I had bought LogFire dot dev as a domain name in, like, 2019, 2018, something like that because I was just and I had been frustrated with logging forever for, like, since I really started writing code seriously. It seemed so weird that you had that logs were were not nested at all, that you just had this, like, linear list of things, which you could filter by, like, what the log is called or what message is there, but, like, it just seems so inferior compared to the programming languages that were producing the logs. And the reason for that is it's very easy to collect standard out and standard error and make a logging application from that.

Samuel Colvin:

But if you are prepared to go the next level and integrate how you collect that data from your application, you get, well, nested logging, which is really tracing. And that's why we're you know? And OpenTelemetry is an enormous opportunity to disrupt observability and do it better. Mhmm. I was talking to Ben Siegelman, who was at, what were they called, LightStep, who he he had he was he started OpenTracing, which merged with another project to become OpenTelemetry and, worked on OpenTelemetry quite a lot.

Samuel Colvin:

And he said one of the reasons they did that was back in wasn't that long ago, 2015. If you wanted to go and build an observability company, you put, like, he said, 70% of your effort into building the SDK, into building the the application, the the the library that you integrated with your application to collect the data. And that was an enormous barrier to entry where Datadog had one of them for every language. What's kind of crazy is that since then, the OpenTelemetry and Splunks and everyone else, the big observability players, have really got on board with OpenTelemetry and, you know, support it and integrate with it. And so, like, it's great of them.

Samuel Colvin:

It's very kind of them to basically build this mechanism through which we can compete with them, but I'd love at some point to hear from them what what their rationale is. I have my theories, but it's it's definitely interesting.

Jack Bridger:

As to why they supported open

Samuel Colvin:

Why they're so keen to support OpenTelemetry when it yes. It has some advantages for them. Now it exists. I get it. Now it exists.

Samuel Colvin:

If you are a large company, you basically say, we must be instrumenting with OpenTelemetry so that we're not locked into any single provider. Yeah. And so once that's the case, Daystock have to be able to answer, yes. We yes. We do OpenTelemetry.

Samuel Colvin:

Otherwise, they'll miss customers. But why in 20, whatever, 2018 or whatever, they were promoting this this way of disrupting the market. Who knows? But it's very kind of them, so thank you.

Jack Bridger:

Yeah. That's that's interesting. And so I guess the obvious question you've already answered a 1000000 times at this point, but it's like what is the link between, LogFire and Pydantic?

Samuel Colvin:

And it's it's it is it is the, like, criticism question that everyone has from from investors to to engineers to everyone. I mean, the I can do the version of what the link is. The link is the the same obsession with developer experience, the same, like, by the developer, like, for developers, etcetera, etcetera. We have a we have a Podantic integration, which means you can instrument your Podantic validations and see the percentages that go that that pass on which fail, etcetera. But, really, it's, I'm not claiming there's that stronger correlation.

Samuel Colvin:

I built the thing I wanted to go and build, and it is for exactly the same people who are using Pydantic. So I think that there is the commercial argument is strong in that regard, in that the 350,000 people that come to our documentation every month are incredibly valuable in, us promoting LogFire. But, also, it was the thing we wanted to go and build. And I think that has an enormous advantage for the open source community because there is no tension with us about whether or not to take features and put them into the commercial version or the free version, and we're not in that situation, which lots of DevTools startups get into where your biggest competitor is your own open source. Yeah.

Samuel Colvin:

Or where you're like, oh, our platform's open source, so you can use it or you can use the hosted version. So we're gonna make it as hard as possible to do the hosted version. And it's this kind of, like, open source, like, kind of veneer open source, which I'm not into. I wanna do, like, full blooded. This is MIT licensed.

Samuel Colvin:

It's infrastructural software that anyone can go and really use. Yep. And then we're gonna have a closed source platform that's gonna make us money, that's gonna allow us to invest back in the open source, but we're not gonna pretend that one is the other.

Jack Bridger:

Yeah. It may it makes sense to me. I I was thinking a lot about it, like, before the show and stuff and, like, how I was kind of rationalising what like investors because you've raised from like some of the best, I mean Sequoia is like Yeah. Obviously like great and I think you got like series a really fast. Yeah.

Jack Bridger:

And to me it was, correct me, this is my feeling was like it's like you've proven that you can build like a really good developer tool. Yeah. So it's almost like you're a second time, you know, dev tool successful dev tools founder. It's like if you were backing someone that already built, like, a really good dev tool in terms

Samuel Colvin:

Right. I I mean, I think I got I hadn't I mean, look, honestly, and I I would say Zakari would agree to this, I raised at the first time founder valuations, but much more easily than most first time founders. I didn't get to raise at those crazy valuations that second time founders get because and that's fair. I haven't gone and start built and sold a successful company, but I've definitely got more to support the idea that I might be good at building things developers want than, you know, someone who's been a software developer inside, you know, than other people. So Yeah.

Samuel Colvin:

Yeah. And look, I mean, there is there is actual value in Pinantic's documentation as as an example. So I I said we have 350,000 people going to documentation each month. If you look at what it costs on Google Ads to get people to come to your site for an observability search term, it's somewhere between $10 a $150 per click. So I think right now, Sentry are advertising it somewhere around $150 a click for Django logging because everyone who's building with Django nowadays is doing it in production, is doing it for business.

Samuel Colvin:

No. It's not the, like, hobby framework anymore. And so if you get that click, that person is very likely to convert into a paying customer. But even if you don't take the $150 a click, you take the, like, $10 lower end of what what a click is worth. And if you consider this is the, like, questionable bit, that a visit to Pidantic's documentation is as visit is a as is as valuable as a Google Ads click, then you get to whatever, like $7,000,000 or, well, 3 and a half million if you call it $10, a click of of, like, equivalent marketing value from that documentation.

Samuel Colvin:

Yeah. Now you can call bullshit on that to by an order of magnitude, and it's still, like, very valuable for it's very worthwhile for us to invest in pedantic. And I think, arguably, that click is more that visit is more valuable because that person is much better time on-site, much better brand recognition, yeah, much better, like, trust in Pydantic than if you just happen to, like, accidentally click on a click on an ad on Google.

Jack Bridger:

Yeah. No. I I completely agree. And I I feel like the brand part is probably more valuable than Yeah. The clicks as well.

Jack Bridger:

And that, like, if you're just a new observability startup, probably hard to get anyone to care about you.

Samuel Colvin:

So we had this crazy time at PyCon US last year. We had a we had a booth in the middle of the conference. We were not sure. It's called Pydantic, the company. So it had Pydantic, the banner, but it was talking about LogFire everywhere.

Samuel Colvin:

We were not talking to anyone about the open source library, and yet our booth was surrounded by people coming and talking to us all week. And there was the Datadog and the, Elk booth around the corner. And I've got a picture somewhere of, like, the Pydantic booth with literally a crowd around it and, like, one guy walking past the Elk booth without looking. And that that is a, like, $12,000,000,000 public company, and Datadog are a $40,000,000,000 public company. And yet in Python, more people have heard of and wanna talk to Pylantic than to those 2 combined.

Jack Bridger:

Yeah. It makes it makes a 100% sense. And I guess observability is what there's not that many proven business models in dev tools, I feel like.

Samuel Colvin:

I mean, I think Sequoia said that basically it's databases and observability. They are the 2 like places where people have made a lot of money in the past.

Jack Bridger:

Yeah. And so it's it's a really it it it makes sense to me, you know, why they believe in you, and, it's very exciting.

Samuel Colvin:

It's really cool. It's it's it's amazing to work with such a talented team of people and, like, be able to go and build something like Log5 that I've been, like, dreaming of for years. And I had some slight feeling I was gonna try and build it myself at some point, and it was gonna have to be incredibly pared back for me to, you know, go and do it, to be able to work with, you know, 15 or so people and actually go build that platform has been a, like, amazing, somewhat, like, hair raising experience at times, but but really exciting.

Jack Bridger:

Yeah. How how does it differ to to building Podantic?

Samuel Colvin:

Well, so Podantic, I did almost all of it on my own for, like, years. If there was a problem, the only person who was ever gonna go and fix it was me. And that you know, there were nice bits of that. There was I could go and rethink things. There was the time to rewrite something if I wanted to.

Samuel Colvin:

I got my own way all the time because there was no one else to really argue with. Now in a company of, like, highly opinionated expert engineers, I don't get my own way all the time, but it's but also we move much, much more quickly and build something you know, one of the one of the it's not a tough bit of one of the one of the considerations of building something like observability is no one wants an observability platform in beta. So it's kind of weird that anyone used us in beta, but, you know, some people were kind enough to, but no one wants a half baked observability platform. It's we're not moving into a, like, greenfield market where we're the only thing available. Right?

Samuel Colvin:

We're competing with established companies, and it needs to be complete to to be usable. And so it doesn't work to build a, like, compelling demo and expect people to pay for that. The compelling demo, sure, it makes us feel good, but you actually need it to be fully functional before people will adopt it and pay for it. And so that that requires a team to go and do all the complex stuff.

Jack Bridger:

Yeah. And you you've been hiring a lot from

Samuel Colvin:

So we we haven't we haven't been hiring a lot this year. So we were we were 10 when we did our series a in the summer, and we're now we're about to be 15. So not like by any means, like, crazy growth, but, Yeah. We're hiring a bit, hiring more in Rust because we've we've been building this database. So we went through a slightly weird journey in in the database.

Samuel Colvin:

So being an observability company is really being a time series database, but just with one schema is is effectively how I would how I would think about it. And so we started off building with ClickHouse because everyone said ClickHouse is the default, and ClickHouse is the obvious thing to use. Great. So we we use ClickHouse. And then quite quickly, we realized that allowing end users to write SQL against their data directly was an amazingly amazing way of giving people phenomenal power in how they could search, but without having to have some horrific UI for, like, query builder at all and this, like, cognitive burden of, like, now I need to go and learn a new syntax for for putting together queries.

Samuel Colvin:

And another kicker is that LLMs are great at writing SQL. So we can have, like, natural language search, which is basically ask the LLM to write SQL. Problem is ClickHouse's flavor of SQL stinks. It doesn't do you can't subtract 2 datetimes to get an interval, which when you're doing Yeah. Spans where all of them are intervals is is, like, very problematic.

Samuel Colvin:

And a bunch of other, even if you do use the date subfunction and subtract to, date times to get an interval, click out until I complained about it a lot thought that 2 nanoseconds was longer than one second, because they just ignored the units on all interval comparison, which was mad. So we we quickly realized that Clegg House's SQL wasn't gonna work. We then went to Timescale DB, which is a Postgres extension and and quite a successful startup, and thought that was gonna be great because it's just Postgres. So, of course, we can optimize it and make it fast, and it has lovely, nice Postgres, SQL that everyone understands how to write. Problem was that, again, to allow users to write SQL, we had to have row level permissions.

Samuel Colvin:

To have row level permissions, you basically have to turn off all of the features that make Timescale fast. So we ended up with this, like, lovely simple SQL but very slow database. And then we discovered in, like, scrabbling around trying to work out how to solve this, Data Fusion, which is a a relatively recent Rust project, part of the Apache Foundation, which is a a basically, an analytical database from scratch or built on Arrow. And we've been using that since May. We actually only fully adopted, switched over from Timescale to what we call FusionFire, our database, like, 3 weeks ago.

Samuel Colvin:

But that that's awesome, and that is, like I think that gives us the the foundations to come build, like, a database that works for the next 5 years for us.

Jack Bridger:

That's awesome.

Samuel Colvin:

But but the other the other, like, subtle point, which I think is quite interesting, is that everyone understands the the architecture that you want from a modern database. You want data at rest in object storage like s 3, and then you in a in a format like parquet or some binary format like that, and then you wanna be able to query that data, from from nodes nearby with some kind of caching. Basically, the data structure that that Snowflake invented or or at least pioneered. And both ClickHouse and Timescale have that, but in both cases, that is proprietary, and that's what you get if you use their cloud cloud version. So although they're both open source databases, you basically get a choice between the ideal architecture and paying them masses of money to host your data or the non ideal architecture of stuff on SSD and some weird, horrid backup schedule, and self hosted.

Samuel Colvin:

Whereas Data Fusion, because it's a, like, a properly open source database without a company behind it, the the modern architecture of of store your data in object store and query it is actually possible. And because it's Rust and it's so extensible, you can go and customize it however you want.

Jack Bridger:

Yeah. That makes sense. That makes sense. I guess that's one of the challenges of open source startups.

Samuel Colvin:

So so I think one of the one of the, like, the I remember talking to Bogomill, the investor from Sequoia early on, and his take was basically I I think there were 2 ways you can could have done what we did. Basically, go and use a hosted database like ClickHouse, build your thing as fast as possible, get adoption, and accept it's gonna you're gonna, like, hemorrhage, like, margin, when you get to scale because you're using some other company, and you can solve that problem down the line. Or you go and you're basically you build a database, and you're a database company for a year. And then once you've got your database right, you can go and get the rest of your application right. And you've got this beautiful architecture, but it'll take longer to get there.

Samuel Colvin:

We've done, depending on how you look at it, the worst of both worlds. And as in we've, like, started we spent a lot of time using a hosted database, and then we've actually gone and built our own. And, you know, we might have done it differently if we had started again, but I think we're now in a really good place. Yeah. And, actually, the the thing we didn't realize when we were using Clerkhouse and Timescale was we were able to innovate on the rest of the application and the SDK and all that other nice stuff because we had something, even if it wasn't gonna last forever, that allowed us to get building the application.

Jack Bridger:

Yeah. And then once you knew how you wanted it Yeah.

Samuel Colvin:

Then And we knew a lot more about what we were gonna go and build when we went and built Fusion Fire. And one of the one of the things that's interesting about us as a team is we are not a bunch of database experts or observability experts. We're not like a team from inside Datadog who are like, we know what we what the future's gonna be. We know exactly what we're gonna go and build. We're gonna go and build that thing.

Samuel Colvin:

We're a bunch of originally Python developers, like users of the end application who then have to figure out how to build the the database. That has that means we're slower on getting the database right, but it means we have a much better understanding of what the developer wants than if we were a team who had all worked inside deep inside Datadog working on databases for the last 10 years.

Jack Bridger:

Yeah. Yeah. The empathy is there.

Samuel Colvin:

The empathy is there. Whereas if you're steeped in observability for years and have never written Python because you're, you know, building a database, how like, the empathy is much harder in a way to to bring along.

Jack Bridger:

Yeah. Yeah. Totally makes sense. I think we're coming towards the end, Samuel. So where where can people learn more about about LogFire?

Samuel Colvin:

Well, anywhere. I mean, relatively easily by by googling it, but podantic.dev and on Twitter and and LinkedIn. But Twitter in particular, I'm, you know, relatively active on there talking about LogFire.

Jack Bridger:

Yeah. And in London as well.

Samuel Colvin:

Yeah. I'm actually well, I'm at I mean, it doesn't help because it's a podcast. I'm doing I'm doing another meetup this evening for Rust and Python, but, like, I I'm going to relatively a lot of conferences on on Python in particular. So, yeah, if you're if you're at a Python conference, there's a good chance you could you'll meet me there.

Jack Bridger:

Yeah. Okay. Amazing.

View episode details


Creators and Guests

Elliott Roche
Producer
Elliott Roche
Freelance Podcast Editor
Samuel Colvin
Guest
Samuel Colvin
Building Pydantic Logfire - uncomplicated observability for Python. Sequoia Scout.

Subscribe

Listen to Scaling DevTools using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music YouTube
← Previous · All Episodes · Next →