TATC 22 | Data Routing

Episode Summary

On this episode of Thriving at the Crossroads, Ry Walker, cofounder of Astronomer.io. Today, he talks about what Astronomer is, what it does, who it helps and how it helps. He shares how they are helping enterprises move up the data maturity mountain as a competitive move.

Listen to the episode here:

TATC Ep 22 – Data Routing Up The Mountain with Astronomer


Today, I’m pleased to introduce Ry Walker, cofounder of Astronomer.io, a data routing platform. Welcome to the show today, Ry.

Thanks for having me.

You’re our first startup from Ohio that I have introduced. I’m very excited about that, I’m expanding my territory a little bit within the US. I’d like to hear a little bit more about Astronomer and what you guys are doing. Can you tell us a little bit more about the company?

It’s not the first time we’ve heard we’re the first ones from Ohio. I’ll tell you real quick a funny story. We went through an accelerator called AngelPad, which is based in New York and San Francisco. We were the first Midwest company that’s ever get accepted into that program. It’s tricky to build a tech company in the middle of the country, but we’re seeing a lot more of that and we’re pretty excited.

TATC 22 | Data Routing

Data Routing: We’re building a data routing platform that makes it easier for data engineers to work.

We’re building a data routing platform which helps data engineers, the developers who are being asked to create data pipelines for their company. We’re building a data routing platform that makes it easier for them to do that work. We’re building open-source connectors so that they don’t have to recreate the integration between Salesforce and their Oracle database, for example. We build the connectors and we’re building standard recipes for pulling that data down. Again, helping them so that they don’t have to think so much about it and read all the API documents. That’s the main focus, this pre-technical solution.

It’s more important to talk about, why do you want to do that? What’s it good for? That’s where it really comes up down to helping larger enterprises. We move up the data maturity mountain, which is what we call the process of going from spreadsheet land, having all your data in spreadsheets to centralizing it into a warehouse of some sort, to eventually building BI dashboards. That’s table stakes these days. The more exciting things now are machine learning and trying to get predictive with your data and incorporating external data sets. At the end of the day, we’re helping enterprises move up the mountain being a catalyst for experimentation, trying some things, trying to connect their data to external analytics tools perhaps, or maybe they’ve hired an internal data science team that needs to pull data from the silos. There are a lot of different use cases but it’s all about moving up the food chain as a competitive move and to not allow your company to die.

It sounds a lot like you’re building the plumbing infrastructure. If we had a house, it’s like how do you move things from place to place and allow them between these systems to actually connect to each other versus having to build it themselves every time. Is that fair?

Yeah. I would say if you could envision a world where the plumbing could be done by a non-professional through some modular system, a do-it-yourself plumbing system. That’s more of what I would say we’re trying to build. Historically, if you want to get this kind of work done, it requires a person who’s pretty skilled from a tech standpoint. It’s really tough to find those kinds of people anywhere at this point, someone who knows how to use Spark or Kafka all these technologies that are sometimes required. We bundle all that to the system and then provide a simple interface for the developer to take advantage of those tools.

You’ve also highlighted a key challenge for a lot of companies. You’re right, it’s hard to find the talent that can connect all of these technologies, particularly when your start mixing some of our older systems with the new technologies, as you mentioned, like Spark. If we were to try to find people, like in my world that would know ERP and Spark, your pool just went down to probably a handful. You get down to much smaller sets of people that can do this. It’s intriguing as you talk about the do-it-yourself tech plumbing effectively that lets organizations be able to handle that migration between those systems.

TATC 22 | Data Routing

Data Routing: We want to encapsulate the complexity that these open-source devs are coming to expect.

We want to encapsulate the complexity and connecting to legacy systems into a nice interface that these open-source devs are coming to expect. Like the modern open-source developer, someone who’s using React, I don’t know if you know too much about some cool frontend technologies that are happening. The backend is just not quite there yet. We’re still stuck in an era where there hasn’t been someone thinking about developer experience. That’s really what we’re trying to do, is on this backend, this data plumbing stuff. Think through the developer experience, similarly the way like the people who invented Ruby on Rails and Node did it for web application development, which is where I come from originally. I’m not a data guy. Historically, I’m an application developer who saw this growing need to do these data projects and realized that it’s not really been taken care of yet by the existing options.

Isn’t it amazing sometimes that we can have the pretty frontends, but what we actually have to do to make them work on the backside can be ugly, right?

Yeah. I can tell you, we’re hiring 40 people now and we’re having to hire a dedicated specialist for each of those components. We have a Spark person dedicated and that will eventually turn into a Spark team. We use something called Apache Airflow that came out on Airbnb in our platform. We use Mesos DC/OS from Mesosphere. There are all these components that in a bigger company, there’s no way you’re hiring all those specialists. We’re trying to basically aggregate all the specialists, build a simple platform and just basically allow our customers to gain the advantage of a great platform rather than having to roll their own. It’s tempting and interesting to roll your own. It’s that or pay a big vendor for a close system. But we’re finding that more and more people want to have a more open system, which is the way we view our platform.

As we think about it, it sounds like your target market or your target audience or the kinds of customers or industries that are interested, this is really targeted primarily towards IT organizations. Is that a fair statement?

That’s funny you asked that because we’re exactly opposite of that. IT organizations are generally the brakeman, they’re pulling the brake. They’re always saying no. That’s not our way. We generally are interacting with the business leaders who want to put their organization’s data to use and they’re being blocked by internal inertia. We’re generally working with people on the edge of the org. More and more, we’ll start to see and have interactions with the central powers, but I view those people as trying to force order on a world that doesn’t want to be orderly. I see it as a force of chaos. I think organizations need to be experimental and try things and fail and that is, by definition, chaos. There are a lot of people in enterprise now who are onboard with that idea, unfortunately their IT org isn’t.

The part of it as I’ve said, there’s a natural juxtaposition between businesses and IT organizations. IT, there’s a lot that’s about keeping the lights on, making sure I don’t affect production down, etc. That I’m so busy doing that, that assessing new technology is often a huge challenge, because I’m constrained enough in keeping what I’ve got going. You’re right. A lot of times there can be “pull the brake” because we don’t have one more thing. On the business end, it’s always about, what is the next thing? How can I innovate more? How do I beat competitors to the market? There’s a natural tension between those two groups that’s just inevitable because they’ve got some slightly different purposes. While IT is to enable the business, it’s do it without risk because of the criticality of the systems. They’ve got these natural points but in a lot of IT organizations, it can be just “pull the brake” as opposed to enable and risk mitigate. It can be stop all production, don’t buy anything, which can really be like putting a choke hold on your business, from an innovation perspective.

There’s a growing movement, I don’t know if you’ve heard of Bimodal IT, the idea that the org needs to have two IT organizations. One that is keeping the lights on with the big things and no one’s denying that’s not important, but there also needs to be an agile experimental part of the org. We’re working with some companies that have that dichotomy. They have literally two different squads in order to not hold the business back. It’s a smart way to think of it. It’s a lot more complicated but all the CIOs are trying. They’re really trying to help their organization win and be innovated. But it takes a new construct to be able to make that happen.

I haven’t heard it called Bimodal IT before, but we’ve also used that in the ERP world as well, because often you end up with exactly what you described. Your teams that are keeping the lights on and keeping production running smoothly and then your teams that are deploying new projects. That’s a pretty common one that we’ve seen in a lot of SAP organizations. I haven’t called it Bimodal IT before. It’s an interesting way of looking at it. The rest of the organization, I’m not sure if how many organizations run that way, so that’s a natural thing in the world of ERP systems just because of the roll outs. In these organizations you’ve been working with, what are some of the initial challenges? Are they cultural? Are they practical for a changed management? As you’ve worked with some of the companies that are trying to adapt this, what are your observations about what you’re seeing?

TATC 22 | Data Routing

Data Routing: Every time you transform raw data into some higher level data, you’ve extracted some value from it.

I talked about inertia a lot. That data that has been at rest wants to stay at rest. It definitely exists. Airflow, which is this tool that we are using that Airbnb open-sourced, it’s all about running a big tree of data process tasks on your data periodically and building that all out in code. Again, we’re very software focused. You’re basically creating value. Every time you transform raw data into some higher level data, you’ve extracted some value from it. Again, experimentation is really important. Sometimes your first few attempts might be complete and utter failures. We basically see that if companies can start to build out this tree of data processing that’s experimental and agile and changing all the time, which is exactly what we’re providing, there’s a new inertia in play. There’s the inertia of movement. Once everyone sees this data is in motion, it’s easier to envision ways to take advantage of it.

We think of it like a lot of times there’s this boulder that you’ve got to push up a hill and it takes a lot of effort to get moving even one inch per hour. But once you get it moving, it’s a lot easier to start accelerating. That’s the catalyst we’re trying to be with our customers. There are a lot of consultants who are also trying to do this, but they’re hampered by not having the right tools to make it happen. We have a big focus on teaming up with IT consultants, data science consultants, to give them a superpower too beyond directly talking with organizations.

The world I come from and the world most of my guests come from for Thriving at the Crossroads is we’re dealing with these ERP systems as well. Have you guys built a lot of plug-ins to the ERP systems yet? Where are you at just in terms of the maturity curve in terms of what you’re building?

We build exactly the connecters that our customers who want to pay us want and need. I know PeopleSoft, we’ve connected to that. Is that an ERP?

Yes, that’s an ERP. You’ve got one.

Why we call ourselves a data routing platform rather than something that sounds more sophisticated is we really don’t want to know what you’re going to do with the data. That’s not our domain. We’re like FedEx. We want to deliver the box and then go get the next box for you. There are people that know what data needs to do and all that. We’ve built a few integrations with ERPs. We built them with databases. That’s another obviously popular source that if you have data in Oracle and SQL Server and MySQL an you want to pull those together for some dashboard, our platform is great for that.

We’re also connecting to a lot of SaaS APIs, like Salesforce and MailChimp or whatever. There’s this huge explosion of enterprise SaaS. Every enterprise SaaS product that you use is a new data silo that you’ve created. It’s worthwhile to do it but you just recognize that our data is becoming more spread out. As these new applications for data require you to centralize the data, they’re at odd ends. We just think there’s a growing opportunity and need to have a very light product that will help pull together whether it’s into Data Lake, into Hadoop. We don’t care where the data goes, but it does need to go somewhere.

Effectively, you also highlighted something I was thinking about as you were talking about this being the plumbing and connecting things, that it does become a real challenge when we start to use more of this software as a service providers. Because you have done exactly that, isolated the data, or maybe it can come out in an Excel spreadsheet. But then you can often run into a challenge depending who the provider is and how sophisticated the tools are to get your data out again so that you can do more with it, right?

TATC 22 | Data Routing

Data Routing: We’re focused on the analytics, the data science side of things.

Yeah. A lot of those vendors, they want lock-in, and for good reasons. There are a lot of ways to have apps talk to each other. You can get your Salesforce to talk to Workday. The thing is you still don’t have a copy of that data so by your co-integrating those things, you’ve actually locked yourself in a little bit further. We’re focused on the analytics, the data science side of things. You can’t analyze that data as it sits out there. You got to pull data from both of those down into a place where your data scientist’s efforts can access it.

Again, the world’s getting very complicated, so I’m not going to deny that there needs to integrate data from point to point, and there’s also needs to integrate data back down to a central analytical warehouse too. Unfortunately, you have to do all these things these days with IT. It’s going to be a great place to work for quite some time. It’s getting more complicated.

Since you guys are a startup and you’re coming out of Ohio, how has the market adaption been and how are you doing customer-wise? Do you have your first customers yet? It sounds like definitely you have some customers because you’re working with IT organizations. Where are you at in terms of your overall customer base so far?

We have 25 customers so far, which we’re pretty proud of. Half of them are regional, near us, a few hometown customers, some of the big companies in Cincinnati, some of the big startups in Cincinnati are our customers. We’re also now seeing a lot of inbound interest from elsewhere. We have several international customers now, and I think that will continue.

From a product development standpoint, we have a three-phased product development plan and we’re just exiting phase one and starting phase two. We’ll launch a new version of our platform late this summer. That’s the first step in our phase two, which is basically opening up a platform for external developers to use it directly without having to have a big relationship with us. In our phase one, we were building pipes on behalf of our customers, full service, as we build out the platform. Now, we’ve got that pretty well-built out and but not quite so polished to ask a developer to find this at 1:00 AM and have a pipeline up by 2:00 AM. That’s our vision, that it’s going to be easy. You’re a junior developer and someone asked you to do something that you really don’t want to do, for one, and maybe don’t even know how to do, and you’re going to find us as a magical answer to that solution and actually have it be a fun process.

It sounds like speed to implementation is also an important consideration in what you’re thinking about, how do we help organizations with this plumbing connection just get up and running faster?

That’s a key to experimentation. Imagine a customer says, “We need to get our Salesforce data into our Hadoop.” You could spend a lot of time asking, what exact data do you want? How do you want it? That’s what a consultant would do. That’s certainly what an internal IT person would do. For us, we basically say there’s probably three or four standard ways of pulling that data. Maybe there’s a small, medium, large integration recipe. You just pick one of those three and hit go and it will start flowing. It might not be perfect but it basically lets you get the ball rolling and figure out what does perfect look like. Again, it’s an agile approach. It’s an experimental approach that we advocate. We’ll have what we call happy path that you can just click and turn on and then iterate from there. That’s our vision for how the platform is going to work. It’s not there quite yet, but we’re basically looking for Alpha customers now who are willing to work with us despite our rough edges. That’s where we are.

Is there anything I should have asked you that I haven’t asked you so far?

TATC 22 | Data Routing

Data Routing: We’re outsiders in the space, but we know what a good developer experience looks like.

No. These have been great questions. I’m really intrigued. I’m not an ERP person. We’re outsiders in the space, but we know what a good developer experience looks like and it’s not any of the current options. It’s fun for me to get to know some of the people. It’s a little bit scary too. There are a lot of legacy competitors that have been doing this stuff for 20 years. Actually since data existed, I’m sure there’s been data integration. That’s the trick. If you were going to ask me, “Who do you see as your top competitors?” I would have cringed at that question, so I’m glad you didn’t asked that.

I was like, “You’re a startup. We don’t need to talk about your competition. We’re talking about what you’re doing and what you think about you.”

We have a competitors list and I add a new company a day, I would say.

I can imagine. They come and they go, competitors based on which plug-ins and platforms and integrations they provide today. It would change all of the time. I have one final question for you. I am a travel person, I love to travel. I always like to ask, what is your favorite travel destination you’ve ever been to and why?

I have a two-part answer. The lame answer is I love going to San Francisco. I’m a tech startup, so of course I like going to Mecca and meeting with all the celebrities. It’s super exciting when I go there, plus I think it’s a great city. I love the air there. I know it’s a weird thing to say, but here in the center of the country, sometimes humidity can get a little obsessive. Out there it’s just refreshingly low humidity all the time. I love it and the temperature.

The second part of my answer is I’m going to Ireland for the first time next month. I have a feeling it’s going to win. I have a feeling it’s going to be my new favorite.

I think it’s going to rocket up on your list. Having been there myself, it is a fantastic country, fabulous people, I love Ireland. We wish you safe travels. Thank you so much for joining us today on the show, Ry.

Thank you.

Did you know at ConsultAce we do monthly webinars on all things SAP? If you’re curious to learn more, check out our website at ConsultAce.biz/resources/webinars.