In our Start Up blog series, we dive-deep into the stories of influential figures in the data science industry to uncover their unique journeys and success stories. In this blog, we have an exclusive interview featuring Tarush Aggarwal, a trailblazing expert in data-driven organisational growth.
A Carnegie Mellon graduate in Computer Engineering, Tarush got his start as the first Data Engineer on Salesforce’s analytics team back in 2011. In those early days of data analytics, he developed a critical log metric framework that enabled Salesforce to analyse customer data and establish industry benchmarks. Fast forward to more recent times, Tarush spearheaded the data function at WeWork, one of the world’s fastest-growing companies. Eventually, he took his expertise and founded The 5x Company in 2020, a venture that empowers entrepreneurs to scale their businesses effectively.
Stay tuned as we delve into Tarush’s inspiring journey and learn valuable insights from his experience in the ever-evolving world of data science.
What would you say then are the fundamentals for a start up and getting the building function right?
I’m convinced that an early stage data team has only got one job and that is to allow the business to answer questions for itself. If the business can answer questions for itself, all of a sudden, every employee in the company now has the autonomy to make a decision for themselves without depending on the data team. If they depend on the data team, you are setting this habit which is not scalable, which means as your organisation starts to scale, you’re going to have to scale your data team. And that doesn’t really work. So the first thing a data team really has to do is set up reporting in such a way that the business can answer questions for themselves.
Once you do that, and the sweet spot is what you know, is what I say is answer 80% of questions in a self- service way. So that means that if you have an intern who joins tomorrow, can this person answer complex questions on your go to market strategy or on how your customers are using your product in a self-service way. Once you set up this foundation, now, all of a sudden the data team can focus on some of the needle moving work because the organisation is now self-sufficient, and doesn’t depend on the data team to go answer these questions for themselves. If you initially jump to the recommendations or the insights, what you’re essentially doing is you’re just working on ad hoc analysis for the business. And you, just like all
of the stakeholders who depend on data, you’re just going back to the raw data and trying to work on an analysis or answer questions for the business. And that breaks very, very quickly.
What this then means is, if an early stage data team is just responsible for setting up the organisation
for self-service, then how does a data team get to this point? And in order to do that, there’s a three step process, right? The first thing you need to do is you need to ingest your data from all your different data systems into a central place. Even a small business, you might have multiple data sources. You might have your application database, which is responsible for your website or your app. You might have marketing data on Facebook or some other tools. You might have user interaction data in something like Mixpanel. You might have customer information in a CRM like Salesforce. At We Work we had over 200 different data sources. We were dealing with physical data plus online data. So even for a small business, it’s not uncommon to have 10 different data sources.
If you have to manually go and pull this data every time it’s just not going to work. Right. And a few years ago, 60/ 70/80% of the time of data teams was really spent on building these pipelines and moving data. So, you know, today with tools like FiveTran, you can now really start to automate this process. So the first thing you have to do is you have to start to ingest your data centrally. Now, once you have this data, you have all of this raw data inside your warehouse. This is not data which you want to answer questions from. Number one, it’s structured in a way which makes sense for the different applications, not for answering business questions. And number two, this data can change without any notice. So if you build an analysis on top of it, if your engineer changes something in your source system, it’s going to break your analysis.
So the second thing we have to do is we have to come up with a new layer, a new data model, which is really built in a way to answer questions for the business. Now, what are the questions you’re trying
to answer? What are the questions that are good for market strategy? What are the questions around your product, around how your customers are using your product? Figure out all those questions and then design a data model, a few of these data models, which can answer 80% of these questions. There’s no need to optimize for a 100%, but again, can you answer most of these questions from this data model? Once you design this data model, you can now walk forward from your raw data and build these transformations. You’ve now created a business layer, which is insulated from the raw data changing.
This is step two. And the third step is you now invest in these self-service tools and set them up in a way such that non-technical users can start to answer complicated questions from this data model. And this data model is really where your data scientists or your data analysts work from. They don’t go all the way back to the raw data and work from there. Because if they do that, now you have all of the same problems where if the source data changes it breaks, or let’s say you change some sort of business metric. Now you have to go and change every single job which had its own version of that metric instead of being able to change it in a single place. These are the three steps, ingestion, modeling, and then self service and this should be the only thing an early stage data team is focused on. Once you can do this, now go worry about data science, recommendation insights.
Are the same fundamentals true for large companies?
Large companies split themselves up into different missions, tribes, and chapters. So yes, it is true because again, each chapter can have this fundamental data model.
And this is really how some of the big tech companies go build products. They start off with having small teams and these teams have autonomy on the area. They also are able to ingest all the data, which they depend on, model it in a way to answer questions for themselves, and then allow anyone in those squads to answer questions. And as they scale up, they might have different layers underneath them, but it follows the same general structure. Once you set up in this way, you can support an organisation of a thousand or one. It’s the same thing. It’s the same pyramid. And it’s the same three step process at every different part of the stack.
Now companies, do you see them moving to adopt this approach or is there pushback competing kind of methodologies?
I think what we’re seeing is that once these organisations hit a certain point they bring on experienced leaders. These experienced leaders are now organising the organisation into this approach with missions and chapters. And with this approach, you naturally tend to work in a way where teams have autonomy over their area. And then the data teams naturally sort of follow in that paradigm. What’s happening is for a lot of the smaller and medium scale organisations they don’t have this awareness as yet. So they are really sort of continuing to add more and more complexity into their stack and at some point it all comes crashing down, right? At some point it becomes very, very difficult to change a small metric. And I’m sure you’ve seen this.
That’s not a problem with the people you have with the intelligence is just a problem because of how teams are structured. You know, you start spending 80% of your time on the small mundane task instead of 80% of the time on the needle moving work. That’s because you have interdependencies everywhere. So, you know, we haven’t quite seen small companies move in this sort of direction as yet and that’s something which, you know, I’m personally very invested in, in sort of helping companies with.
I think also at a sort of personal level, if the data engineers or the data scientists are starting to spend a lot of time doing that stuff, I think that can lead to a bit of dissatisfaction as well. You know, and it’s, it’s not healthy from that perspective either.
I mean for sure, like being on the data team, you’re trying to discover hidden insights in your data and really enable the organisation to figure out what to build next. Instead of that, if you’re spending time answering ad hoc questions for your marketing team or if the CEO’s got a board meeting the next day and you’re sort of staying up late because you need to put together some metrics, that’s a high stress situation. And I would bet it’s not what you had in mind when you signed up for the role. It doesn’t have to be that way. Right. What’s really sad is that there is a best practice at how to set things up so you don’t ever get
to that point. And that’s something which is missed by the majority of companies and the majority of data teams.