How not to be data driven as an org
I have spent a considerable amount of time in various organizations as part of their data teams. During this while, I had the opportunity to spend a lot of time with data people [Analytics, Decision Sciences, Analytics Engineers, Data Scientists, Applied Scientists, Machine Learning Engineers, Data Engineers, Data Analysts etc]. I also had the opportunity to work with senior leaders in the technology space and otherwise. Today, while thinking through what makes or breaks an org from being data driven, I thought I would jot down some of the obvious pointers.
Being data driven doesn’t mean intuitions are bad - Intuitions have a place in decision making and I am not questioning that. However, are you ready to put your intuitions to test, is the question. Are you intuitions well informed? If you are not ready to accept data which goes against your intuition, then there is an obvious problem. If the data leaders or business leaders not ready to put their intuitions to test, it is one sure shot way of being non-data driven. A highly opinionated data leader might not be the best for the organization either. Some highly opinionated business facing data leaders can make the data speak what their opinion is - Instead of letting the data speak what it wants to speak.
Metric definition and Metric transparency is sacrosanct - This is a problem which every small/medium/large company deals with. Various teams and even sub teams have a different definition of the same metric. This can lead to complete breakdown of the company strategy. For two different values of the same metric, the executive decision making can be very different. While, your tech team, your product team and you data team might be solving for a problem assuming a different value (or even definition) of the metric, the execs might be taking a strategic decision assuming a very different value. Transparently putting the metric definitions in an extremely visible place can go a long way in not going wrong on being data driven
Invest early in thinking through experimentation - While, many older companies might have been built without experimentation, the companies of today have the advantage of massive large scale trustworthy experimentation. Not to say that you need to have a full-fledged experimentation platform up and running before your Series A, but a small amount of thought into the same can go a long way in increasing rates of experimentation in the future. Data forms the basis for the same. For example, hashing you consumers (entity of interest) into equal buckets and rehashing them every few days is something you might want to invest in - as soon as you hit 100k users. My experience is that you can get a 10X bump in rate of experimentation in the future by having something as simple as this operational. Not thinking through the data which will power your experiments of the future is a recipe of plateaued growth
Hiring the data team can be thought through - Hiring a very large sized Machine Learning team on getting a Series A is not the smartest thing to do, unless your product is based on Machine Learning only. For a B2C or B2B company which is not all about Machine leaning, the order of hiring can be few data analysts first, then around your series A you have the luxury of having a few data platform engineers and once you have finalized your first good ML problem statement (generative, recommendations etc) you can hire a few ML scientists.
One of the things which can certainly go wrong with the data team is the love for their data product instead of the metric of interests. If you ML/Data team is more hung up on the algorithms than the metric of interest - That is another way for the failure of the team.
How many analysts, DE’s, ML Scientists do you need in a company is a good question and can be learnt from the more successful companies. 1(ML):2(DE):4(Analytics Engg) is a good baseline to start as you head towards your Series B. How lean your data team(engg team) should be is very complicated question. On the one hand, lean teams are often times more productive but then there is also the question of creating too much dependencies on too few people. This applies to Software engineering too. Do I know the answer this question? Not at all.
Build vs Buy - In a rapidly growing org, the initial data platform should always be buy. In fact, bring in a bunch of people who will help you buy the right stuff and plug them together to get you up and running. Stick to the defaults provided by the cloud provider, as much as possible. As you hit, your series C, and hit ~5-10 million users, you should start your journey towards building in house tooling. At this point, if you doing invest in an in house platform, your costs will bite you. So, up until your series B, the idea would be work with the default choices of your cloud provider. Moving from one cloud to the other is a productivity killer, by the way. A lot of companies fall in that trap and should not happen at all, if possible - Or should happen at a much later stage. Either way, moving cloud providers is no fun.
Data and metric literacy across the company - Data and metric literacy amongst product leaders, executives, and even data team members can be very low. I have seen product teams which don’t understand metric definitions, engineering teams which have no clue about the metrics they drive, even data teams which are hung up on their work more than the metric to move. Data literacy across the org has to be pushed and reinforced as many times as needed. At the cost of not sounding too autocratic, a distribution of how many times do employees(from CEO to SDE1’s) look at the standard dashboards is a good measure of how you are going towards becoming data driven or NOT.
Sessions around data/metric literacy including reading experiment results (Statistical significance/ Practical significance for the data/product teams, and things such as Sample Ratio mismatch for the data team) are vital. Having data engineers who are clueless about the business data is a way to create multiple bugs in your ELT’s. Haven’t you seen such data engineering teams? Haven’t you met data science/analytics leaders who fundamentally fail to understand Central Limit Theorem? These are all clues to the data puzzle.
Thanks a lot for bearing the first edition of my writings on various things data. Stay tuned for more such posts in the future.