BD #19 - Use Rules of Thumb for Better Team Decision Making
Nothing saps time and energy like arguing about the same things with your team. Setting and using rules of thumb can empower you to do more, faster.
It’s 08:55, Monday morning, late 2016. I’ve just walked into the office of my first ‘official’ data science job. I’ve done the doctorate, I’ve learnt the maths and stats, I’ve taught myself Python and SQL, been through Andrew Ng’s machine learning course, and practised my random forests and clustering. I feel prepared. My last few years working with machine learning and computational physics have primed me for this — the perfect mix of science-like exploration and technical challenges.
I’ve got this.
Or so I think.
I struggled. Struggled to make an impact. Struggled to deliver value. Struggled to get off the ground with things.
It felt like I was spinning my wheels. I’m doing all the stuff I’ve spent the last few years reading that "makes you a good data scientist". I’m using my hard-earned skills to test some hypotheses. Communicating with stakeholders (or so I thought). Building things.
Why wasn’t this easy?
I’m a huge believer in full-on ownership of things — if something’s not quite right I think about what I’m doing wrong. It’s most often NOT the tools or the environment, and you can do something to affect the outcome.
Eventually, I took a step back and looked hard at what I was doing. I thought about solving it like an abstract problem. Where were the cracks and leaky pipes in my process?
It turns out I was focusing on the wrong things. Sometimes even struggling to decide what to improve or tweak next. This drastically slowed me down, wasted time and effort, and led to plenty of frustration.
I didn’t find all the answers right away. In fact, many of the rules I’m going to point out below took me years of analysis and self-reflection to uncover. These heuristics come from years of headache and experience. There are many more, but these are good starters for those beginning their journey.
I’ve found using and sticking to simple rules to be invaluable. They helped me as a practitioner and individual contributor, and I think they can be even more powerful when locked into a team's understanding of how to work. Setting rules like this can empower less senior people to make good decisions in line with the team's approach and mentality. A section setting simple rules like these is great to run in a Same Page Tool session where you're getting your team aligned and on track with ways of working.
Here are three of my personal favourites that I've found help most teams get going quickly.
1 . Usefulness > Complexity
When starting, it’s too easy to get swept away with building ever more complicated solutions in the eternal quest for a better model. I remember reading everything I could on Kaggle about the hidden dangers of model stacking. I started layering more complex models on top of one another to improve whatever metrics I was optimising for.
This approach does several things that severely damage your outputs:
Cognitive load — it becomes difficult and eventually intractable to fully understand how your model pieces together and how you should next adjust your approach.
Explainability — with every slither of additional complexity, you’re making your models more opaque and more confusing to others. It becomes impossible for stakeholders to trust it.
Tunnel vision — you get lost in the model for the model’s sake, like some weird game where all that matters is the high score. Often, it’s more important to zoom out and see how this will be used or to consider other approaches to the desired outcome.
Overwhelm and fatigue —eventually, you go home for the weekend or take some other natural break from work, and when you come back, the whole thing seems super daunting. Or maybe it goes from being fun to a grind and starts to wear you down. You’ve been working on the same thing for what seems like ages.
This can all lead to weeks of hard work getting shelved or binned. Priorities change or things move on. The team concludes it’s just too hard and you chalk it up to ‘lessons learnt’. You’ve delivered nothing.
How do you combat this? Ask yourself:
"What’s the most useful thing I can deliver and how quickly can I deliver it?"
What if you just put a super-simple linear regression model out in a day? This leads to a common phrase you’ll hear variants of banded about a lot of blogs and conference talks:
“Average in production beats excellent on the shelf.”
Focusing on what’s useful completely changes the frame. A few percentage accuracy points will rarely beat getting something into the hands of the end-user 1/10th of the time (for a great read on when that 1% makes all the difference, I point you to Peter Cotton's excellent article here). As you build more and more solutions, you’ll realise that requirements almost always change as soon as you give the finished product to the end user.
Why not cut out all the tricky middle bit?
This approach will drastically change your contribution to the team. You’ll be able to deliver real value much faster and iterate towards complexity with solid feedback along the way.
2 . Data Quality > Hyperparameter Tuning
This one will become a massive thorn in your side almost immediately. The problem is it takes a while to spot it. Like the Emporer’s new clothes, no one is confident enough to point it out, and we just keep doing what’s expected of us.
I’m talking about data quality.
I’m glad to see many industry leaders now looking at pushing ‘data-centric AI’ and the importance of data quality. It’s a hugely broad topic that’s capturing the attention of many.
It’s not the sexy stuff, though, is it? Why is that?
When we get into data science, there’s something about tweaking loss functions and playing with epochs that appeals to us. It makes you feel like a pilot or something pulling off some complex precision manoeuvre. The thing is, hyperparameters will only ever take you so far. Furthermore, there are plenty of tools and libraries out there now — like HyperOpt — that’ll brute force the fun out of all that anyway.
The big wins come when you start to dig into the ugly stuff.
Why is this data such a mess?
On the surface, looking for rogue whitespace and wonky date formats doesn’t feel like the exciting career in data science I was hoping for. When you get into it, you realise data quality becomes much more. It’s a challenge that requires problem-solving skills, analytical thinking, coding, and domain knowledge (wait — that does sound like data science?!).
Getting stuck into your data quality is often the biggest bang for your buck when it comes to improving model performance. And it’s a messy enough topic that no one tool will solve for you at the click of a button.
3 . Simplicity > Novelty
We’ve all been there — it’s 02:12 in the morning, and you’ve been trawling Kaggle notebooks all night. You’ve seen something incredible that you’re just eager to try. You head to work tomorrow and convince yourself, your team, or whoever that this is the solution to your problem. The fact that it’s something Escher might call ‘slightly complicated and unclear’ doesn’t phase you — you’re a data scientist, and this stuff is supposed to be complicated.
It takes you just a bit longer than you thought, but you’ll get it done tomorrow.
Then a bit longer.
Then you get stuck.
Now plenty of time has gone past, and you’ve lost track of where you started. You’re reaching out to the community because this doesn’t match the expected outputs. It’s so fancy, why doesn’t it perform?
This can be a hard one to swallow, but you’ll notice many more senior data professionals have all completed some version of this journey. You start your career eager to use the best and most elaborate tools — and let’s be honest, they’ve fun. As you progress though, you start to realise there’s so much more than the modelling and the fancy part can always come later.
Just get something that works!
In fact, it doesn’t need to work. Anything will do. In one company, I used to tell my team to use a random number generator as the model. This meant that all the ETL, monitoring, deployment, logging etc., could be set up and, most importantly, the scoring and tracking process. Then, when it all works, you already have a nice low target to beat.
Iterate from there.
When you go into projects from now on, ask yourself, ‘what’s the simplest possible solution that’ll get me an answer here?’. If you get to the point where a simple solution gets the job done, your colleagues and stakeholders are going to appreciate how easy it is to maintain and understand. You won’t regret it.
Data and analytics teams are complicated and complex. I’ve made a lot of mistakes and have seen many teams struggle to make the impact their talent deserves. Using some loose general rules of thumb like the above (and others, only you will know what's best in your context) can really help. Even if it's just a starting point, a status quo from which you have to justify other approaches - it can unlock time and get decisions made sooner.
These rules have served me well.
I hope they help you and your teams.
Should I think of any more I’ll be sure to share them.
And if you have any, please let me know - I’ll be very grateful.
All the best,Adam
When you're ready, there are a few ways I can help you or your organisation:
Sponsor this newsletter: email [email protected] to discuss reaching 2000+ data and AI leaders that subscribe to this newsletter.
CDO-as-a-Service & Consultancy: I'll help you build and shape your data function for a fraction of the cost of a permanent Head of Data or Director hire; get in touch about this or consultancy via email to [email protected]
Coaching: I offer 1-2-1 coaching; book a coaching session here