Ben Dias originally wrote this article on Linkedin Pulse in March 2017 to give Data Scientists insight into what hiring managers are looking for in Data Scientists’ CVs. This article went viral and now [June 2018] has over 200,000 views, 8000+ likes, 600+ comments and 1,400+ shares. Ben Dias is a close relationship of Logikk’s after we placed him at Royal Mail and has let us republish this article on our website for others to learn from.
I’ve just completed my first round of recruitment since joining Royal Mail as their first Head of Data Science, with some successful candidates joining my team.
But having been involved in hiring Data Scientists for many years now, at the initial shortlisting stage of the process, I still find myself wishing too often that the information I’m looking for was in the CV in front of me.
It seems like Data Scientists, in general, don’t know what they should put in their CVs as they don’t understand what hiring managers are looking for.
This leaves hiring managers like me with the dilemma of either rejecting most of the CVs (and taking the considerable risk of dismissing some potentially excellent candidates) or having to employ an additional telephone screening stage to find out the information I need from the potentially suitable candidates.
Neither scenario is ideal, and both have their pros and cons.
So I thought I would try something different, and instead write down and publish what I would like to see in an ideal CV from someone applying for a Data Scientist position on my team at Royal Mail.
“Ultimately I hope this results in improving the quality of CVs across the Data Science community, and thereby help me to streamline my recruitment process as a consequence.”
1. Educational Background
The first thing I look for in a Data Scientist’s CV is evidence of a solid educational background in a heavily mathematical subject.
Almost anyone can claim to be a Data Scientist these days, just because they know how to use the multitude of machine learning libraries out there to build you a solution to your problem.
But to me, a real Data Scientist is someone who understands the technical details behind the algorithms and knows what assumptions they are making when using one algorithm vs another.
This gives me confidence that they would select the right algorithm for each specific problem and will be able to engineer the most appropriate features for that algorithm.
Therefore, I would like to see an externally validated qualification, such as a University degree or equivalent, and I’d like to view this on the first page (at least mentioned in the personal statement at the start).
2. Independent Research Experience
Data Science is by definition a research activity, where we are always looking to solve a problem where the solution is not always obvious, and success is not guaranteed. Otherwise, we are not doing real Data Science!
This is why I’ve taken Eric Ries’ very appropriate Lean Startup framework for innovating in the midst of lots of uncertainty and adapted it to make it work for Data Science.
And thus, I’m running my Data Science team at Royal Mail as a lean Start-up.
Therefore, next, I’m looking for some evidence in the CV to convince me that the candidate is capable of carrying out independent research.
The most obvious evidence for this would be that the candidate has completed a PhD, or at least an MSc that included a research project.
However, the most prominent mistake candidates make in this area of the CV is to say they have done an MSc or PhD in some specific subject (e.g. MSc in Computer Science or PhD in Statistics), and possibly mention the university.
But what I want to know is the details of their research activity and how successful it was.
Therefore, I’m more interested in the title and summary of their thesis. If the work is sufficiently novel and completed successfully, it would give me confidence in their ability to carry out independent research.
But of course, attaching their thesis to their CV is not the answer!
In fact, their ability to summarise the key aspects (context, approach, outcomes and novelty) of their research activity in one paragraph is a significant indicator of their excellent written communication skills as well.
I would also consider evidence of alternative equivalent research experience (e.g. experience as a research scientist or Data Scientist).
But in this case, it should ideally be called out, for example in the personal statement, and the examples of research projects described in the relevant section of the Data Scientist’s CV (e.g. in the work experience section).
3. Programming Skills
Next, I’m looking for evidence of the candidate’s programming skills.
Here, some candidates love to list 101 languages, thinking that it makes them look attractive. But in reality, they would only use 2 or 3 languages on a regular basis.
Here the key for a hiring manager like me is to see that the candidate has experience in at least one language of each of the following types:
- A high-level rapid prototyping language such as Python or R.
- A low-level deployment language such as Java, C++, C#, etc.
- A scalable/Big Data language such as Scala/Spark.
I would want to see all three for a Senior Data Scientist, the first two for a Data Scientist and just the first (R or Python) for a Junior Data Scientist.
The other mistake I see in CVs is just having a list of programming languages with no indication of proficiency or experience.
The ideal CVs not only list the languages along with the number of years’ experience in brackets (e.g. Java [6+ years]) but also contain the languages used in each Data Science project they mention in the work experience section.
The candidate can get lots of brownie-points by also mentioning any open-source code bases they have contributed to, or providing links to their publicly available work (e.g. on GitHub) so that I can go online and view their coding ability.
This will give me significantly more confidence in their programming skills.
4. Impact! Impact!! Impact!!!
For the more Senior Data Scientist roles, next I’m looking for the candidate’s real-world Data Science experience, and what really drives me up the wall here is when there is absolutely no mention at all in the CV of any impact they have had in the real world.
This leaves the hiring manager wondering why any company would ever consider hiring the candidate, as there doesn’t seem to be any indication of ROI (Return on Investment)!
Some candidates focus on who they reported to, others focus on the accuracy and/or complexity of the models they built, while others only mention the types of projects they worked on.
“Rarely anyone covers the most important thing I’m looking for –
what was the impact of their work?”
Why would I even consider paying them to come and work for me?
It doesn’t matter to me if the candidate reported to the CEO or if their models were 99.9% accurate.
What I want to know is what difference they made to the business that hired them.
Here it is essential to remember that all models are wrong, but some are useful!
It doesn’t even matter if your model was only 60% accurate if it improved some aspect of the business (e.g. reduced customer churn) and resulted in tangible business value (e.g. leading to annual incremental revenue of £5 million).
My ethos, which is essential in a commercial environment, is to always start with the simplest possible model and only optimise and/or add complexity if/as required.
This is precisely what the Lean Startup framework mandates and is precisely what we do in my Data Science team at Royal Mail.
This is because you would usually hit diminishing returns as you continue to optimise and/or add complexity to a model, and the key is to know when your model is good enough to have a tangible business impact, and then deliver it, realise the value and move on to the next most crucial problem.
So ideally, in the work experience section of the CV, I would like to see multiple impact statements, at least one for each Data Science role the candidate has held.
This would give me confidence that the candidate has good commercial awareness and is worth investing in, as I can expect a good ROI.
Here again, the candidate’s ability to summarise the key aspects (context, approach, impact) of their Data Science project in one paragraph is a critical indicator of their excellent written communication skills as well.
5. Coaching, Mentoring and Line Management Experience
For the more Senior Data Scientist roles, I’m then searching the CV for the candidate’s experience in coaching, mentoring and line management.
If the candidate is already operating at a Senior level, I would expect to see this mentioned in the work experience section, giving details of how many Data Scientists they have managed and for how long, and/or how many they coached or mentored and in what skills.
Here mention of any formal management, coaching and/or mentoring training courses attended would be a bonus.
What impresses me is if the CV gave an example of how the candidate managed/coached/mentored a Data Scientist who was either a high-performer or someone with development needs – providing the context, their approach and the outcome.
Again, they should showcase their excellent written communication skills by summarising this in one paragraph, instead of writing a thesis!
For a Data Scientist ready to take on their next role as a Senior Data Scientist, I would expect to see some training courses attended and some experience in supervising and mentoring/coaching at least one student and/or contractor/temp.
This will give me confidence that they are ready to take on managing and coaching/mentoring permanent staff as well.
6. Technical Breadth and Depth
Next, I would love to get a feel for the breadth and depth of the candidate’s technical capabilities, especially from the CV of a Senior Data Scientist.
Here the breadth can be demonstrated by mentioning a variety of types of problems they’ve worked on (e.g. forecasting, predicting, optimising, simulating, etc.).
However, I rarely see evidence of the technical depth in a CV. One good example I’ve seen in Data Scientists CVs is where the candidate mentioned an algorithm/library they had contributed to an open-source package.
Coding up such an algorithm/library would require not only good coding skills but also a profound level of understanding of how the algorithm/algorithms in the library work.
Another good example is where the candidate explains why they chose to use one algorithm over another for a specific project.
Here if they articulate the choice based on the assumptions behind each algorithm and properties of the data and/or problem, it shows that they didn’t just use a standard library, but understand why the chosen one is the best algorithm to use for that specific problem.
Highlighting their external accreditations, such as their Chartered status (e.g. Chartered Mathematician, Chartered Scientist, Chartered Statistician, etc.) is also an excellent way for a candidate to demonstrate their technical depth and breadth.
7. Tools and Processes
Especially for a Data Scientist or Senior Data Scientist role, I would also be looking for some evidence in the CV of the candidate’s Agile experience.
Here I’m looking for them to call out when and where they worked according to an Agile framework, and ideally which framework (e.g. Scrum, Kanban, etc.) and tools (e.g. JIRA, Assembla, etc.) they used.
From experience, I have found that combining the Hypothesis Driven Approach with the Kanban Agile framework supports the Lean Startup framework well.
Therefore, this is what my Data Science team at Royal Mail use, and we are currently in the process of migrating from Assembla to the more fit-for-purpose JIRA Cardwell.
Therefore, although I’m looking for any Agile experience, it would be a bonus to see that a candidate has used Kanban and/or the Hypothesis Driven Approach in at least one Data Science project.
I will also be looking for their experience using different environments (e.g. Linux, Windows, Hadoop, Cloud, etc.) and of using best practice processes such as version control (e.g. Git) and documentation (e.g. Wiki). Here the best Data Science CVs list these aspects of each of the projects mentioned in the work experience section, while also listing these with the corresponding years’ of experience in brackets in a separate section.
8. An Open Mindset
An open mindset and a commitment to continuous learning are vital, especially for a Data Scientist, as our field is continuously changing at pace. Therefore, I am always encouraging my Data Science team at Royal Mail to spend some time learning something new each week.
This does, of course, reduce the team’s capacity available for project work. But I consider this as an investment rather than a cost, because it is, and I’ve seen the benefits of continuous learning realised time and time again.
Therefore, another critical aspect I look for in a data science CV is an open mindset and a commitment to continuous learning and continuous professional development.
The usual evidence for this would be regular MOOCs or other training courses completed, conferences and workshops attended, etc.
And the key is to mention the dates of these learning events, to show that they are a regular commitment.
9. Softer Skills
Finally, and especially for the Senior Data Scientist roles, I’m looking for evidence of the softer skills such as stakeholder management, influencing senior managers, presentations to business/non-technical audiences.
Here, again formal training courses and mentoring/coaching received is good evidence to share.
The best data scientist CVs will also highlight any problematic stakeholders and/or collaborators that the candidate had to deal with and manage to deliver the projects they mention in the work experience section.
Here the key is also to mention how they dealt with the problematic stakeholder (i.e. the approach they took).
In summary, an ideal data scientist’s CV will contain all of the above information presented in a transparent, concise manner, demonstrating the candidate’s excellent written communication skills as well.
Candidates should not be afraid of having a CV that is longer than a page or two.
I would be happy to read even up to 5 pages of useful and relevant details, especially if it allows me to move faster in the interview process and get the candidate in straight away for a face-to-face interview and offer them a job quickly.
However, even when you don’t quite fulfil all of the criteria you should still apply for a job.
Especially for permanent roles, hiring managers like me would welcome candidates who have one or two development areas to work on.
I want people who can join my team and contribute, while at the same time continuing to learn and grow themselves.
Continuous development is very important to me, as mentioned before, and is something I am always encouraging my Data Science team at Royal Mail to do, and is something I will always ensure they have the time and space to do.
Steve Kilpatrick – Co-Founder & Director at Logikk, outlines the hard and soft skills required to be a great data scientist.
Andrew Jones – Data Scientist & Director Analytics-link, gives his insight from an interviewers perspective after conducting 100 data science interviews at Amazon.