What does a data scientist do and how do aspiring data scientists show that they’re ready for the role on their resumes? #1 question I’ve been asked by far. Breaking into the field has become a competitive process. I’m updating a post I wrote over 2 years ago about what skills you need to break into the field.
We’re beyond skills now. Breaking into the field requires capabilities. Skills are building blocks like raw ingredients. Capabilities are soup. Getting hired is a matter of showing potential employers that you can use your skills to make a final product. That begs the question, what final products do employers want to see?
2 years later, I have a comprehensive dataset (thanks Pocket Recruiter!) which allows me to say exactly what kinds of soup data scientists are cooking. I’ve analyzed 18208 data scientist profiles. I’ve taken a deeper look at how 655 data scientists describe their role in detail. I’ll be sharing what the data says over the next several posts. What data scientists talk about doing, their capabilities, can help guide aspiring data scientists. These capabilities are what differentiate a resume that’s ignored from a resume a hiring manager will call back.
Data Scientist Core Skills Don’t Matter
Data science, machine learning, data mining, analytics, predictive/math/statistical modeling, statistics, data visualization and algorithms are all the obvious core skills or competencies. R and python are the core languages although java, MySQL/SQL, c++, matlab and SAS all show up as well. Tableau is the core visualization tool although d3.js is rising in importance. Hadoop (Spark less so) still shows up but it’s fading in importance as the machine learning engineer role becomes more common and better defined. I’ll talk about this in a future post about the new roles in the field.
Conventional wisdom says to stuff these terms so the resume parser will push it up to the top. However, once it gets into the hands of a hiring manager, the term stuffing is a detriment. They are looking for substance, not key terms.
Core skills should be on your resume but they’re not indicative of capabilities. Core skills are easily determined using basic frequency analysis of job descriptions. They’re a very high-level indication of basic competency that aren’t enough to land a job. To show capability requires a more detailed description of how those skills have been applied in the real world.
When I looked at resumes that get hired versus resumes that don’t, a key feature that stands out is how key terms are used. Just over 7% of hired data scientists use “understand business needs” in some variant when describing their job experience. 8.5% said they “participate in all phases of data…” (the activity varied between gathering, analysis, etc.). “Develop predictive model” 9%. “Perform data cleaning/cleansing” 10%. “Built predictive models”, “Extract real time data”, etc.
These aren’t skills. These are activities that demonstrate capabilities. Skills in motion.
Here’s how you differentiate yourself from the crowd. Everyone has a list of skills in their resume but not everyone has the right capabilities in their resume. It’s time to drop the skills list off to the side or down at the bottom or your resume. Hiring managers no longer read for skills. They want to know if you have a proven ability to apply those skills. Are your ready to contribute from day 1?
Data Scientist Capabilities Do
What’s a capability? Look at python first. What parts of python are data scientists using in their daily work? Numpy, scikit learn, pandas, and overwhelmingly, tensorflow. It’s not enough to list those as skills, aspiring data scientists need to show experience applying those to solve real world problems.
What methodologies are most relevant to showcase? Regression, clustering and Bayesian statistics, basic pattern recognition, still rule daily work. Natural language processing and computer vision are the two key areas of focus. Bioinformatics is an area worth noting because it is rising rapidly. If you want a prediction of what else will be an area of focus in the next year, look to cybersecurity.
Showing a capability on a resume means demonstrating that you can use TensorFlow to solve a NLP problem that businesses really face. Simple methodologies display the ability to contribute. A set of projects using simple regression or clustering to solve a business problem shows capability. That goes beyond skills to activities that describe what you can do. Hiring managers are looking for active language to help them understand you are ready to contribute versus ready to spout terms.
The most sought-after capabilities are demonstrated in a business environment over an academic setting. Finding projects to demonstrate your capabilities during an internship or junior level role are a key to setting yourself apart from the crowd.
What general activities are most important based on people hired into the field?
Preparing, talk about the preparation done on data and models.
Requiring, talk about what the requirements were from a project and technical standpoint.
Compare, talk about how you compared or evaluated approaches, models, results, etc.
Integrate, how did you integrate multiple approaches and how did the model integrate into an existing product, system, or process?
Implement or deploy, talk about pushing the model into an environment where it was usable.
Gather, talk about requirements to data gathering activities.
I can go on, but you get the point.
Capabilities are best described briefly. Term stuffing is dead so there’s no point to extended explanations. Most readers have a 6 to 25 second attention span. The best resumes I’ve read explain what you accomplished and how in 2 to 6 sentences. That goes for work projects or independent/academic work.
Describe your activities succinctly. Don’t talk about python, move down to the TensorFlow level. While machine vision and NLP are key capabilities now, this year it’s likely that these will migrate into the core skills category and need to be supported with capabilities as well.
That means moving away from describing a project with NLP and moving to the specific area of NLP the project was focused on. Something like question answering using Bert would be a better project description. Leave out the NLP as a descriptor and move it to the bottom with skills.
I accomplished… isn’t described by an accuracy. It’s an outcome. What did the model do in real life, not in simulations or contrived environments? If the only result of the project is an accuracy score, it’s not showing capability.
Outcomes Are Key
This was true two years ago and still is today. Hiring managers are most interested in what outcomes you can produce with your capabilities. That means skills are glossed over. Projects that are stuffed with key terms, but no results are ignored. Use your resume to teach a potential employer what outcomes you can drive, and you’ll stand out from the crowd.