My Data Science & Machine Learning, Beginner’s Learning Path

“Where do I start so I can become a data scientist?” This is my imperfect answer. It’s imperfect for a few reasons:

  • It’s missing R, Java, C, C++, C#, and Scala for machine learning. Each is a viable learning path to replace this Python centric outline.
  • The math is very light. I’m assuming you’ve already come in with a strong engineering or educational background.
  • I’ve completely overstepped cloud architectures like AWS, Google Cloud, & Azure. That’s a learning path of its own.
  • Computer science patterns, architecture, and best practices are largely ignored. They are critical to you becoming employable and another learning path of its own.
  • Finally, you also need to understand the business strategy and use cases behind data science. I’m building a series now to cover this topic.

I became a data scientist over the course of a 20+ year career so you won’t learn this in 3 – 6 months. After going through all this material, you’ll be ready to learn more…a lot more. This a jumping off point. The end of this document is the beginning of your journey into data science. You’ll understand the field well enough to start independent and guided learning about the areas your find promising or interesting.

You’ll be tempted to start building and I very much encourage you to do just that. Start building. Keep learning. Never stop doing either of those.


What Is Hadoop? Hadoop Tutorial For Beginners

What is Apache Spark? The big data analytics platform explained

Apache Spark Tutorial: ML with PySpark

A Beginner’s Guide To Apache Pig

Realtime Event Processing in Hadoop with NiFi, Kafka and Storm


A Deep Dive Into Linear Algebra

An Introduction to Combinatorics & Graph Theory

Tools & Framework:

TensorFlow Tutorial – Deep Learning Using TensorFlow

A 6-part introduction to the MXNet API

Keras Tutorial: The Ultimate Beginner’s Guide to Deep Learning in Python

Data Visualization:

Building Python Data Apps with Blaze and Bokeh

Matplotlib Tutorial: Python Plotting

Python Bokeh Tutorial – Creating Interactive Web Visualizations


Simple Linear Regression

Simple and Multiple Linear Regression in Python

Linear Regression in R

An Introduction To Logistic Regression

Building A Logistic Regression in Python, Step by Step by Susan Li

Supervised and Unsupervised Machine Learning Algorithms

6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R)

A Tutorial on Support Vector Machines for Pattern Recognition

A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)


A Complete Tutorial to Learn Data Science with Python from Scratch

NumPy Tutorial: Data analysis with Python

Scipy Tutorial: Vectors and Arrays (Linear Algebra)

Python Pandas Tutorial

Machine Learning with scikit learn Part 1 & 2


A Thorough Overview of Computational Logic

Game Theory:

Game Theory – A 3 Part Introduction


Correlation & causality

Analysis of variance (ANOVA)

Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics

Characteristics of Good Sample Surveys and Comparative Studies

Descriptive and Inferential Statistics

Intro to Probability Theory

Introduction to Conditional Probability & Bayes theorem for data science

Central limit theorem

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *