“Where do I start so I can become a data scientist?” This is my imperfect answer. It’s imperfect for a few reasons:
- It’s missing R, Java, C, C++, C#, and Scala for machine learning. Each is a viable learning path to replace this Python centric outline.
- The math is very light. I’m assuming you’ve already come in with a strong engineering or educational background.
- I’ve completely overstepped cloud architectures like AWS, Google Cloud, & Azure. That’s a learning path of its own.
- Computer science patterns, architecture, and best practices are largely ignored. They are critical to you becoming employable and another learning path of its own.
- Finally, you also need to understand the business strategy and use cases behind data science. I’m building a series now to cover this topic.
I became a data scientist over the course of a 20+ year career so you won’t learn this in 3 – 6 months. After going through all this material, you’ll be ready to learn more…a lot more. This a jumping off point. The end of this document is the beginning of your journey into data science. You’ll understand the field well enough to start independent and guided learning about the areas your find promising or interesting.
You’ll be tempted to start building and I very much encourage you to do just that. Start building. Keep learning. Never stop doing either of those.
What Is Hadoop? Hadoop Tutorial For Beginners
What is Apache Spark? The big data analytics platform explained
Apache Spark Tutorial: ML with PySpark
A Beginner’s Guide To Apache Pig
Realtime Event Processing in Hadoop with NiFi, Kafka and Storm
A Deep Dive Into Linear Algebra
An Introduction to Combinatorics & Graph Theory
Tools & Framework:
TensorFlow Tutorial – Deep Learning Using TensorFlow
A 6-part introduction to the MXNet API
Keras Tutorial: The Ultimate Beginner’s Guide to Deep Learning in Python
Building Python Data Apps with Blaze and Bokeh
Matplotlib Tutorial: Python Plotting
Python Bokeh Tutorial – Creating Interactive Web Visualizations
Simple Linear Regression
Simple and Multiple Linear Regression in Python
Linear Regression in R
An Introduction To Logistic Regression
Building A Logistic Regression in Python, Step by Step by Susan Li
Supervised and Unsupervised Machine Learning Algorithms
6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R)
A Tutorial on Support Vector Machines for Pattern Recognition
A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)
A Complete Tutorial to Learn Data Science with Python from Scratch
NumPy Tutorial: Data analysis with Python
Scipy Tutorial: Vectors and Arrays (Linear Algebra)
Python Pandas Tutorial
Machine Learning with scikit learn Part 1 & 2
A Thorough Overview of Computational Logic
Game Theory – A 3 Part Introduction
Correlation & causality
Analysis of variance (ANOVA)
Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics
Characteristics of Good Sample Surveys and Comparative Studies
Descriptive and Inferential Statistics
Intro to Probability Theory
Introduction to Conditional Probability & Bayes theorem for data science
Central limit theorem