# My Data Science & Machine Learning, Beginner’s Learning Path

“Where do I start so I can become a data scientist?” This is my imperfect answer. It’s imperfect for a few reasons:

- It’s missing R, Java, C, C++, C#, and Scala for machine learning. Each is a viable learning path to replace this Python centric outline.
- The math is very light. I’m assuming you’ve already come in with a strong engineering or educational background.
- I’ve completely overstepped cloud architectures like AWS, Google Cloud, & Azure. That’s a learning path of its own.
- Computer science patterns, architecture, and best practices are largely ignored. They are critical to you becoming employable and another learning path of its own.
- Finally, you also need to understand the business strategy and use cases behind data science. I’m building a series now to cover this topic.

I became a data scientist over the course of a 20+ year career so you won’t learn this in 3 – 6 months. After going through all this material, you’ll be ready to learn more…a lot more. This a jumping off point. The end of this document is the beginning of your journey into data science. You’ll understand the field well enough to start independent and guided learning about the areas your find promising or interesting.

You’ll be tempted to start building and I very much encourage you to do just that. Start building. Keep learning. Never stop doing either of those.

**Platforms:**

What Is Hadoop? Hadoop Tutorial For Beginners

What is Apache Spark? The big data analytics platform explained

http://www.techworld.com.au/article/629920/what-apache-spark-big-data-analytics-platform-explained/

Apache Spark Tutorial: ML with PySpark

https://www.datacamp.com/community/tutorials/apache-spark-tutorial-machine-learning

A Beginner’s Guide To Apache Pig

https://hortonworks.com/tutorial/beginners-guide-to-apache-pig/

Realtime Event Processing in Hadoop with NiFi, Kafka and Storm

https://hortonworks.com/tutorial/realtime-event-processing-in-hadoop-with-nifi-kafka-and-storm/

**Math:**

A Deep Dive Into Linear Algebra

https://www.khanacademy.org/math/linear-algebra

An Introduction to Combinatorics & Graph Theory

https://www.whitman.edu/mathematics/cgt_online/cgt.pdf

**Tools & Framework:**

TensorFlow Tutorial – Deep Learning Using TensorFlow

A 6-part introduction to the MXNet API

https://becominghuman.ai/an-introduction-to-the-mxnet-api-part-1-848febdcf8ab

Keras Tutorial: The Ultimate Beginner’s Guide to Deep Learning in Python

https://elitedatascience.com/keras-tutorial-deep-learning-in-python

**Data Visualization:**

Building Python Data Apps with Blaze and Bokeh

Matplotlib Tutorial: Python Plotting

https://www.datacamp.com/community/tutorials/matplotlib-tutorial-python

Python Bokeh Tutorial – Creating Interactive Web Visualizations

**Concepts:**

Simple Linear Regression

https://onlinecourses.science.psu.edu/stat501/node/250

Simple and Multiple Linear Regression in Python

https://medium.com/towards-data-science/simple-and-multiple-linear-regression-in-python-c928425168f9

Linear Regression in R

https://www.tutorialspoint.com/r/r_linear_regression.htm

An Introduction To Logistic Regression

http://ufldl.stanford.edu/tutorial/supervised/LogisticRegression/

Building A Logistic Regression in Python, Step by Step by Susan Li

Supervised and Unsupervised Machine Learning Algorithms

https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/

6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R)

https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/

A Tutorial on Support Vector Machines for Pattern Recognition

http://www.cs.northwestern.edu/~pardo/courses/eecs349/readings/support_vector_machines4.pdf

A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)

**Python:**

A Complete Tutorial to Learn Data Science with Python from Scratch

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/

NumPy Tutorial: Data analysis with Python

https://www.dataquest.io/blog/numpy-tutorial-python/

Scipy Tutorial: Vectors and Arrays (Linear Algebra)

https://www.datacamp.com/community/tutorials/python-scipy-tutorial

Python Pandas Tutorial

https://www.tutorialspoint.com/python_pandas/

Machine Learning with scikit learn Part 1 & 2

https://youtu.be/2kT6QOVSgSghttps://youtu.be/WLYzSas511I

**CS:**

A Thorough Overview of Computational Logic

https://www.cs.utexas.edu/users/boyer/acl.pdf

**Game Theory:**

Game Theory – A 3 Part Introduction

**Statistics:**

Correlation & causality

Analysis of variance (ANOVA)

https://www.khanacademy.org/math/statistics-probability/analysis-of-variance-anova-library

Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics

Characteristics of Good Sample Surveys and Comparative Studies

https://onlinecourses.science.psu.edu/stat100/node/3

Descriptive and Inferential Statistics

https://www.thoughtco.com/differences-in-descriptive-and-inferential-statistics-3126224

Intro to Probability Theory

Introduction to Conditional Probability & Bayes theorem for data science

https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/

Central limit theorem

## Leave a Reply

Want to join the discussion?Feel free to contribute!