My Data Science & Machine Learning, Beginner’s Learning Path

“Where do I start so I can become a data scientist?” This is my imperfect answer. It’s imperfect for a few reasons:

  • It’s missing R, Java, C, C++, C#, and Scala for machine learning. Each is a viable learning path to replace this Python centric outline.
  • The math is very light. I’m assuming you’ve already come in with a strong engineering or educational background.
  • I’ve completely overstepped cloud architectures like AWS, Google Cloud, & Azure. That’s a learning path of its own.
  • Computer science patterns, architecture, and best practices are largely ignored. They are critical to you becoming employable and another learning path of its own.
  • Finally, you also need to understand the business strategy and use cases behind data science. I’m building a series now to cover this topic.

I became a data scientist over the course of a 20+ year career so you won’t learn this in 3 – 6 months. After going through all this material, you’ll be ready to learn more…a lot more. This a jumping off point. The end of this document is the beginning of your journey into data science. You’ll understand the field well enough to start independent and guided learning about the areas your find promising or interesting.

You’ll be tempted to start building and I very much encourage you to do just that. Start building. Keep learning. Never stop doing either of those.

Platforms:

What Is Hadoop? Hadoop Tutorial For Beginners

https://youtu.be/n3qnsVFNEIU

What is Apache Spark? The big data analytics platform explained

http://www.techworld.com.au/article/629920/what-apache-spark-big-data-analytics-platform-explained/

Apache Spark Tutorial: ML with PySpark

https://www.datacamp.com/community/tutorials/apache-spark-tutorial-machine-learning

A Beginner’s Guide To Apache Pig

https://hortonworks.com/tutorial/beginners-guide-to-apache-pig/

Realtime Event Processing in Hadoop with NiFi, Kafka and Storm

https://hortonworks.com/tutorial/realtime-event-processing-in-hadoop-with-nifi-kafka-and-storm/

Math:

A Deep Dive Into Linear Algebra

https://www.khanacademy.org/math/linear-algebra

An Introduction to Combinatorics & Graph Theory

https://www.whitman.edu/mathematics/cgt_online/cgt.pdf

Tools & Framework:

TensorFlow Tutorial – Deep Learning Using TensorFlow

https://youtu.be/yX8KuPZCAMo

A 6-part introduction to the MXNet API

https://becominghuman.ai/an-introduction-to-the-mxnet-api-part-1-848febdcf8ab

Keras Tutorial: The Ultimate Beginner’s Guide to Deep Learning in Python

https://elitedatascience.com/keras-tutorial-deep-learning-in-python

Data Visualization:

Building Python Data Apps with Blaze and Bokeh

https://youtu.be/1gD9LMqREDs

Matplotlib Tutorial: Python Plotting

https://www.datacamp.com/community/tutorials/matplotlib-tutorial-python

Python Bokeh Tutorial – Creating Interactive Web Visualizations

https://youtu.be/Mz1AXUE0nR4

Concepts:

Simple Linear Regression

https://onlinecourses.science.psu.edu/stat501/node/250

Simple and Multiple Linear Regression in Python

https://medium.com/towards-data-science/simple-and-multiple-linear-regression-in-python-c928425168f9

Linear Regression in R

https://www.tutorialspoint.com/r/r_linear_regression.htm

An Introduction To Logistic Regression

http://ufldl.stanford.edu/tutorial/supervised/LogisticRegression/

Building A Logistic Regression in Python, Step by Step by Susan Li

https://medium.com/towards-data-science/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8

Supervised and Unsupervised Machine Learning Algorithms

https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/

6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R)

https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/

A Tutorial on Support Vector Machines for Pattern Recognition

http://www.cs.northwestern.edu/~pardo/courses/eecs349/readings/support_vector_machines4.pdf

A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)

https://www.analyticsvidhya.com/blog/2016/04/complete-tutorial-tree-based-modeling-scratch-in-python/

Python:

A Complete Tutorial to Learn Data Science with Python from Scratch

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/

NumPy Tutorial: Data analysis with Python

https://www.dataquest.io/blog/numpy-tutorial-python/

Scipy Tutorial: Vectors and Arrays (Linear Algebra)

https://www.datacamp.com/community/tutorials/python-scipy-tutorial

Python Pandas Tutorial

https://www.tutorialspoint.com/python_pandas/

Machine Learning with scikit learn Part 1 & 2

https://youtu.be/2kT6QOVSgSghttps://youtu.be/WLYzSas511I

CS:

A Thorough Overview of Computational Logic

https://www.cs.utexas.edu/users/boyer/acl.pdf

Game Theory:

Game Theory – A 3 Part Introduction

https://youtu.be/x8gOi7D6QeQ

Statistics:

Correlation & causality

https://www.khanacademy.org/math/probability/scatterplots-a1/creating-interpreting-scatterplots/v/correlation-and-causality

Analysis of variance (ANOVA)

https://www.khanacademy.org/math/statistics-probability/analysis-of-variance-anova-library

Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics

https://shar.es/1PANrc

Characteristics of Good Sample Surveys and Comparative Studies

https://onlinecourses.science.psu.edu/stat100/node/3

Descriptive and Inferential Statistics

https://www.thoughtco.com/differences-in-descriptive-and-inferential-statistics-3126224

Intro to Probability Theory

https://youtu.be/f9XFM8YLccg

Introduction to Conditional Probability & Bayes theorem for data science

https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/

Central limit theorem

https://www.khanacademy.org/math/statistics-probability/sampling-distributions-library/sample-means/v/central-limit-theorem

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *