scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • The Internet is Your Oyster
  • The Motivation
  • An Academic Shortfall
  • The Open Source Data Science Curriculum
  • Intro to Data Science / UW Videos
  • Data Science / Harvard Videos & Course
  • Data Science with Open Source Tools Book $27
  • A Note About Direction
  • Ethics in Machine Intelligence
  • Math
  • Linear Algebra & Programming
  • Convex Optimization
  • Statistics
  • Differential Equations & Calculus
  • Problem Solving
  • Computing
  • Algorithms
  • Distributed Computing Paradigms
  • Databases
  • Data Mining
  • Data Design
  • Machine Learning
  • Probabilistic Modeling
  • Deep Learning (Neural Networks)
  • Social Network & Graph Analysis
  • Natural Language Processing
  • Data Analysis
  • Data Communication and Design
  • Data Science as a Profession
  • Capstone Project
  • Resources
  • Read
  • Watch & Listen
  • Learn
  1. coding

The Open-Source Data Science Masters

https://github.com/datasciencemasters/go#python-learning

PreviousAgile - User StoriesNextCoding Challenge Websites

Last updated 6 years ago

The open-source curriculum for learning Data Science. Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to making use of data.

The Internet is Your Oyster

With Coursera, ebooks, Stack Overflow, and GitHub -- all free and open -- how can you afford not to take advantage of an open source education?

The Motivation

We need more Data Scientists.

...by 2018 the United States will experience a shortage of 190,000 skilled data scientists, and 1.5 million managers and analysts capable of reaping actionable insights from the big data deluge.

-- 23 July 2013

There are little to no Data Scientists with 5 years experience, because the job simply did not exist.

-- David Hardtke "How To Hire A Data Scientist" 13 Nov 2012

An Academic Shortfall

Classic academic conduits aren't providing Data Scientists -- this talent gap will be closed differently.

Academic credentials are important but not necessary for high-quality data science. The core aptitudes – curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature – that distinguish the best data scientists are widely distributed throughout the population.

We’re likely to see more uncredentialed, inexperienced individuals try their hands at data science, bootstrapping their skills on the open-source ecosystem and using the diversity of modeling tools available. Just as data-science platforms and tools are proliferating through the magic of open source, big data’s data-scientist pool will as well.

And there’s yet another trend that will alleviate any talent gap: the democratization of data science. While I agree wholeheartedly with Raden’s statement that “the crème-de-la-crème of data scientists will fill roles in academia, technology vendors, Wall Street, research and government,” I think he’s understating the extent to which autodidacts – the self-taught, uncredentialed, data-passionate people – will come to play a significant role in many organizations’ data science initiatives.

The Open Source Data Science Curriculum

Start here.

  • Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.

  • Topics: Data wrangling, data management, exploratory data analysis to generate hypotheses and intuition, prediction based on statistical methods such as regression and classification, communication of results through visualization, stories, and summaries.

  • Topics: Visualizing Data, Estimation, Models from Scaling Arguments, Arguments from Probability Models, What you Really Need to Know about Classical Statistics, Data Mining, Clustering, PCA, Map/Reduce, Predictive Analytics

  • Example Code in: R, Python, Sage, C, Gnu Scientific Library

A Note About Direction

Ethics in Machine Intelligence

Human impact is a first-class concern when building machine intelligence technology. When we build products, we deduce patterns and then reinforce them in the world. Ethics in any Engineering concerns understanding the sociotechnological impact of the products and services we are bringing to bear in the human world -- and whether they are reinforcing a future we all want to live in.

Math

Linear Algebra & Programming

Convex Optimization

Statistics

Differential Equations & Calculus

Problem Solving

Computing

Algorithms

Distributed Computing Paradigms

Databases

Data Mining

Data Design

How does the real world get translated into data? How should one structure that data to make it understandable and usable? Extends beyond database design to usability of schemas and models.

Machine Learning

Foundational & Theoretical

Practical

Probabilistic Modeling

Deep Learning (Neural Networks)

Social Network & Graph Analysis

Natural Language Processing

Data Analysis

One of the "unteachable" skills of data science is an intuition for analysis. What constitutes valuable, achievable, and well-designed analysis is extremely dependent on context and ends at hand.

in Python

Data Communication and Design

Visualization

Data Visualization and Communication

Theoretical Design of Information

Applied Design of Information

Theoretical Courses / Design & Visualization

Practical Visualization Resources

Python (Learning)

Python (Libraries)

Data Structures & Analysis Packages

Machine Learning Packages

Networks Packages

Statistical Packages

Natural Language Processing & Understanding

Data APIs

Visualization Packages

iPython Data Science Notebooks

Datasets are now here

R resources are now here

Data Science as a Profession

Capstone Project

Resources

Read

Watch & Listen

Learn

-- James Kobielus, 17 Jan 2013

Intro to Data Science / UW

Data Science / Harvard &

Data Science with Open Source Tools

This is an introduction geared toward those with at least a minimum understanding of programming, and (perhaps obviously) an interest in the components of Data Science (like statistics and distributed computing). Out of personal preference and need for focus, I geared the original curriculum toward Python tools and resources. R resources can be found .

Linear Algebra

Linear Algebra / Levandosky

Linear Programming (Math 407)

The Manga Guide to Linear Algebra

An Intuitive Guide to Linear Algebra

A Programmer's Intuition for Matrix Multiplication

Vector Calculus: Understanding the Cross Product

Vector Calculus: Understanding the Dot Product

Convex Optimization / Boyd /

Stats in a Nutshell

Think Stats: Probability and Statistics for Programmers &

Think Bayes &

Differential Equations in Data Science

Problem-Solving Heuristics "How To Solve It"

Get your environment up and running with the

Algorithms Design & Analysis I

Algorithm Design, Kleinberg & Tardos

*See Intro to Data Science

Intro to Hadoop and MapReduce *includes select free excerpts of Hadoop: The Definitive Guide

Introduction to Databases

SQL School

SQL Tutorials

Mining Massive Data Sets / Stanford & &

Mining The Social Web

Introduction to Information Retrieval / Stanford &

OSDSM Specialization:

Machine Learning &

A Course in Machine Learning

The Elements of Statistical Learning / Stanford & &

Machine Learning

Programming Collective Intelligence

Machine Learning for Hackers

Intro to scikit-learn, SciPy2013

Probabilistic Programming and Bayesian Methods for Hackers

Probabilistic Graphical Models

Neural Networks

Neural Networks

Deep Learning for Natural Language Processing CS224d

Social and Economic Networks: Models and Analysis /

Social Network Analysis for Startups

From Languages to Information / Stanford CS147

NLP with Python (NLTK library) ,

How to Write a Spelling Correcter / Norvig (Tutorial)[]

Big Data Analysis with Twitter

Exploratory Data Analysis

Data Analysis in Python

Python for Data Analysis

An Example Data Science Process

The Truthful Art: Data, Charts, and Maps for Communication

Envisioning Information

The Visual Display of Quantitative Information

Information Dashboard Design: Displaying Data for At-a-Glance Monitoring

Data Visualization

Berkeley's Viz Class

Rice University's Data Viz class

D3 Library / Scott Murray

Interactive Data Visualization for the Web / Scott Murray &

OSDSM Specialization:

Learn Python the Hard Way &

Python

Think Python &

Installing Basic Packages &

for Scientific Python Packages

(data structure library)

More Libraries can be found in the repo & in related

Flexible and powerful data analysis / manipulation library with labeled data structures objects, statistical functions, etc & Tutorials

- Tools for Data Mining & Analysis

- Network Modeling & Viz

- Bayesian Inference & Markov Chain Monte Carlo sampling toolkit

- Python module that allows users to explore data, estimate statistical models, and perform statistical tests

- Multivariate Pattern Analysis in Python

- Natural Language Toolkit

- Python library for topic modeling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

- Python wrapper for the Twitter API

- well-integrated with analysis and data manipulation packages like numpy and pandas

- a high-level statistical visualization package built on top of matplotlib

(Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)

Doing Data Science: Straight Talk from the Frontline

The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists

Capstone Analysis of Your Own Design; 's Idea Compendium

Healthcare Twitter Analysis

Analyze your LinkedIn Network

- The "Hacker News" of Data Science

- The free encyclopedia

- Bestseller Pop Sci

- Search for a concept you want to learn

- Online university courses

- The smart number and info cruncher

- High quality, free learning videos

👨‍💻
McKinsey Report Highlights the Impending Data Scientist Shortage
Closing the Talent Gap
Videos
Videos
Course
Book $27
here
Index: Cultural Bias in Machine Intelligence
What are some good resources for learning about numerical analysis? / Quora
Khan Academy / Videos
Stanford / Book $10
University of Washington / Course
Book $19
Better Explained / Article
Better Explained / Article
Better Explained / Article
Better Explained / Article
Stanford / Lectures
Book
Book $29
Digital
Book $25
Digital
Book $25
Python Tutorial
Polya / Book $10
Data Science Toolbox
Stanford / Coursera
Book $125
UW / Lectures on MapReduce
Cloudera / Udacity Course
Book $29
Stanford / Online Course
Mode Analytics / Tutorials
SQLZOO / Tutorials
Coursera
Digital
Book $58
Book $30
Digital
Book $56
Tidy Data in Python
Web Scraping & Crawling
Ng Stanford / Coursera
Stanford CS 229
UMD / Digital Book
Digital
Book $80
Study Group
Caltech / Edx
Book $27
ipynb / digital book
youtube tutorials
Github / Tutorials
Stanford / Coursera
Andrej Karpathy / Python Walkthrough
U Toronto / Coursera
Stanford
Stanford / Coursera
Book $22
Materials
Digital
Book $36
http://norvig.com/spell-correct.html
UC Berkeley / Lectures
Tukey / Book $81
Tutorial
Book $24
ipynb
Cairo / Book $21
Tufte / Book $36
Tufte / Book $27
Stephen Few / Book $29
University of Washington / Slides & Resources
UC Berkeley / Course Docs
Rice University / Slides
Blog / Tutorials
Online Book
Book $26
Data Journalism
Digital
Book $23
Class / Google
Digital
Book $34
Python, virtualenv, NumPy, SciPy, matplotlib and IPython
Using Python Scientifically
Command Line Install Script
numpy Tutorial / Stanford CS231N
Pandas Cookbook
"awesome machine learning"
specializations
pandas
Python for Data Analysis / Book
scikit-learn
networkx
PyMC
Statsmodels
PyMVPA
NLTK
Gensim
twython
matplotlib
Seaborn
Data Science in IPython Notebooks
A Gallery of Interesting IPython Notebooks - Pandas for Data Analysis
O'Reilly / Book $25
Book $22
Quora
Coursolve & UW Data Science
Generate & Download Adjacency Matrix
DataTau
Wikipedia
The Signal and The Noise - Nate Silver $15
Zipfian Academy's List of Resources
A Software Engineer's Guide to Getting Started with Data Science
Data Scientist Interviews / Metamarkets
/r/MachineLearning
The Life of a Data Scientist / Josh Wills
The Talking Machines - Podcast about Machine Learning
What Data Science Is / Hilary Mason
Metacademy
Coursera
Wolfram Alpha
Khan Academy