scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • đŸĒļmath
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • đŸ“Ŋī¸PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍đŸ’ģcoding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • Common Decision Tree Algorithms
  • Mean Squared Error (MSE)
  • Probably Approximately Correct (PAC)
  • Training Process
  • Entropy
  • Information Gain
  • Tree Pruning
  • http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
  1. career
  2. learn.co

Decision Trees

PreviousGradient DescentNextEnsemble Methods

Last updated 6 years ago

Common Decision Tree Algorithms

  1. Gini Index

  2. Chi-Square

  3. Inforformation Gain

  4. Reduction in Variance

Mean Squared Error (MSE)

When performing regression with CART trees (i.e. the target values are continuous) we can evaluate a split using its MSE. The MSE of node mmm is computed as follows:

\begin{equation} \hat{y}_m = \frac{1}{n_{m}} \sum_{i \in D_m} y_i \end{equation}\

\begin{equation} MSE_m = \frac{1}{n_{m}} \sum_{i \in D_m} (y_i - \hat{y}_m)^2 \end{equation}

  • DmD_mDm​: training examples in node mmm

  • nmn_mnm​ : total number of training examples in node mmm

  • yiy_iyi​: target value of i−i-i−th example

from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score

y_pred = regressor.predict(X_test)

print('MSE score: {}'.format(mse(y_test, y_pred)))
print('R^2 score: {}'.format(r2_score(y_test, y_pred)))

Probably Approximately Correct (PAC)

The PAC is a learning models which is characterized by learning from examples.

"Probably: " If learner outputs the data with desired probability and confidence, the results are probably correct

"Approximately: ." A hypothesis is approximately correct if its error over the distribution of inputs is bounded by some predefined interval.

Training Process

The process of training a decision tree and predicting the target features of query instances is as follows:

  1. Present a dataset of training examples containing features/predictors and a target. (similar to classifiers we have seen earlier)

  2. Train the tree model by making splits for the target using the values of predictors. The predictor to use gets selected following the idea of feature selection and uses measures like "information gain" and "gini index" etc. We shall cover these shortly.

  3. Tree is grown untill some stopping criteria is achieved. This could be a set depth of the tree or any other similar measure.

  4. Show a new set of features to the tree, with an unknown class and let the example propagate through a trained tree. Resulting leaf node represents the class predictions this data.

Entropy

The idea of entropy is usually derived from Shannon's description of entropy to measure the information gain against some cost incurred in the process.

import pandas as pd
import scipy as sc

# Input a pandas series 
def ent(data):
     # calculates the probabilities
     p_data= data.value_counts()/len(data)
     # input probabilities to get the entropy
     entropy=sc.stats.entropy(p_data) 
     
     return entropy

Entropy is a measure of Disorder or Uncertainty.

The entropy of a variable is the "amount of information" contained in the variable.

In terms of data, we can informally describe entropy as:

Entropy is an indicator of how messy your data is.

A high entropy always reflects "messed-up" data with low/no information content. The uncertainty about content of the data, before viewing the data remains same (or almost same) as that before the data was available.

Information Gain

Information gain is an impurity/uncertainty based criterion that uses the entropy as the impurity measure.

Tree Pruning

Now that we know how to grow a decision tree using python and scikit learn, let's move on and practice with the idea of Optimization of the classifer. This means that we have some options available with decision trees that we can tweak before the actual learning takes place.

A decision tree , grown beyond a certain level of complexity leads to over-fitting. If we grow our tree and carry on using poor predictors which dont have any impact on the accuracy will eventually a) slow down the learning, and b) cause overfitting.

This process of trimming decision trees to optimize the learning process is called "Tree Pruning".

We can prune our trees using the following parameters:

  • Maximum Depth

Reduce the depth of the tree to build a generalized tree. Set the depth of the tree to 3, 5, 10 depending after verification on test data

  • Minimum Samples Leaf with Split

Restrict the size of sample leaf

  • Minimum Leaf Sample Size

Size in terminal nodes can be fixed to 30, 100, 300 or 5% of total

  • Maximum Leaf Nodes

Reduce the number of leaf nodes

  • Maximum Features

Maximum number of features to consider when splitting a node

CART training algorithm

In this lab we will focus on the CART algorithm (Classification and Regression Trees) for regression.

The CART algorithm builds a binary tree in which every non-leaf node has exactly two children (corresponding to a yes/no answer).

So at each step, the algorithm selects the parameters $\theta$ that minimize the following cost function:

\begin{equation} J(D, \theta) = \frac{n{left}}{n{total}} MSE{left} + \frac{n{right}}{n{total}} MSE{right} \end{equation}

In its simplest form, consider $f$ as the target function to be learnt, the learner (the computational model) is provided with some random examples in the form of (X,f(X))(X,f(X))(X,f(X)).

Here, XXX is a binary input instance of length nnn, and f(X)f(X)f(X) is the value (boolean TRUE or FALSE) of the target function at that instance.

Based on these examples, the learner must succeed in deducing the target function fff which is expressed as:

f:[0,1]n→0,1f : [0, 1]^n → {0, 1}f:[0,1]n→0,1.

Given a set of training examples and their labels, the algorithm repeatedly splits the training examples DDD into two subsets Dleft,DrightD_{left}, D_{right}Dleft​,Dright​ using some feature set fff and feature threshold tft_ftf​ such that samples with the same label are grouped together.

At each node, the algorithm selects the split θ=(f,tf)\theta = (f, t_f)θ=(f,tf​) that produces the smallest mean squared error (MSE) (alternatively, we could use the mean absolute error).

DDD: remaining training examples

ntotaln_{total}ntotal​ : number of remaining training examples

θ=(f,tf)\theta = (f, t_f)θ=(f,tf​): feature and feature threshold

nleft/nrightn_{left}/n_ {right}nleft​/nright​: number of samples in the left/right subset

MSEleft/MSErightMSE_{left}/MSE_{right}MSEleft​/MSEright​: MSE of the left/right subset

This step is repeated recursively until the maximum allowable depth is reached or the current number of samples $n_{total}$ drops below some minimum number. The original equations can be found .

here
http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
LogoWhat is Entropy and why Information gain matter in Decision Trees?Medium