scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • TensorFlow Eager basics
  • Gradients in TensorFlow Eager
  • A neural network with TensorFlow Eager
  1. coding
  2. TensorFlow for AI, ML, and DL

TensorFlow Eager

PreviousXceptionNextGitHub

Last updated 6 years ago

TensorFlow is a great deep learning framework. In fact, it is still the reigning monarch within the deep learning framework kingdom. However, it has some frustrating limitations. One of these is the difficulties that arise during debugging. In TensorFlow, it’s difficult to diagnose what is happening in your model. This is due to its static graphstructure (for details, see ) – in TensorFlow the developer has to first create the full set of graph operations, and only then are these operations compiled with a TensorFlow session object and fed data. Wouldn’t it be great if you could define operations, then immediately run data through them to observe what the output was? Or wouldn’t it be great to set standard Python debug breakpoints within your code, so you can step into your deep learning training loops wherever and whenever you like and examine the tensors and arrays in your models? This is now possible using the TensorFlow Eager API, available in the latest version of TensorFlow.

The TensorFlow Eager API allows you to dynamically create your model in an imperative programming framework. In other words, you can create tensors, operations and other TensorFlow objects by typing the command into Python, and run them straight way without the need to set up the usual session infrastructure. This is useful for debugging, as mentioned above, but also it allows dynamic adjustments of deep learning models as training progresses. In fact, in natural language processing, the ability to create dynamic graphs is useful, given that sentences and other utterances in natural language have varying lengths. In this TensorFlow Eager tutorial, I’ll show you the basics of the new API and also show how you can use it to create a fully fledged convolutional neural network.

Recommended video course – If you’d like to learn more about TensorFlow, and you’re more of a video learner, check out this cheap online course:

TensorFlow Eager basics

The first thing you need to do to use TensorFlow Eager is to enable Eager execution. To do so, you can run the following (note, you can type this directly into your Python interpreter):

import tensorflow as tf
tf.enable_eager_execution()

Now you can define TensorFlow operations and run them on the fly. In the code below, a numpy range from 0 to 9 is multiplied by a scalar value of 10, using the TensorFlow multiply operation:

# simple example
z = tf.constant(np.arange(10))
z_tf = tf.multiply(z, np.array(10))
print(z_tf)

This code snippet will output the following:

tf.Tensor([ 0 10 20 30 40 50 60 70 80 90], shape=(10,), dtype=int32)

Notice we can immediately access the results of the operation. If we ran the above without running the tf.enable_eager_execution() command, we would instead see the definition of the TensorFlow operation i.e.:

Tensor(“Mul:0”, shape=(10,), dtype=int32)

Notice also how easily TensorFlow Eager interacts with the numpy framework. So far, so good. Now, the main component of any deep learning API is how gradients are handled – this will be addressed in the next section.

Gradients in TensorFlow Eager

import tensorflow.contrib.eager as tfe
def f_cubed(x):
    return x**3
grad = tfe.gradients_function(f_cubed)
grad(3.)[0].numpy()

Notice the use of tfe.gradients_function(f_cubed) – when called, this operation will return the gradient of df/dx for the x value. The code above returns the value 27 – this makes sense as the derivative of x3x3 is 3x2=3∗32=273x2=3∗32=27. The final line shows the grad operation, and then the conversion of the output to a numpy scalar i.e. a float value.

We can show the use of this gradients_function in a more complicated example – polynomial line fitting. In this example, we will use TensorFlow Eager to discover the weights of a noisy 3rd order polynomial. This is what the line looks like:

x = np.arange(0, 5, 0.1)
y = x**3 - 4*x**2 - 2*x + 2
y_noise = y + np.random.normal(0, 1.5, size=(len(x),))
plt.close("all")
plt.plot(x, y)
plt.scatter(x, y_noise)

A noisy polynomial to fit

As can be observed from the code, the polynomial is expressed as x3–4x2–2x+2x3–4x2–2x+2 with some random noise added. Therefore, we want our code to find a “weight” vector of approximately [1, -4, -2, 2]. First, let’s define a few functions:

def get_batch(x, y, batch_size=20):
    idxs = np.random.randint(0, len(x), (batch_size))
    return x[idxs], y[idxs]

class PolyModel(object):
    def __init__(self):
        self.w = tfe.Variable(tf.random_normal([4]))
        
    def f(self, x):
        return self.w[0] * x ** 3 + self.w[1] * x ** 2 + self.w[2] * x + self.w[3]

def loss(model, x, y):
    err = model.f(x) - y
    return tf.reduce_mean(tf.square(err))

The first function is a simple randomized batching function. The second is a class definition for our polynomial model. Upon initialization, we create a weight variable self.w and set to a TensorFlow Eager variable type, randomly initialized as a 4 length vector. Next, we define a function f which returns the weight vector by the third order polynomial form. Finally, we have a loss function defined, which returns the mean squared error between the current model output and the noisy y vector.

To train the model, we can run the following:

model = PolyModel()
grad = tfe.implicit_gradients(loss)
optimizer = tf.train.AdamOptimizer()
iters = 20000
for i in range(iters):
    x_batch, y_batch = get_batch(x, y)
    optimizer.apply_gradients(grad(model, x_batch, y_batch))
    if i % 1000 == 0:
        print("Iteration {}, loss: {}".format(i+1, loss(model, x_batch, y_batch).numpy()))

First, we create a model and then use a TensorFlow Eager function called implicit_gradients. This function will detect any upstream or parent gradients involved in calculating the loss, which is handy. We are using a standard Adam optimizer for this task. Finally a loop begins, which supplies the batch data and the model to the gradient function. Then the program applies the returned gradients to the optimizer to perform the optimizing step.

After running this code, we get the following output graph:

plt.close("all")
plt.plot(x, y)
plt.plot(x, model.f(x).numpy())
plt.scatter(x, y_noise)

A noisy polynomial with a fitted function

The orange line is the fitted line, the blue is the “ground truth”. Not perfect, but not too bad.

Next, I’ll show you how to use TensorFlow Eager to create a proper neural network classifier trained on the MNIST dataset.

A neural network with TensorFlow Eager

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
def scale(x, min_val=0.0, max_val=255.0):
    x = tf.to_float(x)
    return tf.div(tf.subtract(x, min_val), tf.subtract(max_val, min_val))
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.map(lambda x, y: (scale(x), tf.one_hot(y, 10))).shuffle(10000).batch(30)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_ds = test_ds.map(lambda x, y: (scale(x), tf.one_hot(y, 10))).shuffle(10000).batch(30)

The next section of code creates the MNIST model itself, which will be trained. The best practice at the moment for TensorFlow Eager is to create a class definition for the model which inherits from the tf.keras.Model class. This is useful for a number of reasons, but the main one for our purposes is the ability to call on the model.variables property when determining Eager gradients, and this “gathers together” all the trainable variables within the model. The code looks like:

class MNISTModel(tf.keras.Model):
    def __init__(self, device='cpu:0'):
        super(MNISTModel, self).__init__()
        self.device = device
        self._input_shape = [-1, 28, 28, 1]
        self.conv1 = tf.layers.Conv2D(32, 5,
                                  padding='same',
                                  activation=tf.nn.relu)
        self.max_pool2d = tf.layers.MaxPooling2D((2, 2), (2, 2), padding='same')
        self.conv2 = tf.layers.Conv2D(64, 5,
                                      padding='same',
                                      activation=tf.nn.relu)
        self.fc1 = tf.layers.Dense(750, activation=tf.nn.relu)
        self.dropout = tf.layers.Dropout(0.5)
        self.fc2 = tf.layers.Dense(10)
    
    def call(self, x):
        x = tf.reshape(x, self._input_shape)
        x = self.max_pool2d(self.conv1(x))
        x = self.max_pool2d(self.conv2(x))
        x = tf.layers.flatten(x)
        x = self.dropout(self.fc1(x))
        return self.fc2(x)

In the model definition, we create layers to implement the following network structure:

  1. 32 channel, 5×5 convolutional layer with ReLU activation

  2. 2×2 max pooling, with (2,2) strides

  3. 64 channel 5×5 convolutional layer with ReLU activation

  4. Flattening

  5. Dense/Fully connected layer with 750 nodes, ReLU activation

  6. Dropout layer

  7. Dense/Fully connected layer with 10 nodes, no activation

The next function is the loss function for the optimization:

def loss_fn(model, x, y):
    return tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits_v2(
          logits=model(x), labels=y))

Note that this function calls the forward pass through the model (which is an instance of our MNISTModel) and calculates the “raw” output. This raw output, along with the labels, passes through to the TensorFlow function softmax_cross_entropy_with_logits_v2. This applies the softmax activation to the “raw” output from the model, then creates a cross entropy loss.

Next, I define an accuracy function below, to keep track of how the training is progressing regarding training set accuracy, and also to check test set accuracy:

def get_accuracy(model, x, y_true):
    logits = model(x)
    prediction = tf.argmax(logits, 1)
    equality = tf.equal(prediction, tf.argmax(y_true, 1))
    accuracy = tf.reduce_mean(tf.cast(equality, tf.float32))
    return accuracy

Finally, the full training code for the model is shown below:

model = MNISTModel()
optimizer = tf.train.AdamOptimizer()
epochs = 1000
for (batch, (images, labels)) in enumerate(train_ds):
    with tfe.GradientTape() as tape:
        loss = loss_fn(model, images, labels)
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
    if batch % 10 == 0:
        acc = get_accuracy(model, images, labels).numpy()
        print("Iteration {}, loss: {:.3f}, train accuracy: {:.2f}%".format(batch, loss_fn(model, images, labels).numpy(), acc*100))
    if batch > epochs:
        break

In the code above, we create the model along with an optimizer. The code then enters the training loop, by iterating over the training dataset train_ds. Then follows the definition of the gradients for the model. Here we are using the TensorFlow Eager object called GradientTape(). This is an efficient way of defining the gradients over all the variables involved in the forward pass. It will track all the operations during the forward pass and will efficiently “play back” these operations during back-propagation.

Using the Python with functionality, we can include the loss_fn function, and all associated upstream variables and operations, within the tape to be recorded. Then, to extract the gradients of the relevant model variables, we call tape.gradient. The first argument is the “target” for the calculation, i.e. the loss, and the second argument is the “source” i.e. all the model variables.

We then pass the gradients and the variables zipped together to the Adam optimizer for a training step. Every 10 iterations some results are printed and the training loop exits if the iterations number exceeds the maximum number of epochs.

Running this code for 1000 iterations will give you a loss < 0.05, and training set accuracy approaching 100%. The code below calculates the test set accuracy:

avg_acc = 0
test_epochs = 20
for (batch, (images, labels)) in enumerate(test_ds):
    avg_acc += get_accuracy(model, images, labels).numpy()
    if batch % 100 == 0 and batch != 0:
        print("Iteration:{}, Average test accuracy: {:.2f}%".format(batch, (avg_acc/batch)*100))
print("Final test accuracy: {:.2f}%".format(avg_acc/batch * 100))

You should be able to get a test set accuracy, using the code defined above, on the order of 98% or greater for the trained model.

In this post, I’ve shown you the basics of using the TensorFlow Eager API for imperative deep learning. I’ve also shown you how to use the autograd-like functionality to perform a polynomial line fitting task and build a convolutional neural network which achieves relatively high test set accuracy for the MNIST classification task. Hopefully you can now use this new TensorFlow paradigm to reduce development time and enhance debugging for your future TensorFlow projects. All the best!

Gradient calculation is necessary in neural networks during the back-propagation stage (if you’d like to know more, check out ). The gradient calculations in the TensorFlow Eager API work similarly to the package used in . To calculate the gradient of an operation using Eager, you can use the gradients_function() operation. The code below calculates the gradient for an x3x3 function:

TensorFlow Eager tutorial - noisy polynomial
TensorFlow Eager - noisy polynomial fit

In the code below, I’ll show you how to create a Convolutional Neural Network to classify MNIST images using TensorFlow Eager. If you’re not sure about Convolutional Neural Networks, you can check out . The first part of the code shows you how to extract the MNIST dataset:

In the case above, we are making use of the Keras datasets now available in TensorFlow (by the way, the Keras deep learning framework is now heavily embedded within TensorFlow – to learn more about ). The raw MNIST image dataset has values ranging from 0 to 255 which represent the grayscale values – these need to be scaled to between 0 and 1. The function below accomplishes this:

Next, in order to setup the Keras image dataset into a TensorFlow Dataset object, we use the following code. This code creates a scaled training and testing dataset. This dataset is also randomly shuffled and ready for batch extraction. It also applies the tf.one_hot function to the labels to convert the integer label to a one hot vector of length 10 (one for each hand-written digit). If you’re not familiar with the TensorFlow Dataset API, check out .

As stated above, if you’re not sure what these terms mean, see my . Note that the call method is a mandatory method for the tf.keras.Model superclass – it is where the forward pass through the model is defined.

👨‍💻
my neural networks tutorial
autograd
PyTorch
my tutorial here
Keras see my tutorial
my TensorFlow Dataset tutorial
Convolutional Neural Network tutorial
my TensorFlow tutorial
Complete Guide to TensorFlow for Deep Learning with Python