TensorFlow Eager

TensorFlow is a great deep learning framework. In fact, it is still the reigning monarch within the deep learning framework kingdom. However, it has some frustrating limitations. One of these is the difficulties that arise during debugging. In TensorFlow, it’s difficult to diagnose what is happening in your model. This is due to its static graphstructure (for details, see my TensorFlow tutorial) – in TensorFlow the developer has to first create the full set of graph operations, and only then are these operations compiled with a TensorFlow session object and fed data. Wouldn’t it be great if you could define operations, then immediately run data through them to observe what the output was? Or wouldn’t it be great to set standard Python debug breakpoints within your code, so you can step into your deep learning training loops wherever and whenever you like and examine the tensors and arrays in your models? This is now possible using the TensorFlow Eager API, available in the latest version of TensorFlow.

The TensorFlow Eager API allows you to dynamically create your model in an imperative programming framework. In other words, you can create tensors, operations and other TensorFlow objects by typing the command into Python, and run them straight way without the need to set up the usual session infrastructure. This is useful for debugging, as mentioned above, but also it allows dynamic adjustments of deep learning models as training progresses. In fact, in natural language processing, the ability to create dynamic graphs is useful, given that sentences and other utterances in natural language have varying lengths. In this TensorFlow Eager tutorial, I’ll show you the basics of the new API and also show how you can use it to create a fully fledged convolutional neural network.

TensorFlow Eager basics

The first thing you need to do to use TensorFlow Eager is to enable Eager execution. To do so, you can run the following (note, you can type this directly into your Python interpreter):

import tensorflow as tf
tf.enable_eager_execution()

Now you can define TensorFlow operations and run them on the fly. In the code below, a numpy range from 0 to 9 is multiplied by a scalar value of 10, using the TensorFlow multiply operation:

# simple example
z = tf.constant(np.arange(10))
z_tf = tf.multiply(z, np.array(10))
print(z_tf)

This code snippet will output the following:

tf.Tensor([ 0 10 20 30 40 50 60 70 80 90], shape=(10,), dtype=int32)

Notice we can immediately access the results of the operation. If we ran the above without running the tf.enable_eager_execution() command, we would instead see the definition of the TensorFlow operation i.e.:

Tensor(“Mul:0”, shape=(10,), dtype=int32)

Notice also how easily TensorFlow Eager interacts with the numpy framework. So far, so good. Now, the main component of any deep learning API is how gradients are handled – this will be addressed in the next section.

Gradients in TensorFlow Eager

Gradient calculation is necessary in neural networks during the back-propagation stage (if you’d like to know more, check out my neural networks tutorial). The gradient calculations in the TensorFlow Eager API work similarly to the autograd package used in PyTorch. To calculate the gradient of an operation using Eager, you can use the gradients_function() operation. The code below calculates the gradient for an x3x3 function:

import tensorflow.contrib.eager as tfe
def f_cubed(x):
    return x**3
grad = tfe.gradients_function(f_cubed)
grad(3.)[0].numpy()

Notice the use of tfe.gradients_function(f_cubed) – when called, this operation will return the gradient of df/dx for the x value. The code above returns the value 27 – this makes sense as the derivative of x3x3 is 3x2=3∗32=273x2=3∗32=27. The final line shows the grad operation, and then the conversion of the output to a numpy scalar i.e. a float value.

We can show the use of this gradients_function in a more complicated example – polynomial line fitting. In this example, we will use TensorFlow Eager to discover the weights of a noisy 3rd order polynomial. This is what the line looks like:

x = np.arange(0, 5, 0.1)
y = x**3 - 4*x**2 - 2*x + 2
y_noise = y + np.random.normal(0, 1.5, size=(len(x),))
plt.close("all")
plt.plot(x, y)
plt.scatter(x, y_noise)

A noisy polynomial to fit

As can be observed from the code, the polynomial is expressed as x3–4x2–2x+2x3–4x2–2x+2 with some random noise added. Therefore, we want our code to find a “weight” vector of approximately [1, -4, -2, 2]. First, let’s define a few functions:

def get_batch(x, y, batch_size=20):
    idxs = np.random.randint(0, len(x), (batch_size))
    return x[idxs], y[idxs]

class PolyModel(object):
    def __init__(self):
        self.w = tfe.Variable(tf.random_normal([4]))
        
    def f(self, x):
        return self.w[0] * x ** 3 + self.w[1] * x ** 2 + self.w[2] * x + self.w[3]

def loss(model, x, y):
    err = model.f(x) - y
    return tf.reduce_mean(tf.square(err))

The first function is a simple randomized batching function. The second is a class definition for our polynomial model. Upon initialization, we create a weight variable self.w and set to a TensorFlow Eager variable type, randomly initialized as a 4 length vector. Next, we define a function f which returns the weight vector by the third order polynomial form. Finally, we have a loss function defined, which returns the mean squared error between the current model output and the noisy y vector.

To train the model, we can run the following:

model = PolyModel()
grad = tfe.implicit_gradients(loss)
optimizer = tf.train.AdamOptimizer()
iters = 20000
for i in range(iters):
    x_batch, y_batch = get_batch(x, y)
    optimizer.apply_gradients(grad(model, x_batch, y_batch))
    if i % 1000 == 0:
        print("Iteration {}, loss: {}".format(i+1, loss(model, x_batch, y_batch).numpy()))

First, we create a model and then use a TensorFlow Eager function called implicit_gradients. This function will detect any upstream or parent gradients involved in calculating the loss, which is handy. We are using a standard Adam optimizer for this task. Finally a loop begins, which supplies the batch data and the model to the gradient function. Then the program applies the returned gradients to the optimizer to perform the optimizing step.

After running this code, we get the following output graph:

plt.close("all")
plt.plot(x, y)
plt.plot(x, model.f(x).numpy())
plt.scatter(x, y_noise)

A noisy polynomial with a fitted function

The orange line is the fitted line, the blue is the “ground truth”. Not perfect, but not too bad.

Next, I’ll show you how to use TensorFlow Eager to create a proper neural network classifier trained on the MNIST dataset.

A neural network with TensorFlow Eager

In the code below, I’ll show you how to create a Convolutional Neural Network to classify MNIST images using TensorFlow Eager. If you’re not sure about Convolutional Neural Networks, you can check out my tutorial here. The first part of the code shows you how to extract the MNIST dataset:

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In the case above, we are making use of the Keras datasets now available in TensorFlow (by the way, the Keras deep learning framework is now heavily embedded within TensorFlow – to learn more about Keras see my tutorial). The raw MNIST image dataset has values ranging from 0 to 255 which represent the grayscale values – these need to be scaled to between 0 and 1. The function below accomplishes this:

def scale(x, min_val=0.0, max_val=255.0):
    x = tf.to_float(x)
    return tf.div(tf.subtract(x, min_val), tf.subtract(max_val, min_val))

Next, in order to setup the Keras image dataset into a TensorFlow Dataset object, we use the following code. This code creates a scaled training and testing dataset. This dataset is also randomly shuffled and ready for batch extraction. It also applies the tf.one_hot function to the labels to convert the integer label to a one hot vector of length 10 (one for each hand-written digit). If you’re not familiar with the TensorFlow Dataset API, check out my TensorFlow Dataset tutorial.

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.map(lambda x, y: (scale(x), tf.one_hot(y, 10))).shuffle(10000).batch(30)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_ds = test_ds.map(lambda x, y: (scale(x), tf.one_hot(y, 10))).shuffle(10000).batch(30)

The next section of code creates the MNIST model itself, which will be trained. The best practice at the moment for TensorFlow Eager is to create a class definition for the model which inherits from the tf.keras.Model class. This is useful for a number of reasons, but the main one for our purposes is the ability to call on the model.variables property when determining Eager gradients, and this “gathers together” all the trainable variables within the model. The code looks like:

class MNISTModel(tf.keras.Model):
    def __init__(self, device='cpu:0'):
        super(MNISTModel, self).__init__()
        self.device = device
        self._input_shape = [-1, 28, 28, 1]
        self.conv1 = tf.layers.Conv2D(32, 5,
                                  padding='same',
                                  activation=tf.nn.relu)
        self.max_pool2d = tf.layers.MaxPooling2D((2, 2), (2, 2), padding='same')
        self.conv2 = tf.layers.Conv2D(64, 5,
                                      padding='same',
                                      activation=tf.nn.relu)
        self.fc1 = tf.layers.Dense(750, activation=tf.nn.relu)
        self.dropout = tf.layers.Dropout(0.5)
        self.fc2 = tf.layers.Dense(10)
    
    def call(self, x):
        x = tf.reshape(x, self._input_shape)
        x = self.max_pool2d(self.conv1(x))
        x = self.max_pool2d(self.conv2(x))
        x = tf.layers.flatten(x)
        x = self.dropout(self.fc1(x))
        return self.fc2(x)

In the model definition, we create layers to implement the following network structure:

  1. 32 channel, 5×5 convolutional layer with ReLU activation

  2. 2×2 max pooling, with (2,2) strides

  3. 64 channel 5×5 convolutional layer with ReLU activation

  4. Flattening

  5. Dense/Fully connected layer with 750 nodes, ReLU activation

  6. Dropout layer

  7. Dense/Fully connected layer with 10 nodes, no activation

As stated above, if you’re not sure what these terms mean, see my Convolutional Neural Network tutorial. Note that the call method is a mandatory method for the tf.keras.Model superclass – it is where the forward pass through the model is defined.

The next function is the loss function for the optimization:

def loss_fn(model, x, y):
    return tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits_v2(
          logits=model(x), labels=y))

Note that this function calls the forward pass through the model (which is an instance of our MNISTModel) and calculates the “raw” output. This raw output, along with the labels, passes through to the TensorFlow function softmax_cross_entropy_with_logits_v2. This applies the softmax activation to the “raw” output from the model, then creates a cross entropy loss.

Next, I define an accuracy function below, to keep track of how the training is progressing regarding training set accuracy, and also to check test set accuracy:

def get_accuracy(model, x, y_true):
    logits = model(x)
    prediction = tf.argmax(logits, 1)
    equality = tf.equal(prediction, tf.argmax(y_true, 1))
    accuracy = tf.reduce_mean(tf.cast(equality, tf.float32))
    return accuracy

Finally, the full training code for the model is shown below:

model = MNISTModel()
optimizer = tf.train.AdamOptimizer()
epochs = 1000
for (batch, (images, labels)) in enumerate(train_ds):
    with tfe.GradientTape() as tape:
        loss = loss_fn(model, images, labels)
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
    if batch % 10 == 0:
        acc = get_accuracy(model, images, labels).numpy()
        print("Iteration {}, loss: {:.3f}, train accuracy: {:.2f}%".format(batch, loss_fn(model, images, labels).numpy(), acc*100))
    if batch > epochs:
        break

In the code above, we create the model along with an optimizer. The code then enters the training loop, by iterating over the training dataset train_ds. Then follows the definition of the gradients for the model. Here we are using the TensorFlow Eager object called GradientTape(). This is an efficient way of defining the gradients over all the variables involved in the forward pass. It will track all the operations during the forward pass and will efficiently “play back” these operations during back-propagation.

Using the Python with functionality, we can include the loss_fn function, and all associated upstream variables and operations, within the tape to be recorded. Then, to extract the gradients of the relevant model variables, we call tape.gradient. The first argument is the “target” for the calculation, i.e. the loss, and the second argument is the “source” i.e. all the model variables.

We then pass the gradients and the variables zipped together to the Adam optimizer for a training step. Every 10 iterations some results are printed and the training loop exits if the iterations number exceeds the maximum number of epochs.

Running this code for 1000 iterations will give you a loss < 0.05, and training set accuracy approaching 100%. The code below calculates the test set accuracy:

avg_acc = 0
test_epochs = 20
for (batch, (images, labels)) in enumerate(test_ds):
    avg_acc += get_accuracy(model, images, labels).numpy()
    if batch % 100 == 0 and batch != 0:
        print("Iteration:{}, Average test accuracy: {:.2f}%".format(batch, (avg_acc/batch)*100))
print("Final test accuracy: {:.2f}%".format(avg_acc/batch * 100))

You should be able to get a test set accuracy, using the code defined above, on the order of 98% or greater for the trained model.

In this post, I’ve shown you the basics of using the TensorFlow Eager API for imperative deep learning. I’ve also shown you how to use the autograd-like functionality to perform a polynomial line fitting task and build a convolutional neural network which achieves relatively high test set accuracy for the MNIST classification task. Hopefully you can now use this new TensorFlow paradigm to reduce development time and enhance debugging for your future TensorFlow projects. All the best!

Last updated