This post contains some of my notes on GradientTape() API from TensorFlow. The notes are kind of random but I hope they are helpful with some useful examples.

A brief overview on how to use gradient tape in TF

First of all let’s enable eager execution


The tf.GradientTape is there for automatic differentiation. A dumb example

x = tf.constant(1)

with tf.GradientTape() as t: # You need this code to let t knows that you need to
             # keep track of x.
  y = 2*x

dy_dx = t.gradient(y,x)


You will get

tf.Tensor(2.0, shape=(), dtype=float32)

Since we have in the code

\[y = 2x\]

and therefore

\[\frac{dy}{dx} = 2\]

Note that TensorFlow operations executed within the context of a GradientTape are “recorded” so that the recorded computation can be later differentiated.
So you should put the t.gradient() outside the context.

Note that according to the documentation “By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called.” That means that you cannot call a second gradient() function with the code above. Let’s suppose you want to evaluate two gradients, then you need the following code

x = tf.constant(2.0)

with tf.GradientTape(persistent = True) as t: # You need this code to let t knows that you need to
                 # keep track of x.
  y = 2*x
  y1 = x*x*x

dy_dx = t.gradient(y,x)
dy1_dx = t.gradient(y1,x)


Note the persistent = True as option whithin GradientTape. In this way you can do multiple calls with t.gradient().

Example - MNIST

Fer example let’s get the MNIST dataset and use the GradientTape to develop our own training loop.

First things first. Let’s load the dataset

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

We can use, for example, Keras functional APIs to define the model we want to use

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

optimizer = keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy()

Then we can prepare the dataset (using

batch_size = 64
train_dataset =, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

And now we can do the actual loop

for epoch in range(3):
  print('Start of epoch %d' % (epoch,))
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    with tf.GradientTape() as tape:
      logits = model(x_batch_train)  # Logits for this minibatch
      loss_value = loss_fn(y_batch_train, logits)

    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s'
             % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

That should train pretty fast. Now the problem is that since we have not compiled the model we cannot simply evaluate the accuracy, or any metric we want. But is easy to solve. We can simply add two lines during the model definition

train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

then during the training (under the for loop over the batches) we need

train_acc_metric(y_batch_train, logits)

and then for each epoch we can print the metrics

train_acc = train_acc_metric.result()
print('Training acc over epoch: %s' % (float(train_acc),))

and for example to do a loop over the val loops one could do

for x_batch_val, y_batch_val in val_dataset:
    val_logits = model(x_batch_val)
    val_acc_metric(y_batch_val, val_logits)
  val_acc = val_acc_metric.result()
  print('Validation acc: %s' % (float(val_acc),))