Deep Learning: Coursework 1

Neural Networks | report | Network | network | 代写Machine learning | Python代写 | 代做assignment – 这道题目是利用python进行的编程代写任务, 涵盖了Neural Networks/report/Network/network/Machine learning/Python等方面, 这是值得参考的assignment代写的题目

IMPORTANT

Please make sure you submission includes all results/plots/tables required for grading. We should not have to re-run your code.

assignment Description

The Data

Handwritten Digit Recognition Dataset (MNIST)

In this assignment we will be using the MNIST digit dataset (https://yann.lecun.com/exdb/mnist/).

The dataset contains images of hand-written digits ( ), and the corresponding labels.

The images have a resolution of pixels.

The MNIST Dataset in TensorFlow

You can use the tensorflow build-in functionality to download and import the dataset into Python (see Setup section below).

0 9

28 28

The Assignment

Objectives

Familiarise yourselves with Tensorflow and basic concepts we have covered in the course: like simple neural network models (fully connected models, convolutional networks) and backpropagation.

You will then train these models to classify hand written digits from the Mnist dataset.

Variable Initialization

Initialize the variables containing the parameters using Xavier initialization (http://proceedings.mlr.press/v9/glorot10a.html).

initializer = tf.contrib.layers.xavier_initializer()
my_variable = tf.Variable(initializer(shape))

Hyper-parameters

For each of these models you will be requested to run experiments with different hyper-parameters.

More specifically, you will be requested to try 3 sets of hyper-parameters per model, and report the resulting model accuracy.

Each combination of hyper-parameter will specify how to set each of the following:

num_epochs : Number of iterations through the training section of the dataset [ a positive integer ].
learning_rate : Learning rate used by the gradient descent optimizer [ a scalar between 0 and 1 ]

In all experiments use a batch_size of 100.

Loss function

All models, should be trained as to minimize the cross-entropy loss function:

where is the input to the softmax layer and denotes the -th entry of vector. And is a index for the dataset.

Note : Sum the loss across the elements of the batch with tf.reduce_sum().

Hint : read about TensorFlow’s tf.nn.softmax_cross_entropy_with_logits (https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits) function.

Optimization

Use stochastic gradient descent (SGD) for optimizing the loss function.

Hint : read about TensorFlow’s tf.train.GradientDescentOptimizer() (https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer).

loss = log p ( | , ) = log =

(

[ ]+log

(

exp(

i = 1

yi xi

i = 1

( )

exp( zi [ yi ])

^10 c = 1 exp( zi [ c ])

softmax output

i = 1

zi yi

c = 1

z

z ^10 z [ c ] c z i

{( xi , yi )} Ni = 1

Training and Evaluation

The tensorflow built-in functionality for downloading and importing the dataset into python returns a Datasets object.

This object will have three attributes:

train
validation
test

Use only the train data in order to optimize the model.

Use datasets.train.next_batch(100) in order to sample mini-batches of data.

Every 20000 training samples (i.e. every 200 updates to the model), interrupt training and measure the accuracy of the model, each time evaluate the accuracy of the model both on 20% of the train set and on the entire test set.

Reporting

For each model i, you will collect the learning curves associated to each combination of hyper- parameters.

Use the utility function plot_learning_curves to plot these learning curves,

and the utility function plot_summary_table to generate a summary table of results.

For each run collect the train and test curves in a tuple, together with the hyper-parameters.

experiments_task_i = [

((num_epochs_1, learning_rate_1), train_accuracy_1, test_accuracy_1),

((num_epochs_2, learning_rate_2), train_accuracy_2, test_accuracy_2),

((num_epochs_3, learning_rate_3), train_accuracy_3, test_accuracy_3)]

Hint

If you need some extra help, familiarizing yourselves with the dataset and the task of building models in TensorFlow, you can check the TF tutorial for MNIST (https://www.tensorflow.org/tutorials/mnist/beginners/).

The tutorial will walk you through the MNIST classification task step-by-step, building and optimizing a model in TensorFlow.

(Please do not copy the provided code, though. Walk through the tutorial, but write your own implementation).

Imports and utility functions (do not modify!)

In [0]:

# Import useful libraries. import tensorflow as tf import matplotlib.pyplot as plt from tensorflow.examples.tutorials.mnist import input_data import numpy as np

# Global variables. log_period_samples = 20000 batch_size = 100

# Import dataset with one-hot encoding of the class labels. def get_data(): return input_data.read_data_sets(“MNIST_data/”, one_hot= True )

_# Placeholders to feed train and test data into the graph.

Since batch dimension is ‘None’, we can reuse them both for train and eval._

def get_placeholders(): x = tf.placeholder(tf.float32, [ None , 784 ]) y_ = tf.placeholder(tf.float32, [ None , 10 ]) return x, y_

# Plot learning curves of experiments def plot_learning_curves(experiment_data): # Generate figure. fig, axes = plt.subplots( 3 , 3 , figsize=( 16 , 12 )) st = fig.suptitle( “Learning Curves for all Tasks and Hyper-parameter settings”, fontsize=”x-large”) # Plot all learning curves. for i, results in enumerate(experiment_data): for j, (setting, train_accuracy, test_accuracy) in enumerate(results): # Plot. xs = [x * log_period_samples for x in range( 1 , len(train_accuracy)+ 1 )] axes[j, i].plot(xs, train_accuracy, label=’train_accuracy’) axes[j, i].plot(xs, test_accuracy, label=’test_accuracy’) # Prettify individual plots. axes[j, i].ticklabel_format(style=’sci’, axis=’x’, scilimits=( 0 , 0 )) axes[j, i].set_xlabel(‘Number of samples processed’) axes[j, i].set_ylabel(‘Epochs: {} , Learning rate: {}. Accuracy’.format(*s etting)) axes[j, i].set_title(‘Task {} ‘.format(i + 1 )) axes[j, i].legend() # Prettify overall figure. plt.tight_layout() st.set_y(0.95) fig.subplots_adjust(top=0.91) plt.show()

# Generate summary table of results. def plot_summary_table(experiment_data): # Fill Data. cell_text = [] rows = [] columns = [‘Setting 1’, ‘Setting 2’, ‘Setting 3’] for i, results in enumerate(experiment_data): rows.append(‘Model {} ‘.format(i + 1 )) cell_text.append([]) for j, (setting, train_accuracy, test_accuracy) in enumerate(results): if test_accuracy != []:

cell_text[i].append(test_accuracy[- 1 ])
else :
print('Warning: Something went wrong! Missing testing/training data')
# Generate Table.
fig=plt.figure(frameon= False )
ax = plt.gca()
the_table = ax.table(
cellText=cell_text,
rowLabels=rows,
colLabels=columns,
loc='center')
the_table.scale( 1 , 4 )
# Prettify.
ax.patch.set_facecolor('None')
ax.xaxis.set_visible( False )
ax.yaxis.set_visible( False )

PART 1: TensorFlow + Simple NN models (30 pts)

Model 1 (5 pts)

Network

Train a neural network model consisting of 1 linear layer, followed by a softmax:

(input linear layer softmax class probabilities)

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs =5, learning_rate =0.
num_epochs =5, learning_rate =0.
num_epochs =15, learning_rate =0.

In [0]:

_# Store results of runs with different configurations in a list.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ experiments_task1 = [] settings = [( 5 , 0.0001), ( 5 , 0.001), ( 15 , 0.1)]

print(‘Training Model 1’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph()
x, y_ = get_placeholders()
mnist = get_data()
eval_mnist = get_data()

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################
pass

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################
pass
experiments_task1.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Model 2 (5 pts)

Network

1 hidden layer (32 units) with a ReLU non-linearity, followed by a softmax.

(input non-linear layer linear layer softmax class probabilities)

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs =15, learning_rate =0.
num_epochs =15, learning_rate =0.
num_epochs =15, learning_rate =0.

In [0]:

_# Store results of runs with different configurations in a list.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ experiments_task2 = [] settings = [( 15 , 0.0001), ( 15 , 0.005), ( 15 , 0.1)]

print(‘Training Model 2’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph() # reset the tensorflow graph
x, y_ = get_placeholders()
mnist = get_data() # use for training.
eval_mnist = get_data() # use for evaluation.

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################
pass

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################
pass

experiments_task2.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Model 3 (5 pts)

Network

2 hidden layers (32 units) each, with ReLU non-linearity, followed by a softmax.

(input non-linear layer non-linear layer linear layer softmax class probabilities)

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs =5, learning_rate =0.
num_epochs =40, learning_rate =0.
num_epochs =40, learning_rate =0.

In [0]:

_# Store results of runs with different configurations in a list.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ experiments_task3 = [] settings = [( 5 , 0.001), ( 40 , 0.001), ( 40 , 0.05)]

print(‘Training Model 3’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph() # reset the tensorflow graph
x, y_ = get_placeholders()
mnist = get_data() # use for training.
eval_mnist = get_data() # use for evaluation.

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################
pass

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################
pass

experiments_task3.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Results

In [0]:

plot_learning_curves([experiments_task1, experiments_task2, experiments_task3])

In [0]:

plot_summary_table([experiments_task1, experiments_task2, experiments_task3])

Questions

Q1.1 (3 pts): Indicate which of the previous experiments constitute an example

of over-fitting. Why is this happening?

Your answer here

Q1.2 (2 pts): Indicate which of the previous experiments constitute an example

of under-fitting. Why is this happening?

Your answer here

Q1.3 (2 pts): How would you prevent over-/under-fitting from happening?

Your answer here

Q1.4 (8 pts): Pick one model that is over-fitting and implement your proposed

fix. Train your model and report your new training/testing curves below.

Your answer here

PART 2: Backpropagation (35 pts)

Objectives

This part will be mirroring the first one, but this time you are not allowed to use any of the Tensorflow functionality for specifing nor optimizing your neural network models. You will now use your own implementations of different neural network models (labelled Model 1-3, and described in the corresponding sections of the Colab). This means for each of these models, and the layers they are composed of, you will need to implement:

Forward pass
Backward pass

Keep in mind, the purpose of this exercise is to implement and optimize your own Machine learning 人工智能”> Neural Networks architectures without the toolbox/library tailored to do so. This also means, in order to train and evaluate your models, you will need to implement your own optimization procedure. You are to use the same cross-entropy loss as before and your own implementation of SGD.

As before, you will train these models to classify hand written digits from the Mnist dataset.

Additional instructions

Do not use any other libraries than the ones provided in the imports cell. You should be able to do everything via numpy (especially for the convolutional layer, rely on the in-built matrix/tensor multiplication that numpy offers).

There are a few questions at the end of the colab. Before doing any coding, please take a look at Question 2.1 — this should help you with the implementations, especially the optimization part.

Hints

Remind yourselves of the chain rule and read through the lecture notes on back-propagation (computing the gradients by recursively applying the chain rule). This is a general procedure that applies to all model architectures you will have to code in the following steps. Thus, you are to implement an optimization procedure that generalizes and can be re-used to train all your models. Recall the only things that you will need for each layer are:

(i) the gradients of layer activations with respect to its input

(ii) the gradients with respect to its parameters, if any.

(See Question 2.1).

Also from the previous assignment, you should have a good idea of what to expect, both in terms of behavior and relative performance. (To keep everything comparable, we kept all the hyperparameters and reporting the same).

Model 1 (10 pts)

Network

Train a neural network model consisting of 1 linear layer, followed by a softmax:

(input linear layer softmax class probabilities)

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs =5, learning_rate =0.
num_epochs =5, learning_rate =0.
num_epochs =15, learning_rate =0.

In [0]:

_# Store results of runs with different configurations in a list.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ my_experiments_task1 = [] settings = [( 5 , 0.0001), ( 5 , 0.001), ( 15 , 0.1)]

print(‘Training Model 1’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph()
x, y_ = get_placeholders()
mnist = get_data()
eval_mnist = get_data()

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################
pass

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################
pass

my_experiments_task1.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Model 2 (5 pts)

Network

1 hidden layer (32 units) with a ReLU non-linearity, followed by a softmax.

(input non-linear layer linear layer softmax class probabilities)

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs =15, learning_rate =0.
num_epochs =15, learning_rate =0.
num_epochs =15, learning_rate =0.

In [0]:

_# Store results of runs with different configurations in a dictionary.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ my_experiments_task2 = [] settings = [( 15 , 0.0001), ( 15 , 0.005), ( 15 , 0.1)]

print(‘Training Model 2’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph() # reset the tensorflow graph
x, y_ = get_placeholders()
mnist = get_data() # use for training.
eval_mnist = get_data() # use for evaluation.

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################
pass

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################
pass

my_experiments_task2.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Model 3 (5 pts)

Network

2 hidden layers (32 units) each, with ReLU non-linearity, followed by a softmax.

(input non-linear layer non-linear layer linear layer softmax class probabilities)

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs =5, learning_rate =0.
num_epochs =40, learning_rate =0.
num_epochs =40, learning_rate =0.

In [0]:

_# Store results of runs with different configurations in a dictionary.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ my_experiments_task3 = [] settings = [( 5 , 0.001), ( 40 , 0.001), ( 40 , 0.05)]

print(‘Training Model 3’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph() # reset the tensorflow graph
x, y_ = get_placeholders()
mnist = get_data() # use for training.
eval_mnist = get_data() # use for evaluation.

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################
pass

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################
pass

my_experiments_task3.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Results

In [0]:

plot_learning_curves([my_experiments_task1, my_experiments_task2, my_experiments _task3])

In [0]:

plot_summary_table([my_experiments_task1, my_experiments_task2, my_experiments_t ask3])

Questions

Q2.1 (15 pts): Compute the following derivatives

Show all intermediate steps in the derivation (in markdown below). Provide the final results in vector/matrix/tensor form whenever appropiate.

a) [5 pts] Given the cross-entropy loss above, compute the derivative of the loss function with respect to the scores (the input to the softmax layer).

Your answer here

b) [10 pts] Consider the first model (M1: linear + softmax). Compute the derivative of the loss with respect to

the input

Your answer here
the parameters of the linear layer: weights and bias

Your answer here

z

=?

loss

z

x

=?

loss

x

W b

=?

loss

W

=?

loss

b

PART 3: Convolution Models (35 pts)

Model 4 (5 pts)

Model

3 layer convolutional model (2 convolutional layers followed by max pooling) + 1 non-linear layer (32 units), followed by softmax.

(input(28×28) conv(3x3x8) + maxpool(2×2) conv(3x3x8) + maxpool(2×2) flatten non-linear linear layer softmax class probabilities)

Use padding = 'SAME' for both the convolution and the max pooling layers.
Employ plain convolution (no stride) and for max pooling operations use 2x2 sliding windows, with no
overlapping pixels (note: this operation will down-sample the input image by 2x2).

Hyper-parameters

Train the model with three different hyper-parameter settings:

num_epochs=5, learning_rate=0.
num_epochs=10, learning_rate=0.
num_epochs=20, learning_rate=0.

In [0]:

_# Store results of runs with different configurations in a dictionary.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ experiments_task4 = [] settings = [( 5 , 0.01), ( 10 , 0.001), ( 20 , 0.001)]

print(‘Training Model 4’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph() # reset the tensorflow graph
x, y_ = get_placeholders()
x_image = tf.reshape(x, [- 1 , 28 , 28 , 1 ])
mnist = get_data() # use for training.
eval_mnist = get_data() # use for evaluation.

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################

experiments_task4.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Model 5 (10 pts): Separable Convolutions

Separable Convolutions

The idea behind separable convolutions is very simple. The premise is that if we consider a 2D/3D filter we would want to apply to an input tensor , we could produce a very similar effect by instead applying a series of simpler transformations (in our case convolutions). Doing this would typically end up in less computations and/or fewer parameters (which in a learn).

Example 1: An famous example of such a 2D filter is the Sobel kernel. begin{equation} begin{bmatrix} -1 & 0 & +1 -2 & 0 & +2 -1 & 0 & +1

end{bmatrix}

times [-1, 0,1] end{equation}

Thus we can see that this kernel can be expressed as the dot product of kernel and kernel. The above is a particular example of a spatial separable convolution, but the principle is more generally applicable. And a common model of ‘separating’ a kernel is essentially separating the normal convolution process in two parts: a depthwise convolution and a pointwise convolution. The depthwise convolution applies a different convolution kernel to every input channel. This will produce an output tensor with the same number of input channels. Then the pointwise convolution will take this intermediate result and apply a kernel to it. As the name suggests this will look individually at every point in the intermediate output. And we will apply as many of these as we need to produce the desired output channels.

Example 2 : For instance, if we consider a convolutional kernel with input channels and output channels. The depthwise convolution will be a kernel (a kernel for each input channel) and the pointwise kernel will be made of kernels.

References and Further Reading: Mobilenet (https://arxiv.org/pdf/1704.04861.pdf) Inception Models (https://arxiv.org/abs/1610.02357)

x

1

2

1 3 3 k 3 1 k 1 1 3

k 2

1 1 ninput _ channels

3 3 16 64

3 3 16 3 3

641 1 16

Model 5

3 layer convolutional model, similar to Model 4, but now with separable convolutions + 1 non-linear layer (32 units), followed by softmax.

(input(28×28) separable conv(3x3x8) separable conv(3x3x4) + maxpool(2×2) flatten non-linear linear layer softmax class probabilities)

Use padding = 'SAME' for both the convolution and the max pooling layers.
No stride. Use max pooling with 2x2 sliding windows, with no overlapping pixels (note: this operation
will down-sample the input image by 2x2).

In [0]:

_# Store results of runs with different configurations in a dictionary.

Use a tuple (num_epochs, learning_rate) as keys, and a tuple (training_accurac

y, testing_accuracy)_ experiments_task5 = [] settings = [( 5 , 0.01), ( 10 , 0.001), ( 20 , 0.001)]

print(‘Training Model 5’)

# Train Model 1 with the different hyper-parameter settings. for (num_epochs, learning_rate) in settings:

# Reset graph, recreate placeholders and dataset.
tf.reset_default_graph() # reset the tensorflow graph
x, y_ = get_placeholders()
x_image = tf.reshape(x, [- 1 , 28 , 28 , 1 ])
mnist = get_data() # use for training.
eval_mnist = get_data() # use for evaluation.

#####################################################
# Define model, loss, update and evaluation metric. #
#####################################################

# Train.
i, train_accuracy, test_accuracy = 0 , [], []
log_period_updates = int(log_period_samples / batch_size)
with tf.train.MonitoredSession() as sess:
while mnist.train.epochs_completed < num_epochs:

# Update.
i += 1
batch_xs, batch_ys = mnist.train.next_batch(batch_size)

#################
# Training step #
#################

# Periodically evaluate.
if i % log_period_updates == 0 :

#####################################
# Compute and store train accuracy. #
#####################################

#####################################
# Compute and store test accuracy. #
#####################################

experiments_task5.append(
((num_epochs, learning_rate), train_accuracy, test_accuracy))

Results

In [0]:

plot_learning_curves([experiments_task3, experiments_task4, experiments_task5])

In [0]:

plot_summary_table([experiments_task3, experiments_task4, experiments_task5])

Questions

Q3.1 (2 pts): Let’s revisit Example 2 above (Sec. Separable Convolutions). Given

an input image of :

a) What are the dimensions of the result of the depthwise convolution? How many computations were
performed in this step? 
b) What about the dimension of the output after the pointwise convolution? How many computations
were perfomend in this step? 
c) Compare this with applying a normal 2D convolution to the orginal image. What is the
dimensionality of the output? What about the number of computations?

Your answer here

28 28 16

3 3 64

Q3.1 (3 pts): Convolutions vs Separable Convolutions

Compare the performance of the two convolutional model vs previous models in Part 1.
Compare the number of parameters in Model 5 vs Model 4? Explicit computation is required here.
Under which conditions could it be advantageous to use separable convolutions instead of normal
convolutions?

Hint: Think in terms of storage, speed of training, speed of inference, representation power.

Your answer here

Q3.3 (7 pts): Equivalence between 2D convolutions and separable

convolutions.

Let’s revisit Example 1 above. Consider a 2D kernel of dimension and two 1D kernels: , a and , a kernel such that.

Prove that the above equality holds, , applying to an input signal is equivalent to applying
consecutively the 1D kernels and. In which order do these 1D kernels need to be applied for the
equivalence to hold?
Does there always exist such a decomposition? That is, for any 2D kernel , can one find and s.t.
? If so, provide a proof. If not, provide a counter-example.

Your answer here

k N M k 1 1 N

k 2 1 N k = k 1 Tk 2

k = k 1 Tk 2 k x

k 1 k 2

k k 1 k 2

k = k 1 Tk 2

Q3.4 (8 pts): Based potentially on insights from Q3.3. propose a different

separate conv. model that achieves similar performance as Model 4, but has

fewer parameters.

a) Report and justify your choice.

Your answer here

b) Implement, train your model and compare perfomance with Model 4 (setting 3). Note :This will likely require a hyperparameter search for the new setting.

Your answer below

In [0]: