In this assignment, you’ll be experimenting with RNN-based sequence models for sentence classification. During class discussions, and in the assigned readings, we’ve discussed various ways to model language in order to construct word representations. A key purpose of forming such representations is to improve the performance of “downstream” tasks such as question answering, entailment, named entity recognition, sentiment analysis, among others.
Here, we will focus on a particular task that relies on classifying a sentence by encoding it to some latent representation. In particular, sentiment analysis is a neat classification problem to explore.
In this homework, you will be implementing a sequence model to classify sentences according to their sentiment score. For this task, you’ll be using an RNN to encode a sequence of word embeddings and use that encoding to then classify the sentence.
At the core of your model is a recurrent neural network. An RNN is a type of neural network that’s designed to maintain state across a sequence of inputs. At its most basic, it might be a feedforward network that takes in a previous state \(h_{t-1}\) and an input \(x_t\) and tries to predict \(h_t\). In practice, RNN variants such as either the LSTM (Hochreiter & Schmidhuber, 1997) or the GRU (Cho et al., 2014) are sometimes more effective.
An RNN \(A\) unrolled over \(t\) timesteps. [Source: Understanding LSTMs].
Once an RNN has been applied to a sequence, the final hidden state is a vector which in theory has captured
information from an arbitrary number of previous time-steps. We can consider that hidden state to be an
encoding of the entire sequence. Our sentiment analysis task is a classification problem: given a sentence,
classify it among negative/neutral
, and positive
. Using the sentence encoding given by the RNN, a vanilla
feedforward neural network can then produce a vector of class scores for sentiment classification.
How to represent a sequence of words is an important choice when designing neural sequence models. For this assignment, you’ll be examining how different word embeddings affect your model’s performance. GloVe is a commonly-used vector representation of words. As discussed in class, a more recent approach is to learn “Deep Contextualized Word Representations” (Peters et al., 2018) using a bi-directional language model that’s pre-trained on a large corpus of text. Central to this paper is ELMo, which aims to provide a semi-supervised solution to downstream NLP tasks.
For this assignment, you will implement your sentiment classifier to operate on sequences of embeddings. You’ll examine how the choice of embedding affects model performance, by evaluating your model on GloVe, ELMo, and concatenated GloVe+ELMo embeddings. In addition, you’ll want to examine the effectiveness of randomly-initialised embeddings as a baseline against which to compare. You may find that the SST-2 classification task has certain quirks, and a naive baseline can be useful for sorting those out.
Your task:
SentimentNetwork
model as well as the modules for performing embedding lookups.For your writeup, include the following:
As before, copy the project stencil from /course/cs2952d/stencil/hw3/
. The data for this assignment can be found in
/course/cs2952d/data/hw3/
and includes train/dev/test splits from the SST-2 corpus as well as some pre-processed data
files to take some of the work out of loading the corpus into your model. You will mostly be working with sentiment.py
,
which serves as the starting point of your program. Run sentiment.py --help
to get a feel for how to use it.
You must complete anything marked as TODO
. To quickly find which sections must be implemented, run grep -n "TODO" *
.
Unlike Homework 1, this assignment will most likely require GPU access in order to complete in a reasonable amount of time.
If you have personal access to a GPU, feel free to use that to complete the assignment. To get started on a machine
with enough disk space, create a Python 3 virtual environment and run pip install -r requirements.txt
.
If you do not have access to a GPU and/or a machine with enough disk space to install a GPU environment, please email the
course staff as early as possible, and we can set you up with compute resources.
In this assignment you’ll be using PyTorch again, but the stencil will provide less guidance this time. For the most part, you should refer to the official PyTorch documentation. In addition, the PyTorch website provides an excellent introduction to using sequence models and LSTMs here.
Of particular note is Step 3 in SentimentNetwork.forward
, which asks you to call a function torch.nn.utils.rnn.pack_padded_sequence
.
This is necessary since our data consists of a bunch of sentences of varying lengths. If we had a batch size of 1 this wouldn’t be an issue,
but imagine how an RNN might be unrolled for an entire batch when each batch example is a different length? The solution to this is to pad each
sequence with an appropriate number of zeroes such that each sequence is the same length. However, a new problem appears: we don’t actually want
these zero-vectors to affect the RNN’s hidden state (and consequently, our loss function). The pack_padded_sequence
utility addresses this issue.
We’ve provided you with a walkthrough of padding sequences in PyTorch in packed_pytorch_demo.py
.
This article also gives a good overview of how to work with variable-sized batches.
Submit your code and writeup. In addition, include your best saved checkpoint for each embedding type. Please do not submit
every saved checkpoint or the data files. Run cs2952d_handin hw3
to submit every file within the current directory.