This assignment is due at 11:59PM on Tuesday, October 23, 2018.

Semantic Parsing : Assignment 2

In this assignment, you’ll be writing your own semantic parser! The approach to semantic parsing in this assignment will follow a two-stage process: parse and interpret. You’ll be constructing useful structured representations of natural language (parsing) and executing them to accomplish a task (interpretation).

Part 1: Arithmetic expressions in natural language

In this part, you’ll focus on a restricted problem: parsing arithmetic expressions. These are sentences in the form of “two plus two”, “four times three”, “two plus two times four”. Each expression has an associated semantic representation and a resulting denotation. The choice of semantic representation is important, since we ultimately want something that’s easily executable. Our representation should be machine-readable and unambiguous. Natural language is usually neither of these things, but for arithmetic expressions we can construct a very straightforward semantic representation: binary expression trees.

Overview

Implement a function execute that takes an expression tree and returns an integer denotation. Refer to the stencil for implementation notes.
Write a parser for arithmetic expressions. Implement the CYK algorithm for parsing CFGs and construct semantic attachments. Your grammar should expect to see integers 1 through 4 and operators +, -, *, and ~ (negation).

Binary expression trees

Recall that a binary expression is usually defined recursively as (operator, left-expression, right-expression). Your first task will be to implement functionality equivalent to execute which operates on a binary expression tree and returns its denotation. Consider the following pseudocode example:

# "two plus two times four"
representation = (+, 2, (*, 2, 4))
denotation = execute(representation) # returns 10

Your implementation should support the following set of operations: addition, subtraction, multiplication, and negation. You should come up with a good way of representing binary expression trees in Python. Note: In the stencil, this involves implementing the eval() methods on Expression subclasses in expressions.py. Look at grammar.py for how these methods are invoked.

We’ve provided a small set of test cases that you can use to check your executor.

Constituency parsing using dynamic programming

Now that you have a good semantic representation, it’s time to write a parser that converts natural language arithmetic into a binary expression tree! The goal is to build a tree of constituents from the input, meaning a structured grouping of words and the phrases they belong to. The set of rules that tells us how to build this tree is called a grammar. Our grammar will be a context-free grammar, containing a collection of rules for producing subtrees.

Some background from NLP:

What exactly is a grammar? How can we represent rules?

Consider a grammar for describing arithmetic expressions. We could categorise phrases denoting a number as an $Expr, binary operator words (such as “plus”, “minus”) as $BinOp, and unary operator words (such as “minus”) as $UnOp. Our grammar may then contain rules such as $Expr $\rightarrow$ two, and $UnOp $\rightarrow$ minus. From Task 1, we can also infer a good rule for composing expressions might look like: $Expr $\rightarrow$ $Expr $BinOp $Expr. In a constituency tree, words are referred to as terminals and non-word categories as non-terminals.

Using these rules, we can parse an expression such as “two times four” as:

($Expr ($Expr two) ($BinOp times) ($Expr four))

Composing the above expression, an example parse for “two plus two times four” might then look like:

($Expr ($Expr two)
       ($BinOp plus)
       ($Expr ($Expr two)
              ($BinOp times)
              ($Expr four)
       )
)

Chart parsing

Chart parsing algorithms are a type of dynamic programming algorithm for applying grammars. In particular, the CYK algorithm is a type of chart parsing algorithm for CFGs.

Some more additional background:

CYK operates on grammars which are in Chomsky normal form (CNF). Thankfully one property of context-free grammars is that every CFG can be represented in CNF (and conversely, every CNF grammar is context-free).

The key requirement of Chomsky normal form is that every rule must be either a lexical rule or binary compositional rule.

Lexical rules must emit only one symbol, and that symbol must be a terminal ($Expr $\rightarrow$ two is unary).
Binary compositional rules must emit only two symbols, and both symbols must be non-terminals. ($Expr $\rightarrow$ $UnOp $Expr is binary).

Any rule in a CFG can be binarised by defining a new symbol (e.g. $EBO). If a subtree contains more than two children, we can collect all but one of those children under this new $EBO symbol. In this way, $Expr $\rightarrow$ $Expr $BinOp $Expr is binarised into two rules: $Expr $\rightarrow$ $EBO $Expr and $EBO $\rightarrow$ $Expr $BinOp.

A slightly less-restrictive binarized grammar allows for unary compositional rules as well. For this assignment, your grammar should be binarized but does not need to be strictly CNF. You should also allow for n-ary lexical rules, which can emit more than one terminal symbol.

Semantic attachments

Given a parse tree, you will need to construct semantic attachments in order to represent… semantics. These are additional representations attached to each node in the parse tree, and they serve as a bridge between a parse tree and the expression tree you completed as part of the first task. When constructing semantic attachments, there’s the option of either eagerly computing semantics during parsing or establishing the process as a separate step which occurs after parsing. The stencil code chooses the former.

When eagerly constructing semantic attachments during parsing, you will modify your representation of Rules to include a semantic component. For lexical rules, this should be the direct semantic representation. For compositional rules, these should be functions that specify how to construct the semantics of a parent given the semantics of the child.

For example, the rule $Expr $\rightarrow$ one has the semantic representation 1. Semantics for the (non-binarized) compositional rule $Expr $\rightarrow$ $Expr $BinOp $Expr would be some constructor on binary expressions, e.g. λ(lhs,op,rhs) -> BinOp(lhs,op,rhs). As an implementation note, you would want to use the BinOp data structure you defined in Task 1.

Separating the parse step from semantics is not conceptually very different. Instead of representing a rule using both syntactic and semantic components, construct a parse tree $P$ and map $P$ to its equivalent AST $A$. For each node, map the syntactic rule to semantic form.

Either way you construct your semantic attachments, you should end up with an instance of the semantic representation from Task 1 (the binary expression tree data structure). Calling execute on that tree will return the denotation of the input sentence!

Your second task is to write a parser for arithmetic expressions. Use the provided stencil code as a guide for implementing the data structures necessary to represent: grammar, rules, parse trees and expression trees.

Part 2: Geographic queries

In the second part of this assignment, you’ll be focusing on the domain of geography queries. In particular, you’ll be working with the Geo880 corpus - a collection of 880 queries about US geography.

Examples of some geographic queries:

"which states border texas?"
"how many states border the largest state?"
"what is the size of the capital of texas?"

More complicated queries look like:

"what rivers flow through states that border the state with the largest population?"
"what is the population of the capital of the largest state through which the mississippi runs?"
"what is the longest river that passes the states that border the state that borders the most states?"

These types of queries are a big step up in complexity compared to arithmetic. Not only is the vocabulary larger, but there is a greater degree of syntactic and lexical ambiguity, and semantics can be composed in arbitrarily complex ways. Geography queries are also interesting because there exist other real-world applications which could ideally be solved using a similar approach of resolving natural language to queries against a structured knowledge base (such as queries on sports, news, trivia, etc).

Of course, a drawback to the Geo880 dataset is that it doesn’t reflect natural language realistically. As you’ll see, getting a semantic parser to work on this corpus will not give you a semantic parser that works on geography questions asked in conversation. But maybe it’ll be able to answer what the longest river is that passes the states that border the state that borders the most states?

Geobase, Knowledge Bases, and a Semantic Representation

You’ll be using Geobase as a knowledge base of facts about US geography. It includes facts about states (capitals, area, population, cities, neighboring states, elevations), cities (state and population), rivers (length and which cities they traverse), mountains (state and height), roads (which states they traverse), and lakes (area and states traversed).

In geobase.py, you’ll find a class GeobaseReader which is responsible for reading and parsing the knowledge base file, and constructing tuples representing relationships in the knowledge graph. Additionally, it normalises units to SI (meters and square meters) and fixes some issues present in the original Prolog file. There are two types of tuples created by GeobaseReader: unary pairs and binary triples:

Unary pairs are in the form (predicate, entity). For example, ('city', '/city/providence_ri').
Binary triples consist of (predicate, entity1, entity2). For example, ('abbreviation', '/state/rhode_island', 'ri').

The key functionality provided by GeobaseReader is the ability to read the dataset into a set of tuples. In order to query the Geobase data, GraphKB implements a graph-structured knowledge base which can be queried using GraphKBExecutor. The language defined by GraphKBExecutor is then a straightforward choice of semantic representation for geographic queries.

The `GraphKBExecutor` Query Language

A GraphKBExecutor executes formal queries against a GraphKB and returns their denotations. Queries are represented by Python tuples, and can be nested. Denotations are also represented by Python tuples, but are conceptually sets (possibly empty). The elements of these tuples are always sorted in canonical order, so that they can be reliably compared for set equality.

The query language defined by GraphKBExecutor is perhaps most easily explained by example:

queries = [
    'bart',
    'male',
    ('has_sister', 'lisa'),  # who has sister lisa?
    ('lisa', 'has_sister'),  # lisa has sister who, i.e., who is a sister of lisa?
    ('lisa', 'has_brother'), # lisa has brother who, i.e., who is a brother of lisa?
    ('.and', 'male', 'child'),
    ('.or', 'male', 'adult'),
    ('.not', 'child'),
    ('.any',),               # anything
    ('.any', 'has_sister'),  # anything has sister who, i.e., who is a sister of anything?
    ('.and', 'child', ('.not', ('.any', 'has_sister'))),
    ('.count', ('bart', 'has_sister')),
    ('has_age', ('.gt', 21)),
    ('has_age', ('.lt', 2)),
    ('has_age', ('.eq', 10)),
    ('.max', 'has_age', 'female'),
    ('.min', 'has_age', ('bart', 'has_sister')),
    ('.max', 'has_age', '.any'),
    ('.argmax', 'has_age', 'female'),
    ('.argmin', 'has_age', ('bart', 'has_sister')),
    ('.argmax', 'has_age', '.any'),
]

For more details on GraphKBExecutor, look at graph_kb.py.

Grammar engineering

Your primary task for this part of this assignment is to develop a grammar for the geography domain. You will be graded on a performance metric known as oracle accuracy – the percentage of examples for which any parse is correct (your parser will produce multiple parses for a given example). This is the same metric you’re evaluated on in Part 1. The next step beyond oracle accuracy is denotation accuracy, which is a measure of your model’s ability to pick the right parse out of a list of candidates.

The design of your grammar will include the following major categories: $Entity, $Type, $Collection, $Relation, and $Optional. The stencil provides definitions for any compositional rules – along with their semantic attachments. You’ll need to add all other necessary rules to your grammar.

Grammar induction

In general, hand-writing production rules is not feasible. Instead, we want to try to learn the logical forms associated with our grammar automatically. In this section, you will attempt to do this using a very simple process in which you generate all possible language/grammar rule alignments, and then prune to the most frequent rules.

Use the provided lexicon.txt file in order to learn grammar rules from existing parses. The file is in the format:

sentence 1|parse of sentence 1|semantic representation
sentence 2|parse of sentence 2|semantic representation
...
sentence N|parse of sentence N|semantic representation

You will only need to generate representations for lexical rules (e.g. for rules like $Entity -> river but not for production rules like $Entity -> $Optional $Entity). We want to assume you know nothing about what words mean, so you need to assume all rules which are consistent with the training data are possible. We can then filter to just the most likely ones to try to clean up our lexicon (co-occurrence FTW!). Specifically, we will ask you to only retain the top 5 rules for each lexical item.

For example, consider the below sentence, parse, and semantics from the training data.

what is the height of mount mckinley ?
($ROOT,
	($Optionals,
		($Optional, what)
		($Optionals,
			($Optional, is)
			($Optionals,
				($Optional, the)
			)
		)
	)
	($ROOT\_$Optionals,
		($Query,
			($Collection,
				($Relation,
					($RevRelation,
						($RevHeightRelation, height)
					)
				)
				($Collection\_$Relation,
					($Optionals,
						($Optional, of)
					)
					($Collection,
							($Collection,
								($Type, mount)
							)
					($Collection,
						($Entity, mckinley)
					)
				)
			)
		)
	)
	($Optionals,
		($Optional, ?)
		)
	)
)
((.and, mountain, /mountain/mckinley), height)

Given this, you would produce the following rules (among many other rules). Note that most of these rules are false.

Rule('$RevHeightRelation', 'height', '.and')
Rule('$RevHeightRelation', 'height', 'height')
Rule('$RevHeightRelation', 'height', 'mountain')
Rule('$RevHeightRelation', 'height', '/mountain/mckinley')
Rule('$Entity', 'mckinley', '.and')
Rule('$Entity', 'mckinley', 'height')
Rule('$Entity', 'mckinley', 'mountain')
Rule('$Entity', 'mckinley', '/mountain/mckinley')

Part 2 builds heavily off of the work from Part 1. Make sure you’ve tested your implementation from Part 1 and have it working on arithmetic queries.

(Hint: You can quickly construct tuples of sentence, parse pairs, semantics by calling .split("|") on each line. Given a parse string, you’ll want to take a look at NLTK’s nltk.tree module - in particular Tree.productions() - to get a list of production rules. You should to parse the semantic representation in a similar manner.)

Once you have completed Part 2, answer the following questions:

What lexicon entries did you add for the word “highest”? What about for the word “florida”? Are these entries good? Why or why not, and how could you improve them?
Find an example of a parse for which you do not get the correct denotation. (Hint: you can add print statements in DenotationOracleAccuracyMetric to find these). Diagnose what went wrong–what would you need to fix in order to get this parse right?

Getting started with the stencil

To get a quick overview of what parts you’re expected to implement, you can run grep -n "TODO" *. You must complete everything marked TODO.

Copy the stencil from /course/cs2952d/stencil/hw2. Copy the lexicon data from /course/cs2952d/data/hw2_lexicon.txt.

For Part 1, look at the files grammar.py, expressions.py. For Part 2, extend the grammar in grammar.py and look at geography.py. The stencil code is much larger than last assignment. To ease you into understanding the myriad of small helper functions to implement, we’ve provided a set of unit tests in unit_tests.py.

Rubric

(Hint: Look at unit_tests.py if you are unsure about the expected behavior of a given function.)

Part 1: Arithmetic expressions in natural language (9.5 points)

0.5 pt implement eval() for Op1 (in expressions.py)
0.5 pt implement eval() for Op2 (in expressions.py)
1.0 pt checks on rule forms in grammar.py,: is_cat is_lexical is_binary 1/3 pts each)
1.5 pt compute_semantics() in grammar.py, (0.75 each for lexical and nonlexical applications)
2.0 pt implementing cyk (parse_input() in grammar.py)
1.0 pt apply_lexical_rules in grammar.py
1.0 pt apply_binary_rules in grammar.py
0.5 pt define_arithmetic_rules: numeral_rules (in grammar.py)
0.5 pt define_arithmetic_rules: operator_rules (in grammar.py)
1.0 pt define_arithmetic_rules: compositional_rules (in grammar.py)

Part 2 Grammar Induction (5.5 points)

1.0 pt checks on rule forms in grammar.py: is_unary, is_optional, contains_optionals (1/3 pts each)
0.9 pt apply_annotators (in grammar.py)
0.1 pt update to parse_input to call apply_annotators (in grammar.py)
2.0 pt induce_grammar_rules() in grammar.py (1 pt for enumerating all rules, 1 pt for pruning to the top 5 per lexical entry)
0.5 pt apply_unary_rules (in grammar.py)
1.0 answers to questions (0.5 each)