To remove any frames surrounding this page,
click here
COGS1360/CSCI1460: Introduction to Computational Linguistics
Spring semester, 2008
Instructors:
Eugene Charniak and
Mark Johnson
Meeting time: Tuesday/Thursday 2:30-3:50
Classroom:
Watson Center (CIT) 506
TA: Micha Elsner (office hours CIT 409, 11am-12pm Friday)
We're developing completely new materials for this course, and writing an
introductory book on Computational Linguistics in the process. Hold
onto your hats!
- You can get the syllabus for the class here.
- The reading for Week 2 is on language modelling.
- Assignment 1 is on language modeling. Trivia due Feb 3, assignment due
Feb 10. Support code is now available at /course/cs146/code
(see testTranslations for an example of how to use the support
classes to read in the data files).
Trivia assignment: to clarify, you are supposed to include the
original sentence as a potential translation of itself, and the data
paths are relative to the course
directory /course/cs146. Solutions: for validate.ex,
1267 sentences, mean expected error 4.6048. For test.ex, 1179
sentences, mean expected error 5.056. Make sure you are getting these
numbers before you start the next stage of the assignment.
- My writeup of the assignment
is here. If we feel your assignment
is not complete enough, we will email you before Thursday
night. Otherwise you may assume you have full credit.
- The reading for week 3 is on machine translation. Look
at /course/cs146/handouts/ibmA.pdf.
- Assignment 2 is to do problems 1.1 and 1.2 in the reading. The
first part (learning the translation parameters) is due on 24
Feb. The second part (writing the very dumb and dumb decoders) is
due on 3 March.
Please report the 10 highest-probability French translations of the
words drug, book, association, ,
(the comma), does and fatalities.
- A set of translation parameters for assignment 2 is here. We are not posting a reference set of decodings, but you can ask your classmates for comparisons if you are interested. If you have turned in all parts of the MT assignment, and haven't heard from us by the end of today (Tues. 10 March), you may assume you have credit.
- Readings for the lecture on stack-based decoding
are Decoding
Algorithm in Structured MT,
Fast
Decoding for Statistical Machine Translation and
Fast
Decoding and Optimal Decoding for Machine Translation.
- Reading for the HMM chapter is in the handout directory.
- The midterm is 19 March, in class.
- For the HMM assignment, you should get roughly 92-3% without unknown
word smoothing and about a percent better with.
- Reading for the parsing chapter is in the handout directory.
- The parsing assignment is due on 14 April. You should expect to
get about 70\%.
- Some optional readings on
parsing:
A
maximum-entropy-inspired parser (Eugene's original parser).
Coarse-to-fine
n-best parsing and max-ent discriminative reranking (Mark's
reranker)
Accurate
unlexicalized parsing (Klein and Manning on tree
annotation and Markovized grammars)
Learning
accurate, compact, and interpretable tree annotation (Petrov et
al on automatic state-splitting using the inside-outside algorithm)
Pipeline
iteration (Hollingshead and Roark on speeding up chart parsing
using a simple phrase detector)
- The pronoun assignment has been out for a while now, and is due
Thursday 30 April. The numbers you should get are listed in the
assignment (remember the updated sheet is the one in the handouts
directory, not the pronoun directory).
C++ resources
There is an overwhelming amount of stuff about C++ on the web.
Wikipedia is a good place to start. In addition, you might find
the following links helpful:
- The standard textbook for C++ is Bjarne Stroustrup,
The C++ Programming Language.
- I found the
C++ FAQ lite helpful when I was learning C++.
- The Standard Template Library (STL) is very important
for the kind of programming we do in this class.
Most compilers come with an STL implementation, but you'll
need documentation to use this most effectively.
- Nicolai Josuttis
The C++ Standard Library - A Tutorial and Reference
is a good textbook on the STL.
- SGI (where a lot of the STL was developed) has an excellent
programmer's guide to the STL. Warning: this site hasn't
been updated to reflect the changes to the STL in Technical
Report 1 (TR1); in particular, hash maps and hash sets will
be replaced by unordered maps and unordered sets in the near
future.
- The
Boost C++ Libraries contain a lot of very useful, high-quality
libraries. Many modern Linuxes come with Boost preinstalled, so
you can use Boost by just including the header.
The Boost libraries I use the most are:
- Many of the C++ standards documents are available on-line.
Don't try to learn C++ from these! But if you like this kind
of thing you can find: