This course covers statistical methods for learning a natural language and applying the knowledge to specific tasks. Topics include: entropy and cross entropy of a language, hidden Markov models, Viterbi algorithm, forward-backward algorithm, trigram models, part-of-speech tagging, probabilistic context-free parsing, inside-outside algorithm, learning probabilistic context-free grammars, statistical models of syntactic disambiguation, statistical anaphora resolution, deriving semantic word classes from statistical properties, and word-sense disambiguation.
Grading is based primarily on the project, and secondarily on the two in-class, 40 minute, exams. Class participation will also be considered. The project is done in groups of 2-4 students. All groups work on the same project. Collaboration between groups is allowed (indeed encouraged), up to, but not including, sharing of code (unless explicitly authorized in class). This semesters project looks at the problem of clustering sentences.
Week of | Reading Assignments |
---|---|
Sept 10 | Ch 14 |
Sept 17 | Ch 2 (minus 2.1.10, 2.2.4) Ch 9 to 9.3.1 |
Sept 24 | Ch 9, Ch 10 |
Oct 1 | Ch 10 |
Oct 8 | Ch 11 |
Oct 15 | Exam, Ch 12 |
Oct 22 | Ch 6 |
Oct 29 | Ch 7 |
Nov 5 | Ch 8 |
Nov 12 | Exam |
Nov 19 | No Class |
Nov 26 | Project Discussions |
Computer Files for the project can be found in /pro/dpg/cs241/.