89-7260 Natural Language Processing and Applications

UPDATE: next time slot for oral exams will be in July 2014, before the semester ends on July 25th NEW

Achtung: The news items in the side bar are sorted alphabetical and not chronological!

Natural language processing is a key technology in web search, information retrieval, social network analysis, machine translation, speech recognition, and many other applications. The course introduces students to methods for natural language processing, natural language understanding, and information retrieval. 
  • text processing and encoding
  • string algorithms, edit distance
  • statistical language models
  • spell correction
  • n-gram models
  • word sense disambiguation
  • Markov models, parts-of-speech tagging
  • probabilistic grammars and parsing
  • text alignment, clustering, text categorization
  • statistical machine translation
  • applications in speech recognition, handwriting recognition, and OCR
  • language acquisition
  • machine learning for NLP
  • cognitive and psychological aspects of NLP
The course will combine a statistical, mathematical, and practical approach. Exercises with be in Python and some Python toolkits.

Lectures: Wednesdays, 13:45 - 15:15 @ Room 48-462 
Tutorials: every other Thursday, 17:00 - 18:30 @ Room 32-411 (see announcements on tutorial page)
Tutors: Mayce Al Azawi and Ludwig Schmidt-Hackenberg


Much of the material we cover is contained in this free e-book (also available from O'Reilly and Amazon in printed form):
You should also do background reading on your own, using sources like Wikipedia and Google Scholar when appropriate. In particular, after each lecture, look up any important terms and ideas introduced in class online.


  • Dates: roughly between April 9th and 12th - There will be no exams in the end of the winter semester!!!
  • Admission (Zulassung): You have to complete 50% of the assigned homework averaged over all handed out tasks.
  • For the exam you have to finish all tasks and bring them with you!
  • FAQ Oral Exams

Lecture 1 (17.10.12):

Lecture 2 (24.10.12):

Lecture 3 (31.10.12):

  • Unix Tools
  • Worksheet*: Downloading Text from the Internet iPyNBPDF
  • Worksheet*: Simple Word Histogram with the Command Line iPyNBPDF
  • Worksheet*: find and xargs iPyNBPDF
  • Worksheet*: Unicode iPyNBPDF

Lecture 4 (8.11.12): 

  • Worksheet*: Regular Expressions iPyNBPDF
  • Worksheet*: Regular Expressions and FSA iPyNBPDF
    • PyDot is not available on the SCI terminals at the moment. We are talking to the SCI to get installed. Working as of 30.11.12 

Lecture 5 (21.11.12): 

  • Worksheets*:

Lecture 6 (22.11.12):

  • Worksheets*:

Lecture 7 (5.1228.11.12):

  • Worksheets*:
    • NLTK - Available taggers iPyNBPDF **
    • NLTK - Ngram taggers iPyNBPDF **
    • NLTK - Tagging from scratch iPyNBPDF **

Lecture 8 (5.12.12): 

  • Worksheets*:
    • NLPA -  Markov Models iPyNBPDF
    • NLPA - HMM - OCR (see lecture 9)

Lecture 9 (12.12.12): 

    Lecture 10 (19.12.12): 

    • Worksheets*:
      • NLPA - OpenFST2 iPyNBPDF ***
      • NLPA - OpenFst - Edit - Distance iPyNBPDF ***

    Lecture 11 (9.1.13):

    • Worksheets*:
      • NLPA - Classification - Intro  iPyNB, PDF  (see below)

    Lecture 12 (16.1.13):

    • Worksheets*:
      • NLPA - Classification - Intro  iPyNB,  PDF (updated)
      • NLPA - Dialog Act Type classification iPyNBPDF
      • NLPA - Sentence Segmentation classification iPyNBPDF
      • NLPA - Classification tagging iPyNBPDF
      • NLPA - Classifier errors iPyNBPDF

    Lecture 13 (23.1.13):

      Lecture 14 (30.1.13): 

      * these are iPython Notebook Worksheets. To use these you need to run ipython notebook --pylab=inline from the folder where you downloaded the files to. You will need version 0.13 or bigger. Ipython notebook is installed on the SCI machines.
      ** you need to download tagutils.py and put it in the same folder as the worksheets in order to run these 
      *** you need to download fstutils.py and put it in the same folder as the worksheets in order to run these NEW