Natural Language Processing – Project – Spring 2007

Initial Project Guidelines

  1. Need to apply NLP techniques to a corpora.
  2. Goals of the exercise must be clearly specified
  3. The text to be analyzed can be obtained from any (legal) source.
  4. Email, IM, web pages, on-line text are all fair play.
  5. All projects will be individual projects.
  6. You can choose whatever technology you are comfortable with: .Net, Java, Perl, Lisp, Prolog, Python. You need to get approval from me on what you are going to be using though.

Sample topics:

  1. Analyzing on-line news stories to determine what is the top topic of the day.
  2. Synthesizing a news paper based on user interests
  3. Extracting clinical concepts from medical dictation.
  4. Identifying appropriate technical report that matches user interest.
  5. Analyzing IM to identify major topics discussed.
  6. Analyzing press releases to determine what the press release is about
  7. Analyzing spam email and identifying it as spam.

Timeline:

  1. Need to identify project area by Feb 15, 2007. Submit a half a page proposal on what you plan to do.
  2. Develop a plan of action and get my concurrence by Feb 28, 2007. The plan should at a minimum include progress milestones that are bi-weekly.
  3. Every two weeks submit a half a page progress report.
  4. First demo of the system right after spring break.
  5. Final demo of the system at the end of the semester.