Natural Language Processing – Project – Spring 2007
Initial Project Guidelines
- Need
to apply NLP techniques to a corpora.
- Goals
of the exercise must be clearly specified
- The
text to be analyzed can be obtained from any (legal) source.
- Email,
IM, web pages, on-line text are all fair play.
- All
projects will be individual projects.
- You
can choose whatever technology you are comfortable with: .Net, Java, Perl,
Lisp, Prolog, Python. You need to get approval
from me on what you are going to be using though.
Sample topics:
- Analyzing
on-line news stories to determine what is the top topic of the day.
- Synthesizing
a news paper based on user interests
- Extracting
clinical concepts from medical dictation.
- Identifying
appropriate technical report that matches user interest.
- Analyzing
IM to identify major topics discussed.
- Analyzing
press releases to determine what the press release is about
- Analyzing
spam email and identifying it as spam.
Timeline:
- Need
to identify project area by Feb 15, 2007. Submit a half a page proposal on
what you plan to do.
- Develop
a plan of action and get my concurrence by Feb 28, 2007. The plan should
at a minimum include progress milestones that are bi-weekly.
- Every
two weeks submit a half a page progress report.
- First
demo of the system right after spring break.
- Final
demo of the system at the end of the semester.