News...
Dec 1, 2006:
Pamphlet for Dec 7 presentations now available.
Nov 17, 2006:
Project 4 presentations will be on Dec 7 from 3:30 till when we finish.
Dinner (pizza) will be provided.
Location: ESB G39
Visitors will be invited so please (i)come dressed
in business
casual; (ii)thoroughly rehearse your talk- make sure you finish in 15 minutes and leave time for
5 minutes questions; (iii)only present HTML- saves time switching between talks.
Nov 16, 2006:
Lecture 8 is now
available.
Oct 3, 2006:
Lecture 7 is now
available.
This subject is about finding the
diamonds in the dust; that is the small gems
within mountains of data.
Students in this subject
will learn the core concepts of data mining and how to use those
concepts to build theories from data.
For administrative details about this subject, see the Syllabus.
For some of the motivations of this subject, read on.
The core ideas of this subject are:
Wholes and holes
- Humans and data miners are a natural partnership.
Humans are good at the whole story while data miners are good at filling in the
holes (the details that humans don't have time to tell us, or just don't know).
Much of "mining" is really "data pre-processing."
- So as well as exploring data mining methods
and tools
(e.g. the WEKA),
students to need also learn the scripting skills required for the pre- and post-processing.
Bias makes us blind, bias lets us see
- The output of a data miner is always biased by the data selected
for the learning, the learning method applied, etc etc. They must
be biased since, otherwise, there would be no way to decide what
bits are most important and which bits can be ignored.
Paradoxically,
bias blinds us to some things while letting us see (predict) the
future.
- So all theories are biased (but only some admit it). But
we should always be aware of the domain-specific nature of the
conclusions drawn from a learner.
Algorithms need audiences
- Data miners built theories that some{one|thing} will use.
People like reading things and some things are easier to read than other
things.
- Hence, this subject does not spend to much time
on
mostly mathematical
methods (eg. regression, neural nets, Bayes classifiers). Instead, we'll focus
on methods that generate human-readable theories (e.g.
decision trees, rule-based learners, treatment learners).
How do dumb apes get by?
- Here's a puzzle.
People aren't real bright (just look at how badly they write software).
Yet, somehow,
people have built the most amazing things, like
the international
domestic airline network and the Internet. How?
- Maybe the real world
is not as complex as our egos imagine. And seemingly naive
probes tell us most of what can be found using supposedly
more sophisticated methods.
You are responsible
- Very successful data miners can be surprisingly simple. This begs the question
"why aren't they used more often so we can control the world around us, better?".
- The answer is that, sometimes the world is very very complicated and no single simple solution
will suffice. But often, the world is
a surprisingly simple place (otherwise, dumb apes would not get by)
which means, in turn, that we should be able to predict and control and select
the future that we want.
- So the curse of data mining is that once you learn how to do it, you become responsible for the future of the human race.
Are your ready for that?