http://lisp.vse.cz/challenge/ecmlpkdd2003/chall2003.htm
Motivation
Knowledge discovery in real-world databases requires a
broad scope of techniques and forms of knowledge. Both the knowledge
and the applied methods should fit the discovery tasks and should
adapt to knowledge hidden in the data. The ECML/PKDD2003 Discovery Challenge
will encourage a collaborative research effort, a broad and unified view of
knowledge and methods of discovery, and emphasis on business problems and solutions
to those problems.
The idea of Discovery Challenge came from Jan Zytkow, who suggested to
organize such an event during PKDD'99 in Prague. In contrast to KDD Cups
held within KDD Conferences, the Discovery Challenge stresses the aspect
collaboration.
The Discovery Challenge should constitute a collection of data and problems as a common ground for
better comparisons and discussions of the applicability of KDD methods on a real-world problems with
respect to both KDD and application viewpoints. The main goals of the
Discovery Challenge are
- stimulate an open view of knowledge and discovery
- stimulate collaborative approach to KDD and research on unification of
both different forms of knowledge and discovery
- integrate into KDD an emphasis on business problems and solutions to
those problems
Time and place
The Discovery Challenge will be held as a workshop during the ECML/PKDD2003 Conference,
September 22-26, 2003, Cavtat-Dubrovnik, Croatia.
Only those registered for ECML/PKDD2003 can participate in the Discovery Challenge.
Data sets
Two data sets from medical domain
are used for the Discovery Challenge:
- A data about risk factors of patients with
atherosclerosis:
This is a data set concerning the twenty years lasting longitudinal study of the risk factors
of the atherosclerosis in the population of 1417 midle aged men.
Four data matrices are included. The first one contains results of observation of
64 attributes of entry examinations of each patient. The second one contains results
of observation of 66 atributes at 10 572 examinations made in the years 1976–1999.
The third data matrix contains additional data about health status of 403 men and the
last data matrix concerns death of 389 patients.
The data from the same domain (but only the first two tables) have been used
in the previous challenge. To see the results from ECML/PKDD 2002 Challenge, follow
this link.
The data were prepared in cooperation with the
EuroMISE-CARDIO centre Prague, Czech Republic.
- A data about chronic hepatitis .
comming soon
The participants in the Challenge can analyze any of these data sets.
To get access to the data, you have to fill-in the
registration form ).
Discovery Challenge guidelines
- Each participant can use any KDD techniques and discover as much
knowledge as possible.
Ideally each submitted contribution will include
- the proposed business objectives (goals that may be of interest to
database users),
- a brief summary of datamining effort; this summary may include
the data preprocessing tasks like data extraction, sampling, data integration and
homogenization, data cleaning, data transformation, the data mining step as well as the
evaluation criteria apporwed,
- presentation of the discovered knowledge, and
- an explanation for database users how they can apply the discovered knowledge.
Since the results may be unexpected, the final applications may be
different from those initially proposed.
- In order to reach a common framework for comparisons, the
presentation of the discovered knowledge should include a clear
summary of the predictions it makes possible. Ideally, such a summary
shows parts of the entire dataset that can be removed from data
because they can be predicted by the discovered knowledge and the remaining data.
- All presentation will be done during the Discovery Challnege Workshop.
The time allocated for each presentation will be about 20
minutes.
- Ample time will be provided during and after the special sessions
for interaction between participants. The discussion will be aimed at
a joint representation of knowledge and method, and on a synthesis of
all contributions.
Submitted papers should be in English and should be formatted according to
the Springer-Verlag Lecture Notes in Artificial Intelligence guidelines.
Authors' instructions and style files
can be downloaded from
http://www.springer.de/comp/lncs/authors.html (no copyright form is
requested, use the style for proceedings). The
maximum length of papers is 12 pages.
The paper must be
submitted electronicaly (as PostScript or PDF files) either by e-mail to
Petr Berka
or using the “submit paper”
option from the Discovery Challenge Webpage.
The deadline for submission is June 30, 2003. An acceptance notification
will follow. The deadline for camera-ready papers is July, 11, 2003.
Acknowledgment
The ECML/PKDD2003 Discovery Challenge is supported by
Petr Berka
Jan Rauch
Shusaku Tsumoto