mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-17 16:36:02 +00:00
134 lines
6.7 KiB
Plaintext
Executable File
134 lines
6.7 KiB
Plaintext
Executable File
|
|
1. Title: INDUCE Trains Data set
|
|
|
|
2. Sources:
|
|
- Donor: GMU, Center for AI, Software Librarian,
|
|
Eric E. Bloedorn (bloedorn@aic.gmu.edu)
|
|
- Original owners: Ryszard S. Michalski (michalski@aic.gmu.edu)
|
|
and Robert Stepp
|
|
- Date received: 1 June 1994
|
|
- Date updated: 24 June 1994 (Thanks to Larry Holder (UT Arlington)
|
|
for noticing a translation error)
|
|
|
|
3. Past usage:
|
|
- This set most closely resembles the data sets described in the following
|
|
two publications:
|
|
1. R.S. Michalski and J.B. Larson "Inductive Inference of VL
|
|
Decision Rules" In Proceedings of the Workshop in Pattern-Directed
|
|
Inference Systems, Hawaii, May 1977. Also published in SIGART
|
|
Newsletter, ACM No. 63, pp. 38-44, June 1977.
|
|
2. Stepp, R.E. and Michalski, R.S. "Conceptual Clustering: Inventing
|
|
Goal-Oriented Classifications of Structured Objects" In
|
|
R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.) "Machine
|
|
Learning: An Artificial Intelligence Approach, Volume II". Los
|
|
Altos, Ca: Morgan Kaufmann.
|
|
|
|
Both of these papers describe a set of 10 trains, 5 east-bound and 5 west
|
|
bound. Both refer to the same 10 trains as seen by the figures in these
|
|
publications. The differences are:
|
|
1) This dataset has 10 attributes, no wheel, or load color attributes
|
|
2) Reference 2 (Stepp, Michalski): does not completely list the
|
|
attributes used, but does mention wheel color - an attribute not
|
|
present in this dataset.
|
|
3) Reference 1 (Michalski, Larson): 12 attributes mentioned, but only 6
|
|
are explicitly described. These 6 are included in the dataset below
|
|
and the Stepp and Michalski set.
|
|
|
|
Results:
|
|
[1] Michalski and Larson found the following decision rules:
|
|
(1) There exists car1, car2, lod1 and lod2 such that
|
|
[infront(car1, car2)][lcont(car1, lod1)][lcont(car2,lod2)]
|
|
[load-shape(lod1)=triangle][load-shape(lod2)=polygon]=>[dir=east]
|
|
(2) There exists a car1 such that
|
|
[ln(car1)=short][car-shape(car1)=closed-top]=>[dir=east]
|
|
(3) [ncar=3]v There exists car1 such that [car1(car-shape(car1)=jagged-
|
|
top] =>[dir=west]
|
|
There exists car1 such that
|
|
(4) [#cars(ln=long)=2][cshape(car1)=open,trapezoind,u-shaped] v
|
|
[location(car1)=2][cshape(car1)=closed, rectangle]=>[dir=west]
|
|
(The first selector in rule 4 uses a meta descriptor generated by
|
|
the program that counts the number of long cars in a train)
|
|
[2] The goal of the cluster research is to develop a general method
|
|
for clustering structured objects that can generate conjunctive
|
|
descriptions that occur in human classifications or invent new
|
|
concepts that have similar appeal. CLUSTER/S was able to find the
|
|
following cognitively appealing clusters: 1) a) "There are two
|
|
different car shapes in the train" b) "There are three or more
|
|
different car shapes in the train" 2) a) Wheels on all cars have
|
|
the same color, b) wheels on all cars do not have the same color."
|
|
|
|
4. Relevant information:
|
|
- Additional "background" knowledge is supplied that provides a partial
|
|
ordering on some of the attribute values.
|
|
- We are providing this dataset both in its original form and in a form
|
|
similar to the more typical propositional datasets in our repository.
|
|
Since the trains dataset records relations between attributes, this
|
|
transformation was somewhat challenging. However, it may shed some
|
|
insight on this problem for people who are more familiar with the simple
|
|
one-instance-per-line dataset format.
|
|
- Hierarchy of values:
|
|
if (cshape is one of {openrect,opentrap,ushaped,dblopnrect}
|
|
then cshape is opentop
|
|
if (cshape is one of {hexagon,ellipse,closedrect,jaggedtop,slopetop,
|
|
engine}
|
|
then cshape closedtop
|
|
- Prediction task: Determine concise decision rules distinguishing
|
|
trains traveling east from those traveling west.
|
|
|
|
5. Number of instances: 10
|
|
|
|
6. Number of attributes:
|
|
- 10, not including the class attribute
|
|
1. ccont(train idx1, car idx2): car idx is contained in train idx
|
|
2. ncar(train idx): # of trains in car train idx (int)
|
|
3. infront(car idx1, car idx2): relative positions of cars in train
|
|
4. loc(car idx): absolute position of car in train (int)
|
|
5. nwhl(car idx): # of wheels of car idx (int)
|
|
6. ln(car idx): length of car idx (long, short)
|
|
7. cshape(car idx): shape of car (engine, dblopenrect,
|
|
closedrect, openrect, opentrap, ushaped,
|
|
hexagon, ellipse, jaggedtop, slopetop,
|
|
opentop, closedtop)
|
|
8. npl(car idx): number of loads in car idx
|
|
9. lcont(car idx, load idx): description of which cars hold which loads
|
|
10. lhshape(load idx): description of load shape (trianglod,
|
|
rectanglod, circlelod, hexagonlod)
|
|
Class: direction (east, west)
|
|
|
|
The following format was used for the "transformed" dataset representation
|
|
as found in trains.transformed.data (one instance per line):
|
|
|
|
Attributes: 33
|
|
1. Number_of_cars (integer in [3-5])
|
|
2. Number_of_different_loads (integer in [1-4])
|
|
3-22: 5 attributes for each of cars 2 through 5: (20 attributes total)
|
|
- num_wheels (integer in [2-3])
|
|
- length (short or long)
|
|
- shape (closedrect, dblopnrect, ellipse, engine, hexagon,
|
|
jaggedtop, openrect, opentrap, slopetop, ushaped)
|
|
- num_loads (integer in [0-3])
|
|
- load_shape (circlelod, hexagonlod, rectanglod, trianglod)
|
|
23-32: 10 Boolean attributes describing whether 2 types of loads are on
|
|
adjacent cars of the train
|
|
- Rectangle_next_to_rectangle (0 if false, 1 if true)
|
|
- Rectangle_next_to_triangle (0 if false, 1 if true)
|
|
- Rectangle_next_to_hexagon (0 if false, 1 if true)
|
|
- Rectangle_next_to_circle (0 if false, 1 if true)
|
|
- Triangle_next_to_triangle (0 if false, 1 if true)
|
|
- Triangle_next_to_hexagon (0 if false, 1 if true)
|
|
- Triangle_next_to_circle (0 if false, 1 if true)
|
|
- Hexagon_next_to_hexagon (0 if false, 1 if true)
|
|
- Hexagon_next_to_circle (0 if false, 1 if true)
|
|
- Circle_next_to_circle (0 if false, 1 if true)
|
|
33. Class attribute (east or west)
|
|
|
|
The number of cars vary between 3 and 5. Therefore, attributes referring
|
|
to properties of cars that do not exist (such as the 5 attriubutes for
|
|
the "5th" car when the train has fewer than 5 cars) are assigned a value
|
|
of "-".
|
|
|
|
7. Distribution of classes:
|
|
- There are 5 east-bound trains and 5 west-bound trains
|
|
(i.e., 50% east, 50% west)
|
|
|