mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-18 17:06:02 +00:00
Commit Inicial
This commit is contained in:
133
data/tanveer/trains/trains.names
Executable file
133
data/tanveer/trains/trains.names
Executable file
@@ -0,0 +1,133 @@
|
||||
|
||||
1. Title: INDUCE Trains Data set
|
||||
|
||||
2. Sources:
|
||||
- Donor: GMU, Center for AI, Software Librarian,
|
||||
Eric E. Bloedorn (bloedorn@aic.gmu.edu)
|
||||
- Original owners: Ryszard S. Michalski (michalski@aic.gmu.edu)
|
||||
and Robert Stepp
|
||||
- Date received: 1 June 1994
|
||||
- Date updated: 24 June 1994 (Thanks to Larry Holder (UT Arlington)
|
||||
for noticing a translation error)
|
||||
|
||||
3. Past usage:
|
||||
- This set most closely resembles the data sets described in the following
|
||||
two publications:
|
||||
1. R.S. Michalski and J.B. Larson "Inductive Inference of VL
|
||||
Decision Rules" In Proceedings of the Workshop in Pattern-Directed
|
||||
Inference Systems, Hawaii, May 1977. Also published in SIGART
|
||||
Newsletter, ACM No. 63, pp. 38-44, June 1977.
|
||||
2. Stepp, R.E. and Michalski, R.S. "Conceptual Clustering: Inventing
|
||||
Goal-Oriented Classifications of Structured Objects" In
|
||||
R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.) "Machine
|
||||
Learning: An Artificial Intelligence Approach, Volume II". Los
|
||||
Altos, Ca: Morgan Kaufmann.
|
||||
|
||||
Both of these papers describe a set of 10 trains, 5 east-bound and 5 west
|
||||
bound. Both refer to the same 10 trains as seen by the figures in these
|
||||
publications. The differences are:
|
||||
1) This dataset has 10 attributes, no wheel, or load color attributes
|
||||
2) Reference 2 (Stepp, Michalski): does not completely list the
|
||||
attributes used, but does mention wheel color - an attribute not
|
||||
present in this dataset.
|
||||
3) Reference 1 (Michalski, Larson): 12 attributes mentioned, but only 6
|
||||
are explicitly described. These 6 are included in the dataset below
|
||||
and the Stepp and Michalski set.
|
||||
|
||||
Results:
|
||||
[1] Michalski and Larson found the following decision rules:
|
||||
(1) There exists car1, car2, lod1 and lod2 such that
|
||||
[infront(car1, car2)][lcont(car1, lod1)][lcont(car2,lod2)]
|
||||
[load-shape(lod1)=triangle][load-shape(lod2)=polygon]=>[dir=east]
|
||||
(2) There exists a car1 such that
|
||||
[ln(car1)=short][car-shape(car1)=closed-top]=>[dir=east]
|
||||
(3) [ncar=3]v There exists car1 such that [car1(car-shape(car1)=jagged-
|
||||
top] =>[dir=west]
|
||||
There exists car1 such that
|
||||
(4) [#cars(ln=long)=2][cshape(car1)=open,trapezoind,u-shaped] v
|
||||
[location(car1)=2][cshape(car1)=closed, rectangle]=>[dir=west]
|
||||
(The first selector in rule 4 uses a meta descriptor generated by
|
||||
the program that counts the number of long cars in a train)
|
||||
[2] The goal of the cluster research is to develop a general method
|
||||
for clustering structured objects that can generate conjunctive
|
||||
descriptions that occur in human classifications or invent new
|
||||
concepts that have similar appeal. CLUSTER/S was able to find the
|
||||
following cognitively appealing clusters: 1) a) "There are two
|
||||
different car shapes in the train" b) "There are three or more
|
||||
different car shapes in the train" 2) a) Wheels on all cars have
|
||||
the same color, b) wheels on all cars do not have the same color."
|
||||
|
||||
4. Relevant information:
|
||||
- Additional "background" knowledge is supplied that provides a partial
|
||||
ordering on some of the attribute values.
|
||||
- We are providing this dataset both in its original form and in a form
|
||||
similar to the more typical propositional datasets in our repository.
|
||||
Since the trains dataset records relations between attributes, this
|
||||
transformation was somewhat challenging. However, it may shed some
|
||||
insight on this problem for people who are more familiar with the simple
|
||||
one-instance-per-line dataset format.
|
||||
- Hierarchy of values:
|
||||
if (cshape is one of {openrect,opentrap,ushaped,dblopnrect}
|
||||
then cshape is opentop
|
||||
if (cshape is one of {hexagon,ellipse,closedrect,jaggedtop,slopetop,
|
||||
engine}
|
||||
then cshape closedtop
|
||||
- Prediction task: Determine concise decision rules distinguishing
|
||||
trains traveling east from those traveling west.
|
||||
|
||||
5. Number of instances: 10
|
||||
|
||||
6. Number of attributes:
|
||||
- 10, not including the class attribute
|
||||
1. ccont(train idx1, car idx2): car idx is contained in train idx
|
||||
2. ncar(train idx): # of trains in car train idx (int)
|
||||
3. infront(car idx1, car idx2): relative positions of cars in train
|
||||
4. loc(car idx): absolute position of car in train (int)
|
||||
5. nwhl(car idx): # of wheels of car idx (int)
|
||||
6. ln(car idx): length of car idx (long, short)
|
||||
7. cshape(car idx): shape of car (engine, dblopenrect,
|
||||
closedrect, openrect, opentrap, ushaped,
|
||||
hexagon, ellipse, jaggedtop, slopetop,
|
||||
opentop, closedtop)
|
||||
8. npl(car idx): number of loads in car idx
|
||||
9. lcont(car idx, load idx): description of which cars hold which loads
|
||||
10. lhshape(load idx): description of load shape (trianglod,
|
||||
rectanglod, circlelod, hexagonlod)
|
||||
Class: direction (east, west)
|
||||
|
||||
The following format was used for the "transformed" dataset representation
|
||||
as found in trains.transformed.data (one instance per line):
|
||||
|
||||
Attributes: 33
|
||||
1. Number_of_cars (integer in [3-5])
|
||||
2. Number_of_different_loads (integer in [1-4])
|
||||
3-22: 5 attributes for each of cars 2 through 5: (20 attributes total)
|
||||
- num_wheels (integer in [2-3])
|
||||
- length (short or long)
|
||||
- shape (closedrect, dblopnrect, ellipse, engine, hexagon,
|
||||
jaggedtop, openrect, opentrap, slopetop, ushaped)
|
||||
- num_loads (integer in [0-3])
|
||||
- load_shape (circlelod, hexagonlod, rectanglod, trianglod)
|
||||
23-32: 10 Boolean attributes describing whether 2 types of loads are on
|
||||
adjacent cars of the train
|
||||
- Rectangle_next_to_rectangle (0 if false, 1 if true)
|
||||
- Rectangle_next_to_triangle (0 if false, 1 if true)
|
||||
- Rectangle_next_to_hexagon (0 if false, 1 if true)
|
||||
- Rectangle_next_to_circle (0 if false, 1 if true)
|
||||
- Triangle_next_to_triangle (0 if false, 1 if true)
|
||||
- Triangle_next_to_hexagon (0 if false, 1 if true)
|
||||
- Triangle_next_to_circle (0 if false, 1 if true)
|
||||
- Hexagon_next_to_hexagon (0 if false, 1 if true)
|
||||
- Hexagon_next_to_circle (0 if false, 1 if true)
|
||||
- Circle_next_to_circle (0 if false, 1 if true)
|
||||
33. Class attribute (east or west)
|
||||
|
||||
The number of cars vary between 3 and 5. Therefore, attributes referring
|
||||
to properties of cars that do not exist (such as the 5 attriubutes for
|
||||
the "5th" car when the train has fewer than 5 cars) are assigned a value
|
||||
of "-".
|
||||
|
||||
7. Distribution of classes:
|
||||
- There are 5 east-bound trains and 5 west-bound trains
|
||||
(i.e., 50% east, 50% west)
|
||||
|
Reference in New Issue
Block a user