mirror of
https://github.com/Doctorado-ML/bayesclass.git
synced 2025-08-15 15:45:54 +00:00
278 lines
8.1 KiB
Plaintext
Executable File
278 lines
8.1 KiB
Plaintext
Executable File
% 1. Title: Hayes-Roth & Hayes-Roth (1977) Database
|
|
%
|
|
% 2. Source Information:
|
|
% (a) Creators: Barbara and Frederick Hayes-Roth
|
|
% (b) Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779
|
|
% (c) Date: March, 1989
|
|
%
|
|
% 3. Past Usage:
|
|
% 1. Hayes-Roth, B., & Hayes-Roth, F. (1977). Concept learning and the
|
|
% recognition and classification of exemplars. Journal of Verbal Learning
|
|
% and Verbal Behavior, 16, 321-338.
|
|
% -- Results:
|
|
% -- Human subjects classification and recognition performance:
|
|
% 1. decreases with distance from the prototype,
|
|
% 2. is better on unseen prototypes than old instances, and
|
|
% 3. improves with presentation frequency during learning.
|
|
% 2. Anderson, J.R., & Kline, P.J. (1979). A learning system and its
|
|
% psychological implications. In Proceedings of the Sixth International
|
|
% Joint Conference on Artificial Intelligence (pp. 16-21). Tokyo, Japan:
|
|
% Morgan Kaufmann.
|
|
% -- Partitioned the results into 4 classes:
|
|
% 1. prototypes
|
|
% 2. near-prototypes with high presentation frequency during learning
|
|
% 3. near-prototypes with low presentation frequency during learning
|
|
% 4. instances that are far from protoypes
|
|
% -- Described evidence that ACT's classification confidence and
|
|
% recognition behaviors closely simulated human subjects' behaviors.
|
|
% 3. Aha, D.W. (1989). Incremental learning of independent, overlapping, and
|
|
% graded concept descriptions with an instance-based process framework.
|
|
% Manuscript submitted for publication.
|
|
% -- Used same partition as Anderson & Kline
|
|
% -- Described evidence that Bloom's classification confidence behavior
|
|
% is similar to the human subjects' behavior. Bloom fitted the data
|
|
% more closely than did ACT.
|
|
%
|
|
% 4. Relevant Information:
|
|
% This database contains 5 numeric-valued attributes. Only a subset of
|
|
% 3 are used during testing (the latter 3). Furthermore, only 2 of the
|
|
% 3 concepts are "used" during testing (i.e., those with the prototypes
|
|
% 000 and 111). I've mapped all values to their zero-indexing equivalents.
|
|
%
|
|
% Some instances could be placed in either category 0 or 1. I've followed
|
|
% the authors' suggestion, placing them in each category with equal
|
|
% probability.
|
|
%
|
|
% I've replaced the actual values of the attributes (i.e., hobby has values
|
|
% chess, sports and stamps) with numeric values. I think this is how
|
|
% the authors' did this when testing the categorization models described
|
|
% in the paper. I find this unfair. While the subjects were able to bring
|
|
% background knowledge to bear on the attribute values and their
|
|
% relationships, the algorithms were provided with no such knowledge. I'm
|
|
% uncertain whether the 2 distractor attributes (name and hobby) are
|
|
% presented to the authors' algorithms during testing. However, it is clear
|
|
% that only the age, educational status, and marital status attributes are
|
|
% given during the human subjects' transfer tests.
|
|
%
|
|
% 5. Number of Instances: 132 training instances, 28 test instances
|
|
%
|
|
% 6. Number of Attributes: 5 plus the class membership attribute. 3 concepts.
|
|
%
|
|
% 7. Attribute Information:
|
|
% -- 1. name: distinct for each instance and represented numerically
|
|
% -- 2. hobby: nominal values ranging between 1 and 3
|
|
% -- 3. age: nominal values ranging between 1 and 4
|
|
% -- 4. educational level: nominal values ranging between 1 and 4
|
|
% -- 5. marital status: nominal values ranging between 1 and 4
|
|
% -- 6. class: nominal value between 1 and 3
|
|
%
|
|
% 9. Missing Attribute Values: none
|
|
%
|
|
% 10. Class Distribution: see below
|
|
%
|
|
% 11. Detailed description of the experiment:
|
|
% 1. 3 categories (1, 2, and neither -- which I call 3)
|
|
% -- some of the instances could be classified in either class 1 or 2, and
|
|
% they have been evenly distributed between the two classes
|
|
% 2. 5 Attributes
|
|
% -- A. name (a randomly-generated number between 1 and 132)
|
|
% -- B. hobby (a randomly-generated number between 1 and 3)
|
|
% -- C. age (a number between 1 and 4)
|
|
% -- D. education level (a number between 1 and 4)
|
|
% -- E. marital status (a number between 1 and 4)
|
|
% 3. Classification:
|
|
% -- only attributes C-E are diagnostic; values for A and B are ignored
|
|
% -- Class Neither: if a 4 occurs for any attribute C-E
|
|
% -- Class 1: Otherwise, if (# of 1's)>(# of 2's) for attributes C-E
|
|
% -- Class 2: Otherwise, if (# of 2's)>(# of 1's) for attributes C-E
|
|
% -- Either 1 or 2: Otherwise, if (# of 2's)=(# of 1's) for attributes C-E
|
|
% 4. Prototypes:
|
|
% -- Class 1: 111
|
|
% -- Class 2: 222
|
|
% -- Class Either: 333
|
|
% -- Class Neither: 444
|
|
% 5. Number of training instances: 132
|
|
% -- Each instance presented 0, 1, or 10 times
|
|
% -- None of the prototypes seen during training
|
|
% -- 3 instances from each of categories 1, 2, and either are repeated
|
|
% 10 times each
|
|
% -- 3 additional instances from the Either category are shown during
|
|
% learning
|
|
% 5. Number of test instances: 28
|
|
% -- All 9 class 1
|
|
% -- All 9 class 2
|
|
% -- All 6 class Either
|
|
% -- All 4 prototypes
|
|
% --------------------
|
|
% -- 28 total
|
|
%
|
|
% Observations of interest:
|
|
% 1. Relative classification confidence of
|
|
% -- prototypes for classes 1 and 2 (2 instances)
|
|
% (Anderson calls these Class 1 instances)
|
|
% -- instances of class 1 with frequency 10 during training and
|
|
% instances of class 2 with frequency 10 during training that
|
|
% are 1 value away from their respective prototypes (6 instances)
|
|
% (Anderson calls these Class 2 instances)
|
|
% -- instances of class 1 with frequency 1 during training and
|
|
% instances of class 2 with frequency 1 during training that
|
|
% are 1 value away from their respective prototypes (6 instances)
|
|
% (Anderson calls these Class 3 instances)
|
|
% -- instances of class 1 with frequency 1 during training and
|
|
% instances of class 2 with frequency 1 during training that
|
|
% are 2 values away from their respective prototypes (6 instances)
|
|
% (Anderson calls these Class 4 instances)
|
|
% 2. Relative classification recognition of them also
|
|
%
|
|
% Some Expected results:
|
|
% Both frequency and distance from prototype will effect the classification
|
|
% accuracy of instances. Greater the frequency, higher the classification
|
|
% confidence. Closer to prototype, higher the classification confidence.
|
|
%
|
|
% Information about the dataset
|
|
% CLASSTYPE: nominal
|
|
% CLASSINDEX: last
|
|
%
|
|
|
|
@relation hayes-roth
|
|
|
|
@attribute hobby INTEGER
|
|
@attribute age INTEGER
|
|
@attribute educational_level INTEGER
|
|
@attribute marital_status INTEGER
|
|
@attribute class {1,2,3,4}
|
|
|
|
@data
|
|
2,1,1,2,1
|
|
2,1,3,2,2
|
|
3,1,4,1,3
|
|
2,4,2,2,3
|
|
1,1,3,4,3
|
|
1,1,3,2,2
|
|
3,1,3,2,2
|
|
3,4,2,4,3
|
|
2,2,1,1,1
|
|
3,2,1,1,1
|
|
1,2,1,1,1
|
|
2,2,3,4,3
|
|
1,1,2,1,1
|
|
2,1,2,2,2
|
|
2,4,1,4,3
|
|
1,1,3,3,1
|
|
3,2,1,2,2
|
|
1,2,1,1,1
|
|
3,3,2,1,1
|
|
3,1,3,2,1
|
|
1,2,2,1,2
|
|
3,2,1,3,1
|
|
2,1,2,1,1
|
|
3,2,1,3,1
|
|
2,3,2,1,1
|
|
3,2,2,1,2
|
|
3,2,1,3,2
|
|
2,1,2,2,2
|
|
1,1,3,2,1
|
|
3,2,1,1,1
|
|
1,4,1,1,3
|
|
2,2,1,3,1
|
|
1,2,1,3,2
|
|
1,1,1,2,1
|
|
2,4,3,1,3
|
|
3,1,2,2,2
|
|
1,1,2,2,2
|
|
3,2,2,1,2
|
|
1,2,1,2,2
|
|
3,4,3,2,3
|
|
2,2,2,1,2
|
|
2,2,1,2,2
|
|
3,2,1,3,2
|
|
3,2,1,1,1
|
|
3,1,2,1,1
|
|
1,2,1,3,2
|
|
2,1,1,2,1
|
|
1,1,1,2,1
|
|
1,2,2,3,2
|
|
3,3,1,1,1
|
|
3,3,3,1,1
|
|
3,2,1,2,2
|
|
3,2,1,2,2
|
|
3,1,2,1,1
|
|
1,1,1,2,1
|
|
2,1,3,2,1
|
|
2,2,2,1,2
|
|
2,1,2,1,1
|
|
2,2,1,3,1
|
|
2,1,2,2,2
|
|
1,2,4,2,3
|
|
2,2,1,2,2
|
|
1,1,2,4,3
|
|
1,3,2,1,1
|
|
2,4,4,2,3
|
|
2,3,2,1,1
|
|
3,1,2,2,2
|
|
1,1,2,2,2
|
|
1,3,2,4,3
|
|
1,1,2,2,2
|
|
3,1,4,2,3
|
|
2,1,3,2,2
|
|
1,1,3,2,2
|
|
3,1,3,2,1
|
|
1,2,4,4,3
|
|
1,4,2,1,3
|
|
2,1,2,1,1
|
|
3,4,1,2,3
|
|
2,2,1,1,1
|
|
1,1,2,1,1
|
|
2,2,4,3,3
|
|
3,1,2,2,2
|
|
1,1,3,2,1
|
|
1,2,1,3,1
|
|
1,4,4,1,3
|
|
3,3,3,2,2
|
|
2,2,1,3,2
|
|
3,3,2,1,2
|
|
1,1,1,3,1
|
|
2,2,1,2,2
|
|
2,2,2,1,2
|
|
2,3,2,3,2
|
|
1,3,2,1,2
|
|
2,2,1,2,2
|
|
1,1,1,2,1
|
|
3,2,2,1,2
|
|
3,2,1,1,1
|
|
1,1,2,1,1
|
|
3,1,4,4,3
|
|
3,3,2,1,2
|
|
2,3,2,1,2
|
|
2,1,3,1,1
|
|
1,2,1,2,2
|
|
3,1,1,2,1
|
|
2,2,4,1,3
|
|
1,2,2,1,2
|
|
2,3,2,1,2
|
|
2,2,1,4,3
|
|
1,4,2,3,3
|
|
2,2,1,1,1
|
|
1,2,1,1,1
|
|
2,2,3,2,2
|
|
1,3,2,1,1
|
|
3,1,2,1,1
|
|
3,1,1,2,1
|
|
3,3,1,4,3
|
|
2,3,4,1,3
|
|
1,2,3,3,2
|
|
3,3,2,2,2
|
|
3,3,4,2,3
|
|
1,2,2,1,2
|
|
2,1,1,4,3
|
|
3,1,2,2,2
|
|
3,2,2,4,3
|
|
2,3,1,3,1
|
|
2,1,1,2,1
|
|
3,4,1,3,3
|
|
1,1,4,3,3
|
|
2,1,2,1,1
|
|
1,2,1,2,2
|
|
1,2,2,1,2
|
|
3,1,1,2,1
|