bayesclass/datasets/hayes-roth_test.arff

% 1. Title: Hayes-Roth & Hayes-Roth (1977) Database
%
% 2. Source Information:
%    (a) Creators: Barbara and Frederick Hayes-Roth
%    (b) Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779
%    (c) Date: March, 1989
%
% 3. Past Usage:
%     1. Hayes-Roth, B., & Hayes-Roth, F. (1977).  Concept learning and the
%        recognition and classification of exemplars.  Journal of Verbal Learning
%        and Verbal Behavior, 16, 321-338.
%        -- Results:
%           -- Human subjects classification and recognition performance:
% 	       1. decreases with distance from the prototype,
% 	       2. is better on unseen prototypes than old instances, and
% 	       3. improves with presentation frequency during learning.
%     2. Anderson, J.R., & Kline, P.J. (1979).  A learning system and its
%        psychological implications.  In Proceedings of the Sixth International
%        Joint Conference on Artificial Intelligence (pp. 16-21).  Tokyo, Japan:
%        Morgan Kaufmann.
%        -- Partitioned the results into 4 classes:
% 	    1. prototypes
% 	    2. near-prototypes with high presentation frequency during learning
% 	    3. near-prototypes with low presentation frequency during learning
% 	    4. instances that are far from protoypes
%        -- Described evidence that ACT's classification confidence and
%           recognition behaviors closely simulated human subjects' behaviors.
%     3. Aha, D.W. (1989).  Incremental learning of independent, overlapping, and
%        graded concept descriptions with an instance-based process framework.
%        Manuscript submitted for publication.
%        -- Used same partition as Anderson & Kline
%        -- Described evidence that Bloom's classification confidence behavior
% 	  is similar to the human subjects' behavior.  Bloom fitted the data
% 	  more closely than did ACT.
%
% 4. Relevant Information:
%      This database contains 5 numeric-valued attributes.  Only a subset of
%      3 are used during testing (the latter 3).  Furthermore, only 2 of the
%      3 concepts are "used" during testing (i.e., those with the prototypes
%      000 and 111).  I've mapped all values to their zero-indexing equivalents.
%
%      Some instances could be placed in either category 0 or 1.  I've followed
%      the authors' suggestion, placing them in each category with equal
%      probability.
%
%      I've replaced the actual values of the attributes (i.e., hobby has values
%      chess, sports and stamps) with numeric values.  I think this is how
%      the authors' did this when testing the categorization models described
%      in the paper.  I find this unfair.  While the subjects were able to bring
%      background knowledge to bear on the attribute values and their
%      relationships, the algorithms were provided with no such knowledge.  I'm
%      uncertain whether the 2 distractor attributes (name and hobby) are
%      presented to the authors' algorithms during testing.  However, it is clear
%      that only the age, educational status, and marital status attributes are
%      given during the human subjects' transfer tests.
%
% 5. Number of Instances: 132 training instances, 28 test instances
%
% 6. Number of Attributes: 5 plus the class membership attribute.  3 concepts.
%
% 7. Attribute Information:
%       -- 1. name: distinct for each instance and represented numerically
%       -- 2. hobby: nominal values ranging between 1 and 3
%       -- 3. age: nominal values ranging between 1 and 4
%       -- 4. educational level: nominal values ranging between 1 and 4
%       -- 5. marital status: nominal values ranging between 1 and 4
%       -- 6. class: nominal value between 1 and 3
%
% 9. Missing Attribute Values: none
%
% 10. Class Distribution: see below
%
% 11. Detailed description of the experiment:
%   1. 3 categories (1, 2, and neither -- which I call 3)
%      -- some of the instances could be classified in either class 1 or 2, and
%         they have been evenly distributed between the two classes
%   2. 5 Attributes
%      -- A. name (a randomly-generated number between 1 and 132)
%      -- B. hobby (a randomly-generated number between 1 and 3)
%      -- C. age (a number between 1 and 4)
%      -- D. education level (a number between 1 and 4)
%      -- E. marital status (a number between 1 and 4)
%   3. Classification:
%      -- only attributes C-E are diagnostic; values for A and B are ignored
%      -- Class Neither: if a 4 occurs for any attribute C-E
%      -- Class 1: Otherwise, if (# of 1's)>(# of 2's) for attributes C-E
%      -- Class 2: Otherwise, if (# of 2's)>(# of 1's) for attributes C-E
%      -- Either 1 or 2: Otherwise, if (# of 2's)=(# of 1's) for attributes C-E
%   4. Prototypes:
%      -- Class 1: 111
%      -- Class 2: 222
%      -- Class Either: 333
%      -- Class Neither: 444
%   5. Number of training instances: 132
%      -- Each instance presented 0, 1, or 10 times
%      -- None of the prototypes seen during training
%      -- 3 instances from each of categories 1, 2, and either are repeated
%         10 times each
%      -- 3 additional instances from the Either category are shown during
%         learning
%   5. Number of test instances: 28
%      -- All 9 class 1
%      -- All 9 class 2
%      -- All 6 class Either
%      -- All 4 prototypes
%      --------------------
%      --    28 total
%
% Observations of interest:
%   1. Relative classification confidence of
%      -- prototypes for classes 1 and 2 (2 instances)
%         (Anderson calls these Class 1 instances)
%      -- instances of class 1 with frequency 10 during training and
%         instances of class 2 with frequency 10 during training that
%         are 1 value away from their respective prototypes (6 instances)
%         (Anderson calls these Class 2 instances)
%      -- instances of class 1 with frequency 1 during training and
%         instances of class 2 with frequency 1 during training that
%         are 1 value away from their respective prototypes (6 instances)
%         (Anderson calls these Class 3 instances)
%      -- instances of class 1 with frequency 1 during training and
%         instances of class 2 with frequency 1 during training that
%         are 2 values away from their respective prototypes (6 instances)
%         (Anderson calls these Class 4 instances)
%  2. Relative classification recognition of them also
%
% Some Expected results:
%    Both frequency and distance from prototype will effect the classification
%    accuracy of instances.  Greater the frequency, higher the classification
%    confidence.  Closer to prototype, higher the classification confidence.
%
% Information about the dataset
% CLASSTYPE: nominal
% CLASSINDEX: last
%

@relation hayes-roth

@attribute hobby INTEGER
@attribute age INTEGER
@attribute educational_level INTEGER
@attribute marital_status INTEGER
@attribute class {1,2,3,4}

@data
1,1,1,2,1
1,1,2,1,1
1,2,1,1,1
1,1,1,3,1
1,1,3,1,1
1,3,1,1,1
1,1,3,3,1
1,3,1,3,1
1,3,3,1,1
1,2,2,1,2
1,2,1,2,2
1,1,2,2,2
1,2,2,3,2
1,2,3,2,2
1,3,2,2,2
1,2,3,3,2
1,3,2,3,2
1,3,3,2,2
1,1,3,2,1
1,3,2,1,2
1,2,1,3,1
1,2,3,1,2
1,1,2,3,1
1,3,1,2,2
1,1,1,1,1
1,2,2,2,2
1,3,3,3,1
1,4,4,4,3