mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-17 16:36:02 +00:00
65 lines
2.2 KiB
Plaintext
Executable File
65 lines
2.2 KiB
Plaintext
Executable File
1. Title: Hill-Valley Dataset
|
|
|
|
2. Source Information
|
|
|
|
a) Creators:
|
|
|
|
Lee Graham (lee@stellaralchemy.com)
|
|
|
|
Franz Oppacher (oppacher@scs.carleton.ca)
|
|
Carleton University, Department of Computer Science
|
|
Intelligent Systems Research Unit
|
|
1125 Colonel By Drive, Ottawa, Ontario, Canada, K1S5B6
|
|
|
|
c) Date of release: March 2008
|
|
|
|
3. Past Usage:
|
|
|
|
(a) Non-published. Evaluation of dataset by various learning algorithms in the Waikato Environment for Knowledge Analysis (WEKA).
|
|
|
|
|
|
4. Relevant Information:
|
|
|
|
Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the points will create either a Hill (a “bump” in the terrain) or a Valley (a “dip” in the terrain).
|
|
|
|
There are six files, as follows:
|
|
|
|
(a) Hill_Valley_without_noise_Training.data
|
|
(b) Hill_Valley_without_noise_Testing.data
|
|
|
|
These first two datasets (without noise) are a training/testing set pair where the hills or valleys have a smooth transition.
|
|
|
|
(c) Hill_Valley_with_noise_Training.data
|
|
(d) Hill_Valley_with_noise_Testing.data
|
|
|
|
These next two datasets (with noise) are a training/testing set pair where the terrain is uneven, and the hill or valley is not as obvious when viewed closely.
|
|
|
|
(e) Hill_Valley_sample_arff.text
|
|
|
|
The sample ARFF file is useful for setting up experiments, but is not necessary.
|
|
|
|
(f) Hill_Valley_visual_examples.jpg
|
|
|
|
This graphic file shows two example instances from the data.
|
|
|
|
|
|
5. Number of Instances: 606 for each training and testing set
|
|
|
|
6. Number of Attributes: 100 predictive attributes, 1 goal attribute
|
|
|
|
7. Attribute Information:
|
|
1-100: Labeled “X##”. Floating point values (numeric)
|
|
101: Labeled “class”. Binary {0, 1} representing {valley, hill}
|
|
|
|
8. Missing Attribute Values: None
|
|
|
|
There is no class noise. The “noisy” datasets are named as such because it more accurately represents the terrain.
|
|
|
|
9. Class Distribution:
|
|
|
|
Hill_Valley_with_noise_Training.data (307 / 299)
|
|
Hill_Valley_with_noise_Testing.data (299 / 307)
|
|
|
|
Hill_Valley_without_noise_Training.data (305 / 301)
|
|
Hill_Valley_without_noise_Testing.data (295 / 311)
|