mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-17 16:36:02 +00:00
42 lines
1.2 KiB
Plaintext
Executable File
42 lines
1.2 KiB
Plaintext
Executable File
Description of SHUTTLE Dataset (STATLOG VERSION)
|
|
|
|
|
|
THIS DATASET SHOULD BE TACKLED BY TRAIN/TEST.
|
|
|
|
|
|
NUMBER OF EXAMPLES
|
|
training set 43500
|
|
test set 14500
|
|
|
|
NUMBER OF ATTRIBUTES
|
|
9
|
|
|
|
The shuttle dataset contains 9 attributes all of which are numerical.
|
|
The first one being time. The last column is the class which has been
|
|
coded as follows :
|
|
1 Rad Flow
|
|
2 Fpv Close
|
|
3 Fpv Open
|
|
4 High
|
|
5 Bypass
|
|
6 Bpv Close
|
|
7 Bpv Open
|
|
|
|
Approximately 80% of the data belongs to class 1. Therefore the default
|
|
accuracy is about 80%. The aim here is to obtain an accuracy of
|
|
99 - 99.9%.
|
|
|
|
|
|
Validation set:
|
|
The examples in the original dataset were in time order, and this time order
|
|
could presumably be relevant in classification. However, this was not deemed
|
|
relevant for StatLog purposes, so the order of the examples
|
|
in the original dataset was randomised, and
|
|
a portion of the original dataset removed for validation purposes.
|
|
|
|
Acknowledgment:
|
|
Thanks to Jason Catlett of Basser Department of Computer Science,
|
|
University of Sydney, N.S.W., Australia for providing the shuttle dataset.
|
|
Thanks also to NASA for allowing us to use the shuttle datasets.
|
|
|