mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-18 17:06:02 +00:00
Commit Inicial
This commit is contained in:
41
data/tanveer/statlog-shuttle/shuttle.doc
Executable file
41
data/tanveer/statlog-shuttle/shuttle.doc
Executable file
@@ -0,0 +1,41 @@
|
||||
Description of SHUTTLE Dataset (STATLOG VERSION)
|
||||
|
||||
|
||||
THIS DATASET SHOULD BE TACKLED BY TRAIN/TEST.
|
||||
|
||||
|
||||
NUMBER OF EXAMPLES
|
||||
training set 43500
|
||||
test set 14500
|
||||
|
||||
NUMBER OF ATTRIBUTES
|
||||
9
|
||||
|
||||
The shuttle dataset contains 9 attributes all of which are numerical.
|
||||
The first one being time. The last column is the class which has been
|
||||
coded as follows :
|
||||
1 Rad Flow
|
||||
2 Fpv Close
|
||||
3 Fpv Open
|
||||
4 High
|
||||
5 Bypass
|
||||
6 Bpv Close
|
||||
7 Bpv Open
|
||||
|
||||
Approximately 80% of the data belongs to class 1. Therefore the default
|
||||
accuracy is about 80%. The aim here is to obtain an accuracy of
|
||||
99 - 99.9%.
|
||||
|
||||
|
||||
Validation set:
|
||||
The examples in the original dataset were in time order, and this time order
|
||||
could presumably be relevant in classification. However, this was not deemed
|
||||
relevant for StatLog purposes, so the order of the examples
|
||||
in the original dataset was randomised, and
|
||||
a portion of the original dataset removed for validation purposes.
|
||||
|
||||
Acknowledgment:
|
||||
Thanks to Jason Catlett of Basser Department of Computer Science,
|
||||
University of Sydney, N.S.W., Australia for providing the shuttle dataset.
|
||||
Thanks also to NASA for allowing us to use the shuttle datasets.
|
||||
|
Reference in New Issue
Block a user