mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-18 00:46:03 +00:00
Commit Inicial
This commit is contained in:
2
data/tanveer/pendigits/conxuntos.dat
Executable file
2
data/tanveer/pendigits/conxuntos.dat
Executable file
File diff suppressed because one or more lines are too long
8
data/tanveer/pendigits/conxuntos_kfold.dat
Executable file
8
data/tanveer/pendigits/conxuntos_kfold.dat
Executable file
File diff suppressed because one or more lines are too long
24
data/tanveer/pendigits/le_datos.m
Executable file
24
data/tanveer/pendigits/le_datos.m
Executable file
@@ -0,0 +1,24 @@
|
||||
printf('lendo problema %s ...\n', problema);
|
||||
|
||||
n_entradas= 16; n_clases= 10;
|
||||
n_fich= 2; fich{1}= 'pendigits.tra'; n_patrons(1)= 7494; fich{2}= 'pendigits.tes'; n_patrons(2)= 3498;
|
||||
|
||||
n_max= max(n_patrons);
|
||||
x = zeros(n_fich, n_max, n_entradas); cl= zeros(n_fich, n_max);
|
||||
|
||||
n_patrons_total = sum(n_patrons); n_iter=0;
|
||||
|
||||
for i_fich=1:n_fich
|
||||
f=fopen(fich{i_fich}, 'r');
|
||||
if -1==f
|
||||
error('erro en fopen abrindo %s\n', fich{i_fich});
|
||||
end
|
||||
for i=1:n_patrons(i_fich)
|
||||
fprintf(2,'%5.1f%%\r', 100*n_iter++/n_patrons_total);
|
||||
for j = 1:n_entradas
|
||||
x(i_fich,i,j) = fscanf(f,'%i',1);
|
||||
end
|
||||
cl(i_fich,i) = fscanf(f,'%i',1); % lectura da clase
|
||||
end
|
||||
fclose(f);
|
||||
end
|
13
data/tanveer/pendigits/pendigits.cost
Executable file
13
data/tanveer/pendigits/pendigits.cost
Executable file
@@ -0,0 +1,13 @@
|
||||
% Rows Columns
|
||||
10 10
|
||||
% Matrix elements
|
||||
0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
|
||||
1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
|
||||
1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
|
||||
1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0
|
||||
1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
|
||||
1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0
|
||||
1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0
|
||||
1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0
|
||||
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0
|
||||
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
|
136
data/tanveer/pendigits/pendigits.names
Executable file
136
data/tanveer/pendigits/pendigits.names
Executable file
@@ -0,0 +1,136 @@
|
||||
1. Title of Database: Pen-Based Recognition of Handwritten Digits
|
||||
|
||||
2. Source:
|
||||
E. Alpaydin, Fevzi. Alimoglu
|
||||
Department of Computer Engineering
|
||||
Bogazici University, 80815 Istanbul Turkey
|
||||
alpaydin@boun.edu.tr
|
||||
July 1998
|
||||
|
||||
3. Past Usage:
|
||||
F. Alimoglu (1996) Combining Multiple Classifiers for Pen-Based
|
||||
Handwritten Digit Recognition,
|
||||
MSc Thesis, Institute of Graduate Studies in Science and
|
||||
Engineering, Bogazici University.
|
||||
http://www.cmpe.boun.edu.tr/~alimoglu/alimoglu.ps.gz
|
||||
|
||||
F. Alimoglu, E. Alpaydin, "Methods of Combining Multiple Classifiers
|
||||
Based on Different Representations for Pen-based Handwriting
|
||||
Recognition," Proceedings of the Fifth Turkish Artificial
|
||||
Intelligence and Artificial Neural Networks Symposium (TAINN 96),
|
||||
June 1996, Istanbul, Turkey.
|
||||
http://www.cmpe.boun.edu.tr/~alimoglu/tainn96.ps.gz
|
||||
|
||||
|
||||
4. Relevant Information:
|
||||
|
||||
We create a digit database by collecting 250 samples from 44 writers.
|
||||
The samples written by 30 writers are used for training,
|
||||
cross-validation and writer dependent testing, and the digits
|
||||
written by the other 14 are used for writer independent testing. This
|
||||
database is also available in the UNIPEN format.
|
||||
|
||||
We use a WACOM PL-100V pressure sensitive tablet with an integrated
|
||||
LCD display and a cordless stylus. The input and display areas are
|
||||
located in the same place. Attached to the serial port of an Intel
|
||||
486 based PC, it allows us to collect handwriting samples. The tablet
|
||||
sends $x$ and $y$ tablet coordinates and pressure level values of the
|
||||
pen at fixed time intervals (sampling rate) of 100 miliseconds.
|
||||
|
||||
These writers are asked to write 250 digits in random order inside
|
||||
boxes of 500 by 500 tablet pixel resolution. Subject are monitored
|
||||
only during the first entry screens. Each screen contains five boxes
|
||||
with the digits to be written displayed above. Subjects are told to
|
||||
write only inside these boxes. If they make a mistake or are unhappy
|
||||
with their writing, they are instructed to clear the content of a box
|
||||
by using an on-screen button. The first ten digits are ignored
|
||||
because most writers are not familiar with this type of input devices,
|
||||
but subjects are not aware of this.
|
||||
|
||||
In our study, we use only ($x, y$) coordinate information. The stylus
|
||||
pressure level values are ignored. First we apply normalization to
|
||||
make our representation invariant to translations and scale
|
||||
distortions. The raw data that we capture from the tablet consist of
|
||||
integer values between 0 and 500 (tablet input box resolution). The
|
||||
new coordinates are such that the coordinate which has the maximum
|
||||
range varies between 0 and 100. Usually $x$ stays in this range, since
|
||||
most characters are taller than they are wide.
|
||||
|
||||
In order to train and test our classifiers, we need to represent
|
||||
digits as constant length feature vectors. A commonly used technique
|
||||
leading to good results is resampling the ( x_t, y_t) points.
|
||||
Temporal resampling (points regularly spaced in time) or spatial
|
||||
resampling (points regularly spaced in arc length) can be used here.
|
||||
Raw point data are already regularly spaced in time but the distance
|
||||
between them is variable. Previous research showed that spatial
|
||||
resampling to obtain a constant number of regularly spaced points
|
||||
on the trajectory yields much better performance, because it provides
|
||||
a better alignment between points. Our resampling algorithm uses
|
||||
simple linear interpolation between pairs of points. The resampled
|
||||
digits are represented as a sequence of T points ( x_t, y_t )_{t=1}^T,
|
||||
regularly spaced in arc length, as opposed to the input sequence,
|
||||
which is regularly spaced in time.
|
||||
|
||||
So, the input vector size is 2*T, two times the number of points
|
||||
resampled. We considered spatial resampling to T=8,12,16 points in our
|
||||
experiments and found that T=8 gave the best trade-off between
|
||||
accuracy and complexity.
|
||||
|
||||
|
||||
5. Number of Instances
|
||||
pendigits.tra Training 7494
|
||||
pendigits.tes Testing 3498
|
||||
|
||||
The way we used the dataset was to use first half of training for
|
||||
actual training, one-fourth for validation and one-fourth
|
||||
for writer-dependent testing. The test set was used for
|
||||
writer-independent testing and is the actual quality measure.
|
||||
|
||||
6. Number of Attributes
|
||||
16 input+1 class attribute
|
||||
|
||||
7. For Each Attribute:
|
||||
All input attributes are integers in the range 0..100.
|
||||
The last attribute is the class code 0..9
|
||||
|
||||
8. Missing Attribute Values
|
||||
None
|
||||
|
||||
9. Class Distribution
|
||||
Class: No of examples in training set
|
||||
0: 780
|
||||
1: 779
|
||||
2: 780
|
||||
3: 719
|
||||
4: 780
|
||||
5: 720
|
||||
6: 720
|
||||
7: 778
|
||||
8: 719
|
||||
9: 719
|
||||
Class: No of examples in testing set
|
||||
0: 363
|
||||
1: 364
|
||||
2: 364
|
||||
3: 336
|
||||
4: 364
|
||||
5: 335
|
||||
6: 336
|
||||
7: 364
|
||||
8: 336
|
||||
9: 336
|
||||
|
||||
Accuracy on the testing set with k-nn
|
||||
using Euclidean distance as the metric
|
||||
|
||||
k = 1 : 97.74
|
||||
k = 2 : 97.37
|
||||
k = 3 : 97.80
|
||||
k = 4 : 97.66
|
||||
k = 5 : 97.60
|
||||
k = 6 : 97.57
|
||||
k = 7 : 97.54
|
||||
k = 8 : 97.54
|
||||
k = 9 : 97.46
|
||||
k = 10 : 97.48
|
||||
k = 11 : 97.34
|
3498
data/tanveer/pendigits/pendigits.tes
Executable file
3498
data/tanveer/pendigits/pendigits.tes
Executable file
File diff suppressed because it is too large
Load Diff
7494
data/tanveer/pendigits/pendigits.tra
Executable file
7494
data/tanveer/pendigits/pendigits.tra
Executable file
File diff suppressed because it is too large
Load Diff
10
data/tanveer/pendigits/pendigits.txt
Executable file
10
data/tanveer/pendigits/pendigits.txt
Executable file
@@ -0,0 +1,10 @@
|
||||
n_entradas= 16
|
||||
n_clases= 10
|
||||
n_arquivos= 2
|
||||
fich1= pendigits_train_R.dat
|
||||
n_patrons1= 7494
|
||||
fich2= pendigits_test_R.dat
|
||||
n_patrons2= 3498
|
||||
n_patrons_entrena= 3747
|
||||
n_patrons_valida= 3747
|
||||
n_conxuntos= 1
|
3517
data/tanveer/pendigits/pendigits_test.arff
Executable file
3517
data/tanveer/pendigits/pendigits_test.arff
Executable file
File diff suppressed because it is too large
Load Diff
3499
data/tanveer/pendigits/pendigits_test_R.dat
Executable file
3499
data/tanveer/pendigits/pendigits_test_R.dat
Executable file
File diff suppressed because it is too large
Load Diff
7513
data/tanveer/pendigits/pendigits_train.arff
Executable file
7513
data/tanveer/pendigits/pendigits_train.arff
Executable file
File diff suppressed because it is too large
Load Diff
7495
data/tanveer/pendigits/pendigits_train_R.dat
Executable file
7495
data/tanveer/pendigits/pendigits_train_R.dat
Executable file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user