mirror of
https://github.com/Doctorado-ML/Stree_datasets.git
synced 2025-08-18 00:46:03 +00:00
Commit Inicial
This commit is contained in:
88
data/tanveer/ecoli/ecoli.names
Executable file
88
data/tanveer/ecoli/ecoli.names
Executable file
@@ -0,0 +1,88 @@
|
||||
1. Title: Protein Localization Sites
|
||||
|
||||
|
||||
2. Creator and Maintainer:
|
||||
Kenta Nakai
|
||||
Institue of Molecular and Cellular Biology
|
||||
Osaka, University
|
||||
1-3 Yamada-oka, Suita 565 Japan
|
||||
nakai@imcb.osaka-u.ac.jp
|
||||
http://www.imcb.osaka-u.ac.jp/nakai/psort.html
|
||||
Donor: Paul Horton (paulh@cs.berkeley.edu)
|
||||
Date: September, 1996
|
||||
See also: yeast database
|
||||
|
||||
3. Past Usage.
|
||||
Reference: "A Probablistic Classification System for Predicting the Cellular
|
||||
Localization Sites of Proteins", Paul Horton & Kenta Nakai,
|
||||
Intelligent Systems in Molecular Biology, 109-115.
|
||||
St. Louis, USA 1996.
|
||||
Results: 81% for E.coli with an ad hoc structured
|
||||
probability model. Also similar accuracy for Binary Decision Tree and
|
||||
Bayesian Classifier methods applied by the same authors in
|
||||
unpublished results.
|
||||
|
||||
Predicted Attribute: Localization site of protein. ( non-numeric ).
|
||||
|
||||
|
||||
4. The references below describe a predecessor to this dataset and its
|
||||
development. They also give results (not cross-validated) for classification
|
||||
by a rule-based expert system with that version of the dataset.
|
||||
|
||||
Reference: "Expert Sytem for Predicting Protein Localization Sites in
|
||||
Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa,
|
||||
PROTEINS: Structure, Function, and Genetics 11:95-110, 1991.
|
||||
|
||||
Reference: "A Knowledge Base for Predicting Protein Localization Sites in
|
||||
Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa,
|
||||
Genomics 14:897-911, 1992.
|
||||
|
||||
|
||||
5. Number of Instances: 336 for the E.coli dataset and
|
||||
|
||||
|
||||
6. Number of Attributes.
|
||||
for E.coli dataset: 8 ( 7 predictive, 1 name )
|
||||
|
||||
|
||||
7. Attribute Information.
|
||||
|
||||
1. Sequence Name: Accession number for the SWISS-PROT database
|
||||
2. mcg: McGeoch's method for signal sequence recognition.
|
||||
3. gvh: von Heijne's method for signal sequence recognition.
|
||||
4. lip: von Heijne's Signal Peptidase II consensus sequence score.
|
||||
Binary attribute.
|
||||
5. chg: Presence of charge on N-terminus of predicted lipoproteins.
|
||||
Binary attribute.
|
||||
6. aac: score of discriminant analysis of the amino acid content of
|
||||
outer membrane and periplasmic proteins.
|
||||
7. alm1: score of the ALOM membrane spanning region prediction program.
|
||||
8. alm2: score of ALOM program after excluding putative cleavable signal
|
||||
regions from the sequence.
|
||||
|
||||
|
||||
|
||||
8. Missing Attribute Values: None.
|
||||
|
||||
|
||||
9. Class Distribution. The class is the localization site. Please see Nakai &
|
||||
Kanehisa referenced above for more details.
|
||||
|
||||
cp (cytoplasm) 143
|
||||
im (inner membrane without signal sequence) 77
|
||||
pp (perisplasm) 52
|
||||
imU (inner membrane, uncleavable signal sequence) 35
|
||||
om (outer membrane) 20
|
||||
omL (outer membrane lipoprotein) 5
|
||||
imL (inner membrane lipoprotein) 2
|
||||
imS (inner membrane, cleavable signal sequence) 2
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user