120 Commits

Author SHA1 Message Date
fea46834c8 Update bayesclass models 2022-11-24 00:20:29 +01:00
a94a33e028 Update actions 2022-11-23 22:33:22 +01:00
b05a62b2e8 Update requirements and github actions 2022-11-23 22:21:34 +01:00
2baaf753ef Add terminal support to debug github action 2022-11-23 12:58:00 +01:00
b01ee40df2 Update main.yml 2022-11-23 09:43:51 +01:00
ed308773ee Update main.yml 2022-11-23 09:34:43 +01:00
0782736338 Update tests be_init_project_tests 2022-11-23 01:31:01 +01:00
71a11110bd Update tests 2022-11-22 23:32:28 +01:00
3a2ec38671 Update be_list to new formats 2022-11-22 17:38:11 +01:00
f60d9365dd Refactor be_report and fix error in datasets 2022-11-22 16:47:03 +01:00
5d7ed6f1ed Fix be_list Results error 2022-11-22 16:26:24 +01:00
8aa76c27c3 Refactor Datasets 2022-11-22 16:26:04 +01:00
93f0db36fa Fix stratified default value from .env 2022-11-22 01:47:12 +01:00
4e0be95a00 Refactor be_list 2022-11-21 20:22:59 +01:00
e76366561c Add be_init_project to scripts 2022-11-21 00:07:29 +01:00
7e9bd7ae4a Begin refactor be_list arguments 2022-11-20 20:17:58 +01:00
3ade3f4022 Add incompatible hyparams to be_main 2022-11-20 19:10:28 +01:00
1b8a424ad3 Add subparser to be_report & tests 2022-11-20 18:23:26 +01:00
146304f4b5 Refactor Arguments to be child of ArgumentParser 2022-11-19 21:25:50 +01:00
07172b91c5 Add overrides to args parse for dataset/title in be_main 2022-11-19 21:16:29 +01:00
Ricardo Montañana Gómez
68d9cb776e Merge pull request #7 from Doctorado-ML:add_excel_belist
Add excel output of reports of be_list
2022-11-18 23:37:17 +01:00
c8124be119 Update version info 2022-11-18 23:36:43 +01:00
58c52849d8 Add AODE to models 2022-11-18 23:33:41 +01:00
d68fb47688 Remove extra space in report header 2022-11-17 13:42:27 +01:00
38667d61f7 Refactor be_list 2022-11-17 12:09:02 +01:00
dfd4f8179b Complete tests adding excel to be_list 2022-11-17 12:00:30 +01:00
8a9342c97b Add space to time column in report 2022-11-17 09:41:17 +01:00
974227166c Add excel to be_list 2022-11-17 01:36:19 +01:00
feea9c542a Add KDB model 2022-11-15 22:06:04 +01:00
a53e957c00 fix stochastic error in discretization 2022-11-14 21:51:53 +01:00
a2db4f1f6d Fix lint error in test 2022-11-14 17:27:18 +01:00
5a3ae6f440 Update version info and tests 2022-11-14 00:54:18 +01:00
Ricardo Montañana Gómez
8d06a2c5f6 Merge pull request #6 from Doctorado-ML/language_version
Add Discretizer to Datasets
Add excel to report datasets
Add report datasets sheet to benchmark excel
2022-11-13 22:51:50 +01:00
9039a634cf Exclude macos-latest with python 3.11 (no torch) 2022-11-13 22:14:01 +01:00
5b5d385b4c Fix uppercase mistake in filename 2022-11-13 20:04:26 +01:00
6ebcc31c36 Add bayesclass to requirements 2022-11-13 18:34:54 +01:00
cd2d803ff5 Update requirements 2022-11-13 18:10:42 +01:00
6aec5b2a97 Add tests to excel in report datasets 2022-11-13 17:44:45 +01:00
f1b9dc1fef Add excel to report dataset 2022-11-13 14:46:41 +01:00
2e6f49de8e Add discretize key to .env.dist 2022-11-12 19:38:14 +01:00
2d61cd11c2 refactor Discretization in datasets 2022-11-12 19:37:46 +01:00
4b442a46f2 Add Discretizer to Datasets 2022-11-10 11:47:01 +01:00
feaf85d0b8 Add Dataset load return a pandas dataframe 2022-11-04 18:40:50 +01:00
c62b06f263 Update Readme 2022-11-01 22:30:42 +01:00
Ricardo Montañana Gómez
b9eaa534bc Merge pull request #5 from Doctorado-ML/language_version
Disable sonar quality gate in CI
2022-11-01 21:24:12 +01:00
0d87e670f7 Disable sonar quality gate in CI
Update base score for Arff STree
2022-11-01 16:53:22 +01:00
Ricardo Montañana Gómez
c77feff54b Merge pull request #4 from Doctorado-ML/language_version
Add Language and language version to reports
Add custom seeds to .env
2022-11-01 14:07:59 +01:00
1e83db7956 Fix lint errors and update version info 2022-11-01 13:22:53 +01:00
8cf823e843 Add custom seeds to .env 2022-11-01 12:24:50 +01:00
97718e6e82 Add Language and language version to reports 2022-11-01 02:07:24 +01:00
Ricardo Montañana Gómez
5532beb88a Merge pull request #3 from Doctorado-ML/discretiz
Add Arff data source for experiments
Add consistent comparative results to reports
2022-10-25 16:55:04 +02:00
db61911ca6 Fix CI error 2022-10-25 15:20:12 +02:00
b24a508d1c Add consistent comparative results to reports 2022-10-25 15:01:18 +02:00
29c4b4ceef Update E203 in main.yml
Create tests
2022-10-25 11:36:04 +02:00
2362f66c7a Add nan manage to arff datasets 2022-10-25 00:56:37 +02:00
8001c7f2eb Add a space to #Samples in every report 2022-10-24 22:43:46 +02:00
47bf6eeda6 Add a space to #Samples in dataset report 2022-10-24 21:30:56 +02:00
34b3bd94de Add Arff as source_data for datasets 2022-10-24 21:04:07 +02:00
7875e2e6ac Merge branch 'main' into discretiz 2022-10-24 19:06:52 +02:00
34b25756ea Fix error in tests with STree fixed version 2022-10-24 19:05:13 +02:00
e15ab3dcab Split Datasets class from Experiments 2022-10-24 18:21:08 +02:00
12024df4d8 syntax Issue in gh actions build 2022-05-18 22:10:32 +02:00
9ace64832a Add sonar build github action 2022-05-18 19:07:25 +02:00
9b78c1a73e Set English, if needed, as default language for R 2022-05-12 12:10:13 +02:00
29d17a4072 Set english as default language for R 2022-05-12 11:27:49 +02:00
81e8bbfebb Update codecov badge 2022-05-11 22:58:49 +02:00
58199262c6 Change message language to R script 2022-05-11 22:33:01 +02:00
f254ea77a7 Add GITHUB_PAT env variable 2022-05-11 18:51:52 +02:00
3d12f458e7 try with remotes 2022-05-11 18:38:52 +02:00
a99f8e6916 update main.yml 2022-05-11 18:29:52 +02:00
8c4a5ebae5 Remove codeql and add R env 2022-05-11 18:14:37 +02:00
3c28fa242e Fix issue in ci 2022-05-11 17:16:53 +02:00
1cb916867c Add package to requirements 2022-05-11 17:12:53 +02:00
65a810d60e remove windows from platforms in ci 2022-05-11 16:25:41 +02:00
39c93e8957 fix python version issue in ci 2022-05-11 16:19:12 +02:00
5bf31b1304 debug github ci 2022-05-11 16:16:22 +02:00
e69c8fea59 remove uneeded requirement from tests 2022-05-11 12:56:52 +02:00
302a6d536b update readme 2022-05-11 12:32:06 +02:00
d77e9737fe update ci workflow 2022-05-11 12:24:55 +02:00
c7768ad387 Add github ci and badges
refactor setup
2022-05-11 12:21:55 +02:00
d826a65300 Fix be_print_strees_test 2022-05-10 14:16:51 +02:00
aebf301b29 Fix be_print_strees_test 2022-05-10 14:14:46 +02:00
e16dde713c Fix issue in be_print_strees_test 2022-05-10 12:42:01 +02:00
a649efde73 Fix be_print_strees issues 2022-05-09 16:27:37 +02:00
e45ef1c9fa Add file not found manage to be_report 2022-05-09 12:02:33 +02:00
7501ce7761 Enhance error msgs in be_main 2022-05-09 11:37:53 +02:00
ca96d05124 Complete be_print_strees 2022-05-09 01:34:25 +02:00
b0c94d4983 Begin print_strees_test 2022-05-09 01:00:51 +02:00
534f32b625 Begin print_strees_test 2022-05-09 00:30:33 +02:00
b3bc2fbd2f Complete be_main tests 2022-05-09 00:23:18 +02:00
09b2ede836 refactor remove iwss from results 2022-05-08 22:50:09 +02:00
4a5225d3dc refactor remove iwss from results 2022-05-08 22:49:50 +02:00
80eb9f1db7 Begin be_main tests 2022-05-08 19:59:53 +02:00
e58901a307 Complete be_benchmark tests 2022-05-08 18:14:55 +02:00
bb4769de43 Continue benchmark tests 2022-05-08 17:19:35 +02:00
1db5d8723a Add no .env exception 2022-05-08 16:51:20 +02:00
2c8646c8d8 Begin be_benchmark test 2022-05-08 16:06:14 +02:00
8457c9b531 Compte be_best & be_build_best tests 2022-05-08 02:03:22 +02:00
4fe1e10488 Complete be_grid tests 2022-05-08 01:31:03 +02:00
5c4d5cb99e Continue be_grid tests 2022-05-08 00:12:52 +02:00
986341723c Continue be_grid tests 2022-05-07 23:33:35 +02:00
af95e9c6bc Begin be_grid tests 2022-05-07 23:28:38 +02:00
b8c4e30714 Remove be_td and be_repair 2022-05-07 19:58:20 +02:00
50d0464702 Complete be_list tests 2022-05-07 19:37:36 +02:00
fe0daf6204 Refactor fake_out variable 2022-05-07 19:14:44 +02:00
fb324ad7ad Fix lint issues 2022-05-07 18:48:45 +02:00
40814c6f1f Add be_summary tests 2022-05-07 18:35:04 +02:00
31059ea117 Continue script testing 2022-05-07 02:08:11 +02:00
df757fefcd Refactor testing 2022-05-07 01:33:35 +02:00
3b214773ff Refactor scripts testing 2022-05-06 23:05:43 +02:00
bb0821c56e Begin script testing 2022-05-06 19:35:14 +02:00
3009167813 Add some tests 2022-05-06 17:15:24 +02:00
d87c7064a9 Fix test issues 2022-05-06 11:15:29 +02:00
3056bb649a Fix score names 2022-05-06 10:58:18 +02:00
47749cea94 Add color to summary and fix some issues 2022-05-05 23:37:13 +02:00
1cefc51870 Add be_build_grid and fix some scripts issues 2022-05-05 20:19:50 +02:00
5bcd4beca9 Fix be_list no data 2022-05-05 13:20:10 +02:00
4c7110214b Fix some issues 2022-05-05 13:11:39 +02:00
Ricardo Montañana Gómez
81ecec8846 Merge pull request #2 from Doctorado-ML/refactor_arguments
Refactor arguments
2022-05-05 00:03:04 +02:00
9d5d9ebd13 add nan handling to excel files 2022-05-04 11:37:57 +02:00
130 changed files with 4918 additions and 746 deletions

View File

@@ -4,3 +4,5 @@ n_folds=5
model=ODTE
stratified=0
source_data=Tanveer
seeds=[57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]
discretize=0

View File

@@ -1,2 +1,3 @@
[flake8]
exclude = .git,__init__.py
ignore = E203, W503

29
.github/workflows/build.yml vendored Normal file
View File

@@ -0,0 +1,29 @@
name: Build
on:
push:
branches:
- main
jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- run: echo "project_version=$(git describe --tags --abbrev=0)" >> $GITHUB_ENV
- uses: sonarsource/sonarqube-scan-action@master
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
with:
args: >
-Dsonar.projectVersion=${{ env.project_version }}
-Dsonar.python.version=3.10
# If you wish to fail your job when the Quality Gate is red, uncomment the
# following lines. This would typically be used to fail a deployment.
#- uses: sonarsource/sonarqube-quality-gate-action@master
# timeout-minutes: 5
# env:
# SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
# SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}

59
.github/workflows/main.yml vendored Normal file
View File

@@ -0,0 +1,59 @@
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python: ["3.10"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
# Make dot command available in the environment
- name: Setup Graphviz
uses: ts-graphviz/setup-graphviz@v1
- uses: r-lib/actions/setup-r@v2
- name: Install R dependencies
env:
GITHUB_PAT: ${{ secrets.PAT_TOKEN }}
run: |
install.packages("remotes")
remotes::install_github("jacintoarias/exreport")
shell: Rscript {0}
# Allows install Wodt in dependencies.
- uses: webfactory/ssh-agent@v0.5.4
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
# - name: Setup tmate session
# uses: mxschmitt/action-tmate@v3
- name: Install dependencies
run: |
pip install -q --upgrade pip
pip install -q -r requirements.txt
pip install -q --upgrade codecov coverage black flake8
git clone https://github.com/Doctorado-ML/bayesclass.git
- name: Lint
run: |
black --check --diff benchmark
flake8 --count benchmark --ignore=E203,W503
- name: Tests
run: |
coverage run -m unittest -v benchmark.tests
coverage xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml

1
.gitignore vendored
View File

@@ -134,3 +134,4 @@ Rplots.pdf
.vscode
.RData
.Rhistory
.pre-commit-config.yaml

View File

@@ -1,3 +1,9 @@
[![CI](https://github.com/Doctorado-ML/benchmark/actions/workflows/main.yml/badge.svg)](https://github.com/Doctorado-ML/benchmark/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/Doctorado-ML/benchmark/branch/main/graph/badge.svg?token=ZRP937NDSG)](https://codecov.io/gh/Doctorado-ML/benchmark)
[![Quality Gate Status](https://haystack.rmontanana.es:25000/api/project_badges/measure?project=benchmark&metric=alert_status&token=336a6e501988888543c3153baa91bad4b9914dd2)](https://haystack.rmontanana.es:25000/dashboard?id=benchmark)
[![Technical Debt](https://haystack.rmontanana.es:25000/api/project_badges/measure?project=benchmark&metric=sqale_index&token=336a6e501988888543c3153baa91bad4b9914dd2)](https://haystack.rmontanana.es:25000/dashboard?id=benchmark)
![https://img.shields.io/badge/python-3.8%2B-blue](https://img.shields.io/badge/python-3.8%2B-brightgreen)
# benchmark
Benchmarking models
@@ -6,53 +12,55 @@ Benchmarking models
```python
# 5 Fold 10 seeds with STree with default hyperparameters and report
python src/main.py -m STree -P iMac27 -r 1
be_main -m STree -P iMac27 -r 1
# Setting number of folds, in this case 7
python src/main.py -m STree -P iMac27 -n 7
be_main -m STree -P iMac27 -n 7
# 5 Fold 10 seeds with STree and best results hyperparams
python src/main.py -m STree -P iMac27 -f 1
be_main -m STree -P iMac27 -f 1
# 5 Fold 10 seeds with STree and same hyperparameters
python src/main.py -m STree -P iMac27 -p '{"kernel": "rbf", "gamma": 0.1}'
be_main -m STree -P iMac27 -p '{"kernel": "rbf", "gamma": 0.1}'
```
## Best Results
```python
# Build best results of STree model and print report
python src/build_best.py -m STree -r 1
be_build_best -m STree -r 1
# Report of STree best results
python src/report.py -b STree
be_report -b STree
```
## Reports
```python
# Datasets list
python src/report.py
be_report
# Report of given experiment
python src/report.py -f results/results_STree_iMac27_2021-09-22_17:13:02.json
be_report -f results/results_STree_iMac27_2021-09-22_17:13:02.json
# Report of given experiment building excel file and compare with best results
python src/report.py -f results/results_STree_iMac27_2021-09-22_17:13:02.json -x 1 -c 1
be_report -f results/results_STree_iMac27_2021-09-22_17:13:02.json -x 1 -c 1
# Report of given experiment building sql file
python src/report.py -f results/results_STree_iMac27_2021-09-22_17:13:02.json -q 1
be_report -f results/results_STree_iMac27_2021-09-22_17:13:02.json -q 1
```
## Benchmark
```python
# Do benchmark and print report
python src/benchmark.py
be_benchmark
# Do benchmark, print report and build excel file with data
python src/benchmark.py -x 1
be_benchmark -x 1
# Do benchmark, print report and build tex table with results
be_benchmark -t 1
```
## List
```python
# List of results of given model
python src/list.py -m ODTE
be_list -m ODTE
# List of results of given model and score
python src/list.py -m STree -s f1-macro
be_list -m STree -s f1-macro
# List all results
python src/list.py
be_list
```

View File

@@ -1,6 +1,7 @@
import sys
import argparse
from .Experiments import Models
from .Utils import Files
from .Models import Models
from .Utils import Files, NO_ENV
ALL_METRICS = (
"accuracy",
@@ -15,19 +16,28 @@ class EnvData:
@staticmethod
def load():
args = {}
try:
with open(Files.dot_env) as f:
for line in f.read().splitlines():
if line == "" or line.startswith("#"):
continue
key, value = line.split("=")
args[key] = value
except FileNotFoundError:
print(NO_ENV, file=sys.stderr)
exit(1)
else:
return args
class EnvDefault(argparse.Action):
# Thanks to https://stackoverflow.com/users/445507/russell-heilling
def __init__(self, envvar, required=True, default=None, **kwargs):
def __init__(
self, envvar, required=True, default=None, mandatory=False, **kwargs
):
self._args = EnvData.load()
self._overrides = {}
if required and not mandatory:
default = self._args[envvar]
required = False
super(EnvDefault, self).__init__(
@@ -38,24 +48,27 @@ class EnvDefault(argparse.Action):
setattr(namespace, self.dest, values)
class Arguments:
def __init__(self):
self.ap = argparse.ArgumentParser()
class Arguments(argparse.ArgumentParser):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
models_data = Models.define_models(random_state=0)
self._overrides = {}
self._subparser = None
self.parameters = {
"best": [
("-b", "--best"),
"best_paramfile": [
("-b", "--best_paramfile"),
{
"type": str,
"action": "store_true",
"required": False,
"help": "best results of models",
"default": False,
"help": "Use best hyperparams file?",
},
],
"color": [
("-c", "--color"),
{
"type": bool,
"required": False,
"action": "store_true",
"default": False,
"help": "use colors for the tree",
},
@@ -63,8 +76,9 @@ class Arguments:
"compare": [
("-c", "--compare"),
{
"type": bool,
"action": "store_true",
"required": False,
"default": False,
"help": "Compare accuracy with best results",
},
],
@@ -72,6 +86,8 @@ class Arguments:
("-d", "--dataset"),
{
"type": str,
"envvar": "dataset", # for compatiblity with EnvDefault
"action": EnvDefault,
"required": False,
"help": "dataset to work with",
},
@@ -79,38 +95,26 @@ class Arguments:
"excel": [
("-x", "--excel"),
{
"type": bool,
"required": False,
"action": "store_true",
"default": False,
"help": "Generate Excel File",
},
],
"file": [
("-f", "--file"),
{"type": str, "required": False, "help": "Result file"},
],
"grid": [
("-g", "--grid"),
{
"type": str,
"required": False,
"help": "grid results of model",
},
],
"grid_paramfile": [
("-g", "--grid_paramfile"),
{
"type": bool,
"required": False,
"action": "store_true",
"default": False,
"help": "Use best hyperparams file?",
"help": "Use grid output hyperparams file?",
},
],
"hidden": [
("--hidden",),
{
"type": str,
"required": False,
"action": "store_true",
"default": False,
"help": "Show hidden results",
},
@@ -131,8 +135,8 @@ class Arguments:
"lose": [
("-l", "--lose"),
{
"type": bool,
"default": False,
"action": "store_true",
"required": False,
"help": "show lose results",
},
@@ -154,8 +158,6 @@ class Arguments:
"type": str,
"required": True,
"choices": list(models_data),
"action": EnvDefault,
"envvar": "model",
"help": "model name",
},
],
@@ -165,17 +167,16 @@ class Arguments:
"type": str,
"required": True,
"choices": list(models_data),
"action": EnvDefault,
"envvar": "model",
"help": "model name",
},
],
"nan": [
("--nan",),
{
"type": bool,
"action": "store_true",
"required": False,
"help": "Move nan results to hidden folder",
"default": False,
"help": "List nan results to hidden folder",
},
],
"number": [
@@ -197,15 +198,6 @@ class Arguments:
"help": "number of folds",
},
],
"paramfile": [
("-f", "--paramfile"),
{
"type": bool,
"required": False,
"default": False,
"help": "Use best hyperparams file?",
},
],
"platform": [
("-P", "--platform"),
{
@@ -219,7 +211,7 @@ class Arguments:
"quiet": [
("-q", "--quiet"),
{
"type": bool,
"action": "store_true",
"required": False,
"default": False,
},
@@ -227,7 +219,7 @@ class Arguments:
"report": [
("-r", "--report"),
{
"type": bool,
"action": "store_true",
"default": False,
"required": False,
"help": "Report results",
@@ -245,14 +237,18 @@ class Arguments:
],
"sql": [
("-q", "--sql"),
{"type": bool, "required": False, "help": "Generate SQL File"},
{
"required": False,
"action": "store_true",
"default": False,
"help": "Generate SQL File",
},
],
"stratified": [
("-t", "--stratified"),
{
"action": EnvDefault,
"envvar": "stratified",
"type": str,
"required": True,
"help": "Stratified",
},
@@ -260,8 +256,8 @@ class Arguments:
"tex_output": [
("-t", "--tex-output"),
{
"type": bool,
"required": False,
"action": "store_true",
"default": False,
"help": "Generate Tex file with the table",
},
@@ -273,8 +269,8 @@ class Arguments:
"win": [
("-w", "--win"),
{
"type": bool,
"default": False,
"action": "store_true",
"required": False,
"help": "show win results",
},
@@ -282,12 +278,43 @@ class Arguments:
}
def xset(self, *arg_name, **kwargs):
names, default = self.parameters[arg_name[0]]
self.ap.add_argument(
names, parameters = self.parameters[arg_name[0]]
if "overrides" in kwargs:
self._overrides[names[0]] = (kwargs["overrides"], kwargs["const"])
del kwargs["overrides"]
self.add_argument(
*names,
**{**default, **kwargs},
**{**parameters, **kwargs},
)
return self
def parse(self):
return self.ap.parse_args()
def add_subparser(
self, dest="subcommand", help_text="help for subcommand"
):
self._subparser = self.add_subparsers(dest=dest, help=help_text)
def add_subparsers_options(self, subparser, arguments):
command, help_text = subparser
parser = self._subparser.add_parser(command, help=help_text)
for name, args in arguments:
try:
names, parameters = self.parameters[name]
except KeyError:
names = (name,)
parameters = {}
# Order of args is important
parser.add_argument(*names, **{**args, **parameters})
def add_exclusive(self, hyperparameters, required=False):
group = self.add_mutually_exclusive_group(required=required)
for name in hyperparameters:
names, parameters = self.parameters[name]
group.add_argument(*names, **parameters)
def parse(self, args=None):
for key, (dest_key, value) in self._overrides.items():
if args is None:
args = sys.argv[1:]
if key in args:
args.extend((f"--{dest_key}", value))
return super().parse_args(args)

196
benchmark/Datasets.py Normal file
View File

@@ -0,0 +1,196 @@
import os
from types import SimpleNamespace
import pandas as pd
import numpy as np
from scipy.io import arff
from .Utils import Files
from .Arguments import EnvData
from mdlp.discretization import MDLP
class Diterator:
def __init__(self, data):
self._stack = data.copy()
def __next__(self):
if len(self._stack) == 0:
raise StopIteration()
return self._stack.pop(0)
class DatasetsArff:
@staticmethod
def dataset_names(name):
return f"{name}.arff"
@staticmethod
def folder():
return "datasets"
def load(self, name, class_name):
file_name = os.path.join(self.folder(), self.dataset_names(name))
data = arff.loadarff(file_name)
df = pd.DataFrame(data[0])
df.dropna(axis=0, how="any", inplace=True)
self.dataset = df
X = df.drop(class_name, axis=1)
self.features = X.columns
self.class_name = class_name
y, _ = pd.factorize(df[class_name])
X = X.to_numpy()
return X, y
class DatasetsTanveer:
@staticmethod
def dataset_names(name):
return f"{name}_R.dat"
@staticmethod
def folder():
return "data"
def load(self, name, *args):
file_name = os.path.join(self.folder(), self.dataset_names(name))
data = pd.read_csv(
file_name,
sep="\t",
index_col=0,
)
X = data.drop("clase", axis=1)
self.features = X.columns
X = X.to_numpy()
y = data["clase"].to_numpy()
self.dataset = data
self.class_name = "clase"
return X, y
class DatasetsSurcov:
@staticmethod
def dataset_names(name):
return f"{name}.csv"
@staticmethod
def folder():
return "datasets"
def load(self, name, *args):
file_name = os.path.join(self.folder(), self.dataset_names(name))
data = pd.read_csv(
file_name,
index_col=0,
)
data.dropna(axis=0, how="any", inplace=True)
self.columns = data.columns
X = data.drop(["class"], axis=1)
self.features = X.columns
self.class_name = "class"
self.dataset = data
X = X.to_numpy()
y = data["class"].to_numpy()
return X, y
class Datasets:
def __init__(self, dataset_name=None):
envData = EnvData.load()
# DatasetsSurcov, DatasetsTanveer, DatasetsArff,...
source_name = getattr(
__import__(__name__),
f"Datasets{envData['source_data']}",
)
self.discretize = envData["discretize"] == "1"
self.dataset = source_name()
self.class_names = []
self.data_sets = []
# initialize self.class_names & self.data_sets
class_names, sets = self._init_names(dataset_name)
self.class_names = class_names
self.data_sets = sets
def _init_names(self, dataset_name):
file_name = os.path.join(self.dataset.folder(), Files.index)
default_class = "class"
with open(file_name) as f:
sets = f.read().splitlines()
class_names = [default_class] * len(sets)
if "," in sets[0]:
result = []
class_names = []
for data in sets:
name, class_name = data.split(",")
result.append(name)
class_names.append(class_name)
sets = result
# Set as dataset list the dataset passed as argument
if dataset_name is None:
return class_names, sets
try:
class_name = class_names[sets.index(dataset_name)]
except ValueError:
raise ValueError(f"Unknown dataset: {dataset_name}")
return [class_name], [dataset_name]
def get_attributes(self, name):
tmp = self.discretize
self.discretize = False
X, y = self.load(name)
attr = SimpleNamespace()
values, counts = np.unique(y, return_counts=True)
comp = ""
sep = ""
for count in counts:
comp += f"{sep}{count/sum(counts)*100:5.2f}%"
sep = "/ "
attr.balance = comp
attr.classes = len(np.unique(y))
attr.samples = X.shape[0]
attr.features = X.shape[1]
self.discretize = tmp
return attr
def get_features(self):
return self.dataset.features
def get_class_name(self):
return self.dataset.class_name
def get_dataset(self):
return self.dataset.dataset
def load(self, name, dataframe=False):
try:
class_name = self.class_names[self.data_sets.index(name)]
X, y = self.dataset.load(name, class_name)
if self.discretize:
X = self.discretize_dataset(X, y)
dataset = pd.DataFrame(X, columns=self.get_features())
dataset[self.get_class_name()] = y
self.dataset.dataset = dataset
if dataframe:
return self.get_dataset()
return X, y
except (ValueError, FileNotFoundError):
raise ValueError(f"Unknown dataset: {name}")
def discretize_dataset(self, X, y):
"""Supervised discretization with Fayyad and Irani's MDLP algorithm.
Parameters
----------
X : np.ndarray
array (n_samples, n_features) of features
y : np.ndarray
array (n_samples,) of labels
Returns
-------
tuple (X, y) of numpy.ndarray
"""
discretiz = MDLP(random_state=17, dtype=np.int32)
Xdisc = discretiz.fit_transform(X, y)
return Xdisc
def __iter__(self) -> Diterator:
return Diterator(self.data_sets)

View File

@@ -1,4 +1,5 @@
import os
import sys
import json
import random
import warnings
@@ -6,95 +7,22 @@ import time
from datetime import datetime
from tqdm import tqdm
import numpy as np
import pandas as pd
from sklearn.model_selection import (
StratifiedKFold,
KFold,
GridSearchCV,
cross_validate,
)
from .Utils import Folders, Files
from .Utils import Folders, Files, NO_RESULTS
from .Datasets import Datasets
from .Models import Models
from .Arguments import EnvData
class Randomized:
seeds = [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]
class Diterator:
def __init__(self, data):
self._stack = data.copy()
def __next__(self):
if len(self._stack) == 0:
raise StopIteration()
return self._stack.pop(0)
class DatasetsTanveer:
@staticmethod
def dataset_names(name):
return f"{name}_R.dat"
@staticmethod
def folder():
return "data"
def load(self, name):
file_name = os.path.join(self.folder(), self.dataset_names(name))
data = pd.read_csv(
file_name,
sep="\t",
index_col=0,
)
X = data.drop("clase", axis=1).to_numpy()
y = data["clase"].to_numpy()
return X, y
class DatasetsSurcov:
@staticmethod
def dataset_names(name):
return f"{name}.csv"
@staticmethod
def folder():
return "datasets"
def load(self, name):
file_name = os.path.join(self.folder(), self.dataset_names(name))
data = pd.read_csv(
file_name,
index_col=0,
)
data.dropna(axis=0, how="any", inplace=True)
self.columns = data.columns
X = data.drop("class", axis=1).to_numpy()
y = data["class"].to_numpy()
return X, y
class Datasets:
def __init__(self, dataset_name=None):
envData = EnvData.load()
class_name = getattr(
__import__(__name__),
f"Datasets{envData['source_data']}",
)
self.dataset = class_name()
if dataset_name is None:
file_name = os.path.join(self.dataset.folder(), Files.index)
with open(file_name) as f:
self.data_sets = f.read().splitlines()
else:
self.data_sets = [dataset_name]
def load(self, name):
return self.dataset.load(name)
def __iter__(self) -> Diterator:
return Diterator(self.data_sets)
def seeds():
return json.loads(EnvData.load()["seeds"])
class BestResults:
@@ -144,6 +72,7 @@ class BestResults:
score=self.score_name, model=self.model
)
all_files = sorted(list(os.walk(Folders.results)))
found = False
for root, _, files in tqdm(
all_files, desc="files", disable=self.quiet
):
@@ -153,6 +82,9 @@ class BestResults:
with open(file_name) as fp:
data = json.load(fp)
self._process_datafile(results, data, name)
found = True
if not found:
raise ValueError(NO_RESULTS)
# Build best results json file
output = {}
datasets = Datasets()
@@ -214,8 +146,11 @@ class Experiment:
grid_file = os.path.join(
Folders.results, Files.grid_output(score_name, model_name)
)
try:
with open(grid_file) as f:
self.hyperparameters_dict = json.load(f)
except FileNotFoundError:
raise ValueError(f"{grid_file} does not exist")
else:
self.hyperparameters_dict = hyper.fill(
dictionary=dictionary,
@@ -223,7 +158,7 @@ class Experiment:
self.platform = platform
self.progress_bar = progress_bar
self.folds = folds
self.random_seeds = Randomized.seeds
self.random_seeds = Randomized.seeds()
self.results = []
self.duration = 0
self._init_experiment()
@@ -231,6 +166,10 @@ class Experiment:
def get_output_file(self):
return self.output_file
@staticmethod
def get_python_version():
return "{}.{}".format(sys.version_info.major, sys.version_info.minor)
def _build_classifier(self, random_state, hyperparameters):
self.model = Models.get_model(self.model_name, random_state)
clf = self.model
@@ -262,7 +201,7 @@ class Experiment:
shuffle=True, random_state=random_state, n_splits=self.folds
)
clf = self._build_classifier(random_state, hyperparameters)
self.version = clf.version() if hasattr(clf, "version") else "-"
self.version = Models.get_version(self.model_name, clf)
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
res = cross_validate(
@@ -312,6 +251,8 @@ class Experiment:
output["duration"] = self.duration
output["seeds"] = self.random_seeds
output["platform"] = self.platform
output["language_version"] = self.get_python_version()
output["language"] = "Python"
output["results"] = self.results
with open(self.output_file, "w") as f:
json.dump(output, f)
@@ -370,14 +311,10 @@ class GridSearch:
self.progress_bar = progress_bar
self.folds = folds
self.platform = platform
self.random_seeds = Randomized.seeds
self.random_seeds = Randomized.seeds()
self.grid_file = os.path.join(
Folders.results, Files.grid_input(score_name, model_name)
)
with open(self.grid_file) as f:
self.grid = json.load(f)
self.duration = 0
self._init_data()
def get_output_file(self):
return self.output_file
@@ -426,6 +363,10 @@ class GridSearch:
self.results[name] = [score, hyperparameters, message]
def do_gridsearch(self):
with open(self.grid_file) as f:
self.grid = json.load(f)
self.duration = 0
self._init_data()
now = time.time()
loop = tqdm(
list(self.datasets),
@@ -445,7 +386,7 @@ class GridSearch:
random_state=self.random_seeds[0],
n_splits=self.folds,
)
clf = Models.get_model(self.model_name)
clf = Models.get_model(self.model_name, self.random_seeds[0])
self.version = clf.version() if hasattr(clf, "version") else "-"
self._num_warnings = 0
warnings.warn = self._warn
@@ -455,7 +396,7 @@ class GridSearch:
estimator=clf,
cv=kfold,
param_grid=self.grid,
scoring=self.score_name,
scoring=self.score_name.replace("-", "_"),
n_jobs=-1,
)
grid.fit(X, y)

View File

@@ -8,9 +8,12 @@ from sklearn.ensemble import (
)
from sklearn.svm import SVC
from stree import Stree
from bayesclass.clfs import TAN, KDB, AODE
from wodt import Wodt
from odte import Odte
from xgboost import XGBClassifier
import sklearn
import xgboost
class Models:
@@ -18,6 +21,9 @@ class Models:
def define_models(random_state):
return {
"STree": Stree(random_state=random_state),
"TAN": TAN(random_state=random_state),
"KDB": KDB(k=3),
"AODE": AODE(random_state=random_state),
"Cart": DecisionTreeClassifier(random_state=random_state),
"ExtraTree": ExtraTreeClassifier(random_state=random_state),
"Wodt": Wodt(random_state=random_state),
@@ -89,3 +95,15 @@ class Models:
nodes, leaves = result.nodes_leaves()
depth = result.depth_ if hasattr(result, "depth_") else 0
return nodes, leaves, depth
@staticmethod
def get_version(name, clf):
if hasattr(clf, "version"):
return clf.version()
if name in ["Cart", "ExtraTree", "RandomForest", "GBC", "SVC"]:
return sklearn.__version__
elif name.startswith("Bagging") or name.startswith("AdaBoost"):
return sklearn.__version__
elif name == "XGBoost":
return xgboost.__version__
return "Error"

View File

@@ -1,14 +1,51 @@
import os
import sys
from operator import itemgetter
from types import SimpleNamespace
import math
import json
import abc
import shutil
import subprocess
import xlsxwriter
from xlsxwriter.exceptions import DuplicateWorksheetName
import numpy as np
from .Experiments import Datasets, BestResults
from .Utils import Folders, Files, Symbols, BEST_ACCURACY_STREE, TextColor
from .Experiments import BestResults
from .Datasets import Datasets
from .Arguments import EnvData, ALL_METRICS
from .Utils import (
Folders,
Files,
Symbols,
TextColor,
NO_RESULTS,
)
from ._version import __version__
def get_input(message="", is_test=False):
return "test" if is_test else input(message)
class BestResultsEver:
def __init__(self):
self.data = {}
for i in ["Tanveer", "Surcov", "Arff"]:
self.data[i] = {}
for metric in ALL_METRICS:
self.data[i][metric.replace("-", "_")] = ["self", 1.0]
self.data[i][metric] = ["self", 1.0]
self.data["Tanveer"]["accuracy"] = [
"STree_default (liblinear-ovr)",
40.282203,
]
self.data["Arff"]["accuracy"] = [
"STree_default (linear-ovo)",
22.109799,
]
def get_name_value(self, key, score):
return self.data[key][score]
class BaseReport(abc.ABC):
@@ -22,7 +59,20 @@ class BaseReport(abc.ABC):
with open(self.file_name) as f:
self.data = json.load(f)
self.best_acc_file = best_file
self.lines = self.data if best_file else self.data["results"]
if best_file:
self.lines = self.data
else:
self.lines = self.data["results"]
self.score_name = self.data["score_name"]
self.__compute_best_results_ever()
def __compute_best_results_ever(self):
args = EnvData.load()
key = args["source_data"]
best = BestResultsEver()
self.best_score_name, self.best_score_value = best.get_name_value(
key, self.score_name
)
def _get_accuracy(self, item):
return self.data[item][0] if self.best_acc_file else item["score"]
@@ -61,6 +111,12 @@ class BaseReport(abc.ABC):
}
return meaning[status]
def _get_best_accuracy(self):
return self.best_score_value
def _get_message_best_accuracy(self):
return f"{self.score_name} compared to {self.best_score_name} .:"
@abc.abstractmethod
def header(self) -> None:
pass
@@ -75,10 +131,10 @@ class BaseReport(abc.ABC):
class Report(BaseReport):
header_lengths = [30, 5, 5, 3, 7, 7, 7, 15, 16, 15]
header_lengths = [30, 6, 5, 3, 7, 7, 7, 15, 17, 15]
header_cols = [
"Dataset",
"Samp",
"Sampl.",
"Feat.",
"Cls",
"Nodes",
@@ -134,7 +190,7 @@ class Report(BaseReport):
)
i += 1
print(
f"{result['time']:9.6f}±{result['time_std']:6.4f} ",
f"{result['time']:10.6f}±{result['time_std']:6.4f} ",
end="",
)
i += 1
@@ -148,7 +204,8 @@ class Report(BaseReport):
self._compare_totals = {}
self.header_line("*")
self.header_line(
f" Report {self.data['model']} ver. {self.data['version']}"
f" {self.data['model']} ver. {self.data['version']}"
f" {self.data['language']} ver. {self.data['language_version']}"
f" with {self.data['folds']} Folds "
f"cross validation and {len(self.data['seeds'])} random seeds. "
f"{self.data['date']} {self.data['time']}"
@@ -180,8 +237,8 @@ class Report(BaseReport):
f" {key} {self._status_meaning(key)} .....: {value:2d}"
)
self.header_line(
f" Accuracy compared to stree_default (liblinear-ovr) .: "
f"{accuracy/BEST_ACCURACY_STREE:7.4f}"
f" {self._get_message_best_accuracy()} "
f"{accuracy/self._get_best_accuracy():7.4f}"
)
self.header_line("*")
@@ -195,18 +252,17 @@ class ReportBest(BaseReport):
"Hyperparameters",
]
def __init__(self, score, model, best, grid):
def __init__(self, score, model, best):
name = (
Files.best_results(score, model)
if best
else Files.grid_output(score, model)
)
self.best = best
self.grid = grid
file_name = os.path.join(Folders.results, name)
super().__init__(file_name, best_file=True)
self.best = best
self.score_name = score
self.model = model
super().__init__(file_name, best_file=True)
def header_line(self, text: str) -> None:
length = sum(self.header_lengths) + len(self.header_lengths) - 3
@@ -246,8 +302,8 @@ class ReportBest(BaseReport):
def footer(self, accuracy):
self.header_line("*")
self.header_line(
f" Scores compared to stree_default accuracy (liblinear-ovr) .: "
f"{accuracy/BEST_ACCURACY_STREE:7.4f}"
f" {self._get_message_best_accuracy()} "
f"{accuracy/self._get_best_accuracy():7.4f}"
)
self.header_line("*")
@@ -269,13 +325,25 @@ class Excel(BaseReport):
self._compare_totals = {}
if book is None:
self.excel_file_name = self.file_name.replace(".json", ".xlsx")
self.book = xlsxwriter.Workbook(self.excel_file_name)
self.book = xlsxwriter.Workbook(
self.excel_file_name, {"nan_inf_to_errors": True}
)
self.set_book_properties()
self.close = True
else:
self.book = book
self.close = False
self.sheet = self.book.add_worksheet(self.data["model"])
suffix = ""
num = 1
while True:
try:
self.sheet = self.book.add_worksheet(
self.data["model"] + suffix
)
break
except DuplicateWorksheetName:
num += 1
suffix = str(num)
self.max_hyper_width = 0
self.col_hyperparams = 0
@@ -297,7 +365,8 @@ class Excel(BaseReport):
def get_title(self):
return (
f" Report {self.data['model']} ver. {self.data['version']}"
f" {self.data['model']} ver. {self.data['version']}"
f" {self.data['language']} ver. {self.data['language_version']}"
f" with {self.data['folds']} Folds "
f"cross validation and {len(self.data['seeds'])} random seeds. "
f"{self.data['date']} {self.data['time']}"
@@ -499,8 +568,8 @@ class Excel(BaseReport):
self.sheet.write(self.row, 3, self._status_meaning(key), bold)
self.row += 1
message = (
f"** Accuracy compared to stree_default (liblinear-ovr) .: "
f"{accuracy/BEST_ACCURACY_STREE:7.4f}"
f"** {self._get_message_best_accuracy()} "
f"{accuracy/self._get_best_accuracy():7.4f}"
)
bold = self.book.add_format({"bold": True, "font_size": 14})
# set width of the hyperparams column with the maximum width
@@ -514,11 +583,253 @@ class Excel(BaseReport):
self.sheet.set_row(c, 20)
self.sheet.set_row(0, 25)
self.sheet.freeze_panes(6, 1)
self.sheet.hide_gridlines()
self.sheet.hide_gridlines(2)
if self.close:
self.book.close()
class ReportDatasets:
row = 6
# alternate lines colors
color1 = "#DCE6F1"
color2 = "#FDE9D9"
color3 = "#B1A0C7"
def __init__(self, excel=False, book=None):
self.excel = excel
self.env = EnvData().load()
self.close = False
self.output = True
self.header_text = f"Datasets used in benchmark ver. {__version__}"
if excel:
self.max_length = 0
if book is None:
self.excel_file_name = Files.datasets_report_excel
self.book = xlsxwriter.Workbook(
self.excel_file_name, {"nan_inf_to_errors": True}
)
self.set_properties(self.get_title())
self.close = True
else:
self.book = book
self.output = False
self.sheet = self.book.add_worksheet("Datasets")
def set_properties(self, title):
self.book.set_properties(
{
"title": title,
"subject": "Machine learning results",
"author": "Ricardo Montañana Gómez",
"manager": "Dr. J. A. Gámez, Dr. J. M. Puerta",
"company": "UCLM",
"comments": "Created with Python and XlsxWriter",
}
)
@staticmethod
def get_python_version():
return "{}.{}".format(sys.version_info.major, sys.version_info.minor)
def get_title(self):
return (
f" Benchmark ver. {__version__} - "
f" Python ver. {self.get_python_version()}"
f" with {self.env['n_folds']} Folds cross validation "
f" Discretization: {self.env['discretize']} "
f"Stratification: {self.env['stratified']}"
)
def get_file_name(self):
return self.excel_file_name
def header(self):
merge_format = self.book.add_format(
{
"border": 1,
"bold": 1,
"align": "center",
"valign": "vcenter",
"font_size": 18,
"bg_color": self.color3,
}
)
merge_format_subheader = self.book.add_format(
{
"border": 1,
"bold": 1,
"align": "center",
"valign": "vcenter",
"font_size": 16,
"bg_color": self.color1,
}
)
merge_format_subheader_right = self.book.add_format(
{
"border": 1,
"bold": 1,
"align": "right",
"valign": "vcenter",
"font_size": 16,
"bg_color": self.color1,
}
)
merge_format_subheader_left = self.book.add_format(
{
"border": 1,
"bold": 1,
"align": "left",
"valign": "vcenter",
"font_size": 16,
"bg_color": self.color1,
}
)
self.sheet.merge_range(0, 0, 0, 4, self.header_text, merge_format)
self.sheet.merge_range(
1,
0,
4,
0,
f" Default score {self.env['score']}",
merge_format_subheader,
)
self.sheet.merge_range(
1,
1,
1,
3,
"Cross validation",
merge_format_subheader_right,
)
self.sheet.write(
1, 4, f"{self.env['n_folds']} Folds", merge_format_subheader_left
)
self.sheet.merge_range(
2,
1,
2,
3,
"Stratified",
merge_format_subheader_right,
)
self.sheet.write(
2,
4,
f"{'True' if self.env['stratified']=='1' else 'False'}",
merge_format_subheader_left,
)
self.sheet.merge_range(
3,
1,
3,
3,
"Discretized",
merge_format_subheader_right,
)
self.sheet.write(
3,
4,
f"{'True' if self.env['discretize']=='1' else 'False'}",
merge_format_subheader_left,
)
self.sheet.merge_range(
4,
1,
4,
3,
"Seeds",
merge_format_subheader_right,
)
self.sheet.write(
4, 4, f"{self.env['seeds']}", merge_format_subheader_left
)
self.update_max_length(len(self.env["seeds"]) + 1)
header_cols = [
("Dataset", 30),
("Samples", 10),
("Features", 10),
("Classes", 10),
("Balance", 50),
]
bold = self.book.add_format(
{
"bold": True,
"font_size": 14,
"bg_color": self.color3,
"border": 1,
}
)
i = 0
for item, length in header_cols:
self.sheet.write(5, i, item, bold)
self.sheet.set_column(i, i, length)
i += 1
def footer(self):
# set Balance column width to max length
self.sheet.set_column(4, 4, self.max_length)
self.sheet.freeze_panes(6, 1)
self.sheet.hide_gridlines(2)
if self.close:
self.book.close()
def print_line(self, result):
size_n = 14
integer = self.book.add_format(
{"num_format": "#,###", "font_size": size_n, "border": 1}
)
normal = self.book.add_format({"font_size": size_n, "border": 1})
col = 0
if self.row % 2 == 0:
normal.set_bg_color(self.color1)
integer.set_bg_color(self.color1)
else:
normal.set_bg_color(self.color2)
integer.set_bg_color(self.color2)
self.sheet.write(self.row, col, result.dataset, normal)
self.sheet.write(self.row, col + 1, result.samples, integer)
self.sheet.write(self.row, col + 2, result.features, integer)
self.sheet.write(self.row, col + 3, result.classes, normal)
self.sheet.write(self.row, col + 4, result.balance, normal)
self.update_max_length(len(result.balance))
self.row += 1
def update_max_length(self, value):
if value > self.max_length:
self.max_length = value
def report(self):
data_sets = Datasets()
color_line = TextColor.LINE1
if self.excel:
self.header()
if self.output:
print(color_line, end="")
print(self.header_text)
print("")
print(f"{'Dataset':30s} Sampl. Feat. Cls Balance")
print("=" * 30 + " ====== ===== === " + "=" * 60)
for dataset in data_sets:
attributes = data_sets.get_attributes(dataset)
attributes.dataset = dataset
if self.excel:
self.print_line(attributes)
color_line = (
TextColor.LINE2
if color_line == TextColor.LINE1
else TextColor.LINE1
)
if self.output:
print(color_line, end="")
print(
f"{dataset:30s} {attributes.samples:6,d} "
f"{attributes.features:5,d} {attributes.classes:3d} "
f"{attributes.balance:40s}"
)
if self.excel:
self.footer()
class SQL(BaseReport):
table_name = "results"
@@ -596,6 +907,13 @@ class Benchmark:
self._report = {}
self._datasets = set()
self.visualize = visualize
self.__compute_best_results_ever()
def __compute_best_results_ever(self):
args = EnvData.load()
key = args["source_data"]
best = BestResultsEver()
_, self.best_score_value = best.get_name_value(key, self._score)
def get_result_file_name(self):
return os.path.join(Folders.exreport, Files.exreport(self._score))
@@ -604,6 +922,8 @@ class Benchmark:
summary = Summary()
summary.acquire(given_score=self._score)
self._models = summary.get_models()
if self._models == []:
raise ValueError(NO_RESULTS)
for model in self._models:
best = summary.best_result(
criterion="model", value=model, score=self._score
@@ -782,7 +1102,9 @@ class Benchmark:
)
def excel(self):
book = xlsxwriter.Workbook(self.get_excel_file_name())
book = xlsxwriter.Workbook(
self.get_excel_file_name(), {"nan_inf_to_errors": True}
)
Excel.set_properties(book, "Experimentation summary")
sheet = book.add_worksheet("Benchmark")
normal = book.add_format({"font_size": 14, "border": 1})
@@ -929,7 +1251,7 @@ class Benchmark:
sheet.write_formula(
row,
col + 1,
f"=sum({range_metric})/{BEST_ACCURACY_STREE}",
f"=sum({range_metric})/{self.best_score_value}",
decimal_total,
)
range_rank = (
@@ -977,7 +1299,12 @@ class Benchmark:
k = Excel(file_name=file_name, book=book)
k.report()
sheet.freeze_panes(6, 1)
sheet.hide_gridlines()
sheet.hide_gridlines(2)
def add_datasets_sheet():
# Add datasets sheet
re = ReportDatasets(excel=True, book=book)
re.report()
def exreport_output():
file_name = os.path.join(
@@ -1005,6 +1332,7 @@ class Benchmark:
footer()
models_files()
exreport_output()
add_datasets_sheet()
book.close()
@@ -1021,16 +1349,18 @@ class StubReport(BaseReport):
def footer(self, accuracy: float) -> None:
self.accuracy = accuracy
self.score = accuracy / BEST_ACCURACY_STREE
self.score = accuracy / self._get_best_accuracy()
class Summary:
def __init__(self, hidden=False) -> None:
def __init__(self, hidden=False, compare=False) -> None:
self.results = Files().get_all_results(hidden=hidden)
self.data = []
self.data_filtered = []
self.datasets = {}
self.models = set()
self.hidden = hidden
self.compare = compare
def get_models(self):
return sorted(self.models)
@@ -1073,18 +1403,15 @@ class Summary:
self.data.append(entry)
def get_results_criteria(
self,
score,
model,
input_data,
sort_key,
number,
self, score, model, input_data, sort_key, number, nan=False
):
data = self.data.copy() if input_data is None else input_data
if score:
data = [x for x in data if x["score"] == score]
if model:
data = [x for x in data if x["model"] == model]
if nan:
data = [x for x in data if x["metric"] != x["metric"]]
keys = (
itemgetter(sort_key, "time")
if sort_key == "date"
@@ -1102,13 +1429,17 @@ class Summary:
input_data=None,
sort_key="date",
number=0,
nan=False,
) -> None:
"""Print the list of results"""
data = self.get_results_criteria(
score, model, input_data, sort_key, number
if self.data_filtered == []:
self.data_filtered = self.get_results_criteria(
score, model, input_data, sort_key, number, nan=nan
)
max_file = max(len(x["file"]) for x in data)
max_title = max(len(x["title"]) for x in data)
if self.data_filtered == []:
raise ValueError(NO_RESULTS)
max_file = max(len(x["file"]) for x in self.data_filtered)
max_title = max(len(x["title"]) for x in self.data_filtered)
if self.hidden:
color1 = TextColor.GREEN
color2 = TextColor.YELLOW
@@ -1117,10 +1448,11 @@ class Summary:
color2 = TextColor.LINE2
print(color1, end="")
print(
f"{'Date':10s} {'File':{max_file}s} {'Score':7s} {'Time(h)':7s} "
f"{'Title':s}"
f" # {'Date':10s} {'File':{max_file}s} {'Score':8s} "
f"{'Time(h)':7s} {'Title':s}"
)
print(
"===",
"=" * 10
+ " "
+ "=" * max_file
@@ -1129,64 +1461,205 @@ class Summary:
+ " "
+ "=" * 7
+ " "
+ "=" * max_title
+ "=" * max_title,
)
print(
"\n".join(
[
(color2 if n % 2 == 0 else color1)
+ f"{x['date']} {x['file']:{max_file}s} "
(color2 if n % 2 == 0 else color1) + f"{n:3d} "
f"{x['date']} {x['file']:{max_file}s} "
f"{x['metric']:8.5f} "
f"{x['duration']/3600:7.3f} "
f"{x['title']}"
for n, x in enumerate(data)
for n, x in enumerate(self.data_filtered)
]
)
)
def manage_results(self):
"""Manage results showed in the summary
return True if excel file is created False otherwise
"""
def process_file(num, command, path):
num = int(num)
name = self.data_filtered[num]["file"]
file_name_result = os.path.join(path, name)
verb1, verb2 = (
("delete", "Deleting")
if command == cmd.delete
else (
"hide",
"Hiding",
)
)
conf_message = (
TextColor.RED
+ f"Are you sure to {verb1} {file_name_result} (y/n)? "
)
confirm = get_input(message=conf_message)
if confirm == "y":
print(TextColor.YELLOW + f"{verb2} {file_name_result}")
if command == cmd.delete:
os.unlink(file_name_result)
else:
os.rename(
os.path.join(Folders.results, name),
os.path.join(Folders.hidden_results, name),
)
self.data_filtered.pop(num)
get_input(message="Press enter to continue")
self.list_results()
cmd = SimpleNamespace(
quit="q", relist="r", delete="d", hide="h", excel="e"
)
message = (
TextColor.ENDC
+ f"Choose option {str(cmd).replace('namespace', '')}: "
)
path = Folders.hidden_results if self.hidden else Folders.results
book = None
max_value = len(self.data)
while True:
match get_input(message=message).split():
case [cmd.relist]:
self.list_results()
case [cmd.quit]:
if book is not None:
book.close()
return True
return False
case [cmd.hide, num] if num.isdigit() and int(num) < max_value:
if self.hidden:
print("Already hidden")
else:
process_file(num, path=path, command=cmd.hide)
case [cmd.delete, num] if num.isdigit() and int(
num
) < max_value:
process_file(num=num, path=path, command=cmd.delete)
case [cmd.excel, num] if num.isdigit() and int(
num
) < max_value:
# Add to excel file result #num
num = int(num)
file_name_result = os.path.join(
path, self.data_filtered[num]["file"]
)
if book is None:
file_name = Files.be_list_excel
book = xlsxwriter.Workbook(
file_name, {"nan_inf_to_errors": True}
)
excel = Excel(
file_name=file_name_result,
book=book,
compare=self.compare,
)
excel.report()
print(f"Added {file_name_result} to {Files.be_list_excel}")
case [num] if num.isdigit() and int(num) < max_value:
# Report the result #num
num = int(num)
file_name_result = os.path.join(
path, self.data_filtered[num]["file"]
)
rep = Report(file_name_result, compare=self.compare)
rep.report()
case _:
print("Invalid option. Try again!")
def show_result(self, data: dict, title: str = "") -> None:
def whites(n: int) -> str:
return " " * n + "*"
return " " * n + color1 + "*"
if data == {}:
print(f"** {title} has No data **")
return
color1 = TextColor.CYAN
color2 = TextColor.YELLOW
file_name = data["file"]
metric = data["metric"]
result = StubReport(os.path.join(Folders.results, file_name))
length = 81
print("*" * length)
print(color1 + "*" * length)
if title != "":
print(f"*{title:^{length - 2}s}*")
print(
"*"
+ color2
+ TextColor.BOLD
+ f"{title:^{length - 2}s}"
+ TextColor.ENDC
+ color1
+ "*"
)
print("*" + "-" * (length - 2) + "*")
print("*" + whites(length - 2))
print(f"* {result.data['title']:^{length - 4}} *")
print("*" + whites(length - 2))
print(
f"* Model: {result.data['model']:15s} "
f"Ver. {result.data['version']:10s} "
f"Score: {result.data['score_name']:10s} "
f"Metric: {metric:10.7f}" + whites(length - 78)
"* "
+ color2
+ f"{result.data['title']:^{length - 4}}"
+ color1
+ " *"
)
print("*" + whites(length - 2))
print(
f"* Date : {result.data['date']:15s} Time: "
f"{result.data['time']:18s} Time Spent: "
f"{result.data['duration']:9,.2f} secs." + whites(length - 78)
"* Model: "
+ color2
+ f"{result.data['model']:15s} "
+ color1
+ "Ver. "
+ color2
+ f"{result.data['version']:10s} "
+ color1
+ "Score: "
+ color2
+ f"{result.data['score_name']:10s} "
+ color1
+ "Metric: "
+ color2
+ f"{metric:10.7f}"
+ whites(length - 78)
)
print(color1 + "*" + whites(length - 2))
print(
"* Date : "
+ color2
+ f"{result.data['date']:15s}"
+ color1
+ " Time: "
+ color2
+ f"{result.data['time']:18s} "
+ color1
+ "Time Spent: "
+ color2
+ f"{result.data['duration']:9,.2f}"
+ color1
+ " secs."
+ whites(length - 78)
)
seeds = str(result.data["seeds"])
seeds_len = len(seeds)
print(
f"* Seeds: {seeds:{seeds_len}s} Platform: "
f"{result.data['platform']:17s} " + whites(length - 79)
"* Seeds: "
+ color2
+ f"{seeds:{seeds_len}s} "
+ color1
+ "Platform: "
+ color2
+ f"{result.data['platform']:17s} "
+ whites(length - 79)
)
print(
f"* Stratified: {str(result.data['stratified']):15s}"
"* Stratified: "
+ color2
+ f"{str(result.data['stratified']):15s}"
+ whites(length - 30)
)
print(f"* {file_name:60s}" + whites(length - 63))
print("*" + whites(length - 2))
print("*" * length)
print("* " + color2 + f"{file_name:60s}" + whites(length - 63))
print(color1 + "*" + whites(length - 2))
print(color1 + "*" * length)
def best_results(self, criterion=None, value=None, score="accuracy", n=10):
# First filter the same score results (accuracy, f1, ...)
@@ -1196,6 +1669,8 @@ class Summary:
if criterion is None or value is None
else [x for x in haystack if x[criterion] == value]
)
if haystack == []:
raise ValueError(NO_RESULTS)
return (
sorted(
haystack,
@@ -1231,11 +1706,14 @@ class Summary:
return best_results
def show_top(self, score="accuracy", n=10):
try:
self.list_results(
score=score,
input_data=self.best_results(score=score, n=n),
sort_key="metric",
)
except ValueError as e:
print(e)
class PairCheck:

View File

@@ -1,7 +1,10 @@
import os
import sys
import subprocess
BEST_ACCURACY_STREE = 40.282203
PYTHON_VERSION = "{}.{}".format(sys.version_info.major, sys.version_info.minor)
NO_RESULTS = "** No results found **"
NO_ENV = "File .env not found"
class Folders:
@@ -9,6 +12,7 @@ class Folders:
hidden_results = "hidden_results"
exreport = "exreport"
report = os.path.join(exreport, "exreport_output")
img = "img"
@staticmethod
def src():
@@ -23,6 +27,8 @@ class Files:
exreport_pdf = "Rplots.pdf"
benchmark_r = "benchmark.r"
dot_env = ".env"
datasets_report_excel = "ReportDatasets.xlsx"
be_list_excel = "some_results.xlsx"
@staticmethod
def exreport_output(score):
@@ -140,3 +146,7 @@ class TextColor:
ENDC = "\033[0m"
BOLD = "\033[1m"
UNDERLINE = "\033[4m"
WHITE = "\033[97m"
GREY = "\033[90m"
BLACK = "\033[90m"
DEFAULT = "\033[99m"

View File

@@ -1,10 +1,16 @@
from .Experiments import Experiment, Datasets, DatasetsSurcov, DatasetsTanveer
from .Datasets import (
Datasets,
DatasetsSurcov,
DatasetsTanveer,
DatasetsArff,
)
from .Experiments import Experiment
from .Results import Report, Summary
from .Arguments import EnvDefault
from ._version import __version__
__author__ = "Ricardo Montañana Gómez"
__copyright__ = "Copyright 2020-2022, Ricardo Montañana Gómez"
__copyright__ = "Copyright 2020-2023, Ricardo Montañana Gómez"
__license__ = "MIT License"
__author_email__ = "ricardo.montanana@alu.uclm.es"
__all__ = ["Experiment", "Datasets", "Report", "Summary", "EnvDefault"]
__all__ = ["Experiment", "Datasets", "Report", "Summary", __version__]

View File

@@ -1 +1 @@
__version__ = "0.1.1"
__version__ = "0.4.0"

View File

@@ -1,4 +1,8 @@
library(glue)
Sys.setenv(LANG = "en")
if (Sys.getlocale("LC_MESSAGES") == "es_ES.UTF-8") {
resoutput <- capture.output(Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8'))
}
args = commandArgs(trailingOnly=TRUE)
if (length(args)!=3) {
stop("Only two arguments must be supplied (score & input_file & visualize).n", call.=FALSE)

View File

@@ -4,17 +4,21 @@ from benchmark.Utils import Files
from benchmark.Arguments import Arguments
def main():
arguments = Arguments()
arguments.xset("score").xset("excel").xset("tex_output")
ar = arguments.parse()
benchmark = Benchmark(score=ar.score, visualize=True)
def main(args_test=None):
arguments = Arguments(prog="be_benchmark")
arguments.xset("score").xset("excel").xset("tex_output").xset("quiet")
args = arguments.parse(args_test)
benchmark = Benchmark(score=args.score, visualize=not args.quiet)
try:
benchmark.compile_results()
except ValueError as e:
print(e)
else:
benchmark.save_results()
benchmark.report(ar.tex_output)
benchmark.report(args.tex_output)
benchmark.exreport()
if ar.excel:
if args.excel:
benchmark.excel()
Files.open(benchmark.get_excel_file_name())
if ar.tex_output:
Files.open(benchmark.get_excel_file_name(), test=args.quiet)
if args.tex_output:
print(f"File {benchmark.get_tex_file()} generated")

View File

@@ -4,12 +4,12 @@ from benchmark.Results import Summary
from benchmark.Arguments import ALL_METRICS, Arguments
def main():
def main(args_test=None):
arguments = Arguments()
metrics = list(ALL_METRICS)
metrics.append("all")
arguments.xset("score", choices=metrics)
args = arguments.parse()
args = arguments.parse(args_test)
metrics = ALL_METRICS if args.score == "all" else [args.score]
summary = Summary()
summary.acquire()

View File

@@ -1,19 +1,25 @@
#!/usr/bin/env python
from benchmark.Results import ReportBest
from benchmark.Experiments import Datasets, BestResults
from benchmark.Experiments import BestResults
from benchmark.Datasets import Datasets
from benchmark.Arguments import Arguments
"""Build a json file with the best results of a model and its hyperparameters
"""
def main():
def main(args_test=None):
arguments = Arguments()
arguments.xset("score").xset("report").xset("model")
args = arguments.parse()
arguments.xset("score", mandatory=True).xset("report")
arguments.xset("model", mandatory=True)
args = arguments.parse(args_test)
datasets = Datasets()
best = BestResults(args.score, args.model, datasets)
try:
best.build()
except ValueError as e:
print(e)
else:
if args.report:
report = ReportBest(args.score, args.model, best=True, grid=False)
report = ReportBest(args.score, args.model, best=True)
report.report()

View File

@@ -2,9 +2,17 @@
import os
import json
from benchmark.Utils import Files, Folders
from benchmark.Arguments import Arguments
"""Build sample grid input file for the model with data taken from the
input grid used optimizing STree
"""
def main():
def main(args_test=None):
arguments = Arguments()
arguments.xset("model", mandatory=True).xset("score", mandatory=True)
args = arguments.parse(args_test)
data = [
'{"C": 1e4, "gamma": 0.1, "kernel": "rbf"}',
'{"C": 7, "gamma": 0.14, "kernel": "rbf"}',
@@ -103,10 +111,9 @@ def main():
t2 = sorted([x for x in value if isinstance(x, str)])
results_tmp[new_key] = t1 + t2
output.append(results_tmp)
# save results
file_name = Files.grid_input("accuracy", "ODTE")
file_name = Files.grid_input(args.score, args.model)
file_output = os.path.join(Folders.results, file_name)
with open(file_output, "w") as f:
json.dump(output, f, indent=4)
print(f"Grid values saved to {file_output}")
print(f"Generated grid input file to {file_output}")

View File

@@ -1,16 +1,19 @@
#!/usr/bin/env python
from benchmark.Experiments import GridSearch, Datasets
from benchmark.Experiments import GridSearch
from benchmark.Datasets import Datasets
from benchmark.Arguments import Arguments
"""Do experiment and build result file, optionally print report with results
"""
def main():
def main(args_test=None):
arguments = Arguments()
arguments.xset("score").xset("platform").xset("model").xset("n_folds")
arguments.xset("quiet").xset("stratified").xset("dataset")
args = arguments.parse()
arguments.xset("score").xset("platform").xset("model", mandatory=True)
arguments.xset("quiet").xset("stratified").xset("dataset").xset("n_folds")
args = arguments.parse(args_test)
if not args.quiet:
print(f"Perform grid search with {args.model} model")
job = GridSearch(
score_name=args.score,
model_name=args.model,
@@ -18,6 +21,9 @@ def main():
datasets=Datasets(dataset_name=args.dataset),
progress_bar=not args.quiet,
platform=args.platform,
folds=args.folds,
folds=args.n_folds,
)
try:
job.do_gridsearch()
except FileNotFoundError:
print(f"** The grid input file [{job.grid_file}] could not be found")

View File

@@ -0,0 +1,33 @@
#!/usr/bin/env python
import os
from benchmark.Utils import Files, Folders
from benchmark.Arguments import Arguments
def main(args_test=None):
arguments = Arguments(prog="be_init_project")
arguments.add_argument("project_name", help="Project name")
args = arguments.parse(args_test)
folders = []
folders.append(args.project_name)
folders.append(os.path.join(args.project_name, Folders.results))
folders.append(os.path.join(args.project_name, Folders.hidden_results))
folders.append(os.path.join(args.project_name, Folders.exreport))
folders.append(os.path.join(args.project_name, Folders.report))
folders.append(os.path.join(args.project_name, Folders.img))
try:
for folder in folders:
print(f"Creating folder {folder}")
os.makedirs(folder)
except FileExistsError as e:
print(e)
exit(1)
env_src = os.path.join(Folders.src(), "..", f"{Files.dot_env}.dist")
env_to = os.path.join(args.project_name, Files.dot_env)
os.system(f"cp {env_src} {env_to}")
print("Done!")
print(
"Please, edit .env file with your settings and add a datasets folder"
)
print("with an all.txt file with the datasets you want to use.")
print("In that folder you have to include all the datasets you'll use.")

View File

@@ -1,49 +1,32 @@
#! /usr/bin/env python
import os
from benchmark.Results import Summary
from benchmark.Utils import Folders
from benchmark.Utils import Files
from benchmark.Arguments import Arguments
"""List experiments of a model
"""
def main():
arguments = Arguments()
arguments.xset("number").xset("model", required=False).xset("score")
arguments.xset("hidden").xset("nan").xset("key")
args = arguments.parse()
data = Summary(hidden=args.hidden)
def main(args_test=None):
arguments = Arguments(prog="be_list")
arguments.xset("number").xset("model", required=False).xset("key")
arguments.xset("score", required=False).xset("compare").xset("hidden")
arguments.xset("nan")
args = arguments.parse(args_test)
data = Summary(hidden=args.hidden, compare=args.compare)
data.acquire()
try:
data.list_results(
score=args.score,
model=args.model,
sort_key=args.key,
number=args.number,
nan=args.nan,
)
if args.nan:
results_nan = []
results = data.get_results_criteria(
score=args.score,
model=args.model,
input_data=None,
sort_key=args.key,
number=args.number,
)
for result in results:
if result["metric"] != result["metric"]:
results_nan.append(result)
if results_nan != []:
print(
"\n"
+ "*" * 30
+ " Results with nan moved to hidden "
+ "*" * 30
)
data.list_results(input_data=results_nan)
for result in results_nan:
name = result["file"]
os.rename(
os.path.join(Folders.results, name),
os.path.join(Folders.hidden_results, name),
)
except ValueError as e:
print(e)
return
excel_generated = data.manage_results()
if excel_generated:
print(f"Generated file: {Files.be_list_excel}")
Files.open(Files.be_list_excel, test=args_test is not None)

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python
import os
from benchmark.Experiments import Experiment, Datasets
from benchmark.Experiments import Experiment
from benchmark.Datasets import Datasets
from benchmark.Results import Report
from benchmark.Arguments import Arguments
@@ -8,23 +9,29 @@ from benchmark.Arguments import Arguments
"""
def main():
arguments = Arguments()
arguments.xset("stratified").xset("score").xset("model").xset("dataset")
def main(args_test=None):
arguments = Arguments(prog="be_main")
arguments.xset("stratified").xset("score").xset("model", mandatory=True)
arguments.xset("n_folds").xset("platform").xset("quiet").xset("title")
arguments.xset("hyperparameters").xset("paramfile").xset("report")
arguments.xset("grid_paramfile")
args = arguments.parse()
arguments.xset("report")
arguments.add_exclusive(
["grid_paramfile", "best_paramfile", "hyperparameters"]
)
arguments.xset(
"dataset", overrides="title", const="Test with only one dataset"
)
args = arguments.parse(args_test)
report = args.report or args.dataset is not None
if args.grid_paramfile:
args.paramfile = False
args.best_paramfile = False
try:
job = Experiment(
score_name=args.score,
model_name=args.model,
stratified=args.stratified,
datasets=Datasets(dataset_name=args.dataset),
hyperparams_dict=args.hyperparameters,
hyperparams_file=args.paramfile,
hyperparams_file=args.best_paramfile,
grid_paramfile=args.grid_paramfile,
progress_bar=not args.quiet,
platform=args.platform,
@@ -32,6 +39,9 @@ def main():
folds=args.n_folds,
)
job.do_experiment()
except ValueError as e:
print(e)
else:
if report:
result_file = job.get_output_file()
report = Report(result_file)

View File

@@ -1,22 +1,26 @@
#!/usr/bin/env python
from benchmark.Results import PairCheck
from Arguments import Arguments
from benchmark.Arguments import Arguments
"""Check best results of two models giving scores and win-tie-loose results
"""
def main():
def main(args_test=None):
arguments = Arguments()
arguments.xset("score").xset("win").xset("model1").xset("model2")
arguments.xset("lose")
args = arguments.parse()
args = arguments.parse(args_test)
pair_check = PairCheck(
args.score,
args.model1,
args.model2,
args.win_results,
args.lose_results,
args.win,
args.lose,
)
try:
pair_check.compute()
except ValueError as e:
print(str(e))
else:
pair_check.report()

View File

@@ -1,18 +1,11 @@
#!/usr/bin/env python
import os
import subprocess
import json
from stree import Stree
from graphviz import Source
from benchmark.Experiments import Datasets
from benchmark.Datasets import Datasets
from benchmark.Utils import Files, Folders
from Arguments import Arguments
def compute_stree(X, y, random_state):
clf = Stree(random_state=random_state)
clf.fit(X, y)
return clf
from benchmark.Arguments import Arguments
def load_hyperparams(score_name, model_name):
@@ -62,7 +55,6 @@ def add_color(source):
def print_stree(clf, dataset, X, y, color, quiet):
output_folder = "img"
samples, features = X.shape
classes = max(y) + 1
accuracy = clf.score(X, y)
@@ -72,20 +64,18 @@ def print_stree(clf, dataset, X, y, color, quiet):
if color:
dot_source = add_color(dot_source)
grp = Source(dot_source)
file_name = os.path.join(output_folder, f"stree_{dataset}")
file_name = os.path.join(Folders.img, f"stree_{dataset}")
grp.render(format="png", filename=f"{file_name}")
os.remove(f"{file_name}")
print(f"File {file_name}.png generated")
if not quiet:
cmd_open = "/usr/bin/open"
if os.path.isfile(cmd_open) and os.access(cmd_open, os.X_OK):
subprocess.run([cmd_open, f"{file_name}.png"])
file_name += ".png"
print(f"File {file_name} generated")
Files.open(name=file_name, test=quiet)
def main():
def main(args_test=None):
arguments = Arguments()
arguments.xset("color").xset("dataset", default="all").xset("quiet")
args = arguments.parse()
args = arguments.parse(args_test)
hyperparameters = load_hyperparams("accuracy", "ODTE")
random_state = 57
dt = Datasets()

View File

@@ -1,23 +0,0 @@
#!/usr/bin/env python
import os
import json
from benchmark.Experiments import Files, Folders
def main():
versions = dict(SVC="-", STree="1.2.3", ODTE="0.3.2")
results = Files().get_all_results(hidden=False)
for result in results:
print(result)
file_name = os.path.join(Folders.results, result)
with open(file_name) as f:
data = json.load(f)
if "title" not in data:
print(f"Repairing title in {result}")
data["title"] = "default"
if "version" not in data:
print(f"Repairing version in {result}")
model = data["model"]
data["version"] = versions[model] if model in versions else "-"
with open(file_name, "w") as f:
json.dump(data, f, indent=4)

View File

@@ -1,69 +1,83 @@
#!/usr/bin/env python
import numpy as np
from benchmark.Experiments import Datasets
from benchmark.Results import Report, Excel, SQL, ReportBest
from benchmark.Utils import (
Files,
TextColor,
)
from benchmark.Results import Report, Excel, SQL, ReportBest, ReportDatasets
from benchmark.Utils import Files
from benchmark.Arguments import Arguments
"""Build report on screen of a result file, optionally generate excel and sql
file, and can compare results of report with best results obtained by model
file, and can compare results of report wibth best results obtained by model
If no argument is set, displays the datasets and its characteristics
"""
def default_report():
sets = Datasets()
color_line = TextColor.LINE1
print(color_line, end="")
print(f"{'Dataset':30s} Samp. Feat Cls Balance")
print("=" * 30 + " ===== ==== === " + "=" * 40)
for line in sets:
X, y = sets.load(line)
color_line = (
TextColor.LINE2
if color_line == TextColor.LINE1
else TextColor.LINE1
def main(args_test=None):
is_test = args_test is not None
arguments = Arguments(prog="be_report")
arguments.add_subparser()
arguments.add_subparsers_options(
(
"best",
"Report best results obtained by any model/score. "
"See be_build_best",
),
[
("model", dict(required=False)),
("score", dict(required=False)),
],
)
values, counts = np.unique(y, return_counts=True)
comp = ""
sep = ""
for value, count in zip(values, counts):
comp += f"{sep}{count/sum(counts)*100:5.2f}%"
sep = "/ "
print(color_line, end="")
print(
f"{line:30s} {X.shape[0]:5,d} {X.shape[1]:4d} "
f"{len(np.unique(y)):3d} {comp:40s}"
arguments.add_subparsers_options(
(
"grid",
"Report grid results obtained by any model/score. "
"See be_build_grid",
),
[
("model", dict(required=False)),
("score", dict(required=False)),
],
)
def main():
arguments = Arguments()
arguments.xset("file").xset("excel").xset("sql").xset("compare")
arguments.xset("best").xset("grid").xset("model", required=False).xset(
"score"
arguments.add_subparsers_options(
("file", "Report file results"),
[
("file_name", {}),
("excel", {}),
("sql", {}),
("compare", {}),
],
)
args = arguments.parse()
if args.grid:
args.best = False
if args.file is None and args.best is None:
default_report()
else:
if args.best is not None or args.grid is not None:
report = ReportBest(args.score, args.model, args.best, args.grid)
arguments.add_subparsers_options(
("datasets", "Report datasets information"),
[
("excel", {}),
],
)
args = arguments.parse(args_test)
match args.subcommand:
case "best" | "grid":
best = args.subcommand == "best"
report = ReportBest(args.score, args.model, best)
report.report()
else:
report = Report(args.file, args.compare)
case "file":
try:
report = Report(args.file_name, args.compare)
report.report()
except FileNotFoundError as e:
print(e)
return
if args.sql:
sql = SQL(args.file_name)
sql.report()
if args.excel:
excel = Excel(
file_name=args.file_name,
compare=args.compare,
)
excel.report()
Files.open(excel.get_file_name(), is_test)
case "datasets":
report = ReportDatasets(args.excel)
report.report()
if args.excel:
excel = Excel(args.file, args.compare)
excel.report()
Files.open(excel.get_file_name())
if args.sql:
sql = SQL(args.file)
sql.report()
Files.open(report.get_file_name(), is_test)
case _:
arguments.print_help()

View File

@@ -3,22 +3,27 @@ from benchmark.Results import Summary
from benchmark.Arguments import ALL_METRICS, Arguments
def main():
def main(args_test=None):
arguments = Arguments()
metrics = list(ALL_METRICS)
metrics.append("all")
arguments.xset("score", choices=metrics).xset("model", required=False)
args = arguments.parse()
arguments.xset("score", choices=metrics).xset("model")
args = arguments.parse(args_test)
metrics = ALL_METRICS if args.score == "all" else [args.score]
summary = Summary()
summary.acquire()
for metric in metrics:
title = f"BEST RESULT of {metric} for {args.model}"
try:
best = summary.best_result(
criterion="model", value=args.model, score=metric
)
except ValueError as e:
print(e)
else:
summary.show_result(data=best, title=title)
summary.show_result(
summary.best_result(score=metric), title=f"BEST RESULT of {metric}"
summary.best_result(score=metric),
title=f"BEST RESULT of {metric}",
)
summary.show_top(score=metric, n=10)

View File

@@ -1,48 +0,0 @@
#!/usr/bin/env python
import sys
import time
from benchmark.Experiments import Datasets
from mufs import MUFS
def main():
mufs_i = MUFS()
mufs_c = MUFS()
mufs_f = MUFS()
datasets = Datasets()
iwss_t = iwss_tl = cfs_t = cfs_tl = fcbf_t = fcbf_tl = 0
for i in datasets:
X, y = datasets.load(i)
now = time.time()
mufs_i.iwss(X, y, float(sys.argv[1]))
iwss = time.time() - now
iwss_r = len(mufs_i.get_results())
now = time.time()
mufs_c.cfs(X, y)
cfs = time.time() - now
cfs_r = len(mufs_c.get_results())
now = time.time()
mufs_f.fcbf(X, y, 1e-5)
fcbf = time.time() - now
fcbf_r = len(mufs_f.get_results())
print(
f"{i:30s} {iwss:.4f}({iwss_r:2d}) {cfs:.4f}({cfs_r:2d}) {fcbf:.4f}"
f"({fcbf_r:2d})"
)
iwss_t += iwss
iwss_tl += iwss_r
cfs_t += cfs
cfs_tl += cfs_r
fcbf_t += fcbf
fcbf_tl += fcbf_r
num = len(list(datasets))
iwss_t /= num
iwss_tl /= num
cfs_t /= num
cfs_tl /= num
fcbf_t /= num
fcbf_tl /= num
print(
f"{'Average ..: ':30s} {iwss_t:.4f}({iwss_tl:.2f}) {cfs_t:.4f}"
f"({cfs_tl:.2f}) {fcbf_t:.4f}({fcbf_tl:.2f})"
)

View File

@@ -5,3 +5,5 @@ model=ODTE
stratified=0
# Source of data Tanveer/Surcov
source_data=Tanveer
seeds=[57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]
discretize=0

View File

@@ -0,0 +1,8 @@
score=accuracy
platform=MacBookpro16
n_folds=5
model=ODTE
stratified=0
source_data=Arff
seeds=[271, 314, 171]
discretize=1

View File

@@ -5,3 +5,5 @@ model=ODTE
stratified=0
# Source of data Tanveer/Surcov
source_data=Tanveer
seeds=[57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]
discretize=0

View File

@@ -5,3 +5,5 @@ model=ODTE
stratified=0
# Source of data Tanveer/Surcov
source_data=Surcov
seeds=[57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]
discretize=0

2
benchmark/tests/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
ReportDatasets.xlsx
some_results.xlsx

View File

@@ -0,0 +1,121 @@
import os
from io import StringIO
from unittest.mock import patch
from .TestBase import TestBase
from ..Arguments import Arguments, ALL_METRICS, NO_ENV
class ArgumentsTest(TestBase):
def build_args(self):
arguments = Arguments()
arguments.xset("n_folds").xset("model", mandatory=True)
arguments.xset("key", required=True)
return arguments
def test_build_hyperparams_file(self):
expected_metrics = (
"accuracy",
"f1-macro",
"f1-micro",
"f1-weighted",
"roc-auc-ovr",
)
self.assertSequenceEqual(ALL_METRICS, expected_metrics)
def test_parameters(self):
expected_parameters = {
"best_paramfile": ("-b", "--best_paramfile"),
"color": ("-c", "--color"),
"compare": ("-c", "--compare"),
"dataset": ("-d", "--dataset"),
"excel": ("-x", "--excel"),
"grid_paramfile": ("-g", "--grid_paramfile"),
"hidden": ("--hidden",),
"hyperparameters": ("-p", "--hyperparameters"),
"key": ("-k", "--key"),
"lose": ("-l", "--lose"),
"model": ("-m", "--model"),
"model1": ("-m1", "--model1"),
"model2": ("-m2", "--model2"),
"nan": ("--nan",),
"number": ("-n", "--number"),
"n_folds": ("-n", "--n_folds"),
"platform": ("-P", "--platform"),
"quiet": ("-q", "--quiet"),
"report": ("-r", "--report"),
"score": ("-s", "--score"),
"sql": ("-q", "--sql"),
"stratified": ("-t", "--stratified"),
"tex_output": ("-t", "--tex-output"),
"title": ("--title",),
"win": ("-w", "--win"),
}
arg = Arguments()
for key, value in expected_parameters.items():
self.assertSequenceEqual(arg.parameters[key][0], value, key)
def test_xset(self):
arguments = self.build_args()
test_args = ["-n", "3", "--model", "SVC", "-k", "metric"]
args = arguments.parse(test_args)
self.assertEqual(args.n_folds, 3)
self.assertEqual(args.model, "SVC")
self.assertEqual(args.key, "metric")
@patch("sys.stderr", new_callable=StringIO)
def test_xset_mandatory(self, stderr):
arguments = self.build_args()
test_args = ["-n", "3", "-k", "date"]
with self.assertRaises(SystemExit):
arguments.parse(test_args)
self.assertRegexpMatches(
stderr.getvalue(),
r"error: the following arguments are required: -m/--model",
)
@patch("sys.stderr", new_callable=StringIO)
def test_xset_required(self, stderr):
arguments = self.build_args()
test_args = ["-n", "3", "-m", "SVC"]
with self.assertRaises(SystemExit):
arguments.parse(test_args)
self.assertRegexpMatches(
stderr.getvalue(),
r"error: the following arguments are required: -k/--key",
)
@patch("sys.stderr", new_callable=StringIO)
def test_no_env(self, stderr):
path = os.getcwd()
os.chdir("..")
try:
self.build_args()
except SystemExit:
pass
finally:
os.chdir(path)
self.assertEqual(stderr.getvalue(), f"{NO_ENV}\n")
@patch("sys.stderr", new_callable=StringIO)
def test_overrides(self, stderr):
arguments = self.build_args()
arguments.xset("title")
arguments.xset("dataset", overrides="title", const="sample text")
test_args = ["-n", "3", "-m", "SVC", "-k", "1", "-d", "dataset"]
args = arguments.parse(test_args)
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(args.title, "sample text")
@patch("sys.stderr", new_callable=StringIO)
def test_overrides_no_args(self, stderr):
arguments = self.build_args()
arguments.xset("title")
arguments.xset("dataset", overrides="title", const="sample text")
test_args = None
with self.assertRaises(SystemExit):
arguments.parse(test_args)
self.assertRegexpMatches(
stderr.getvalue(),
r"error: the following arguments are required: -m/--model, "
"-k/--key, --title",
)

View File

@@ -3,24 +3,21 @@ from io import StringIO
from unittest.mock import patch
from openpyxl import load_workbook
from .TestBase import TestBase
from ..Utils import Folders, Files
from ..Utils import Folders, Files, NO_RESULTS
from ..Results import Benchmark
from .._version import __version__
class BenchmarkTest(TestBase):
def tearDown(self) -> None:
benchmark = Benchmark("accuracy", visualize=False)
files = [
"exreport_accuracy.csv",
"exreport_accuracy.txt",
"exreport_accuracy.xlsx",
"exreport_err_accuracy.txt",
"exreport_err_unknown.txt",
"exreport_unknown.csv",
"exreport_unknown.txt",
"Rplots.pdf",
benchmark.get_tex_file(),
]
files = []
for score in ["accuracy", "unknown"]:
files.append(Files.exreport(score))
files.append(Files.exreport_output(score))
files.append(Files.exreport_err(score))
files.append(Files.exreport_excel(score))
files.append(Files.exreport_pdf)
files.append(Files.tex_output("accuracy"))
self.remove_files(files, Folders.exreport)
self.remove_files(files, ".")
return super().tearDown()
@@ -29,27 +26,25 @@ class BenchmarkTest(TestBase):
benchmark = Benchmark("accuracy", visualize=False)
benchmark.compile_results()
benchmark.save_results()
self.check_file_file(
benchmark.get_result_file_name(), "exreport_csv.test"
)
self.check_file_file(benchmark.get_result_file_name(), "exreport_csv")
def test_exreport_report(self):
benchmark = Benchmark("accuracy", visualize=False)
benchmark.compile_results()
benchmark.save_results()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
benchmark.report(tex_output=False)
self.check_output_file(fake_out, "exreport_report.test")
self.check_output_file(stdout, "exreport_report")
def test_exreport(self):
benchmark = Benchmark("accuracy", visualize=False)
benchmark.compile_results()
benchmark.save_results()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
benchmark.exreport()
with open(os.path.join(self.test_files, "exreport.test")) as f:
expected_t = f.read()
computed_t = fake_out.getvalue()
computed_t = stdout.getvalue()
computed_t = computed_t.split("\n")
computed_t.pop(0)
for computed, expected in zip(computed_t, expected_t.split("\n")):
@@ -70,24 +65,39 @@ class BenchmarkTest(TestBase):
self.assertFalse(os.path.exists(Folders.report))
def test_exreport_error(self):
benchmark = Benchmark("unknown", visualize=False)
benchmark = Benchmark("accuracy", visualize=False)
benchmark.compile_results()
benchmark.save_results()
with patch(self.output, new=StringIO()) as fake_out:
# Make Rscript exreport fail
benchmark._score = "unknown"
with patch(self.output, new=StringIO()) as stdout:
benchmark.exreport()
self.check_output_file(fake_out, "exreport_error.test")
self.check_output_file(stdout, "exreport_error")
def test_exreport_no_data(self):
benchmark = Benchmark("f1-weighted", visualize=False)
with self.assertRaises(ValueError) as msg:
benchmark.compile_results()
self.assertEqual(str(msg.exception), NO_RESULTS)
def test_tex_output(self):
benchmark = Benchmark("accuracy", visualize=False)
benchmark.compile_results()
benchmark.save_results()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
benchmark.report(tex_output=True)
with open(os.path.join(self.test_files, "exreport_report.test")) as f:
expected = f.read()
self.assertEqual(fake_out.getvalue(), expected)
self.check_output_file(stdout, "exreport_report")
self.assertTrue(os.path.exists(benchmark.get_tex_file()))
self.check_file_file(benchmark.get_tex_file(), "exreport_tex.test")
self.check_file_file(benchmark.get_tex_file(), "exreport_tex")
@staticmethod
def generate_excel_sheet(test, sheet, file_name):
with open(os.path.join("test_files", file_name), "w") as f:
for row in range(1, sheet.max_row + 1):
for col in range(1, sheet.max_column + 1):
value = sheet.cell(row=row, column=col).value
if value is not None:
print(f'{row};{col};"{value}"', file=f)
def test_excel_output(self):
benchmark = Benchmark("accuracy", visualize=False)
@@ -98,9 +108,16 @@ class BenchmarkTest(TestBase):
benchmark.excel()
file_name = benchmark.get_excel_file_name()
book = load_workbook(file_name)
replace = None
with_this = None
for sheet_name in book.sheetnames:
sheet = book[sheet_name]
self.check_excel_sheet(sheet, f"exreport_excel_{sheet_name}.test")
# ExcelTest.generate_excel_sheet(
# self, sheet, f"exreport_excel_{sheet_name}.test"
# )
if sheet_name == "Datasets":
replace = self.benchmark_version
with_this = __version__
self.check_excel_sheet(
sheet,
f"exreport_excel_{sheet_name}",
replace=replace,
with_this=with_this,
)

View File

@@ -1,6 +1,7 @@
import os
from .TestBase import TestBase
from ..Experiments import BestResults, Datasets
from ..Experiments import BestResults
from ..Datasets import Datasets
class BestResultTest(TestBase):
@@ -8,7 +9,7 @@ class BestResultTest(TestBase):
expected = {
"balance-scale": [
0.98,
{"splitter": "iwss", "max_features": "auto"},
{"splitter": "best", "max_features": "auto"},
"results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json",
],
"balloons": [
@@ -62,3 +63,12 @@ class BestResultTest(TestBase):
best.fill({}),
{"balance-scale": (0.0, {}, ""), "balloons": (0.0, {}, "")},
)
def test_build_error(self):
dt = Datasets()
model = "SVC"
best = BestResults(
score="accuracy", model=model, datasets=dt, quiet=True
)
with self.assertRaises(ValueError):
best.build()

View File

@@ -1,6 +1,7 @@
import shutil
from .TestBase import TestBase
from ..Experiments import Randomized, Datasets
from ..Experiments import Randomized
from ..Datasets import Datasets
class DatasetTest(TestBase):
@@ -22,12 +23,31 @@ class DatasetTest(TestBase):
def test_Randomized(self):
expected = [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]
self.assertSequenceEqual(Randomized.seeds, expected)
self.assertSequenceEqual(Randomized.seeds(), expected)
def test_Randomized_3_seeds(self):
self.set_env(".env.arff")
expected = [271, 314, 171]
self.assertSequenceEqual(Randomized.seeds(), expected)
def test_load_dataframe(self):
self.set_env(".env.arff")
dt = Datasets()
X, y = dt.load("iris", dataframe=False)
dataset = dt.load("iris", dataframe=True)
class_name = dt.get_class_name()
features = dt.get_features()
self.assertListEqual(y.tolist(), dataset[class_name].tolist())
for i in range(len(features)):
self.assertListEqual(
X[:, i].tolist(), dataset[features[i]].tolist()
)
def test_Datasets_iterator(self):
test = {
".env.dist": ["balance-scale", "balloons"],
".env.surcov": ["iris", "wine"],
".env.arff": ["iris", "wine"],
}
for key, value in test.items():
self.set_env(key)
@@ -45,10 +65,28 @@ class DatasetTest(TestBase):
self.assertSequenceEqual(computed, value)
self.set_env(".env.dist")
def test_load_dataset(self):
dt = Datasets()
X, y = dt.load("balance-scale")
self.assertSequenceEqual(X.shape, (625, 4))
self.assertSequenceEqual(y.shape, (625,))
def test_create_with_unknown_dataset(self):
with self.assertRaises(ValueError) as msg:
Datasets("unknown")
self.assertEqual(str(msg.exception), "Unknown dataset: unknown")
def test_load_unknown_dataset(self):
dt = Datasets()
with self.assertRaises(ValueError) as msg:
dt.load("unknown")
self.assertEqual(str(msg.exception), "Unknown dataset: unknown")
def test_Datasets_subset(self):
test = {
".env.dist": "balloons",
".env.surcov": "wine",
".env.arff": "iris",
}
for key, value in test.items():
self.set_env(key)

View File

@@ -23,7 +23,7 @@ class ExcelTest(TestBase):
file_output = report.get_file_name()
book = load_workbook(file_output)
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel_compared.test")
self.check_excel_sheet(sheet, "excel_compared")
def test_report_excel(self):
file_name = "results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json"
@@ -32,7 +32,7 @@ class ExcelTest(TestBase):
file_output = report.get_file_name()
book = load_workbook(file_output)
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel.test")
self.check_excel_sheet(sheet, "excel")
def test_Excel_Add_sheet(self):
file_name = "results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json"
@@ -48,6 +48,6 @@ class ExcelTest(TestBase):
book.close()
book = load_workbook(os.path.join(Folders.results, excel_file_name))
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel_add_STree.test")
self.check_excel_sheet(sheet, "excel_add_STree")
sheet = book["ODTE"]
self.check_excel_sheet(sheet, "excel_add_ODTE.test")
self.check_excel_sheet(sheet, "excel_add_ODTE")

View File

@@ -1,6 +1,7 @@
import json
from .TestBase import TestBase
from ..Experiments import Experiment, Datasets
from ..Experiments import Experiment
from ..Datasets import Datasets
class ExperimentTest(TestBase):
@@ -36,7 +37,7 @@ class ExperimentTest(TestBase):
expected = {
"balance-scale": [
0.98,
{"splitter": "iwss", "max_features": "auto"},
{"splitter": "best", "max_features": "auto"},
"results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json",
],
"balloons": [

View File

@@ -1,6 +1,7 @@
import json
from .TestBase import TestBase
from ..Experiments import GridSearch, Datasets
from ..Experiments import GridSearch
from ..Datasets import Datasets
class GridSearchTest(TestBase):
@@ -37,7 +38,8 @@ class GridSearchTest(TestBase):
],
".",
)
_ = self.build_exp()
grid = self.build_exp()
grid._init_data()
# check the output file is initialized
with open(file_name) as f:
data = json.load(f)
@@ -76,7 +78,9 @@ class GridSearchTest(TestBase):
"v. 1.2.4, Computed on Test on 2022-02-22 at 12:00:00 took 1s",
],
}
self.assertSequenceEqual(data, expected)
for key, value in expected.items():
self.assertEqual(data[key][0], value[0])
self.assertDictEqual(data[key][1], value[1])
def test_duration_message(self):
expected = ["47.234s", "5.421m", "1.177h"]

View File

@@ -15,6 +15,8 @@ from odte import Odte
from xgboost import XGBClassifier
from .TestBase import TestBase
from ..Models import Models
import xgboost
import sklearn
class ModelTest(TestBase):
@@ -33,6 +35,38 @@ class ModelTest(TestBase):
for key, value in test.items():
self.assertIsInstance(Models.get_model(key), value)
def test_Models_version(self):
def ver_stree():
return "1.2.3"
def ver_wodt():
return "h.j.k"
def ver_odte():
return "4.5.6"
test = {
"STree": [ver_stree, "1.2.3"],
"Wodt": [ver_wodt, "h.j.k"],
"ODTE": [ver_odte, "4.5.6"],
"RandomForest": [None, "7.8.9"],
"BaggingStree": [None, "x.y.z"],
"AdaBoostStree": [None, "w.x.z"],
"XGBoost": [None, "10.11.12"],
}
for key, value in test.items():
clf = Models.get_model(key)
if key in ["STree", "Wodt", "ODTE"]:
clf.version = value[0]
elif key == "XGBoost":
xgboost.__version__ = value[1]
else:
sklearn.__version__ = value[1]
self.assertEqual(Models.get_version(key, clf), value[1])
def test_bogus_Model_Version(self):
self.assertEqual(Models.get_version("unknown", None), "Error")
def test_BaggingStree(self):
clf = Models.get_model("BaggingStree")
self.assertIsInstance(clf, BaggingClassifier)
@@ -80,7 +114,6 @@ class ModelTest(TestBase):
"GBC": ((15, 8, 3), 1.0),
}
X, y = load_wine(return_X_y=True)
print("")
for key, (value, score_expected) in test.items():
clf = Models.get_model(key, random_state=1)
clf.fit(X, y)
@@ -91,5 +124,16 @@ class ModelTest(TestBase):
# score_expected,
# score_computed,
# )
self.assertSequenceEqual(Models.get_complexity(key, clf), value)
# Fix flaky test
if key == "AdaBoostStree":
# computed values
a_c, b_c, c_c = Models.get_complexity(key, clf)
# expected values
a_e, b_e, c_e = value
for c, e in zip((a_c, b_c, c_c), (a_e, b_e, c_e)):
self.assertAlmostEqual(c, e, delta=0.25)
else:
self.assertSequenceEqual(
Models.get_complexity(key, clf), value
)
self.assertEqual(score_computed, score_expected, key)

View File

@@ -1,4 +1,3 @@
import os
from io import StringIO
from unittest.mock import patch
from .TestBase import TestBase
@@ -19,35 +18,32 @@ class PairCheckTest(TestBase):
def test_pair_check(self):
report = self.build_model(model1="ODTE", model2="STree")
report.compute()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.report()
computed = fake_out.getvalue()
with open(os.path.join(self.test_files, "paircheck.test"), "r") as f:
expected = f.read()
self.assertEqual(computed, expected)
self.check_output_file(stdout, "paircheck")
def test_pair_check_win(self):
report = self.build_model(win=True)
report.compute()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "paircheck_win.test")
self.check_output_file(stdout, "paircheck_win")
def test_pair_check_lose(self):
report = self.build_model(
model1="RandomForest", model2="STree", lose=True
)
report.compute()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "paircheck_lose.test")
self.check_output_file(stdout, "paircheck_lose")
def test_pair_check_win_lose(self):
report = self.build_model(win=True, lose=True)
report.compute()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "paircheck_win_lose.test")
self.check_output_file(stdout, "paircheck_win_lose")
def test_pair_check_store_result(self):
report = self.build_model(win=True, lose=True)

View File

@@ -1,16 +1,20 @@
import os
from io import StringIO
from unittest.mock import patch
from .TestBase import TestBase
from ..Results import Report, BaseReport, ReportBest
from ..Results import Report, BaseReport, ReportBest, ReportDatasets, get_input
from ..Utils import Symbols
class ReportTest(TestBase):
def test_get_input(self):
self.assertEqual(get_input(is_test=True), "test")
def test_BaseReport(self):
with patch.multiple(BaseReport, __abstractmethods__=set()):
file_name = (
"results/results_accuracy_STree_iMac27_2021-09-30_11:"
"42:07_0.json"
file_name = os.path.join(
"results",
"results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json",
)
a = BaseReport(file_name)
self.assertIsNone(a.header())
@@ -19,21 +23,23 @@ class ReportTest(TestBase):
def test_report_with_folder(self):
report = Report(
file_name="results/results_accuracy_STree_iMac27_2021-09-30_11:"
"42:07_0.json"
file_name=os.path.join(
"results",
"results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json",
)
with patch(self.output, new=StringIO()) as fake_out:
)
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "report.test")
self.check_output_file(stdout, "report")
def test_report_without_folder(self):
report = Report(
file_name="results_accuracy_STree_iMac27_2021-09-30_11:42:07_0"
".json"
)
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "report.test")
self.check_output_file(stdout, "report")
def test_report_compared(self):
report = Report(
@@ -41,9 +47,9 @@ class ReportTest(TestBase):
".json",
compare=True,
)
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "report_compared.test")
self.check_output_file(stdout, "report_compared")
def test_compute_status(self):
file_name = "results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json"
@@ -63,19 +69,37 @@ class ReportTest(TestBase):
_ = Report("unknown_file")
def test_report_best(self):
report = ReportBest("accuracy", "STree", best=True, grid=False)
with patch(self.output, new=StringIO()) as fake_out:
report = ReportBest("accuracy", "STree", best=True)
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "report_best.test")
self.check_output_file(stdout, "report_best")
def test_report_grid(self):
report = ReportBest("accuracy", "STree", best=False, grid=True)
with patch(self.output, new=StringIO()) as fake_out:
report = ReportBest("accuracy", "STree", best=False)
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_file(fake_out, "report_grid.test")
file_name = "report_grid.test"
with open(os.path.join(self.test_files, file_name)) as f:
expected = f.read().splitlines()
output_text = stdout.getvalue().splitlines()
# Compare replacing STree version
for line, index in zip(expected, range(len(expected))):
if self.stree_version in line:
# replace STree version
line = self.replace_STree_version(line, output_text, index)
def test_report_best_both(self):
report = ReportBest("accuracy", "STree", best=True, grid=True)
with patch(self.output, new=StringIO()) as fake_out:
self.assertEqual(line, output_text[index])
@patch("sys.stdout", new_callable=StringIO)
def test_report_datasets(self, mock_output):
report = ReportDatasets()
report.report()
self.check_output_file(fake_out, "report_best.test")
file_name = f"report_datasets{self.ext}"
with open(os.path.join(self.test_files, file_name)) as f:
expected = f.read()
output_text = mock_output.getvalue().splitlines()
for line, index in zip(expected.splitlines(), range(len(expected))):
if self.benchmark_version in line:
# replace benchmark version
line = self.replace_benchmark_version(line, output_text, index)
self.assertEqual(line, output_text[index])

View File

@@ -19,4 +19,4 @@ class SQLTest(TestBase):
file_name = os.path.join(
Folders.results, file_name.replace(".json", ".sql")
)
self.check_file_file(file_name, "sql.test")
self.check_file_file(file_name, "sql")

View File

@@ -2,6 +2,7 @@ from io import StringIO
from unittest.mock import patch
from .TestBase import TestBase
from ..Results import Summary
from ..Utils import NO_RESULTS
class SummaryTest(TestBase):
@@ -130,60 +131,60 @@ class SummaryTest(TestBase):
def test_summary_list_results_model(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.list_results(model="STree")
self.check_output_file(fake_out, "summary_list_model.test")
self.check_output_file(stdout, "summary_list_model")
def test_summary_list_results_score(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.list_results(score="accuracy")
self.check_output_file(fake_out, "summary_list_score.test")
self.check_output_file(stdout, "summary_list_score")
def test_summary_list_results_n(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.list_results(score="accuracy", number=3)
self.check_output_file(fake_out, "summary_list_n.test")
self.check_output_file(stdout, "summary_list_n")
def test_summary_list_hidden(self):
report = Summary(hidden=True)
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.list_results(score="accuracy")
self.check_output_file(fake_out, "summary_list_hidden.test")
self.check_output_file(stdout, "summary_list_hidden")
def test_show_result_no_title(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
title = ""
best = report.best_result(
criterion="model", value="STree", score="accuracy"
)
report.show_result(data=best, title=title)
self.check_output_file(fake_out, "summary_show_results.test")
self.check_output_file(stdout, "summary_show_results")
def test_show_result_title(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
title = "**Title**"
best = report.best_result(
criterion="model", value="STree", score="accuracy"
)
report.show_result(data=best, title=title)
self.check_output_file(fake_out, "summary_show_results_title.test")
self.check_output_file(stdout, "summary_show_results_title")
def test_show_result_no_data(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
title = "**Test**"
report.show_result(data={}, title=title)
computed = fake_out.getvalue()
computed = stdout.getvalue()
expected = "** **Test** has No data **\n"
self.assertEqual(computed, expected)
@@ -212,6 +213,20 @@ class SummaryTest(TestBase):
def test_show_top(self):
report = Summary()
report.acquire()
with patch(self.output, new=StringIO()) as fake_out:
with patch(self.output, new=StringIO()) as stdout:
report.show_top()
self.check_output_file(fake_out, "summary_show_top.test")
self.check_output_file(stdout, "summary_show_top")
@patch("sys.stdout", new_callable=StringIO)
def test_show_top_no_data(self, stdout):
report = Summary()
report.acquire()
report.show_top(score="f1-macro")
self.assertEqual(stdout.getvalue(), f"{NO_RESULTS}\n")
def test_no_data(self):
report = Summary()
report.acquire()
with self.assertRaises(ValueError) as msg:
report.list_results(score="f1-macro", model="STree")
self.assertEqual(str(msg.exception), NO_RESULTS)

View File

@@ -1,6 +1,12 @@
import os
import glob
import pathlib
import sys
import csv
import unittest
from importlib import import_module
from io import StringIO
from unittest.mock import patch
class TestBase(unittest.TestCase):
@@ -8,6 +14,9 @@ class TestBase(unittest.TestCase):
os.chdir(os.path.dirname(os.path.abspath(__file__)))
self.test_files = "test_files"
self.output = "sys.stdout"
self.ext = ".test"
self.benchmark_version = "0.2.0"
self.stree_version = "1.2.4"
super().__init__(*args, **kwargs)
def remove_files(self, files, folder):
@@ -24,7 +33,10 @@ class TestBase(unittest.TestCase):
if value is not None:
print(f'{row};{col};"{value}"', file=f)
def check_excel_sheet(self, sheet, file_name):
def check_excel_sheet(
self, sheet, file_name, replace=None, with_this=None
):
file_name += self.ext
with open(os.path.join(self.test_files, file_name), "r") as f:
expected = csv.reader(f, delimiter=";")
for row, col, value in expected:
@@ -35,16 +47,63 @@ class TestBase(unittest.TestCase):
value = float(value)
except ValueError:
pass
if replace is not None and isinstance(value, str):
if replace in value:
value = value.replace(replace, with_this)
self.assertEqual(sheet.cell(int(row), int(col)).value, value)
def check_output_file(self, output, file_name):
file_name += self.ext
with open(os.path.join(self.test_files, file_name)) as f:
expected = f.read()
self.assertEqual(output.getvalue(), expected)
def replace_STree_version(self, line, output, index):
idx = line.find(self.stree_version)
return line.replace(self.stree_version, output[index][idx : idx + 5])
def replace_benchmark_version(self, line, output, index):
idx = line.find(self.benchmark_version)
return line.replace(
self.benchmark_version, output[index][idx : idx + 5]
)
def check_file_file(self, computed_file, expected_file):
with open(computed_file) as f:
computed = f.read()
expected_file += self.ext
with open(os.path.join(self.test_files, expected_file)) as f:
expected = f.read()
self.assertEqual(computed, expected)
def check_output_lines(self, stdout, file_name, lines_to_compare):
with open(os.path.join(self.test_files, f"{file_name}.test")) as f:
expected = f.read()
computed_data = stdout.getvalue().splitlines()
n_line = 0
# compare only report lines without date, time, duration...
for expected, computed in zip(expected.splitlines(), computed_data):
if n_line in lines_to_compare:
self.assertEqual(computed, expected, n_line)
n_line += 1
def prepare_scripts_env(self):
self.scripts_folder = os.path.join(
os.path.dirname(os.path.abspath(__file__)), "..", "scripts"
)
sys.path.append(self.scripts_folder)
def search_script(self, name):
py_files = glob.glob(os.path.join(self.scripts_folder, "*.py"))
for py_file in py_files:
module_name = pathlib.Path(py_file).stem
if name == module_name:
module = import_module(module_name)
return module
@patch("sys.stdout", new_callable=StringIO)
@patch("sys.stderr", new_callable=StringIO)
def execute_script(self, script, args, stderr, stdout):
module = self.search_script(script)
module.main(args)
return stdout, stderr

View File

@@ -129,7 +129,11 @@ class UtilTest(TestBase):
)
self.assertCountEqual(
Files().get_all_results(hidden=True),
["results_accuracy_STree_iMac27_2021-11-01_23:55:16_0.json"],
[
"results_accuracy_STree_iMac27_2021-11-01_23:55:16_0.json",
"results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_"
"0.json",
],
)
def test_Files_get_results_Error(self):
@@ -174,6 +178,8 @@ class UtilTest(TestBase):
"model": "ODTE",
"stratified": "0",
"source_data": "Tanveer",
"seeds": "[57, 31, 1714, 17, 23, 79, 83, 97, 7, 1]",
"discretize": "0",
}
computed = EnvData().load()
self.assertDictEqual(computed, expected)

View File

@@ -10,6 +10,17 @@ from .SQL_test import SQLTest
from .Benchmark_test import BenchmarkTest
from .Summary_test import SummaryTest
from .PairCheck_test import PairCheckTest
from .Arguments_test import ArgumentsTest
from .scripts.Be_Pair_check_test import BePairCheckTest
from .scripts.Be_List_test import BeListTest
from .scripts.Be_Init_Project_test import BeInitProjectTest
from .scripts.Be_Report_test import BeReportTest
from .scripts.Be_Summary_test import BeSummaryTest
from .scripts.Be_Grid_test import BeGridTest
from .scripts.Be_Best_test import BeBestTest
from .scripts.Be_Benchmark_test import BeBenchmarkTest
from .scripts.Be_Main_test import BeMainTest
from .scripts.Be_Print_Strees_test import BePrintStrees
all = [
"UtilTest",
@@ -24,5 +35,14 @@ all = [
"BenchmarkTest",
"SummaryTest",
"PairCheckTest",
"be_list",
"ArgumentsTest",
"BePairCheckTest",
"BeListTest",
"BeReportTest",
"BeSummaryTest",
"BeGridTest",
"BeBestTest",
"BeBenchmarkTest",
"BeMainTest",
"BePrintStrees",
]

View File

@@ -1,2 +1,2 @@
iris
wine
iris,class
wine,class

View File

@@ -0,0 +1,305 @@
% 1. Title: Hayes-Roth & Hayes-Roth (1977) Database
%
% 2. Source Information:
% (a) Creators: Barbara and Frederick Hayes-Roth
% (b) Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779
% (c) Date: March, 1989
%
% 3. Past Usage:
% 1. Hayes-Roth, B., & Hayes-Roth, F. (1977). Concept learning and the
% recognition and classification of exemplars. Journal of Verbal Learning
% and Verbal Behavior, 16, 321-338.
% -- Results:
% -- Human subjects classification and recognition performance:
% 1. decreases with distance from the prototype,
% 2. is better on unseen prototypes than old instances, and
% 3. improves with presentation frequency during learning.
% 2. Anderson, J.R., & Kline, P.J. (1979). A learning system and its
% psychological implications. In Proceedings of the Sixth International
% Joint Conference on Artificial Intelligence (pp. 16-21). Tokyo, Japan:
% Morgan Kaufmann.
% -- Partitioned the results into 4 classes:
% 1. prototypes
% 2. near-prototypes with high presentation frequency during learning
% 3. near-prototypes with low presentation frequency during learning
% 4. instances that are far from protoypes
% -- Described evidence that ACT's classification confidence and
% recognition behaviors closely simulated human subjects' behaviors.
% 3. Aha, D.W. (1989). Incremental learning of independent, overlapping, and
% graded concept descriptions with an instance-based process framework.
% Manuscript submitted for publication.
% -- Used same partition as Anderson & Kline
% -- Described evidence that Bloom's classification confidence behavior
% is similar to the human subjects' behavior. Bloom fitted the data
% more closely than did ACT.
%
% 4. Relevant Information:
% This database contains 5 numeric-valued attributes. Only a subset of
% 3 are used during testing (the latter 3). Furthermore, only 2 of the
% 3 concepts are "used" during testing (i.e., those with the prototypes
% 000 and 111). I've mapped all values to their zero-indexing equivalents.
%
% Some instances could be placed in either category 0 or 1. I've followed
% the authors' suggestion, placing them in each category with equal
% probability.
%
% I've replaced the actual values of the attributes (i.e., hobby has values
% chess, sports and stamps) with numeric values. I think this is how
% the authors' did this when testing the categorization models described
% in the paper. I find this unfair. While the subjects were able to bring
% background knowledge to bear on the attribute values and their
% relationships, the algorithms were provided with no such knowledge. I'm
% uncertain whether the 2 distractor attributes (name and hobby) are
% presented to the authors' algorithms during testing. However, it is clear
% that only the age, educational status, and marital status attributes are
% given during the human subjects' transfer tests.
%
% 5. Number of Instances: 132 training instances, 28 test instances
%
% 6. Number of Attributes: 5 plus the class membership attribute. 3 concepts.
%
% 7. Attribute Information:
% -- 1. name: distinct for each instance and represented numerically
% -- 2. hobby: nominal values ranging between 1 and 3
% -- 3. age: nominal values ranging between 1 and 4
% -- 4. educational level: nominal values ranging between 1 and 4
% -- 5. marital status: nominal values ranging between 1 and 4
% -- 6. class: nominal value between 1 and 3
%
% 9. Missing Attribute Values: none
%
% 10. Class Distribution: see below
%
% 11. Detailed description of the experiment:
% 1. 3 categories (1, 2, and neither -- which I call 3)
% -- some of the instances could be classified in either class 1 or 2, and
% they have been evenly distributed between the two classes
% 2. 5 Attributes
% -- A. name (a randomly-generated number between 1 and 132)
% -- B. hobby (a randomly-generated number between 1 and 3)
% -- C. age (a number between 1 and 4)
% -- D. education level (a number between 1 and 4)
% -- E. marital status (a number between 1 and 4)
% 3. Classification:
% -- only attributes C-E are diagnostic; values for A and B are ignored
% -- Class Neither: if a 4 occurs for any attribute C-E
% -- Class 1: Otherwise, if (# of 1's)>(# of 2's) for attributes C-E
% -- Class 2: Otherwise, if (# of 2's)>(# of 1's) for attributes C-E
% -- Either 1 or 2: Otherwise, if (# of 2's)=(# of 1's) for attributes C-E
% 4. Prototypes:
% -- Class 1: 111
% -- Class 2: 222
% -- Class Either: 333
% -- Class Neither: 444
% 5. Number of training instances: 132
% -- Each instance presented 0, 1, or 10 times
% -- None of the prototypes seen during training
% -- 3 instances from each of categories 1, 2, and either are repeated
% 10 times each
% -- 3 additional instances from the Either category are shown during
% learning
% 5. Number of test instances: 28
% -- All 9 class 1
% -- All 9 class 2
% -- All 6 class Either
% -- All 4 prototypes
% --------------------
% -- 28 total
%
% Observations of interest:
% 1. Relative classification confidence of
% -- prototypes for classes 1 and 2 (2 instances)
% (Anderson calls these Class 1 instances)
% -- instances of class 1 with frequency 10 during training and
% instances of class 2 with frequency 10 during training that
% are 1 value away from their respective prototypes (6 instances)
% (Anderson calls these Class 2 instances)
% -- instances of class 1 with frequency 1 during training and
% instances of class 2 with frequency 1 during training that
% are 1 value away from their respective prototypes (6 instances)
% (Anderson calls these Class 3 instances)
% -- instances of class 1 with frequency 1 during training and
% instances of class 2 with frequency 1 during training that
% are 2 values away from their respective prototypes (6 instances)
% (Anderson calls these Class 4 instances)
% 2. Relative classification recognition of them also
%
% Some Expected results:
% Both frequency and distance from prototype will effect the classification
% accuracy of instances. Greater the frequency, higher the classification
% confidence. Closer to prototype, higher the classification confidence.
%
% Information about the dataset
% CLASSTYPE: nominal
% CLASSINDEX: last
%
@relation hayes-roth
@attribute hobby INTEGER
@attribute age INTEGER
@attribute educational_level INTEGER
@attribute marital_status INTEGER
@attribute class {1,2,3,4}
@data
2,1,1,2,1
2,1,3,2,2
3,1,4,1,3
2,4,2,2,3
1,1,3,4,3
1,1,3,2,2
3,1,3,2,2
3,4,2,4,3
2,2,1,1,1
3,2,1,1,1
1,2,1,1,1
2,2,3,4,3
1,1,2,1,1
2,1,2,2,2
2,4,1,4,3
1,1,3,3,1
3,2,1,2,2
1,2,1,1,1
3,3,2,1,1
3,1,3,2,1
1,2,2,1,2
3,2,1,3,1
2,1,2,1,1
3,2,1,3,1
2,3,2,1,1
3,2,2,1,2
3,2,1,3,2
2,1,2,2,2
1,1,3,2,1
3,2,1,1,1
1,4,1,1,3
2,2,1,3,1
1,2,1,3,2
1,1,1,2,1
2,4,3,1,3
3,1,2,2,2
1,1,2,2,2
3,2,2,1,2
1,2,1,2,2
3,4,3,2,3
2,2,2,1,2
2,2,1,2,2
3,2,1,3,2
3,2,1,1,1
3,1,2,1,1
1,2,1,3,2
2,1,1,2,1
1,1,1,2,1
1,2,2,3,2
3,3,1,1,1
3,3,3,1,1
3,2,1,2,2
3,2,1,2,2
3,1,2,1,1
1,1,1,2,1
2,1,3,2,1
2,2,2,1,2
2,1,2,1,1
2,2,1,3,1
2,1,2,2,2
1,2,4,2,3
2,2,1,2,2
1,1,2,4,3
1,3,2,1,1
2,4,4,2,3
2,3,2,1,1
3,1,2,2,2
1,1,2,2,2
1,3,2,4,3
1,1,2,2,2
3,1,4,2,3
2,1,3,2,2
1,1,3,2,2
3,1,3,2,1
1,2,4,4,3
1,4,2,1,3
2,1,2,1,1
3,4,1,2,3
2,2,1,1,1
1,1,2,1,1
2,2,4,3,3
3,1,2,2,2
1,1,3,2,1
1,2,1,3,1
1,4,4,1,3
3,3,3,2,2
2,2,1,3,2
3,3,2,1,2
1,1,1,3,1
2,2,1,2,2
2,2,2,1,2
2,3,2,3,2
1,3,2,1,2
2,2,1,2,2
1,1,1,2,1
3,2,2,1,2
3,2,1,1,1
1,1,2,1,1
3,1,4,4,3
3,3,2,1,2
2,3,2,1,2
2,1,3,1,1
1,2,1,2,2
3,1,1,2,1
2,2,4,1,3
1,2,2,1,2
2,3,2,1,2
2,2,1,4,3
1,4,2,3,3
2,2,1,1,1
1,2,1,1,1
2,2,3,2,2
1,3,2,1,1
3,1,2,1,1
3,1,1,2,1
3,3,1,4,3
2,3,4,1,3
1,2,3,3,2
3,3,2,2,2
3,3,4,2,3
1,2,2,1,2
2,1,1,4,3
3,1,2,2,2
3,2,2,4,3
2,3,1,3,1
2,1,1,2,1
3,4,1,3,3
1,1,4,3,3
2,1,2,1,1
1,2,1,2,2
1,2,2,1,2
3,1,1,2,1
1,1,1,2,1
1,1,2,1,1
1,2,1,1,1
1,1,1,3,1
1,1,3,1,1
1,3,1,1,1
1,1,3,3,1
1,3,1,3,1
1,3,3,1,1
1,2,2,1,2
1,2,1,2,2
1,1,2,2,2
1,2,2,3,2
1,2,3,2,2
1,3,2,2,2
1,2,3,3,2
1,3,2,3,2
1,3,3,2,2
1,1,3,2,1
1,3,2,1,2
1,2,1,3,1
1,2,3,1,2
1,1,2,3,1
1,3,1,2,2
1,1,1,1,1
1,2,2,2,2
1,3,3,3,1
1,4,4,4,3

View File

@@ -0,0 +1,225 @@
% 1. Title: Iris Plants Database
%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
% (c) Date: July, 1988
%
% 3. Past Usage:
% - Publications: too many to mention!!! Here are a few.
% 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
% Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
% to Mathematical Statistics" (John Wiley, NY, 1950).
% 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
% (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
% 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
% Structure and Classification Rule for Recognition in Partially Exposed
% Environments". IEEE Transactions on Pattern Analysis and Machine
% Intelligence, Vol. PAMI-2, No. 1, 67-71.
% -- Results:
% -- very low misclassification rates (0% for the setosa class)
% 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE
% Transactions on Information Theory, May 1972, 431-433.
% -- Results:
% -- very low misclassification rates again
% 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II
% conceptual clustering system finds 3 classes in the data.
%
% 4. Relevant Information:
% --- This is perhaps the best known database to be found in the pattern
% recognition literature. Fisher's paper is a classic in the field
% and is referenced frequently to this day. (See Duda & Hart, for
% example.) The data set contains 3 classes of 50 instances each,
% where each class refers to a type of iris plant. One class is
% linearly separable from the other 2; the latter are NOT linearly
% separable from each other.
% --- Predicted attribute: class of iris plant.
% --- This is an exceedingly simple domain.
%
% 5. Number of Instances: 150 (50 in each of three classes)
%
% 6. Number of Attributes: 4 numeric, predictive attributes and the class
%
% 7. Attribute Information:
% 1. sepal length in cm
% 2. sepal width in cm
% 3. petal length in cm
% 4. petal width in cm
% 5. class:
% -- Iris Setosa
% -- Iris Versicolour
% -- Iris Virginica
%
% 8. Missing Attribute Values: None
%
% Summary Statistics:
% Min Max Mean SD Class Correlation
% sepal length: 4.3 7.9 5.84 0.83 0.7826
% sepal width: 2.0 4.4 3.05 0.43 -0.4194
% petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
% petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
%
% 9. Class Distribution: 33.3% for each of 3 classes.
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%

View File

@@ -0,0 +1,302 @@
% 1. Title of Database: Wine recognition data
% Updated Sept 21, 1998 by C.Blake : Added attribute information
%
% 2. Sources:
% (a) Forina, M. et al, PARVUS - An Extendible Package for Data
% Exploration, Classification and Correlation. Institute of Pharmaceutical
% and Food Analysis and Technologies, Via Brigata Salerno,
% 16147 Genoa, Italy.
%
% (b) Stefan Aeberhard, email: stefan@coral.cs.jcu.edu.au
% (c) July 1991
% 3. Past Usage:
%
% (1)
% S. Aeberhard, D. Coomans and O. de Vel,
% Comparison of Classifiers in High Dimensional Settings,
% Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of
% Mathematics and Statistics, James Cook University of North Queensland.
% (Also submitted to Technometrics).
%
% The data was used with many others for comparing various
% classifiers. The classes are separable, though only RDA
% has achieved 100% correct classification.
% (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data))
% (All results using the leave-one-out technique)
%
% In a classification context, this is a well posed problem
% with "well behaved" class structures. A good data set
% for first testing of a new classifier, but not very
% challenging.
%
% (2)
% S. Aeberhard, D. Coomans and O. de Vel,
% "THE CLASSIFICATION PERFORMANCE OF RDA"
% Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of
% Mathematics and Statistics, James Cook University of North Queensland.
% (Also submitted to Journal of Chemometrics).
%
% Here, the data was used to illustrate the superior performance of
% the use of a new appreciation function with RDA.
%
% 4. Relevant Information:
%
% -- These data are the results of a chemical analysis of
% wines grown in the same region in Italy but derived from three
% different cultivars.
% The analysis determined the quantities of 13 constituents
% found in each of the three types of wines.
%
% -- I think that the initial data set had around 30 variables, but
% for some reason I only have the 13 dimensional version.
% I had a list of what the 30 or so variables were, but a.)
% I lost it, and b.), I would not know which 13 variables
% are included in the set.
%
% -- The attributes are (dontated by Riccardo Leardi,
% riclea@anchem.unige.it )
% 1) Alcohol
% 2) Malic acid
% 3) Ash
% 4) Alcalinity of ash
% 5) Magnesium
% 6) Total phenols
% 7) Flavanoids
% 8) Nonflavanoid phenols
% 9) Proanthocyanins
% 10)Color intensity
% 11)Hue
% 12)OD280/OD315 of diluted wines
% 13)Proline
%
% 5. Number of Instances
%
% class 1 59
% class 2 71
% class 3 48
%
% 6. Number of Attributes
%
% 13
%
% 7. For Each Attribute:
%
% All attributes are continuous
%
% No statistics available, but suggest to standardise
% variables for certain uses (e.g. for us with classifiers
% which are NOT scale invariant)
%
% NOTE: 1st attribute is class identifier (1-3)
%
% 8. Missing Attribute Values:
%
% None
%
% 9. Class Distribution: number of instances per class
%
% class 1 59
% class 2 71
% class 3 48
%
% Information about the dataset
% CLASSTYPE: nominal
% CLASSINDEX: first
%
@relation wine
@attribute class {1,2,3}
@attribute Alcohol REAL
@attribute Malic_acid REAL
@attribute Ash REAL
@attribute Alcalinity_of_ash REAL
@attribute Magnesium INTEGER
@attribute Total_phenols REAL
@attribute Flavanoids REAL
@attribute Nonflavanoid_phenols REAL
@attribute Proanthocyanins REAL
@attribute Color_intensity REAL
@attribute Hue REAL
@attribute OD280/OD315_of_diluted_wines REAL
@attribute Proline INTEGER
@data
1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050
1,13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185
1,14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480
1,13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735
1,14.2,1.76,2.45,15.2,112,3.27,3.39,.34,1.97,6.75,1.05,2.85,1450
1,14.39,1.87,2.45,14.6,96,2.5,2.52,.3,1.98,5.25,1.02,3.58,1290
1,14.06,2.15,2.61,17.6,121,2.6,2.51,.31,1.25,5.05,1.06,3.58,1295
1,14.83,1.64,2.17,14,97,2.8,2.98,.29,1.98,5.2,1.08,2.85,1045
1,13.86,1.35,2.27,16,98,2.98,3.15,.22,1.85,7.22,1.01,3.55,1045
1,14.1,2.16,2.3,18,105,2.95,3.32,.22,2.38,5.75,1.25,3.17,1510
1,14.12,1.48,2.32,16.8,95,2.2,2.43,.26,1.57,5,1.17,2.82,1280
1,13.75,1.73,2.41,16,89,2.6,2.76,.29,1.81,5.6,1.15,2.9,1320
1,14.75,1.73,2.39,11.4,91,3.1,3.69,.43,2.81,5.4,1.25,2.73,1150
1,14.38,1.87,2.38,12,102,3.3,3.64,.29,2.96,7.5,1.2,3,1547
1,13.63,1.81,2.7,17.2,112,2.85,2.91,.3,1.46,7.3,1.28,2.88,1310
1,14.3,1.92,2.72,20,120,2.8,3.14,.33,1.97,6.2,1.07,2.65,1280
1,13.83,1.57,2.62,20,115,2.95,3.4,.4,1.72,6.6,1.13,2.57,1130
1,14.19,1.59,2.48,16.5,108,3.3,3.93,.32,1.86,8.7,1.23,2.82,1680
1,13.64,3.1,2.56,15.2,116,2.7,3.03,.17,1.66,5.1,.96,3.36,845
1,14.06,1.63,2.28,16,126,3,3.17,.24,2.1,5.65,1.09,3.71,780
1,12.93,3.8,2.65,18.6,102,2.41,2.41,.25,1.98,4.5,1.03,3.52,770
1,13.71,1.86,2.36,16.6,101,2.61,2.88,.27,1.69,3.8,1.11,4,1035
1,12.85,1.6,2.52,17.8,95,2.48,2.37,.26,1.46,3.93,1.09,3.63,1015
1,13.5,1.81,2.61,20,96,2.53,2.61,.28,1.66,3.52,1.12,3.82,845
1,13.05,2.05,3.22,25,124,2.63,2.68,.47,1.92,3.58,1.13,3.2,830
1,13.39,1.77,2.62,16.1,93,2.85,2.94,.34,1.45,4.8,.92,3.22,1195
1,13.3,1.72,2.14,17,94,2.4,2.19,.27,1.35,3.95,1.02,2.77,1285
1,13.87,1.9,2.8,19.4,107,2.95,2.97,.37,1.76,4.5,1.25,3.4,915
1,14.02,1.68,2.21,16,96,2.65,2.33,.26,1.98,4.7,1.04,3.59,1035
1,13.73,1.5,2.7,22.5,101,3,3.25,.29,2.38,5.7,1.19,2.71,1285
1,13.58,1.66,2.36,19.1,106,2.86,3.19,.22,1.95,6.9,1.09,2.88,1515
1,13.68,1.83,2.36,17.2,104,2.42,2.69,.42,1.97,3.84,1.23,2.87,990
1,13.76,1.53,2.7,19.5,132,2.95,2.74,.5,1.35,5.4,1.25,3,1235
1,13.51,1.8,2.65,19,110,2.35,2.53,.29,1.54,4.2,1.1,2.87,1095
1,13.48,1.81,2.41,20.5,100,2.7,2.98,.26,1.86,5.1,1.04,3.47,920
1,13.28,1.64,2.84,15.5,110,2.6,2.68,.34,1.36,4.6,1.09,2.78,880
1,13.05,1.65,2.55,18,98,2.45,2.43,.29,1.44,4.25,1.12,2.51,1105
1,13.07,1.5,2.1,15.5,98,2.4,2.64,.28,1.37,3.7,1.18,2.69,1020
1,14.22,3.99,2.51,13.2,128,3,3.04,.2,2.08,5.1,.89,3.53,760
1,13.56,1.71,2.31,16.2,117,3.15,3.29,.34,2.34,6.13,.95,3.38,795
1,13.41,3.84,2.12,18.8,90,2.45,2.68,.27,1.48,4.28,.91,3,1035
1,13.88,1.89,2.59,15,101,3.25,3.56,.17,1.7,5.43,.88,3.56,1095
1,13.24,3.98,2.29,17.5,103,2.64,2.63,.32,1.66,4.36,.82,3,680
1,13.05,1.77,2.1,17,107,3,3,.28,2.03,5.04,.88,3.35,885
1,14.21,4.04,2.44,18.9,111,2.85,2.65,.3,1.25,5.24,.87,3.33,1080
1,14.38,3.59,2.28,16,102,3.25,3.17,.27,2.19,4.9,1.04,3.44,1065
1,13.9,1.68,2.12,16,101,3.1,3.39,.21,2.14,6.1,.91,3.33,985
1,14.1,2.02,2.4,18.8,103,2.75,2.92,.32,2.38,6.2,1.07,2.75,1060
1,13.94,1.73,2.27,17.4,108,2.88,3.54,.32,2.08,8.90,1.12,3.1,1260
1,13.05,1.73,2.04,12.4,92,2.72,3.27,.17,2.91,7.2,1.12,2.91,1150
1,13.83,1.65,2.6,17.2,94,2.45,2.99,.22,2.29,5.6,1.24,3.37,1265
1,13.82,1.75,2.42,14,111,3.88,3.74,.32,1.87,7.05,1.01,3.26,1190
1,13.77,1.9,2.68,17.1,115,3,2.79,.39,1.68,6.3,1.13,2.93,1375
1,13.74,1.67,2.25,16.4,118,2.6,2.9,.21,1.62,5.85,.92,3.2,1060
1,13.56,1.73,2.46,20.5,116,2.96,2.78,.2,2.45,6.25,.98,3.03,1120
1,14.22,1.7,2.3,16.3,118,3.2,3,.26,2.03,6.38,.94,3.31,970
1,13.29,1.97,2.68,16.8,102,3,3.23,.31,1.66,6,1.07,2.84,1270
1,13.72,1.43,2.5,16.7,108,3.4,3.67,.19,2.04,6.8,.89,2.87,1285
2,12.37,.94,1.36,10.6,88,1.98,.57,.28,.42,1.95,1.05,1.82,520
2,12.33,1.1,2.28,16,101,2.05,1.09,.63,.41,3.27,1.25,1.67,680
2,12.64,1.36,2.02,16.8,100,2.02,1.41,.53,.62,5.75,.98,1.59,450
2,13.67,1.25,1.92,18,94,2.1,1.79,.32,.73,3.8,1.23,2.46,630
2,12.37,1.13,2.16,19,87,3.5,3.1,.19,1.87,4.45,1.22,2.87,420
2,12.17,1.45,2.53,19,104,1.89,1.75,.45,1.03,2.95,1.45,2.23,355
2,12.37,1.21,2.56,18.1,98,2.42,2.65,.37,2.08,4.6,1.19,2.3,678
2,13.11,1.01,1.7,15,78,2.98,3.18,.26,2.28,5.3,1.12,3.18,502
2,12.37,1.17,1.92,19.6,78,2.11,2,.27,1.04,4.68,1.12,3.48,510
2,13.34,.94,2.36,17,110,2.53,1.3,.55,.42,3.17,1.02,1.93,750
2,12.21,1.19,1.75,16.8,151,1.85,1.28,.14,2.5,2.85,1.28,3.07,718
2,12.29,1.61,2.21,20.4,103,1.1,1.02,.37,1.46,3.05,.906,1.82,870
2,13.86,1.51,2.67,25,86,2.95,2.86,.21,1.87,3.38,1.36,3.16,410
2,13.49,1.66,2.24,24,87,1.88,1.84,.27,1.03,3.74,.98,2.78,472
2,12.99,1.67,2.6,30,139,3.3,2.89,.21,1.96,3.35,1.31,3.5,985
2,11.96,1.09,2.3,21,101,3.38,2.14,.13,1.65,3.21,.99,3.13,886
2,11.66,1.88,1.92,16,97,1.61,1.57,.34,1.15,3.8,1.23,2.14,428
2,13.03,.9,1.71,16,86,1.95,2.03,.24,1.46,4.6,1.19,2.48,392
2,11.84,2.89,2.23,18,112,1.72,1.32,.43,.95,2.65,.96,2.52,500
2,12.33,.99,1.95,14.8,136,1.9,1.85,.35,2.76,3.4,1.06,2.31,750
2,12.7,3.87,2.4,23,101,2.83,2.55,.43,1.95,2.57,1.19,3.13,463
2,12,.92,2,19,86,2.42,2.26,.3,1.43,2.5,1.38,3.12,278
2,12.72,1.81,2.2,18.8,86,2.2,2.53,.26,1.77,3.9,1.16,3.14,714
2,12.08,1.13,2.51,24,78,2,1.58,.4,1.4,2.2,1.31,2.72,630
2,13.05,3.86,2.32,22.5,85,1.65,1.59,.61,1.62,4.8,.84,2.01,515
2,11.84,.89,2.58,18,94,2.2,2.21,.22,2.35,3.05,.79,3.08,520
2,12.67,.98,2.24,18,99,2.2,1.94,.3,1.46,2.62,1.23,3.16,450
2,12.16,1.61,2.31,22.8,90,1.78,1.69,.43,1.56,2.45,1.33,2.26,495
2,11.65,1.67,2.62,26,88,1.92,1.61,.4,1.34,2.6,1.36,3.21,562
2,11.64,2.06,2.46,21.6,84,1.95,1.69,.48,1.35,2.8,1,2.75,680
2,12.08,1.33,2.3,23.6,70,2.2,1.59,.42,1.38,1.74,1.07,3.21,625
2,12.08,1.83,2.32,18.5,81,1.6,1.5,.52,1.64,2.4,1.08,2.27,480
2,12,1.51,2.42,22,86,1.45,1.25,.5,1.63,3.6,1.05,2.65,450
2,12.69,1.53,2.26,20.7,80,1.38,1.46,.58,1.62,3.05,.96,2.06,495
2,12.29,2.83,2.22,18,88,2.45,2.25,.25,1.99,2.15,1.15,3.3,290
2,11.62,1.99,2.28,18,98,3.02,2.26,.17,1.35,3.25,1.16,2.96,345
2,12.47,1.52,2.2,19,162,2.5,2.27,.32,3.28,2.6,1.16,2.63,937
2,11.81,2.12,2.74,21.5,134,1.6,.99,.14,1.56,2.5,.95,2.26,625
2,12.29,1.41,1.98,16,85,2.55,2.5,.29,1.77,2.9,1.23,2.74,428
2,12.37,1.07,2.1,18.5,88,3.52,3.75,.24,1.95,4.5,1.04,2.77,660
2,12.29,3.17,2.21,18,88,2.85,2.99,.45,2.81,2.3,1.42,2.83,406
2,12.08,2.08,1.7,17.5,97,2.23,2.17,.26,1.4,3.3,1.27,2.96,710
2,12.6,1.34,1.9,18.5,88,1.45,1.36,.29,1.35,2.45,1.04,2.77,562
2,12.34,2.45,2.46,21,98,2.56,2.11,.34,1.31,2.8,.8,3.38,438
2,11.82,1.72,1.88,19.5,86,2.5,1.64,.37,1.42,2.06,.94,2.44,415
2,12.51,1.73,1.98,20.5,85,2.2,1.92,.32,1.48,2.94,1.04,3.57,672
2,12.42,2.55,2.27,22,90,1.68,1.84,.66,1.42,2.7,.86,3.3,315
2,12.25,1.73,2.12,19,80,1.65,2.03,.37,1.63,3.4,1,3.17,510
2,12.72,1.75,2.28,22.5,84,1.38,1.76,.48,1.63,3.3,.88,2.42,488
2,12.22,1.29,1.94,19,92,2.36,2.04,.39,2.08,2.7,.86,3.02,312
2,11.61,1.35,2.7,20,94,2.74,2.92,.29,2.49,2.65,.96,3.26,680
2,11.46,3.74,1.82,19.5,107,3.18,2.58,.24,3.58,2.9,.75,2.81,562
2,12.52,2.43,2.17,21,88,2.55,2.27,.26,1.22,2,.9,2.78,325
2,11.76,2.68,2.92,20,103,1.75,2.03,.6,1.05,3.8,1.23,2.5,607
2,11.41,.74,2.5,21,88,2.48,2.01,.42,1.44,3.08,1.1,2.31,434
2,12.08,1.39,2.5,22.5,84,2.56,2.29,.43,1.04,2.9,.93,3.19,385
2,11.03,1.51,2.2,21.5,85,2.46,2.17,.52,2.01,1.9,1.71,2.87,407
2,11.82,1.47,1.99,20.8,86,1.98,1.6,.3,1.53,1.95,.95,3.33,495
2,12.42,1.61,2.19,22.5,108,2,2.09,.34,1.61,2.06,1.06,2.96,345
2,12.77,3.43,1.98,16,80,1.63,1.25,.43,.83,3.4,.7,2.12,372
2,12,3.43,2,19,87,2,1.64,.37,1.87,1.28,.93,3.05,564
2,11.45,2.4,2.42,20,96,2.9,2.79,.32,1.83,3.25,.8,3.39,625
2,11.56,2.05,3.23,28.5,119,3.18,5.08,.47,1.87,6,.93,3.69,465
2,12.42,4.43,2.73,26.5,102,2.2,2.13,.43,1.71,2.08,.92,3.12,365
2,13.05,5.8,2.13,21.5,86,2.62,2.65,.3,2.01,2.6,.73,3.1,380
2,11.87,4.31,2.39,21,82,2.86,3.03,.21,2.91,2.8,.75,3.64,380
2,12.07,2.16,2.17,21,85,2.6,2.65,.37,1.35,2.76,.86,3.28,378
2,12.43,1.53,2.29,21.5,86,2.74,3.15,.39,1.77,3.94,.69,2.84,352
2,11.79,2.13,2.78,28.5,92,2.13,2.24,.58,1.76,3,.97,2.44,466
2,12.37,1.63,2.3,24.5,88,2.22,2.45,.4,1.9,2.12,.89,2.78,342
2,12.04,4.3,2.38,22,80,2.1,1.75,.42,1.35,2.6,.79,2.57,580
3,12.86,1.35,2.32,18,122,1.51,1.25,.21,.94,4.1,.76,1.29,630
3,12.88,2.99,2.4,20,104,1.3,1.22,.24,.83,5.4,.74,1.42,530
3,12.81,2.31,2.4,24,98,1.15,1.09,.27,.83,5.7,.66,1.36,560
3,12.7,3.55,2.36,21.5,106,1.7,1.2,.17,.84,5,.78,1.29,600
3,12.51,1.24,2.25,17.5,85,2,.58,.6,1.25,5.45,.75,1.51,650
3,12.6,2.46,2.2,18.5,94,1.62,.66,.63,.94,7.1,.73,1.58,695
3,12.25,4.72,2.54,21,89,1.38,.47,.53,.8,3.85,.75,1.27,720
3,12.53,5.51,2.64,25,96,1.79,.6,.63,1.1,5,.82,1.69,515
3,13.49,3.59,2.19,19.5,88,1.62,.48,.58,.88,5.7,.81,1.82,580
3,12.84,2.96,2.61,24,101,2.32,.6,.53,.81,4.92,.89,2.15,590
3,12.93,2.81,2.7,21,96,1.54,.5,.53,.75,4.6,.77,2.31,600
3,13.36,2.56,2.35,20,89,1.4,.5,.37,.64,5.6,.7,2.47,780
3,13.52,3.17,2.72,23.5,97,1.55,.52,.5,.55,4.35,.89,2.06,520
3,13.62,4.95,2.35,20,92,2,.8,.47,1.02,4.4,.91,2.05,550
3,12.25,3.88,2.2,18.5,112,1.38,.78,.29,1.14,8.21,.65,2,855
3,13.16,3.57,2.15,21,102,1.5,.55,.43,1.3,4,.6,1.68,830
3,13.88,5.04,2.23,20,80,.98,.34,.4,.68,4.9,.58,1.33,415
3,12.87,4.61,2.48,21.5,86,1.7,.65,.47,.86,7.65,.54,1.86,625
3,13.32,3.24,2.38,21.5,92,1.93,.76,.45,1.25,8.42,.55,1.62,650
3,13.08,3.9,2.36,21.5,113,1.41,1.39,.34,1.14,9.40,.57,1.33,550
3,13.5,3.12,2.62,24,123,1.4,1.57,.22,1.25,8.60,.59,1.3,500
3,12.79,2.67,2.48,22,112,1.48,1.36,.24,1.26,10.8,.48,1.47,480
3,13.11,1.9,2.75,25.5,116,2.2,1.28,.26,1.56,7.1,.61,1.33,425
3,13.23,3.3,2.28,18.5,98,1.8,.83,.61,1.87,10.52,.56,1.51,675
3,12.58,1.29,2.1,20,103,1.48,.58,.53,1.4,7.6,.58,1.55,640
3,13.17,5.19,2.32,22,93,1.74,.63,.61,1.55,7.9,.6,1.48,725
3,13.84,4.12,2.38,19.5,89,1.8,.83,.48,1.56,9.01,.57,1.64,480
3,12.45,3.03,2.64,27,97,1.9,.58,.63,1.14,7.5,.67,1.73,880
3,14.34,1.68,2.7,25,98,2.8,1.31,.53,2.7,13,.57,1.96,660
3,13.48,1.67,2.64,22.5,89,2.6,1.1,.52,2.29,11.75,.57,1.78,620
3,12.36,3.83,2.38,21,88,2.3,.92,.5,1.04,7.65,.56,1.58,520
3,13.69,3.26,2.54,20,107,1.83,.56,.5,.8,5.88,.96,1.82,680
3,12.85,3.27,2.58,22,106,1.65,.6,.6,.96,5.58,.87,2.11,570
3,12.96,3.45,2.35,18.5,106,1.39,.7,.4,.94,5.28,.68,1.75,675
3,13.78,2.76,2.3,22,90,1.35,.68,.41,1.03,9.58,.7,1.68,615
3,13.73,4.36,2.26,22.5,88,1.28,.47,.52,1.15,6.62,.78,1.75,520
3,13.45,3.7,2.6,23,111,1.7,.92,.43,1.46,10.68,.85,1.56,695
3,12.82,3.37,2.3,19.5,88,1.48,.66,.4,.97,10.26,.72,1.75,685
3,13.58,2.58,2.69,24.5,105,1.55,.84,.39,1.54,8.66,.74,1.8,750
3,13.4,4.6,2.86,25,112,1.98,.96,.27,1.11,8.5,.67,1.92,630
3,12.2,3.03,2.32,19,96,1.25,.49,.4,.73,5.5,.66,1.83,510
3,12.77,2.39,2.28,19.5,86,1.39,.51,.48,.64,9.899999,.57,1.63,470
3,14.16,2.51,2.48,20,91,1.68,.7,.44,1.24,9.7,.62,1.71,660
3,13.71,5.65,2.45,20.5,95,1.68,.61,.52,1.06,7.7,.64,1.74,740
3,13.4,3.91,2.48,23,102,1.8,.75,.43,1.41,7.3,.7,1.56,750
3,13.27,4.28,2.26,20,120,1.59,.69,.43,1.35,10.2,.59,1.56,835
3,13.17,2.59,2.37,20,120,1.65,.68,.53,1.46,9.3,.6,1.62,840
3,14.13,4.1,2.74,24.5,96,2.05,.76,.56,1.35,9.2,.61,1.6,560

File diff suppressed because one or more lines are too long

2
benchmark/tests/img/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
*
!.gitignore

View File

@@ -1 +1 @@
{"balance-scale": [0.98, {"splitter": "iwss", "max_features": "auto"}, "results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json"], "balloons": [0.86, {"C": 7, "gamma": 0.1, "kernel": "rbf", "max_iter": 10000.0, "multiclass_strategy": "ovr"}, "results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json"]}
{"balance-scale": [0.98, {"splitter": "best", "max_features": "auto"}, "results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json"], "balloons": [0.86, {"C": 7, "gamma": 0.1, "kernel": "rbf", "max_iter": 10000.0, "multiclass_strategy": "ovr"}, "results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json"]}

View File

@@ -0,0 +1,6 @@
[
{
"C": [1.0, 5.0],
"kernel": ["linear", "rbf", "poly"]
}
]

View File

@@ -0,0 +1,26 @@
{
"balance-scale": [
0.9743999999999999,
{
"base_estimator__C": 57,
"base_estimator__gamma": 0.1,
"base_estimator__kernel": "rbf",
"base_estimator__multiclass_strategy": "ovr",
"n_estimators": 100,
"n_jobs": -1
},
"v. 0.3.2, Computed on bart on 2022-03-10 at 22:56:53 took 12.182 min"
],
"balloons": [
0.7666666666666667,
{
"base_estimator__C": 5,
"base_estimator__gamma": 0.14,
"base_estimator__kernel": "rbf",
"base_estimator__multiclass_strategy": "ovr",
"n_estimators": 100,
"n_jobs": -1
},
"v. 0.3.2, Computed on bart on 2022-03-10 at 23:09:07 took 18.229 s"
]
}

View File

@@ -6,7 +6,7 @@
"kernel": "liblinear",
"multiclass_strategy": "ovr"
},
"v. 1.2.4, Computed on Test on 2022-02-22 at 12:00:00 took 1s"
"v. 1.3.0, Computed on Test on 2022-02-22 at 12:00:00 took 1s"
],
"balloons": [
0.625,
@@ -15,6 +15,6 @@
"kernel": "linear",
"multiclass_strategy": "ovr"
},
"v. 1.2.4, Computed on Test on 2022-02-22 at 12:00:00 took 1s"
"v. 1.3.0, Computed on Test on 2022-02-22 at 12:00:00 took 1s"
]
}

View File

@@ -3,6 +3,8 @@
"title": "Gridsearched hyperparams v022.1b random_init",
"model": "ODTE",
"version": "0.3.2",
"language_version": "3.11x",
"language": "Python",
"stratified": false,
"folds": 5,
"date": "2022-04-20",

View File

@@ -3,6 +3,8 @@
"title": "Test default paramters with RandomForest",
"model": "RandomForest",
"version": "-",
"language_version": "3.11x",
"language": "Python",
"stratified": false,
"folds": 5,
"date": "2022-01-14",

View File

@@ -3,6 +3,8 @@
"model": "STree",
"stratified": false,
"folds": 5,
"language_version": "3.11x",
"language": "Python",
"date": "2021-09-30",
"time": "11:42:07",
"duration": 624.2505249977112,

View File

@@ -1,6 +1,8 @@
{
"score_name": "accuracy",
"model": "STree",
"language": "Python",
"language_version": "3.11x",
"stratified": false,
"folds": 5,
"date": "2021-10-27",
@@ -15,7 +17,7 @@
"features": 4,
"classes": 3,
"hyperparameters": {
"splitter": "iwss",
"splitter": "best",
"max_features": "auto"
},
"nodes": 11.08,
@@ -32,7 +34,7 @@
"features": 4,
"classes": 2,
"hyperparameters": {
"splitter": "iwss",
"splitter": "best",
"max_features": "auto"
},
"nodes": 4.12,

View File

@@ -1,6 +1,8 @@
{
"score_name": "accuracy",
"model": "STree",
"language_version": "3.11x",
"language": "Python",
"stratified": false,
"folds": 5,
"date": "2021-11-01",

View File

@@ -0,0 +1,77 @@
import os
from openpyxl import load_workbook
from ...Utils import NO_RESULTS, Folders, Files
from ..TestBase import TestBase
from ..._version import __version__
class BeBenchmarkTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
self.score = "accuracy"
def tearDown(self) -> None:
files = []
for score in [self.score, "unknown"]:
files.append(Files.exreport(score))
files.append(Files.exreport_output(score))
files.append(Files.exreport_err(score))
files.append(Files.exreport_excel(self.score))
files.append(Files.exreport_pdf)
files.append(Files.tex_output(self.score))
self.remove_files(files, Folders.exreport)
self.remove_files(files, ".")
return super().tearDown()
def test_be_benchmark_complete(self):
stdout, stderr = self.execute_script(
"be_benchmark", ["-s", self.score, "-q", "-t", "-x"]
)
self.assertEqual(stderr.getvalue(), "")
# Check output
self.check_output_file(stdout, "be_benchmark_complete")
# Check csv file
file_name = os.path.join(Folders.exreport, Files.exreport(self.score))
self.check_file_file(file_name, "exreport_csv")
# Check tex file
file_name = os.path.join(
Folders.exreport, Files.tex_output(self.score)
)
self.assertTrue(os.path.exists(file_name))
self.check_file_file(file_name, "exreport_tex")
# Check excel file
file_name = os.path.join(
Folders.exreport, Files.exreport_excel(self.score)
)
book = load_workbook(file_name)
replace = None
with_this = None
for sheet_name in book.sheetnames:
sheet = book[sheet_name]
if sheet_name == "Datasets":
replace = self.benchmark_version
with_this = __version__
self.check_excel_sheet(
sheet,
f"exreport_excel_{sheet_name}",
replace=replace,
with_this=with_this,
)
def test_be_benchmark_single(self):
stdout, stderr = self.execute_script(
"be_benchmark", ["-s", self.score, "-q"]
)
self.assertEqual(stderr.getvalue(), "")
# Check output
self.check_output_file(stdout, "be_benchmark")
# Check csv file
file_name = os.path.join(Folders.exreport, Files.exreport(self.score))
self.check_file_file(file_name, "exreport_csv")
def test_be_benchmark_no_data(self):
stdout, stderr = self.execute_script(
"be_benchmark", ["-s", "f1-weighted"]
)
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(stdout.getvalue(), f"{NO_RESULTS}\n")

View File

@@ -0,0 +1,108 @@
import os
import json
from ...Utils import Folders, Files, NO_RESULTS
from ..TestBase import TestBase
class BeBestTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
def tearDown(self) -> None:
self.remove_files(
[Files.best_results("accuracy", "ODTE")],
Folders.results,
)
return super().tearDown()
def test_be_best_all(self):
stdout, stderr = self.execute_script("be_best", ["-s", "all"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_best_all")
def test_be_build_best_error(self):
stdout, _ = self.execute_script(
"be_build_best", ["-s", "accuracy", "-m", "SVC"]
)
self.assertEqual(stdout.getvalue(), f"{NO_RESULTS}\n")
def test_be_build_best(self):
self.execute_script("be_build_best", ["-s", "accuracy", "-m", "ODTE"])
expected_data = {
"balance-scale": [
0.96352,
{
"base_estimator__C": 57,
"base_estimator__gamma": 0.1,
"base_estimator__kernel": "rbf",
"base_estimator__multiclass_strategy": "ovr",
"n_estimators": 100,
"n_jobs": -1,
},
"results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json",
],
"balloons": [
0.785,
{
"base_estimator__C": 5,
"base_estimator__gamma": 0.14,
"base_estimator__kernel": "rbf",
"base_estimator__multiclass_strategy": "ovr",
"n_estimators": 100,
"n_jobs": -1,
},
"results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json",
],
}
name = Files.best_results("accuracy", "ODTE")
file_name = os.path.join(Folders.results, name)
with open(file_name, "r") as f:
computed_data = json.load(f)
for computed, expected in zip(computed_data, expected_data):
self.assertEqual(computed, expected)
for key, value in expected_data.items():
self.assertIn(key, computed_data)
self.assertEqual(computed_data[key][0], value[0])
self.assertSequenceEqual(computed_data[key][1], value[1])
def test_be_build_best_report(self):
stdout, _ = self.execute_script(
"be_build_best", ["-s", "accuracy", "-m", "ODTE", "-r"]
)
expected_data = {
"balance-scale": [
0.96352,
{
"base_estimator__C": 57,
"base_estimator__gamma": 0.1,
"base_estimator__kernel": "rbf",
"base_estimator__multiclass_strategy": "ovr",
"n_estimators": 100,
"n_jobs": -1,
},
"results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json",
],
"balloons": [
0.785,
{
"base_estimator__C": 5,
"base_estimator__gamma": 0.14,
"base_estimator__kernel": "rbf",
"base_estimator__multiclass_strategy": "ovr",
"n_estimators": 100,
"n_jobs": -1,
},
"results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json",
],
}
name = Files.best_results("accuracy", "ODTE")
file_name = os.path.join(Folders.results, name)
with open(file_name, "r") as f:
computed_data = json.load(f)
for computed, expected in zip(computed_data, expected_data):
self.assertEqual(computed, expected)
for key, value in expected_data.items():
self.assertIn(key, computed_data)
self.assertEqual(computed_data[key][0], value[0])
self.assertSequenceEqual(computed_data[key][1], value[1])
self.check_output_file(stdout, "be_build_best_report")

View File

@@ -0,0 +1,79 @@
import os
import json
from ...Utils import Folders, Files
from ..TestBase import TestBase
def get_test():
return "hola"
class BeGridTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
def tearDown(self) -> None:
self.remove_files(
[
Files.grid_input("f1-macro", "STree"),
Files.grid_output("accuracy", "SVC"),
],
Folders.results,
)
return super().tearDown()
def test_be_build_grid(self):
stdout, stderr = self.execute_script(
"be_build_grid", ["-m", "STree", "-s", "f1-macro"]
)
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(
stdout.getvalue(),
"Generated grid input file to results/grid_input_f1-macro_STree."
"json\n",
)
name = Files.grid_input("f1-macro", "STree")
file_name = os.path.join(Folders.results, name)
self.check_file_file(file_name, "be_build_grid")
def test_be_grid_(self):
stdout, stderr = self.execute_script(
"be_grid",
["-m", "SVC", "-s", "accuracy", "--n_folds", "2"],
)
expected = "Perform grid search with SVC model\n"
self.assertTrue(stdout.getvalue().startswith(expected))
name = Files.grid_output("accuracy", "SVC")
file_name = os.path.join(Folders.results, name)
with open(file_name, "r") as f:
computed_data = json.load(f)
expected_data = {
"balance-scale": [
0.9167895469812403,
{"C": 5.0, "kernel": "linear"},
"v. -, Computed on iMac27 on 2022-05-07 at 23:55:03 took",
],
"balloons": [
0.6875,
{"C": 5.0, "kernel": "rbf"},
"v. -, Computed on iMac27 on 2022-05-07 at 23:55:03 took",
],
}
for computed, expected in zip(computed_data, expected_data):
self.assertEqual(computed, expected)
for key, value in expected_data.items():
self.assertIn(key, computed_data)
self.assertEqual(computed_data[key][0], value[0])
self.assertSequenceEqual(computed_data[key][1], value[1])
def test_be_grid_no_input(self):
stdout, stderr = self.execute_script(
"be_grid",
["-m", "ODTE", "-s", "f1-weighted", "-q"],
)
self.assertEqual(stderr.getvalue(), "")
grid_file = os.path.join(
Folders.results, Files.grid_input("f1-weighted", "ODTE")
)
expected = f"** The grid input file [{grid_file}] could not be found\n"
self.assertEqual(stdout.getvalue(), expected)

View File

@@ -0,0 +1,66 @@
import os
from io import StringIO
from unittest.mock import patch
from ..TestBase import TestBase
from ...Utils import Folders
class BeInitProjectTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
def tearDown(self):
if os.path.exists("test_project"):
os.system("rm -rf test_project")
def assertIsFile(self, file_name):
if not os.path.isfile(file_name):
raise AssertionError(f"File {str(file_name)} does not exist")
def assertIsFolder(self, path):
if not os.path.exists(path):
raise AssertionError(f"Folder {str(path)} does not exist")
def test_be_init_project(self):
test_project = "test_project"
stdout, stderr = self.execute_script("be_init_project", [test_project])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_init_project")
# check folders
expected = [
Folders.results,
Folders.hidden_results,
Folders.exreport,
Folders.report,
Folders.img,
]
for folder in expected:
self.assertIsFolder(os.path.join(test_project, folder))
self.assertIsFile(os.path.join(test_project, ".env"))
os.system(f"rm -rf {test_project}")
@patch("sys.stdout", new_callable=StringIO)
@patch("sys.stderr", new_callable=StringIO)
def test_be_init_project_no_arguments(self, stdout, stderr):
with self.assertRaises(SystemExit) as cm:
module = self.search_script("be_init_project")
module.main("")
self.assertEqual(cm.exception.code, 2)
self.check_output_file(stdout, "be_init_project_no_arguments")
self.assertEqual(stderr.getvalue(), "")
@patch("sys.stdout", new_callable=StringIO)
@patch("sys.stderr", new_callable=StringIO)
def test_be_init_project_twice(self, stdout, stderr):
test_project = "test_project"
self.execute_script("be_init_project", [test_project])
with self.assertRaises(SystemExit) as cm:
module = self.search_script("be_init_project")
module.main([test_project])
self.assertEqual(cm.exception.code, 1)
self.assertEqual(
stderr.getvalue(),
f"Creating folder {test_project}\n"
f"[Errno 17] File exists: '{test_project}'\n",
)
self.assertEqual(stdout.getvalue(), "")

View File

@@ -0,0 +1,152 @@
import os
import shutil
from unittest.mock import patch
from openpyxl import load_workbook
from ...Utils import Folders, Files, NO_RESULTS
from ..TestBase import TestBase
class BeListTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
@patch("benchmark.Results.get_input", return_value="q")
def test_be_list(self, input_data):
stdout, stderr = self.execute_script("be_list", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_model")
@patch("benchmark.Results.get_input", side_effect=iter(["x", "q"]))
def test_be_list_invalid_option(self, input_data):
stdout, stderr = self.execute_script("be_list", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_model_invalid")
@patch("benchmark.Results.get_input", side_effect=iter(["0", "q"]))
def test_be_list_report(self, input_data):
stdout, stderr = self.execute_script("be_list", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_report")
@patch("benchmark.Results.get_input", side_effect=iter(["r", "q"]))
def test_be_list_twice(self, input_data):
stdout, stderr = self.execute_script("be_list", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_model_2")
@patch("benchmark.Results.get_input", side_effect=iter(["e 2", "q"]))
def test_be_list_report_excel(self, input_data):
stdout, stderr = self.execute_script("be_list", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_report_excel")
book = load_workbook(Files.be_list_excel)
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel")
@patch(
"benchmark.Results.get_input", side_effect=iter(["e 2", "e 1", "q"])
)
def test_be_list_report_excel_twice(self, input_data):
stdout, stderr = self.execute_script("be_list", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_report_excel_2")
book = load_workbook(Files.be_list_excel)
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel")
sheet = book["STree2"]
self.check_excel_sheet(sheet, "excel2")
@patch("benchmark.Results.get_input", return_value="q")
def test_be_list_no_data(self, input_data):
stdout, stderr = self.execute_script(
"be_list", ["-m", "Wodt", "-s", "f1-macro"]
)
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(stdout.getvalue(), f"{NO_RESULTS}\n")
@patch(
"benchmark.Results.get_input", side_effect=iter(["d 0", "y", "", "q"])
)
# @patch("benchmark.Results.get_input", side_effect=iter(["q"]))
def test_be_list_delete(self, input_data):
def copy_files(source_folder, target_folder, file_name):
source = os.path.join(source_folder, file_name)
target = os.path.join(target_folder, file_name)
shutil.copyfile(source, target)
file_name = (
"results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:"
"35_0.json"
)
# move nan result from hidden to results
copy_files(Folders.hidden_results, Folders.results, file_name)
try:
# list and delete result
stdout, stderr = self.execute_script("be_list", "")
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_delete")
except Exception:
# delete the result copied if be_list couldn't
os.unlink(os.path.join(Folders.results, file_name))
self.fail("test_be_list_delete() should not raise exception")
@patch(
"benchmark.Results.get_input", side_effect=iter(["h 0", "y", "", "q"])
)
def test_be_list_hide(self, input_data):
def swap_files(source_folder, target_folder, file_name):
source = os.path.join(source_folder, file_name)
target = os.path.join(target_folder, file_name)
os.rename(source, target)
file_name = (
"results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:"
"35_0.json"
)
# move nan result from hidden to results
swap_files(Folders.hidden_results, Folders.results, file_name)
try:
# list and move nan result to hidden again
stdout, stderr = self.execute_script("be_list", "")
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_hide")
except Exception:
# delete the result copied if be_list couldn't
swap_files(Folders.results, Folders.hidden_results, file_name)
self.fail("test_be_list_hide() should not raise exception")
@patch("benchmark.Results.get_input", side_effect=iter(["h 0", "q"]))
def test_be_list_already_hidden(self, input_data):
stdout, stderr = self.execute_script("be_list", ["--hidden"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_already_hidden")
@patch("benchmark.Results.get_input", side_effect=iter(["h 0", "n", "q"]))
def test_be_list_dont_hide(self, input_data):
stdout, stderr = self.execute_script("be_list", "")
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_default")
@patch("benchmark.Results.get_input", side_effect=iter(["q"]))
def test_be_list_hidden_nan(self, input_data):
stdout, stderr = self.execute_script("be_list", ["--hidden", "--nan"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_hidden_nan")
@patch("benchmark.Results.get_input", side_effect=iter(["q"]))
def test_be_list_hidden(self, input_data):
stdout, stderr = self.execute_script("be_list", ["--hidden"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_list_hidden")
def test_be_no_env(self):
path = os.getcwd()
os.chdir("..")
stderr = None
try:
_, stderr = self.execute_script("be_list", [])
except SystemExit as e:
self.assertEqual(e.code, 1)
finally:
os.chdir(path)
self.assertIsNone(stderr)

View File

@@ -0,0 +1,196 @@
import os
import json
from io import StringIO
from unittest.mock import patch
from ...Results import Report
from ...Utils import Files, Folders
from ..TestBase import TestBase
class BeMainTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
self.score = "accuracy"
self.files = []
def tearDown(self) -> None:
self.remove_files(self.files, ".")
return super().tearDown()
def test_be_main_dataset(self):
stdout, _ = self.execute_script(
"be_main",
["-m", "STree", "-d", "balloons", "--title", "test"],
)
self.check_output_lines(
stdout=stdout,
file_name="be_main_dataset",
lines_to_compare=[0, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13],
)
def test_be_main_complete(self):
stdout, _ = self.execute_script(
"be_main",
["-s", self.score, "-m", "STree", "--title", "test", "-r"],
)
# keep the report name to delete it after
report_name = stdout.getvalue().splitlines()[-1].split("in ")[1]
self.files.append(report_name)
self.check_output_lines(
stdout, "be_main_complete", [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 14]
)
def test_be_main_no_report(self):
stdout, _ = self.execute_script(
"be_main",
["-s", self.score, "-m", "STree", "--title", "test"],
)
# keep the report name to delete it after
report_name = stdout.getvalue().splitlines()[-1].split("in ")[1]
self.files.append(report_name)
report = Report(file_name=report_name)
with patch(self.output, new=StringIO()) as stdout:
report.report()
self.check_output_lines(
stdout,
"be_main_complete",
[0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 14],
)
def test_be_main_best_params(self):
stdout, _ = self.execute_script(
"be_main",
[
"-s",
self.score,
"-m",
"STree",
"--title",
"test",
"-b",
"-r",
],
)
# keep the report name to delete it after
report_name = stdout.getvalue().splitlines()[-1].split("in ")[1]
self.files.append(report_name)
self.check_output_lines(
stdout, "be_main_best", [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 14]
)
@patch("sys.stdout", new_callable=StringIO)
@patch("sys.stderr", new_callable=StringIO)
def test_be_main_incompatible_params(self, stdout, stderr):
m1 = (
"be_main: error: argument -b/--best_paramfile: not allowed with "
"argument -p/--hyperparameters"
)
m2 = (
"be_main: error: argument -g/--grid_paramfile: not allowed with "
"argument -p/--hyperparameters"
)
m3 = (
"be_main: error: argument -g/--grid_paramfile: not allowed with "
"argument -p/--hyperparameters"
)
m4 = m1
p0 = [
"-s",
self.score,
"-m",
"SVC",
"--title",
"test",
]
pset = json.dumps(dict(C=17))
p1 = p0.copy()
p1.extend(["-p", pset, "-b"])
p2 = p0.copy()
p2.extend(["-p", pset, "-g"])
p3 = p0.copy()
p3.extend(["-p", pset, "-g", "-b"])
p4 = p0.copy()
p4.extend(["-b", "-g"])
parameters = [(p1, m1), (p2, m2), (p3, m3), (p4, m4)]
for parameter, message in parameters:
with self.assertRaises(SystemExit) as msg:
module = self.search_script("be_main")
module.main(parameter)
self.assertEqual(msg.exception.code, 2)
self.assertEqual(stderr.getvalue(), "")
self.assertRegexpMatches(stdout.getvalue(), message)
def test_be_main_best_params_non_existent(self):
model = "GBC"
stdout, stderr = self.execute_script(
"be_main",
[
"-s",
self.score,
"-m",
model,
"--title",
"test",
"-b",
"-r",
],
)
self.assertEqual(stderr.getvalue(), "")
file_name = os.path.join(
Folders.results, Files.best_results(self.score, model)
)
self.assertEqual(
stdout.getvalue(),
f"{file_name} does not exist\n",
)
def test_be_main_grid_non_existent(self):
model = "GBC"
stdout, stderr = self.execute_script(
"be_main",
[
"-s",
self.score,
"-m",
model,
"--title",
"test",
"-g",
"-r",
],
)
self.assertEqual(stderr.getvalue(), "")
file_name = os.path.join(
Folders.results, Files.grid_output(self.score, model)
)
self.assertEqual(
stdout.getvalue(),
f"{file_name} does not exist\n",
)
def test_be_main_grid_params(self):
stdout, _ = self.execute_script(
"be_main",
[
"-s",
self.score,
"-m",
"STree",
"--title",
"test",
"-g",
"-r",
],
)
# keep the report name to delete it after
report_name = stdout.getvalue().splitlines()[-1].split("in ")[1]
self.files.append(report_name)
self.check_output_lines(
stdout, "be_main_grid", [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 14]
)
def test_be_main_no_data(self):
stdout, _ = self.execute_script(
"be_main", ["-m", "STree", "-d", "unknown", "--title", "test"]
)
self.assertEqual(stdout.getvalue(), "Unknown dataset: unknown\n")

View File

@@ -0,0 +1,28 @@
from ..TestBase import TestBase
from ...Utils import NO_RESULTS
class BePairCheckTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
def test_be_pair_check(self):
stdout, stderr = self.execute_script(
"be_pair_check", ["-m1", "ODTE", "-m2", "STree"]
)
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "paircheck")
def test_be_pair_check_no_data_a(self):
stdout, stderr = self.execute_script(
"be_pair_check", ["-m1", "SVC", "-m2", "ODTE"]
)
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(stdout.getvalue(), f"{NO_RESULTS}\n")
def test_be_pair_check_no_data_b(self):
stdout, stderr = self.execute_script(
"be_pair_check", ["-m1", "STree", "-m2", "SVC"]
)
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(stdout.getvalue(), f"{NO_RESULTS}\n")

View File

@@ -0,0 +1,44 @@
import os
from ...Utils import Folders
from ..TestBase import TestBase
class BePrintStrees(TestBase):
def setUp(self):
self.prepare_scripts_env()
self.score = "accuracy"
self.files = []
self.datasets = ["balloons", "balance-scale"]
def tearDown(self) -> None:
self.remove_files(self.files, ".")
return super().tearDown()
def test_be_print_strees_dataset_bn(self):
for name in self.datasets:
stdout, _ = self.execute_script(
"be_print_strees",
["-d", name, "-q"],
)
file_name = os.path.join(Folders.img, f"stree_{name}.png")
self.files.append(file_name)
self.assertTrue(os.path.exists(file_name))
self.assertEqual(
stdout.getvalue(), f"File {file_name} generated\n"
)
computed_size = os.path.getsize(file_name)
self.assertGreater(computed_size, 25000)
def test_be_print_strees_dataset_color(self):
for name in self.datasets:
stdout, _ = self.execute_script(
"be_print_strees",
["-d", name, "-q", "-c"],
)
file_name = os.path.join(Folders.img, f"stree_{name}.png")
self.files.append(file_name)
self.assertEqual(
stdout.getvalue(), f"File {file_name} generated\n"
)
computed_size = os.path.getsize(file_name)
self.assertGreater(computed_size, 30000)

View File

@@ -0,0 +1,156 @@
import os
from openpyxl import load_workbook
from io import StringIO
from unittest.mock import patch
from ...Utils import Folders, Files
from ..TestBase import TestBase
from ..._version import __version__
class BeReportTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
def tearDown(self) -> None:
files = [
"results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.sql",
"results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.xlsx",
]
self.remove_files(files, Folders.results)
self.remove_files([Files.datasets_report_excel], os.getcwd())
return super().tearDown()
def test_be_report(self):
file_name = os.path.join(
"results",
"results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json",
)
stdout, stderr = self.execute_script("be_report", ["file", file_name])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "report")
def test_be_report_not_found(self):
stdout, stderr = self.execute_script("be_report", ["file", "unknown"])
self.assertEqual(stderr.getvalue(), "")
self.assertEqual(stdout.getvalue(), "unknown does not exists!\n")
def test_be_report_compare(self):
file_name = "results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json"
stdout, stderr = self.execute_script(
"be_report", ["file", file_name, "-c"]
)
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "report_compared")
def test_be_report_datatsets(self):
stdout, stderr = self.execute_script("be_report", ["datasets"])
self.assertEqual(stderr.getvalue(), "")
file_name = f"report_datasets{self.ext}"
with open(os.path.join(self.test_files, file_name)) as f:
expected = f.read()
output_text = stdout.getvalue().splitlines()
for line, index in zip(expected.splitlines(), range(len(expected))):
if self.benchmark_version in line:
# replace benchmark version
line = self.replace_benchmark_version(line, output_text, index)
self.assertEqual(line, output_text[index])
def test_be_report_datasets_excel(self):
stdout, stderr = self.execute_script("be_report", ["datasets", "-x"])
self.assertEqual(stderr.getvalue(), "")
file_name = f"report_datasets{self.ext}"
with open(os.path.join(self.test_files, file_name)) as f:
expected = f.read()
output_text = stdout.getvalue().splitlines()
for line, index in zip(expected.splitlines(), range(len(expected))):
if self.benchmark_version in line:
# replace benchmark version
line = self.replace_benchmark_version(line, output_text, index)
self.assertEqual(line, output_text[index])
file_name = os.path.join(os.getcwd(), Files.datasets_report_excel)
book = load_workbook(file_name)
sheet = book["Datasets"]
self.check_excel_sheet(
sheet,
"exreport_excel_Datasets",
replace=self.benchmark_version,
with_this=__version__,
)
def test_be_report_best(self):
stdout, stderr = self.execute_script(
"be_report", ["best", "-s", "accuracy", "-m", "STree"]
)
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "report_best")
def test_be_report_grid(self):
stdout, stderr = self.execute_script(
"be_report", ["grid", "-s", "accuracy", "-m", "STree"]
)
self.assertEqual(stderr.getvalue(), "")
file_name = "report_grid.test"
with open(os.path.join(self.test_files, file_name)) as f:
expected = f.read().splitlines()
output_text = stdout.getvalue().splitlines()
# Compare replacing STree version
for line, index in zip(expected, range(len(expected))):
if "1.2.4" in line:
# replace STree version
line = self.replace_STree_version(line, output_text, index)
self.assertEqual(line, output_text[index])
@patch("sys.stderr", new_callable=StringIO)
def test_be_report_unknown_subcommand(self, stderr):
with self.assertRaises(SystemExit) as msg:
module = self.search_script("be_report")
module.main(["unknown"])
self.assertEqual(msg.exception.code, 2)
self.check_output_file(stderr, "report_unknown_subcommand")
def test_be_report_without_subcommand(self):
stdout, stderr = self.execute_script("be_report", "")
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "report_without_subcommand")
def test_be_report_excel_compared(self):
file_name = "results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json"
stdout, stderr = self.execute_script(
"be_report",
["file", file_name, "-x", "-c"],
)
file_name = os.path.join(
Folders.results, file_name.replace(".json", ".xlsx")
)
book = load_workbook(file_name)
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel_compared")
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "report_compared")
def test_be_report_excel(self):
file_name = "results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json"
stdout, stderr = self.execute_script(
"be_report",
["file", file_name, "-x"],
)
file_name = os.path.join(
Folders.results, file_name.replace(".json", ".xlsx")
)
book = load_workbook(file_name)
sheet = book["STree"]
self.check_excel_sheet(sheet, "excel")
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "report")
def test_be_report_sql(self):
file_name = "results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json"
stdout, stderr = self.execute_script(
"be_report",
["file", file_name, "-q"],
)
file_name = os.path.join(
Folders.results, file_name.replace(".json", ".sql")
)
self.check_file_file(file_name, "sql")
self.assertEqual(stderr.getvalue(), "")

View File

@@ -0,0 +1,31 @@
from ..TestBase import TestBase
class BeSummaryTest(TestBase):
def setUp(self):
self.prepare_scripts_env()
def tearDown(self) -> None:
pass
def test_be_summary_list_results_model(self):
stdout, stderr = self.execute_script("be_summary", ["-m", "STree"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_summary_list_model")
def test_be_summary_list_results_score(self):
stdout, stderr = self.execute_script("be_summary", ["-s", "accuracy"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_summary_list_score")
def test_be_summary_list_results_score_all(self):
stdout, stderr = self.execute_script("be_summary", ["-s", "all"])
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_summary_list_score_all")
def test_summary_list_results_model_score(self):
stdout, stderr = self.execute_script(
"be_summary", ["-s", "accuracy", "-m", "ODTE"]
)
self.assertEqual(stderr.getvalue(), "")
self.check_output_file(stdout, "be_summary_list_score_model")

View File

@@ -0,0 +1,32 @@
Dataset ODTE RandomForest STree
============================== ============= ============= =============
balance-scale 0.96352±0.025 0.83616±0.026 0.97056±0.015
balloons 0.78500±0.246 0.62500±0.250 0.86000±0.285
Model File Name Score
============================== =========================================================================== ========
ODTE results/results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341
RandomForest results/results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627
STree results/results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544
****************************************************************************************************
Benchmark Ok
****************************************************************************************************
---------------------------------------------------------------------
Friedman test, objetive maximize output variable accuracy. Obtained p-value: 1.3534e-01
Chi squared with 2 degrees of freedom statistic: 4.0000
Test accepted: p-value: 1.3534e-01 >= 0.0500
---------------------------------------------------------------------
Control post hoc test for output accuracy
Adjust method: Holm
Control method: STree
p-values:
ODTE 0.3173
RandomForest 0.0910
---------------------------------------------------------------------
$testMultiple
classifier pvalue rank win tie loss
STree STree NA 1 NA NA NA
ODTE ODTE 0.31731051 2 2 0 0
RandomForest RandomForest 0.09100053 3 2 0 0

View File

@@ -0,0 +1,33 @@
Dataset ODTE RandomForest STree
============================== ============= ============= =============
balance-scale 0.96352±0.025 0.83616±0.026 0.97056±0.015
balloons 0.78500±0.246 0.62500±0.250 0.86000±0.285
Model File Name Score
============================== =========================================================================== ========
ODTE results/results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341
RandomForest results/results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627
STree results/results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544
****************************************************************************************************
Benchmark Ok
****************************************************************************************************
---------------------------------------------------------------------
Friedman test, objetive maximize output variable accuracy. Obtained p-value: 1.3534e-01
Chi squared with 2 degrees of freedom statistic: 4.0000
Test accepted: p-value: 1.3534e-01 >= 0.0500
---------------------------------------------------------------------
Control post hoc test for output accuracy
Adjust method: Holm
Control method: STree
p-values:
ODTE 0.3173
RandomForest 0.0910
---------------------------------------------------------------------
$testMultiple
classifier pvalue rank win tie loss
STree STree NA 1 NA NA NA
ODTE ODTE 0.31731051 2 2 0 0
RandomForest RandomForest 0.09100053 3 2 0 0
File exreport/exreport_accuracy.tex generated

View File

@@ -0,0 +1,60 @@
balance-scale results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json
----------------------------------------------------------------------------------------------------
0.8361600 {}
----------------------------------------------------------------------------------------------------
Test default paramters with RandomForest
****************************************************************************************************
balloons results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json
----------------------------------------------------------------------------------------------------
0.5566667 {"max_features": "auto", "splitter": "mutual"}
----------------------------------------------------------------------------------------------------
default B
****************************************************************************************************
balance-scale
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balloons
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balance-scale
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balloons
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balance-scale
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balloons
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balance-scale
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************
balloons
----------------------------------------------------------------------------------------------------
1.0000000 ""
----------------------------------------------------------------------------------------------------
****************************************************************************************************

View File

@@ -0,0 +1,11 @@
******************************************************************************************************************************************************************
* Report Best accuracy Scores with ODTE in any platform *
******************************************************************************************************************************************************************
Dataset Score File/Message Hyperparameters
============================== ======== ============================================================================ =============================================
balance-scale 0.963520 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json {'base_estimator__C': 57, 'base_estimator__gamma': 0.1, 'base_estimator__kernel': 'rbf', 'base_estimator__multiclass_strategy': 'ovr', 'n_estimators': 100, 'n_jobs': -1}
balloons 0.785000 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json {'base_estimator__C': 5, 'base_estimator__gamma': 0.14, 'base_estimator__kernel': 'rbf', 'base_estimator__multiclass_strategy': 'ovr', 'n_estimators': 100, 'n_jobs': -1}
******************************************************************************************************************************************************************
* accuracy compared to STree_default (liblinear-ovr) .: 0.0434 *
******************************************************************************************************************************************************************

View File

@@ -0,0 +1,105 @@
[
{
"n_jobs": [
-1
],
"n_estimators": [
100
],
"base_estimator__C": [
1.0
],
"base_estimator__kernel": [
"linear"
],
"base_estimator__multiclass_strategy": [
"ovo"
]
},
{
"n_jobs": [
-1
],
"n_estimators": [
100
],
"base_estimator__C": [
0.001,
0.0275,
0.05,
0.08,
0.2,
0.25,
0.95,
1.0,
1.75,
7,
10000.0
],
"base_estimator__kernel": [
"liblinear"
],
"base_estimator__multiclass_strategy": [
"ovr"
]
},
{
"n_jobs": [
-1
],
"n_estimators": [
100
],
"base_estimator__C": [
0.05,
1.0,
1.05,
2,
2.8,
2.83,
5,
7,
57,
10000.0
],
"base_estimator__gamma": [
0.001,
0.1,
0.14,
10.0,
"auto",
"scale"
],
"base_estimator__kernel": [
"rbf"
],
"base_estimator__multiclass_strategy": [
"ovr"
]
},
{
"n_jobs": [
-1
],
"n_estimators": [
100
],
"base_estimator__C": [
0.05,
0.2,
1.0,
8.25
],
"base_estimator__gamma": [
0.1,
"scale"
],
"base_estimator__kernel": [
"poly"
],
"base_estimator__multiclass_strategy": [
"ovo",
"ovr"
]
}
]

View File

@@ -0,0 +1,20 @@
{
"balance-scale": [
0.9119999999999999,
{
"C": 1.0,
"kernel": "liblinear",
"multiclass_strategy": "ovr"
},
"v. 1.2.4, Computed on iMac27 on 2022-05-07 at 23:29:25 took 0.962s"
],
"balloons": [
0.7,
{
"C": 1.0,
"kernel": "linear",
"multiclass_strategy": "ovr"
},
"v. 1.2.4, Computed on iMac27 on 2022-05-07 at 23:29:25 took 1.232s"
]
}

View File

@@ -0,0 +1,10 @@
Creating folder test_project
Creating folder test_project/results
Creating folder test_project/hidden_results
Creating folder test_project/exreport
Creating folder test_project/exreport/exreport_output
Creating folder test_project/img
Done!
Please, edit .env file with your settings and add a datasets folder
with an all.txt file with the datasets you want to use.
In that folder you have to include all the datasets you'll use.

View File

@@ -0,0 +1,2 @@
usage: be_init_project [-h] project_name
be_init_project: error: the following arguments are required: project_name

View File

@@ -0,0 +1,5 @@
 # Date File Score Time(h) Title
=== ========== ================================================================ ======== ======= =======================
 0 2022-05-04 results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json nan 3.091 Default hyperparameters
 1 2021-11-01 results_accuracy_STree_iMac27_2021-11-01_23:55:16_0.json 0.97446 0.098 default
Already hidden

View File

@@ -0,0 +1,7 @@
 # Date File Score Time(h) Title
=== ========== =============================================================== ======== ======= ============================================
 0 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 1 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest
 2 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 3 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 4 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters

View File

@@ -0,0 +1,16 @@
 # Date File Score Time(h) Title
=== ========== ================================================================ ======== ======= ============================================
 0 2022-05-04 results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json nan 3.091 Default hyperparameters
 1 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 2 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest
 3 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 4 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 5 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
Deleting results/results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json
 # Date File Score Time(h) Title
=== ========== =============================================================== ======== ======= ============================================
 0 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 1 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest
 2 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 3 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 4 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters

View File

@@ -0,0 +1,4 @@
 # Date File Score Time(h) Title
=== ========== ================================================================ ======== ======= =======================
 0 2022-05-04 results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json nan 3.091 Default hyperparameters
 1 2021-11-01 results_accuracy_STree_iMac27_2021-11-01_23:55:16_0.json 0.97446 0.098 default

View File

@@ -0,0 +1,3 @@
 # Date File Score Time(h) Title
=== ========== ================================================================ ======== ======= =======================
 0 2022-05-04 results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json nan 3.091 Default hyperparameters

View File

@@ -0,0 +1,16 @@
 # Date File Score Time(h) Title
=== ========== ================================================================ ======== ======= ============================================
 0 2022-05-04 results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json nan 3.091 Default hyperparameters
 1 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 2 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest
 3 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 4 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 5 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
Hiding results/results_accuracy_XGBoost_MacBookpro16_2022-05-04_11:00:35_0.json
 # Date File Score Time(h) Title
=== ========== =============================================================== ======== ======= ============================================
 0 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 1 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest
 2 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 3 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 4 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters

View File

@@ -0,0 +1,5 @@
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters

View File

@@ -0,0 +1,10 @@
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters

View File

@@ -0,0 +1,6 @@
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
Invalid option. Try again!

View File

@@ -0,0 +1,20 @@
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
*************************************************************************************************************************
* STree ver. 1.2.3 Python ver. 3.11x with 5 Folds cross validation and 10 random seeds. 2021-11-01 19:17:07 *
* default B *
* Random seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Stratified: False *
* Execution took 4115.04 seconds, 1.14 hours, on macbook-pro *
* Score is accuracy *
*************************************************************************************************************************
Dataset Sampl. Feat. Cls Nodes Leaves Depth Score Time Hyperparameters
============================== ====== ===== === ======= ======= ======= =============== ================= ===============
balance-scale 625 4 3 18.78 9.88 5.90 0.970000±0.0020 0.233304±0.0481 {'max_features': 'auto', 'splitter': 'mutual'}
balloons 16 4 2 4.72 2.86 2.78 0.556667±0.2941 0.021352±0.0058 {'max_features': 'auto', 'splitter': 'mutual'}
*************************************************************************************************************************
* accuracy compared to STree_default (liblinear-ovr) .: 0.0379 *
*************************************************************************************************************************

View File

@@ -0,0 +1,7 @@
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
Added results/results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json to some_results.xlsx
Generated file: some_results.xlsx

View File

@@ -0,0 +1,8 @@
 # Date File Score Time(h) Title
=== ========== ============================================================= ======== ======= =================================
 0 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 1 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 2 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
Added results/results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json to some_results.xlsx
Added results/results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json to some_results.xlsx
Generated file: some_results.xlsx

View File

@@ -0,0 +1,16 @@
*************************************************************************************************************************
* STree ver. 1.2.4 Python ver. 3.11x with 5 Folds cross validation and 10 random seeds. 2022-05-09 00:15:25 *
* test *
* Random seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Stratified: False *
* Execution took 0.80 seconds, 0.00 hours, on iMac27 *
* Score is accuracy *
*************************************************************************************************************************
Dataset Sampl. Feat. Cls Nodes Leaves Depth Score Time Hyperparameters
============================== ====== ===== === ======= ======= ======= =============== ================= ===============
balance-scale 625 4 3 23.32 12.16 6.44 0.840160±0.0304 0.013745±0.0019 {'splitter': 'best', 'max_features': 'auto'}
balloons 16 4 2 3.00 2.00 2.00 0.860000±0.2850 0.000388±0.0000 {'C': 7, 'gamma': 0.1, 'kernel': 'rbf', 'max_iter': 10000.0, 'multiclass_strategy': 'ovr'}
*************************************************************************************************************************
* accuracy compared to STree_default (liblinear-ovr) .: 0.0422 *
*************************************************************************************************************************
Results in results/results_accuracy_STree_iMac27_2022-05-09_00:15:25_0.json

View File

@@ -0,0 +1,16 @@
*************************************************************************************************************************
* STree ver. 1.2.4 Python ver. 3.11x with 5 Folds cross validation and 10 random seeds. 2022-05-08 20:14:43 *
* test *
* Random seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Stratified: False *
* Execution took 0.48 seconds, 0.00 hours, on iMac27 *
* Score is accuracy *
*************************************************************************************************************************
Dataset Sampl. Feat. Cls Nodes Leaves Depth Score Time Hyperparameters
============================== ====== ===== === ======= ======= ======= =============== ================= ===============
balance-scale 625 4 3 17.36 9.18 6.18 0.908480±0.0247 0.007388±0.0013 {}
balloons 16 4 2 4.64 2.82 2.66 0.663333±0.3009 0.000664±0.0002 {}
*************************************************************************************************************************
* accuracy compared to STree_default (liblinear-ovr) .: 0.0390 *
*************************************************************************************************************************
Results in results/results_accuracy_STree_iMac27_2022-05-08_20:14:43_0.json

View File

@@ -0,0 +1,15 @@
*************************************************************************************************************************
* STree ver. 1.2.4 Python ver. 3.11x with 5 Folds cross validation and 10 random seeds. 2022-05-08 19:38:28 *
* Test with only one dataset *
* Random seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Stratified: False *
* Execution took 0.06 seconds, 0.00 hours, on iMac27 *
* Score is accuracy *
*************************************************************************************************************************
Dataset Sampl. Feat. Cls Nodes Leaves Depth Score Time Hyperparameters
============================== ====== ===== === ======= ======= ======= =============== ================= ===============
balloons 16 4 2 4.64 2.82 2.66 0.663333±0.3009 0.000671±0.0001 {}
*************************************************************************************************************************
* accuracy compared to STree_default (liblinear-ovr) .: 0.0165 *
*************************************************************************************************************************
Partial result file removed: results/results_accuracy_STree_iMac27_2022-05-08_19:38:28_0.json

View File

@@ -0,0 +1,16 @@
*************************************************************************************************************************
* STree ver. 1.2.4 Python ver. 3.11x with 5 Folds cross validation and 10 random seeds. 2022-05-09 00:21:06 *
* test *
* Random seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Stratified: False *
* Execution took 0.89 seconds, 0.00 hours, on iMac27 *
* Score is accuracy *
*************************************************************************************************************************
Dataset Sampl. Feat. Cls Nodes Leaves Depth Score Time Hyperparameters
============================== ====== ===== === ======= ======= ======= =============== ================= ===============
balance-scale 625 4 3 26.12 13.56 7.94 0.910720±0.0249 0.015852±0.0027 {'C': 1.0, 'kernel': 'liblinear', 'multiclass_strategy': 'ovr'}
balloons 16 4 2 4.64 2.82 2.66 0.663333±0.3009 0.000640±0.0001 {'C': 1.0, 'kernel': 'linear', 'multiclass_strategy': 'ovr'}
*************************************************************************************************************************
* accuracy compared to STree_default (liblinear-ovr) .: 0.0391 *
*************************************************************************************************************************
Results in results/results_accuracy_STree_iMac27_2022-05-09_00:21:06_0.json

View File

@@ -0,0 +1,35 @@
*********************************************************************************
* BEST RESULT of accuracy for STree *
*-------------------------------------------------------------------------------*
* *
*  With gridsearched hyperparameters  *
* *
* Model: STree Ver. 1.2.3 Score: accuracy Metric:  0.0454434 *
* *
* Date : 2021-09-30  Time: 11:42:07 Time Spent:  624.25 secs. *
* Seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Platform: iMac27 *
* Stratified: False *
* results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json *
* *
*********************************************************************************
*********************************************************************************
* BEST RESULT of accuracy *
*-------------------------------------------------------------------------------*
* *
*  With gridsearched hyperparameters  *
* *
* Model: STree Ver. 1.2.3 Score: accuracy Metric:  0.0454434 *
* *
* Date : 2021-09-30  Time: 11:42:07 Time Spent:  624.25 secs. *
* Seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Platform: iMac27 *
* Stratified: False *
* results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json *
* *
*********************************************************************************
 # Date File Score Time(h) Title
=== ========== =============================================================== ======== ======= ============================================
 0 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
 1 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 2 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 3 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 4 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest

View File

@@ -0,0 +1,35 @@
*********************************************************************************
* BEST RESULT of accuracy for ODTE *
*-------------------------------------------------------------------------------*
* *
*  Gridsearched hyperparams v022.1b random_init  *
* *
* Model: ODTE Ver. 0.3.2 Score: accuracy Metric:  0.0434068 *
* *
* Date : 2022-04-20  Time: 10:52:20 Time Spent: 22,591.47 secs. *
* Seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Platform: Galgo *
* Stratified: False *
* results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json *
* *
*********************************************************************************
*********************************************************************************
* BEST RESULT of accuracy *
*-------------------------------------------------------------------------------*
* *
*  With gridsearched hyperparameters  *
* *
* Model: STree Ver. 1.2.3 Score: accuracy Metric:  0.0454434 *
* *
* Date : 2021-09-30  Time: 11:42:07 Time Spent:  624.25 secs. *
* Seeds: [57, 31, 1714, 17, 23, 79, 83, 97, 7, 1] Platform: iMac27 *
* Stratified: False *
* results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json *
* *
*********************************************************************************
 # Date File Score Time(h) Title
=== ========== =============================================================== ======== ======= ============================================
 0 2021-09-30 results_accuracy_STree_iMac27_2021-09-30_11:42:07_0.json 0.04544 0.173 With gridsearched hyperparameters
 1 2022-04-20 results_accuracy_ODTE_Galgo_2022-04-20_10:52:20_0.json 0.04341 6.275 Gridsearched hyperparams v022.1b random_init
 2 2021-10-27 results_accuracy_STree_iMac27_2021-10-27_09:40:40_0.json 0.04158 0.943 default A
 3 2021-11-01 results_accuracy_STree_macbook-pro_2021-11-01_19:17:07_0.json 0.03790 1.143 default B
 4 2022-01-14 results_accuracy_RandomForest_iMac27_2022-01-14_12:39:30_0.json 0.03627 0.076 Test default paramters with RandomForest

Some files were not shown because too many files have changed in this diff Show More