From ba193c01dc5b34633d7a66483f3454edd939241b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ricardo=20Monta=C3=B1ana=20G=C3=B3mez?= Date: Mon, 10 May 2021 12:35:34 +0200 Subject: [PATCH] Updated Home (markdown) --- Home.md | 43 +++++++++++++++++++++++++------------------ 1 file changed, 25 insertions(+), 18 deletions(-) diff --git a/Home.md b/Home.md index 567b016..b8ac6e9 100644 --- a/Home.md +++ b/Home.md @@ -8,15 +8,16 @@ Oblique Tree classifier based on SVM nodes. The nodes are built and splitted wit ![Stree](https://raw.github.com/doctorado-ml/stree/master/example.png) -[[Information relative to STree experimentation]] - - ## Installation ```bash pip install git+https://github.com/doctorado-ml/stree ``` +## Documentation + +Can be found in + ## Examples ### Jupyter notebooks @@ -33,21 +34,23 @@ pip install git+https://github.com/doctorado-ml/stree ## Hyperparameters -| | **Hyperparameter** | **Type/Values** | **Default** | **Meaning** | -| --- | ------------------ | ------------------------------------------------------ | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| \* | C | \ | 1.0 | Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. | -| \* | kernel | {"linear", "poly", "rbf"} | linear | Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’ or ‘rbf’. | -| \* | max_iter | \ | 1e5 | Hard limit on iterations within solver, or -1 for no limit. | -| \* | random_state | \ | None | Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False.
Pass an int for reproducible output across multiple function calls | -| | max_depth | \ | None | Specifies the maximum depth of the tree | -| \* | tol | \ | 1e-4 | Tolerance for stopping criterion. | -| \* | degree | \ | 3 | Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels. | -| \* | gamma | {"scale", "auto"} or \ | scale | Kernel coefficient for ‘rbf’ and ‘poly’.
if gamma='scale' (default) is passed then it uses 1 / (n_features \* X.var()) as value of gamma,
if ‘auto’, uses 1 / n_features. | -| | split_criteria | {"impurity", "max_samples"} | impurity | Decides (just in case of a multi class classification) which column (class) use to split the dataset in a node\*\* | -| | criterion | {“gini”, “entropy”} | entropy | The function to measure the quality of a split (only used if max_features != num_features).
Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. | -| | min_samples_split | \ | 0 | The minimum number of samples required to split an internal node. 0 (default) for any | -| | max_features | \, \

or {“auto”, “sqrt”, “log2”} | None | The number of features to consider when looking for the split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features \* n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features. | -| | splitter | {"best", "random"} | random | The strategy used to choose the feature set at each node (only used if max_features != num_features).
Supported strategies are “best” to choose the best feature set and “random” to choose a random combination.
The algorithm generates 5 candidates at most to choose from in both strategies. | +| | **Hyperparameter** | **Type/Values** | **Default** | **Meaning** | +| --- | ------------------- | ------------------------------------------------------ | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| \* | C | \ | 1.0 | Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. | +| \* | kernel | {"liblinear", "linear", "poly", "rbf", "sigmoid"} | linear | Specifies the kernel type to be used in the algorithm. It must be one of ‘liblinear’, ‘linear’, ‘poly’ or ‘rbf’. liblinear uses [liblinear](https://www.csie.ntu.edu.tw/~cjlin/liblinear/) library and the rest uses [libsvm](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) library through scikit-learn library | +| \* | max_iter | \ | 1e5 | Hard limit on iterations within solver, or -1 for no limit. | +| \* | random_state | \ | None | Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False.
Pass an int for reproducible output across multiple function calls | +| | max_depth | \ | None | Specifies the maximum depth of the tree | +| \* | tol | \ | 1e-4 | Tolerance for stopping criterion. | +| \* | degree | \ | 3 | Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels. | +| \* | gamma | {"scale", "auto"} or \ | scale | Kernel coefficient for ‘rbf’ and ‘poly’.
if gamma='scale' (default) is passed then it uses 1 / (n_features \* X.var()) as value of gamma,
if ‘auto’, uses 1 / n_features. | +| | split_criteria | {"impurity", "max_samples"} | impurity | Decides (just in case of a multi class classification) which column (class) use to split the dataset in a node\*\*. max_samples is incompatible with 'ovo' multiclass_strategy | +| | criterion | {“gini”, “entropy”} | entropy | The function to measure the quality of a split (only used if max_features != num_features).
Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. | +| | min_samples_split | \ | 0 | The minimum number of samples required to split an internal node. 0 (default) for any | +| | max_features | \, \

or {“auto”, “sqrt”, “log2”} | None | The number of features to consider when looking for the split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features \* n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features. | +| | splitter | {"best", "random", "mutual"} | "random" | The strategy used to choose the feature set at each node (only used if max_features < num_features). Supported strategies are: **“best”**: sklearn SelectKBest algorithm is used in every node to choose the max_features best features. **“random”**: The algorithm generates 5 candidates and choose one randomly. **"mutual"**: Chooses the best features w.r.t. their mutual info with the label | +| | normalize | \ | False | If standardization of features should be applied on each node with the samples that reach it | +| \* | multiclass_strategy | {"ovo", "ovr"} | "ovo" | Strategy to use with multiclass datasets, **"ovo"**: one versus one. **"ovr"**: one versus rest | \* Hyperparameter used by the support vector classifier of every node @@ -64,3 +67,7 @@ Once we have the column to take into account for the split, the algorithm splits ```bash python -m unittest -v stree.tests ``` + +## License + +STree is [MIT](https://github.com/doctorado-ml/stree/blob/master/LICENSE) licensed