mirror of
https://github.com/rmontanana/mdlp.git
synced 2025-08-15 15:35:55 +00:00
v1.1.2
-Fix a big mistake in sortIndices method (removed unneeded loop) -Add three hyperparameters to algorithm: * max_depth: maximum level of recursion when looking for cut point candidates. * min_length: minimum length of the interval of samples to be searched for candidates. * max_cut: Maximum number of cutpoints. This could be achieved in two ways: a natural number meaning the maximum number of outpoints in each feature of the dataset, or this number could be a number int the range (0, 1) meaning a proportion of the number of samples.
mdlp
Discretization algorithm based on the paper by Fayyad & Irani Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
The implementation tries to mitigate the problem of different label values with the same value of the variable:
- Sorts the values of the variable using the label values as a tie-breaker
- Once found a valid candidate for the split, it checks if the previous value is the same as actual one, and tries to get previous one, or next if the former is not possible.
Other features:
- Intervals with the same value of the variable are not taken into account for cutpoints.
- Intervals have to have more than two examples to be evaluated.
The algorithm returns the cut points for the variable.
Sample
To run the sample, just execute the following commands:
cd sample
mkdir build
cd build
cmake ..
make
./sample -f iris -m 2
./sample -h
Test
To run the tests, execute the following commands:
cd tests
./test
Languages
C++
73.6%
Python
8.6%
Jupyter Notebook
6.6%
CMake
4.3%
Makefile
3.5%
Other
3.4%