Files
mdlp/README.md
Ricardo Montañana Gómez e36d9af8f9 Fix BinDisc quantile mistakes (#9)
* Fix BinDisc quantile mistakes

* Fix FImdlp tests

* Fix tests, samples and remove uneeded support files

* Add coypright header to sources
Fix coverage report
Add coverage badge to README

* Update sonar github action

* Move sources to a folder and change ArffFiles files to library

* Add recursive submodules to github action
2024-07-04 17:27:39 +02:00

2.1 KiB

Build Quality Gate Status Reliability Rating Coverage Badge

logo mdlp

Discretization algorithm based on the paper by Fayyad & Irani Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning

The implementation tries to mitigate the problem of different label values with the same value of the variable:

  • Sorts the values of the variable using the label values as a tie-breaker
  • Once found a valid candidate for the split, it checks if the previous value is the same as actual one, and tries to get previous one, or next if the former is not possible.

Other features:

  • Intervals with the same value of the variable are not taken into account for cutpoints.

  • Intervals have to have more than two examples to be evaluated (mdlp).

  • The algorithm returns the cut points for the variable.

  • The transform method uses the cut points returning its index in the following way:

      cut[i - 1] <= x < cut[i]
    

    using the std::upper_bound method

  • K-Bins discretization is also implemented, and "quantile" and "uniform" strategies are available.

Sample

To run the sample, just execute the following commands:

make build
build_release/sample/sample -f iris -m 2
build_release/sample/sample -h

Test

To run the tests and see coverage (llvm with lcov and genhtml have to be installed), execute the following commands:

make test