mirror of
https://github.com/rmontanana/mdlp.git
synced 2025-08-15 07:25:56 +00:00
2.6 KiB
2.6 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a C++ implementation of the MDLP (Minimum Description Length Principle) discretization algorithm based on Fayyad & Irani's paper. The library provides discretization methods for continuous-valued attributes in classification learning.
Build System
The project uses CMake with a Makefile wrapper for common tasks:
Common Commands
make build
- Build release version with sample programmake test
- Run full test suite with coverage reportmake install
- Install the library
Build Configurations
- Release: Built in
build_release/
directory - Debug: Built in
build_debug/
directory (for testing)
Dependencies
- PyTorch (libtorch) - Required dependency
- GoogleTest - Fetched automatically for testing
- Coverage tools: lcov, genhtml
Code Architecture
Core Components
- Discretizer (
src/Discretizer.h/cpp
) - Abstract base class for all discretizers - CPPFImdlp (
src/CPPFImdlp.h/cpp
) - Main MDLP algorithm implementation - BinDisc (
src/BinDisc.h/cpp
) - K-bins discretization (quantile/uniform strategies) - Metrics (
src/Metrics.h/cpp
) - Entropy and information gain calculations
Key Data Types
samples_t
- Input data sampleslabels_t
- Classification labelsindices_t
- Index arrays for sorting/processingprecision_t
- Floating-point precision type
Algorithm Flow
- Data is sorted using labels as tie-breakers for identical values
- MDLP recursively finds optimal cut points using entropy-based criteria
- Cut points are validated to ensure meaningful splits
- Transform method maps continuous values to discrete bins
Testing
Tests are built with GoogleTest and include:
Metrics_unittest
- Entropy/information gain testsFImdlp_unittest
- Core MDLP algorithm testsBinDisc_unittest
- K-bins discretization testsDiscretizer_unittest
- Base class functionality tests
Running Tests
make test # Runs all tests and generates coverage report
cd build_debug/tests && ctest # Run tests directly
Coverage reports are generated at build_debug/tests/coverage/index.html
.
Sample Usage
The sample program demonstrates basic usage:
build_release/sample/sample -f iris -m 2
Development Notes
- The library uses PyTorch tensors for efficient numerical operations
- Code follows C++17 standards
- Coverage is maintained at 100%
- The implementation handles edge cases like duplicate values and small intervals
- Conan package manager support is available via
conanfile.py