mirror of
https://github.com/rmontanana/mdlp.git
synced 2025-08-16 07:55:58 +00:00
Begin adding conan dependency manager
This commit is contained in:
77
CLAUDE.md
Normal file
77
CLAUDE.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
This is a C++ implementation of the MDLP (Minimum Description Length Principle) discretization algorithm based on Fayyad & Irani's paper. The library provides discretization methods for continuous-valued attributes in classification learning.
|
||||
|
||||
## Build System
|
||||
|
||||
The project uses CMake with a Makefile wrapper for common tasks:
|
||||
|
||||
### Common Commands
|
||||
- `make build` - Build release version with sample program
|
||||
- `make test` - Run full test suite with coverage report
|
||||
- `make install` - Install the library
|
||||
|
||||
### Build Configurations
|
||||
- **Release**: Built in `build_release/` directory
|
||||
- **Debug**: Built in `build_debug/` directory (for testing)
|
||||
|
||||
### Dependencies
|
||||
- PyTorch (libtorch) - Required dependency
|
||||
- GoogleTest - Fetched automatically for testing
|
||||
- Coverage tools: lcov, genhtml
|
||||
|
||||
## Code Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **Discretizer** (`src/Discretizer.h/cpp`) - Abstract base class for all discretizers
|
||||
2. **CPPFImdlp** (`src/CPPFImdlp.h/cpp`) - Main MDLP algorithm implementation
|
||||
3. **BinDisc** (`src/BinDisc.h/cpp`) - K-bins discretization (quantile/uniform strategies)
|
||||
4. **Metrics** (`src/Metrics.h/cpp`) - Entropy and information gain calculations
|
||||
|
||||
### Key Data Types
|
||||
- `samples_t` - Input data samples
|
||||
- `labels_t` - Classification labels
|
||||
- `indices_t` - Index arrays for sorting/processing
|
||||
- `precision_t` - Floating-point precision type
|
||||
|
||||
### Algorithm Flow
|
||||
1. Data is sorted using labels as tie-breakers for identical values
|
||||
2. MDLP recursively finds optimal cut points using entropy-based criteria
|
||||
3. Cut points are validated to ensure meaningful splits
|
||||
4. Transform method maps continuous values to discrete bins
|
||||
|
||||
## Testing
|
||||
|
||||
Tests are built with GoogleTest and include:
|
||||
- `Metrics_unittest` - Entropy/information gain tests
|
||||
- `FImdlp_unittest` - Core MDLP algorithm tests
|
||||
- `BinDisc_unittest` - K-bins discretization tests
|
||||
- `Discretizer_unittest` - Base class functionality tests
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
make test # Runs all tests and generates coverage report
|
||||
cd build_debug/tests && ctest # Run tests directly
|
||||
```
|
||||
|
||||
Coverage reports are generated at `build_debug/tests/coverage/index.html`.
|
||||
|
||||
## Sample Usage
|
||||
|
||||
The sample program demonstrates basic usage:
|
||||
```bash
|
||||
build_release/sample/sample -f iris -m 2
|
||||
```
|
||||
|
||||
## Development Notes
|
||||
|
||||
- The library uses PyTorch tensors for efficient numerical operations
|
||||
- Code follows C++17 standards
|
||||
- Coverage is maintained at 100%
|
||||
- The implementation handles edge cases like duplicate values and small intervals
|
||||
- Conan package manager support is available via `conanfile.py`
|
Reference in New Issue
Block a user