Begin conan integration

2025-07-03 01:40:30 +02:00
parent 1ef7ca6180
commit 3d814a79c6
16 changed files with 350 additions and 127 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,139 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Platform is a C++ machine learning framework for running experiments with Bayesian Networks and other classifiers. It supports both research-focused experimental classifiers and production-ready models through a unified interface.
+
+## Build System
+
+The project uses CMake with Make as the primary build system:
+
+- **Release build**: `make release` (creates `build_Release/` directory)
+- **Debug build**: `make debug` (creates `build_Debug/` directory with testing and coverage enabled)
+- **Install binaries**: `make install` (copies executables to `~/bin` by default)
+- **Clean project**: `make clean` (removes build directories)
+- **Initialize dependencies**: `make init` (runs conan install for both Release and Debug)
+
+### Testing
+
+- **Run tests**: `make test` (builds debug version and runs all tests)
+- **Coverage report**: `make coverage` (runs tests and generates coverage with gcovr)
+- **Single test with options**: `make test opt="-s"` (verbose) or `make test opt="-c='Test Name'"` (specific test)
+
+### Build Targets
+
+Main executables (built from `src/commands/`):
+- `b_main`: Main experiment runner
+- `b_grid`: Grid search over hyperparameters 
+- `b_best`: Best results analysis and comparison
+- `b_list`: Dataset listing and properties
+- `b_manage`: Results management interface
+- `b_results`: Results processing
+
+## Dependencies
+
+The project uses Conan for package management with these key dependencies:
+- **libtorch**: PyTorch C++ backend for tensor operations
+- **nlohmann_json**: JSON processing
+- **catch2**: Unit testing framework
+- **cli11**: Command-line argument parsing (replacement for argparse)
+
+Custom dependencies (not available in ConanCenter):
+- **fimdlp**: MDLP discretization library (needs manual integration)
+- **folding**: Cross-validation utilities (needs manual integration)
+- **arff-files**: ARFF dataset file handling (needs manual integration)
+
+External dependencies (managed separately):
+- **BayesNet**: Core Bayesian network classifiers (from `../lib/`)
+- **PyClassifiers**: Python classifier wrappers (from `../lib/`)
+- **MPI**: Message Passing Interface for parallel processing
+- **Boost**: Python integration and utilities
+
+**Note**: Some dependencies (fimdlp, folding, arff-files) are not available in ConanCenter and need to be:
+- Built as custom Conan packages, or
+- Integrated using CMake FetchContent, or
+- Built separately and found via find_package
+
+## Architecture
+
+### Core Components
+
+**Experiment Framework** (`src/main/`):
+- `Experiment.cpp/h`: Main experiment orchestration
+- `Models.cpp/h`: Classifier factory and registration system
+- `Scores.cpp/h`: Performance metrics calculation
+- `HyperParameters.cpp/h`: Parameter management
+- `ArgumentsExperiment.cpp/h`: Command-line argument handling
+
+**Data Handling** (`src/common/`):
+- `Dataset.cpp/h`: Individual dataset representation
+- `Datasets.cpp/h`: Dataset collection management
+- `Discretization.cpp/h`: Data discretization utilities
+
+**Classifiers** (`src/experimental_clfs/`):
+- `AdaBoost.cpp/h`: Multi-class SAMME AdaBoost implementation
+- `DecisionTree.cpp/h`: Decision tree base classifier
+- `XA1DE.cpp/h`: Extended AODE variants
+- Experimental implementations of Bayesian network classifiers
+
+**Grid Search** (`src/grid/`):
+- `GridSearch.cpp/h`: Hyperparameter optimization
+- `GridExperiment.cpp/h`: Grid search experiment management
+- Uses MPI for parallel hyperparameter evaluation
+
+**Results & Reporting** (`src/results/`, `src/reports/`):
+- JSON-based result storage with schema validation
+- Excel export capabilities via libxlsxwriter
+- Console and paginated result display
+
+### Model Registration System
+
+The framework uses a factory pattern with automatic registration:
+- All classifiers inherit from `bayesnet::BaseClassifier`
+- Registration happens in `src/main/modelRegister.h`
+- Factory creates instances by string name via `Models::create()`
+
+## Configuration
+
+**Environment Configuration** (`.env` file):
+- `experiment`: Experiment name/type
+- `n_folds`: Cross-validation folds (default: 5)
+- `seeds`: Random seeds for reproducibility
+- `model`: Default classifier name
+- `score`: Primary evaluation metric
+- `platform`: System identifier for results
+
+**Grid Search Configuration**:
+- `grid_<model_name>_input.json`: Hyperparameter search space
+- `grid_<model_name>_output.json`: Search results
+
+## Data Format
+
+**Dataset Requirements**:
+- ARFF format files in `datasets/` directory
+- `all.txt` file listing datasets: `<name>,<class_name>,<real_features>`
+- Supports both discrete and continuous features
+- Automatic discretization available via MDLP
+
+**Experimental Data**:
+- Results stored in JSON format with versioned schemas
+- Test data in `tests/data/` for unit testing
+- Sample datasets: iris, diabetes, ecoli, glass, etc.
+
+## Development Workflow
+
+1. **Setup**: Run `make init` to install dependencies via Conan
+2. **Development**: Use `make debug` for development builds with testing
+3. **Testing**: Run `make test` after changes
+4. **Release**: Use `make release` for optimized builds
+5. **Experiments**: Use `.env` configuration and run `b_main` with appropriate flags
+
+## Key Features
+
+- **Multi-threaded**: Uses MPI for parallel grid search and experiments
+- **Cross-platform**: Supports Linux and macOS via vcpkg
+- **Extensible**: Easy classifier registration and integration
+- **Research-focused**: Designed for machine learning experimentation
+- **Visualization**: DOT graph generation for decision trees and networks