# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Platform is a C++ machine learning framework for running experiments with Bayesian Networks and other classifiers. It supports both research-focused experimental classifiers and production-ready models through a unified interface. ## Build System The project uses CMake with Make as the primary build system: - **Release build**: `make release` (creates `build_Release/` directory) - **Debug build**: `make debug` (creates `build_Debug/` directory with testing and coverage enabled) - **Install binaries**: `make install` (copies executables to `~/bin` by default) - **Clean project**: `make clean` (removes build directories) - **Initialize dependencies**: `make init` (runs conan install for both Release and Debug) ### Testing - **Run tests**: `make test` (builds debug version and runs all tests) - **Coverage report**: `make coverage` (runs tests and generates coverage with gcovr) - **Single test with options**: `make test opt="-s"` (verbose) or `make test opt="-c='Test Name'"` (specific test) ### Build Targets Main executables (built from `src/commands/`): - `b_main`: Main experiment runner - `b_grid`: Grid search over hyperparameters - `b_best`: Best results analysis and comparison - `b_list`: Dataset listing and properties - `b_manage`: Results management interface - `b_results`: Results processing ## Dependencies The project uses Conan for package management with these key dependencies: - **libtorch**: PyTorch C++ backend for tensor operations - **nlohmann_json**: JSON processing - **catch2**: Unit testing framework - **cli11**: Command-line argument parsing (replacement for argparse) Custom dependencies (not available in ConanCenter): - **fimdlp**: MDLP discretization library (needs manual integration) - **folding**: Cross-validation utilities (needs manual integration) - **arff-files**: ARFF dataset file handling (needs manual integration) External dependencies (managed separately): - **BayesNet**: Core Bayesian network classifiers (from `../lib/`) - **PyClassifiers**: Python classifier wrappers (from `../lib/`) - **MPI**: Message Passing Interface for parallel processing - **Boost**: Python integration and utilities **Note**: Some dependencies (fimdlp, folding, arff-files) are not available in ConanCenter and need to be: - Built as custom Conan packages, or - Integrated using CMake FetchContent, or - Built separately and found via find_package ## Architecture ### Core Components **Experiment Framework** (`src/main/`): - `Experiment.cpp/h`: Main experiment orchestration - `Models.cpp/h`: Classifier factory and registration system - `Scores.cpp/h`: Performance metrics calculation - `HyperParameters.cpp/h`: Parameter management - `ArgumentsExperiment.cpp/h`: Command-line argument handling **Data Handling** (`src/common/`): - `Dataset.cpp/h`: Individual dataset representation - `Datasets.cpp/h`: Dataset collection management - `Discretization.cpp/h`: Data discretization utilities **Classifiers** (`src/experimental_clfs/`): - `AdaBoost.cpp/h`: Multi-class SAMME AdaBoost implementation - `DecisionTree.cpp/h`: Decision tree base classifier - `XA1DE.cpp/h`: Extended AODE variants - Experimental implementations of Bayesian network classifiers **Grid Search** (`src/grid/`): - `GridSearch.cpp/h`: Hyperparameter optimization - `GridExperiment.cpp/h`: Grid search experiment management - Uses MPI for parallel hyperparameter evaluation **Results & Reporting** (`src/results/`, `src/reports/`): - JSON-based result storage with schema validation - Excel export capabilities via libxlsxwriter - Console and paginated result display ### Model Registration System The framework uses a factory pattern with automatic registration: - All classifiers inherit from `bayesnet::BaseClassifier` - Registration happens in `src/main/modelRegister.h` - Factory creates instances by string name via `Models::create()` ## Configuration **Environment Configuration** (`.env` file): - `experiment`: Experiment name/type - `n_folds`: Cross-validation folds (default: 5) - `seeds`: Random seeds for reproducibility - `model`: Default classifier name - `score`: Primary evaluation metric - `platform`: System identifier for results **Grid Search Configuration**: - `grid__input.json`: Hyperparameter search space - `grid__output.json`: Search results ## Data Format **Dataset Requirements**: - ARFF format files in `datasets/` directory - `all.txt` file listing datasets: `,,` - Supports both discrete and continuous features - Automatic discretization available via MDLP **Experimental Data**: - Results stored in JSON format with versioned schemas - Test data in `tests/data/` for unit testing - Sample datasets: iris, diabetes, ecoli, glass, etc. ## Development Workflow 1. **Setup**: Run `make init` to install dependencies via Conan 2. **Development**: Use `make debug` for development builds with testing 3. **Testing**: Run `make test` after changes 4. **Release**: Use `make release` for optimized builds 5. **Experiments**: Use `.env` configuration and run `b_main` with appropriate flags ## Key Features - **Multi-threaded**: Uses MPI for parallel grid search and experiments - **Cross-platform**: Supports Linux and macOS via vcpkg - **Extensible**: Easy classifier registration and integration - **Research-focused**: Designed for machine learning experimentation - **Visualization**: DOT graph generation for decision trees and networks