5.5 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Platform is a C++ machine learning framework for running experiments with Bayesian Networks and other classifiers. It supports both research-focused experimental classifiers and production-ready models through a unified interface.
Build System
The project uses CMake with Make as the primary build system:
- Release build:
make release
(createsbuild_Release/
directory) - Debug build:
make debug
(createsbuild_Debug/
directory with testing and coverage enabled) - Install binaries:
make install
(copies executables to~/bin
by default) - Clean project:
make clean
(removes build directories) - Initialize dependencies:
make init
(runs conan install for both Release and Debug)
Testing
- Run tests:
make test
(builds debug version and runs all tests) - Coverage report:
make coverage
(runs tests and generates coverage with gcovr) - Single test with options:
make test opt="-s"
(verbose) ormake test opt="-c='Test Name'"
(specific test)
Build Targets
Main executables (built from src/commands/
):
b_main
: Main experiment runnerb_grid
: Grid search over hyperparametersb_best
: Best results analysis and comparisonb_list
: Dataset listing and propertiesb_manage
: Results management interfaceb_results
: Results processing
Dependencies
The project uses Conan for package management with these key dependencies:
- libtorch: PyTorch C++ backend for tensor operations
- nlohmann_json: JSON processing
- catch2: Unit testing framework
- cli11: Command-line argument parsing (replacement for argparse)
Custom dependencies (not available in ConanCenter):
- fimdlp: MDLP discretization library (needs manual integration)
- folding: Cross-validation utilities (needs manual integration)
- arff-files: ARFF dataset file handling (needs manual integration)
External dependencies (managed separately):
- BayesNet: Core Bayesian network classifiers (from
../lib/
) - PyClassifiers: Python classifier wrappers (from
../lib/
) - MPI: Message Passing Interface for parallel processing
- Boost: Python integration and utilities
Note: Some dependencies (fimdlp, folding, arff-files) are not available in ConanCenter and need to be:
- Built as custom Conan packages, or
- Integrated using CMake FetchContent, or
- Built separately and found via find_package
Architecture
Core Components
Experiment Framework (src/main/
):
Experiment.cpp/h
: Main experiment orchestrationModels.cpp/h
: Classifier factory and registration systemScores.cpp/h
: Performance metrics calculationHyperParameters.cpp/h
: Parameter managementArgumentsExperiment.cpp/h
: Command-line argument handling
Data Handling (src/common/
):
Dataset.cpp/h
: Individual dataset representationDatasets.cpp/h
: Dataset collection managementDiscretization.cpp/h
: Data discretization utilities
Classifiers (src/experimental_clfs/
):
AdaBoost.cpp/h
: Multi-class SAMME AdaBoost implementationDecisionTree.cpp/h
: Decision tree base classifierXA1DE.cpp/h
: Extended AODE variants- Experimental implementations of Bayesian network classifiers
Grid Search (src/grid/
):
GridSearch.cpp/h
: Hyperparameter optimizationGridExperiment.cpp/h
: Grid search experiment management- Uses MPI for parallel hyperparameter evaluation
Results & Reporting (src/results/
, src/reports/
):
- JSON-based result storage with schema validation
- Excel export capabilities via libxlsxwriter
- Console and paginated result display
Model Registration System
The framework uses a factory pattern with automatic registration:
- All classifiers inherit from
bayesnet::BaseClassifier
- Registration happens in
src/main/modelRegister.h
- Factory creates instances by string name via
Models::create()
Configuration
Environment Configuration (.env
file):
experiment
: Experiment name/typen_folds
: Cross-validation folds (default: 5)seeds
: Random seeds for reproducibilitymodel
: Default classifier namescore
: Primary evaluation metricplatform
: System identifier for results
Grid Search Configuration:
grid_<model_name>_input.json
: Hyperparameter search spacegrid_<model_name>_output.json
: Search results
Data Format
Dataset Requirements:
- ARFF format files in
datasets/
directory all.txt
file listing datasets:<name>,<class_name>,<real_features>
- Supports both discrete and continuous features
- Automatic discretization available via MDLP
Experimental Data:
- Results stored in JSON format with versioned schemas
- Test data in
tests/data/
for unit testing - Sample datasets: iris, diabetes, ecoli, glass, etc.
Development Workflow
- Setup: Run
make init
to install dependencies via Conan - Development: Use
make debug
for development builds with testing - Testing: Run
make test
after changes - Release: Use
make release
for optimized builds - Experiments: Use
.env
configuration and runb_main
with appropriate flags
Key Features
- Multi-threaded: Uses MPI for parallel grid search and experiments
- Cross-platform: Supports Linux and macOS via vcpkg
- Extensible: Easy classifier registration and integration
- Research-focused: Designed for machine learning experimentation
- Visualization: DOT graph generation for decision trees and networks