CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

PyClassifiers is a C++ library that provides wrappers for Python machine learning classifiers. It enables C++ applications to use Python-based ML algorithms (scikit-learn, XGBoost, custom implementations) through a unified interface.

Essential Commands

Build System

# Setup build configurations
make debug          # Configure debug build with testing and coverage
make release         # Configure release build

# Build targets
make buildd          # Build debug version
make buildr          # Build release version

# Testing
make test            # Run all unit tests
make test opt="-s"   # Run tests with verbose output
make test opt="-c='Test Name'" # Run specific test section

# Coverage
make coverage        # Run tests and generate coverage report

# Installation
sudo make install    # Install library to system (requires release build)

# Utilities
make clean           # Clean test artifacts
make help            # Show all available targets

Dependencies

Requires Conan package manager (pip install conan)
Miniconda installation required for Python classifiers
Boost library (preferably system package: sudo dnf install boost-devel)

Architecture

Core Components

PyWrap (pyclfs/PyWrap.h): Singleton managing Python interpreter lifecycle and thread-safe Python/C++ communication.

PyClassifier (pyclfs/PyClassifier.h): Abstract base class inheriting from bayesnet::BaseClassifier. All Python classifier wrappers extend this class.

Individual Classifiers: Each classifier (STree, ODTE, SVC, RandomForest, XGBoost, AdaBoostPy) wraps specific Python modules with consistent C++ interface.

Data Flow

Uses PyTorch tensors for efficient C++/Python data exchange
JSON-based hyperparameter configuration
Automatic memory management for Python objects

Key Directories

pyclfs/ - Core library source code
tests/ - Catch2 unit tests with ARFF test datasets
build_debug/ - Debug build artifacts
build_release/ - Release build artifacts
cmake/modules/ - Custom CMake modules

Development Patterns

Adding New Classifiers

Inherit from PyClassifier base class
Implement required virtual methods: fit(), predict(), predict_proba()
Use PyWrap::getInstance() for Python interpreter access
Handle hyperparameters via JSON configuration
Add corresponding unit tests in tests/TestPythonClassifiers.cc

Python Integration

All Python interactions go through PyWrap singleton
Use RAII pattern for Python object management
Convert data using PyTorch tensors (discrete/continuous data support)
Handle Python exceptions and convert to C++ exceptions

Testing

Catch2 framework with parameterized tests using GENERATE()
Test data in ARFF format located in tests/data/
Performance benchmarks validate expected accuracy scores
Coverage reports generated with gcovr

Important Files

pyclfs/PyWrap.h - Python interpreter management
pyclfs/PyClassifier.h - Base classifier interface
CMakeLists.txt - Main build configuration
Makefile - Build automation and common tasks
conanfile.py - Package dependencies
tests/TestPythonClassifiers.cc - Main test suite

Technical Requirements

C++17 standard compliance
Python 3.11+ required
Boost library with Python and NumPy support
PyTorch for tensor operations
Thread-safe design for concurrent usage

3.5 KiB Raw Permalink Blame History