Pyclassifiers/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

PyClassifiers is a C++ library that provides wrappers for Python machine learning classifiers. It enables C++ applications to use Python-based ML algorithms (scikit-learn, XGBoost, custom implementations) through a unified interface.

## Essential Commands

### Build System
```bash
# Setup build configurations
make debug          # Configure debug build with testing and coverage
make release         # Configure release build

# Build targets
make buildd          # Build debug version
make buildr          # Build release version

# Testing
make test            # Run all unit tests
make test opt="-s"   # Run tests with verbose output
make test opt="-c='Test Name'" # Run specific test section

# Coverage
make coverage        # Run tests and generate coverage report

# Installation
sudo make install    # Install library to system (requires release build)

# Utilities
make clean           # Clean test artifacts
make help            # Show all available targets
```

### Dependencies
- Requires `VCPKG_ROOT` environment variable set
- Miniconda installation required for Python classifiers
- Boost library (preferably system package: `sudo dnf install boost-devel`)

## Architecture

### Core Components

**PyWrap** (`pyclfs/PyWrap.h`): Singleton managing Python interpreter lifecycle and thread-safe Python/C++ communication.

**PyClassifier** (`pyclfs/PyClassifier.h`): Abstract base class inheriting from `bayesnet::BaseClassifier`. All Python classifier wrappers extend this class.

**Individual Classifiers**: Each classifier (STree, ODTE, SVC, RandomForest, XGBoost, AdaBoostPy) wraps specific Python modules with consistent C++ interface.

### Data Flow
- Uses PyTorch tensors for efficient C++/Python data exchange
- JSON-based hyperparameter configuration
- Automatic memory management for Python objects

## Key Directories

- `pyclfs/` - Core library source code
- `tests/` - Catch2 unit tests with ARFF test datasets
- `build_debug/` - Debug build artifacts
- `build_release/` - Release build artifacts
- `cmake/modules/` - Custom CMake modules

## Development Patterns

### Adding New Classifiers
1. Inherit from `PyClassifier` base class
2. Implement required virtual methods: `fit()`, `predict()`, `predict_proba()`
3. Use `PyWrap::getInstance()` for Python interpreter access
4. Handle hyperparameters via JSON configuration
5. Add corresponding unit tests in `tests/TestPythonClassifiers.cc`

### Python Integration
- All Python interactions go through PyWrap singleton
- Use RAII pattern for Python object management
- Convert data using PyTorch tensors (discrete/continuous data support)
- Handle Python exceptions and convert to C++ exceptions

### Testing
- Catch2 framework with parameterized tests using GENERATE()
- Test data in ARFF format located in `tests/data/`
- Performance benchmarks validate expected accuracy scores
- Coverage reports generated with gcovr

## Important Files

- `pyclfs/PyWrap.h` - Python interpreter management
- `pyclfs/PyClassifier.h` - Base classifier interface
- `CMakeLists.txt` - Main build configuration
- `Makefile` - Build automation and common tasks
- `vcpkg.json` - Package dependencies
- `tests/TestPythonClassifiers.cc` - Main test suite

## Technical Requirements

- C++17 standard compliance
- Python 3.11+ required
- Boost library with Python and NumPy support
- PyTorch for tensor operations
- Thread-safe design for concurrent usage