Files
Pyclassifiers/CLAUDE.md

3.5 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

PyClassifiers is a C++ library that provides wrappers for Python machine learning classifiers. It enables C++ applications to use Python-based ML algorithms (scikit-learn, XGBoost, custom implementations) through a unified interface.

Essential Commands

Build System

# Setup build configurations
make debug          # Configure debug build with testing and coverage
make release         # Configure release build

# Build targets
make buildd          # Build debug version
make buildr          # Build release version

# Testing
make test            # Run all unit tests
make test opt="-s"   # Run tests with verbose output
make test opt="-c='Test Name'" # Run specific test section

# Coverage
make coverage        # Run tests and generate coverage report

# Installation
sudo make install    # Install library to system (requires release build)

# Utilities
make clean           # Clean test artifacts
make help            # Show all available targets

Dependencies

  • Requires Conan package manager (pip install conan)
  • Miniconda installation required for Python classifiers
  • Boost library (preferably system package: sudo dnf install boost-devel)

Architecture

Core Components

PyWrap (pyclfs/PyWrap.h): Singleton managing Python interpreter lifecycle and thread-safe Python/C++ communication.

PyClassifier (pyclfs/PyClassifier.h): Abstract base class inheriting from bayesnet::BaseClassifier. All Python classifier wrappers extend this class.

Individual Classifiers: Each classifier (STree, ODTE, SVC, RandomForest, XGBoost, AdaBoostPy) wraps specific Python modules with consistent C++ interface.

Data Flow

  • Uses PyTorch tensors for efficient C++/Python data exchange
  • JSON-based hyperparameter configuration
  • Automatic memory management for Python objects

Key Directories

  • pyclfs/ - Core library source code
  • tests/ - Catch2 unit tests with ARFF test datasets
  • build_debug/ - Debug build artifacts
  • build_release/ - Release build artifacts
  • cmake/modules/ - Custom CMake modules

Development Patterns

Adding New Classifiers

  1. Inherit from PyClassifier base class
  2. Implement required virtual methods: fit(), predict(), predict_proba()
  3. Use PyWrap::getInstance() for Python interpreter access
  4. Handle hyperparameters via JSON configuration
  5. Add corresponding unit tests in tests/TestPythonClassifiers.cc

Python Integration

  • All Python interactions go through PyWrap singleton
  • Use RAII pattern for Python object management
  • Convert data using PyTorch tensors (discrete/continuous data support)
  • Handle Python exceptions and convert to C++ exceptions

Testing

  • Catch2 framework with parameterized tests using GENERATE()
  • Test data in ARFF format located in tests/data/
  • Performance benchmarks validate expected accuracy scores
  • Coverage reports generated with gcovr

Important Files

  • pyclfs/PyWrap.h - Python interpreter management
  • pyclfs/PyClassifier.h - Base classifier interface
  • CMakeLists.txt - Main build configuration
  • Makefile - Build automation and common tasks
  • conanfile.py - Package dependencies
  • tests/TestPythonClassifiers.cc - Main test suite

Technical Requirements

  • C++17 standard compliance
  • Python 3.11+ required
  • Boost library with Python and NumPy support
  • PyTorch for tensor operations
  • Thread-safe design for concurrent usage