Add technical analysis

2025-06-29 14:51:10 +02:00
parent d37c686e05
commit 004528be8c
2 changed files with 391 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,101 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+PyClassifiers is a C++ library that provides wrappers for Python machine learning classifiers. It enables C++ applications to use Python-based ML algorithms (scikit-learn, XGBoost, custom implementations) through a unified interface.
+
+## Essential Commands
+
+### Build System
+```bash
+# Setup build configurations
+make debug          # Configure debug build with testing and coverage
+make release         # Configure release build
+
+# Build targets
+make buildd          # Build debug version
+make buildr          # Build release version
+
+# Testing
+make test            # Run all unit tests
+make test opt="-s"   # Run tests with verbose output
+make test opt="-c='Test Name'" # Run specific test section
+
+# Coverage
+make coverage        # Run tests and generate coverage report
+
+# Installation
+sudo make install    # Install library to system (requires release build)
+
+# Utilities
+make clean           # Clean test artifacts
+make help            # Show all available targets
+```
+
+### Dependencies
+- Requires `VCPKG_ROOT` environment variable set
+- Miniconda installation required for Python classifiers
+- Boost library (preferably system package: `sudo dnf install boost-devel`)
+
+## Architecture
+
+### Core Components
+
+**PyWrap** (`pyclfs/PyWrap.h`): Singleton managing Python interpreter lifecycle and thread-safe Python/C++ communication.
+
+**PyClassifier** (`pyclfs/PyClassifier.h`): Abstract base class inheriting from `bayesnet::BaseClassifier`. All Python classifier wrappers extend this class.
+
+**Individual Classifiers**: Each classifier (STree, ODTE, SVC, RandomForest, XGBoost, AdaBoostPy) wraps specific Python modules with consistent C++ interface.
+
+### Data Flow
+- Uses PyTorch tensors for efficient C++/Python data exchange
+- JSON-based hyperparameter configuration
+- Automatic memory management for Python objects
+
+## Key Directories
+
+- `pyclfs/` - Core library source code
+- `tests/` - Catch2 unit tests with ARFF test datasets
+- `build_debug/` - Debug build artifacts
+- `build_release/` - Release build artifacts
+- `cmake/modules/` - Custom CMake modules
+
+## Development Patterns
+
+### Adding New Classifiers
+1. Inherit from `PyClassifier` base class
+2. Implement required virtual methods: `fit()`, `predict()`, `predict_proba()`
+3. Use `PyWrap::getInstance()` for Python interpreter access
+4. Handle hyperparameters via JSON configuration
+5. Add corresponding unit tests in `tests/TestPythonClassifiers.cc`
+
+### Python Integration
+- All Python interactions go through PyWrap singleton
+- Use RAII pattern for Python object management
+- Convert data using PyTorch tensors (discrete/continuous data support)
+- Handle Python exceptions and convert to C++ exceptions
+
+### Testing
+- Catch2 framework with parameterized tests using GENERATE()
+- Test data in ARFF format located in `tests/data/`
+- Performance benchmarks validate expected accuracy scores
+- Coverage reports generated with gcovr
+
+## Important Files
+
+- `pyclfs/PyWrap.h` - Python interpreter management
+- `pyclfs/PyClassifier.h` - Base classifier interface
+- `CMakeLists.txt` - Main build configuration
+- `Makefile` - Build automation and common tasks
+- `vcpkg.json` - Package dependencies
+- `tests/TestPythonClassifiers.cc` - Main test suite
+
+## Technical Requirements
+
+- C++17 standard compliance
+- Python 3.11+ required
+- Boost library with Python and NumPy support
+- PyTorch for tensor operations
+- Thread-safe design for concurrent usage