Files
SVMClassifier/DEVELOPMENT.md
Ricardo Montañana Gómez d6dc083a5a
Some checks failed
CI/CD Pipeline / Code Linting (push) Failing after 22s
CI/CD Pipeline / Build and Test (Debug, clang, ubuntu-latest) (push) Failing after 5m44s
CI/CD Pipeline / Build and Test (Debug, gcc, ubuntu-latest) (push) Failing after 5m33s
CI/CD Pipeline / Build and Test (Release, clang, ubuntu-20.04) (push) Failing after 6m12s
CI/CD Pipeline / Build and Test (Release, clang, ubuntu-latest) (push) Failing after 5m13s
CI/CD Pipeline / Build and Test (Release, gcc, ubuntu-20.04) (push) Failing after 5m30s
CI/CD Pipeline / Build and Test (Release, gcc, ubuntu-latest) (push) Failing after 5m33s
CI/CD Pipeline / Docker Build Test (push) Failing after 13s
CI/CD Pipeline / Performance Benchmarks (push) Has been skipped
CI/CD Pipeline / Build Documentation (push) Successful in 31s
CI/CD Pipeline / Create Release Package (push) Has been skipped
Initial commit as Claude developed it
2025-06-22 12:50:10 +02:00

537 lines
16 KiB
Markdown

# Development Guide
This guide provides comprehensive information for developers who want to contribute to the SVM Classifier C++ project.
## Table of Contents
- [Development Environment Setup](#development-environment-setup)
- [Project Structure](#project-structure)
- [Building from Source](#building-from-source)
- [Testing](#testing)
- [Code Style and Standards](#code-style-and-standards)
- [Contributing Guidelines](#contributing-guidelines)
- [Debugging and Profiling](#debugging-and-profiling)
- [Documentation](#documentation)
- [Release Process](#release-process)
## Development Environment Setup
### Prerequisites
**Required:**
- C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2019+)
- CMake 3.15+
- Git
- libtorch (PyTorch C++)
- pkg-config
**Optional (but recommended):**
- Doxygen (for documentation)
- Valgrind (for memory checking)
- lcov/gcov (for coverage analysis)
- clang-format (for code formatting)
- clang-tidy (for static analysis)
### Quick Setup
```bash
# Clone the repository
git clone https://github.com/your-username/svm-classifier.git
cd svm-classifier
# Run the automated setup
chmod +x install.sh
./install.sh --build-type Debug
# Or use the validation script for comprehensive testing
chmod +x validate_build.sh
./validate_build.sh --verbose --performance --memory-check
```
### Docker Development Environment
```bash
# Build development image
docker build --target development -t svm-dev .
# Run development container
docker run --rm -it -v $(pwd):/workspace svm-dev
# Inside container:
cd /workspace
mkdir build && cd build
cmake .. -DCMAKE_PREFIX_PATH=/opt/libtorch -DCMAKE_BUILD_TYPE=Debug
make -j$(nproc)
```
## Project Structure
```
svm-classifier/
├── include/svm_classifier/ # Public header files
│ ├── svm_classifier.hpp # Main classifier interface
│ ├── data_converter.hpp # Tensor conversion utilities
│ ├── multiclass_strategy.hpp # Multiclass strategies
│ ├── kernel_parameters.hpp # Parameter management
│ └── types.hpp # Common types and enums
├── src/ # Implementation files
│ ├── svm_classifier.cpp
│ ├── data_converter.cpp
│ ├── multiclass_strategy.cpp
│ └── kernel_parameters.cpp
├── tests/ # Test suite
│ ├── test_main.cpp # Test runner
│ ├── test_svm_classifier.cpp # Integration tests
│ ├── test_data_converter.cpp # Unit tests
│ ├── test_multiclass_strategy.cpp
│ ├── test_kernel_parameters.cpp
│ └── test_performance.cpp # Performance benchmarks
├── examples/ # Usage examples
│ ├── basic_usage.cpp
│ └── advanced_usage.cpp
├── external/ # Third-party dependencies
├── cmake/ # CMake configuration files
├── .github/workflows/ # CI/CD configuration
├── docs/ # Documentation (generated)
├── CMakeLists.txt # Main build configuration
├── Doxyfile # Documentation configuration
├── Dockerfile # Container configuration
├── README.md # Project overview
├── QUICK_START.md # Getting started guide
├── DEVELOPMENT.md # This file
└── LICENSE # License information
```
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ SVMClassifier │
│ ┌─────────────────┐ ┌──────────────────────────────────┐ │
│ │ KernelParameters│ │ MulticlassStrategy │ │
│ │ │ │ ┌─────────────┐┌─────────────┐ │ │
│ │ - JSON config │ │ │OneVsRest ││OneVsOne │ │ │
│ │ - Validation │ │ │Strategy ││Strategy │ │ │
│ │ - Defaults │ │ └─────────────┘└─────────────┘ │ │
│ └─────────────────┘ └──────────────────────────────────┘ │
│ │ │ │
│ └─────────┬───────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ DataConverter │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Tensor → libsvm │ │ Tensor → liblinear │ │ │
│ │ │ Tensor → liblinear│ Results → Tensor │ │ │
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │
┌─────────────┐ ┌─────────────┐
│ libsvm │ │ liblinear │
│ │ │ │
│ - RBF │ │ - Linear │
│ - Polynomial│ │ - Fast │
│ - Sigmoid │ │ - Scalable │
└─────────────┘ └─────────────┘
```
## Building from Source
### Debug Build
```bash
mkdir build-debug && cd build-debug
cmake .. \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_PREFIX_PATH=/path/to/libtorch \
-DCMAKE_CXX_FLAGS="-g -O0 -Wall -Wextra"
make -j$(nproc)
```
### Release Build
```bash
mkdir build-release && cd build-release
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_PREFIX_PATH=/path/to/libtorch \
-DCMAKE_CXX_FLAGS="-O3 -DNDEBUG"
make -j$(nproc)
```
### Build Options
| Option | Description | Default |
|--------|-------------|---------|
| `CMAKE_BUILD_TYPE` | Build configuration | `Release` |
| `CMAKE_PREFIX_PATH` | PyTorch installation path | Auto-detect |
| `CMAKE_INSTALL_PREFIX` | Installation directory | `/usr/local` |
| `BUILD_TESTING` | Enable testing | `ON` |
| `BUILD_EXAMPLES` | Build examples | `ON` |
### Cross-Platform Building
#### Windows (MSVC)
```cmd
mkdir build && cd build
cmake .. -G "Visual Studio 16 2019" -A x64 ^
-DCMAKE_PREFIX_PATH=C:\libtorch ^
-DCMAKE_TOOLCHAIN_FILE=C:\vcpkg\scripts\buildsystems\vcpkg.cmake
cmake --build . --config Release
```
#### macOS
```bash
# Install dependencies with Homebrew
brew install cmake pkg-config openblas
# Build
mkdir build && cd build
cmake .. -DCMAKE_PREFIX_PATH=/opt/libtorch
make -j$(sysctl -n hw.ncpu)
```
## Testing
### Test Categories
- **Unit Tests** (`[unit]`): Test individual components
- **Integration Tests** (`[integration]`): Test component interactions
- **Performance Tests** (`[performance]`): Benchmark performance
### Running Tests
```bash
cd build
# Run all tests
ctest --output-on-failure
# Run specific test categories
./svm_classifier_tests "[unit]"
./svm_classifier_tests "[integration]"
./svm_classifier_tests "[performance]"
# Run with verbose output
./svm_classifier_tests "[unit]" --reporter console
# Run specific test
./svm_classifier_tests "SVMClassifier Construction"
```
### Coverage Analysis
```bash
# Build with coverage
cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="--coverage"
make -j$(nproc)
# Run tests
./svm_classifier_tests
# Generate coverage report
make coverage
# View HTML report
open coverage_html/index.html
```
### Memory Testing
```bash
# Run with Valgrind
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all \
./svm_classifier_tests "[unit]"
# Or use the provided target
make test_memcheck
```
### Adding New Tests
1. **Unit Tests**: Add to appropriate `test_*.cpp` file
2. **Integration Tests**: Add to `test_svm_classifier.cpp`
3. **Performance Tests**: Add to `test_performance.cpp`
Example test structure:
```cpp
TEST_CASE("Feature Description", "[category][subcategory]") {
SECTION("Specific behavior") {
// Arrange
auto svm = SVMClassifier(KernelType::LINEAR);
auto X = torch::randn({100, 10});
auto y = torch::randint(0, 2, {100});
// Act
auto metrics = svm.fit(X, y);
// Assert
REQUIRE(svm.is_fitted());
REQUIRE(metrics.status == TrainingStatus::SUCCESS);
}
}
```
## Code Style and Standards
### C++ Standards
- **Language Standard**: C++17
- **Naming Convention**: `snake_case` for functions/variables, `PascalCase` for classes
- **File Naming**: `snake_case.hpp` and `snake_case.cpp`
- **Indentation**: 4 spaces (no tabs)
### Code Formatting
Use clang-format with the provided configuration:
```bash
# Format all source files
find src include tests examples -name "*.cpp" -o -name "*.hpp" | \
xargs clang-format -i
# Check formatting
find src include tests examples -name "*.cpp" -o -name "*.hpp" | \
xargs clang-format --dry-run --Werror
```
### Static Analysis
```bash
# Run clang-tidy
clang-tidy src/*.cpp include/svm_classifier/*.hpp \
-- -I include -I /opt/libtorch/include
```
### Documentation Standards
- Use Doxygen-style comments for public APIs
- Include `@brief`, `@param`, `@return`, `@throws` as appropriate
- Provide usage examples for complex functions
Example:
```cpp
/**
* @brief Train the SVM classifier on the provided dataset
* @param X Feature tensor of shape (n_samples, n_features)
* @param y Target tensor of shape (n_samples,) with class labels
* @return Training metrics including timing and convergence info
* @throws std::invalid_argument if input data is invalid
* @throws std::runtime_error if training fails
*
* @code
* auto X = torch::randn({100, 4});
* auto y = torch::randint(0, 3, {100});
* SVMClassifier svm(KernelType::RBF, 1.0);
* auto metrics = svm.fit(X, y);
* @endcode
*/
TrainingMetrics fit(const torch::Tensor& X, const torch::Tensor& y);
```
### Error Handling
- Use exceptions for error conditions
- Provide meaningful error messages
- Validate inputs at public API boundaries
- Use RAII for resource management
### Performance Guidelines
- Minimize memory allocations in hot paths
- Use move semantics where appropriate
- Prefer algorithms from STL
- Profile before optimizing
## Contributing Guidelines
### Workflow
1. **Fork** the repository
2. **Create** a feature branch: `git checkout -b feature/amazing-feature`
3. **Implement** your changes
4. **Add** tests for new functionality
5. **Run** the validation script: `./validate_build.sh`
6. **Commit** with descriptive messages
7. **Push** to your fork
8. **Create** a Pull Request
### Commit Message Format
```
type(scope): short description
Longer description if needed
- Bullet points for details
- Reference issues: Fixes #123
```
Types: `feat`, `fix`, `docs`, `test`, `refactor`, `perf`, `ci`
### Pull Request Requirements
- [ ] All tests pass
- [ ] Code follows style guidelines
- [ ] New features have tests
- [ ] Documentation is updated
- [ ] Performance impact is considered
- [ ] Breaking changes are documented
### Code Review Process
1. Automated checks must pass (CI/CD)
2. At least one maintainer review required
3. Address all review comments
4. Ensure branch is up-to-date with main
## Debugging and Profiling
### Debugging Builds
```bash
# Debug build with symbols
cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="-g -O0"
# Run with GDB
gdb ./svm_classifier_tests
(gdb) run "[unit]"
```
### Common Debugging Scenarios
```cpp
// Enable verbose logging (if implemented)
torch::set_num_threads(1); // Single-threaded for reproducibility
// Print tensor information
std::cout << "X shape: " << X.sizes() << std::endl;
std::cout << "X dtype: " << X.dtype() << std::endl;
std::cout << "X device: " << X.device() << std::endl;
// Check for NaN/Inf values
if (torch::any(torch::isnan(X)).item<bool>()) {
throw std::runtime_error("X contains NaN values");
}
```
### Performance Profiling
```bash
# Build with profiling
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo
# Profile with perf
perf record ./svm_classifier_tests "[performance]"
perf report
# Profile with gprof
g++ -pg -o program program.cpp
./program
gprof program gmon.out > analysis.txt
```
## Documentation
### Building Documentation
```bash
# Generate API documentation
doxygen Doxyfile
# View documentation
open docs/html/index.html
```
### Documentation Structure
- **README.md**: Project overview and quick start
- **QUICK_START.md**: Step-by-step getting started guide
- **DEVELOPMENT.md**: This development guide
- **API Reference**: Generated from source code comments
### Contributing to Documentation
- Keep documentation up-to-date with code changes
- Use clear, concise language
- Include practical examples
- Test all code examples
## Release Process
### Version Numbering
We follow [Semantic Versioning](https://semver.org/):
- **MAJOR.MINOR.PATCH**
- **MAJOR**: Incompatible API changes
- **MINOR**: Backward-compatible new features
- **PATCH**: Backward-compatible bug fixes
### Release Checklist
1. **Update version** in CMakeLists.txt
2. **Update CHANGELOG.md** with new features/fixes
3. **Run full validation**: `./validate_build.sh --performance --memory-check`
4. **Update documentation** if needed
5. **Create release tag**: `git tag -a v1.0.0 -m "Release 1.0.0"`
6. **Push tag**: `git push origin v1.0.0`
7. **Create GitHub release** with release notes
8. **Update package managers** (if applicable)
### Continuous Integration
Our CI/CD pipeline runs on every PR and includes:
- **Build testing** on multiple platforms (Ubuntu, macOS, Windows)
- **Compiler compatibility** (GCC, Clang, MSVC)
- **Code quality** checks (formatting, static analysis)
- **Test execution** (unit, integration, performance)
- **Coverage analysis**
- **Memory leak detection**
- **Documentation generation**
- **Package creation**
### Branch Strategy
- **main**: Stable releases
- **develop**: Integration branch for features
- **feature/***: Individual feature development
- **hotfix/***: Critical bug fixes
- **release/***: Release preparation
## Getting Help
### Resources
- 📖 [Project Documentation](README.md)
- 🐛 [Issue Tracker](https://github.com/your-username/svm-classifier/issues)
- 💬 [Discussions](https://github.com/your-username/svm-classifier/discussions)
- 📧 Email: svm-classifier@example.com
### Reporting Issues
When reporting issues, please include:
1. **Environment**: OS, compiler, library versions
2. **Reproduction**: Minimal code example
3. **Expected vs Actual**: What should happen vs what happens
4. **Logs**: Error messages, stack traces
5. **Investigation**: What you've tried already
### Feature Requests
For new features:
1. **Check existing issues** to avoid duplicates
2. **Describe the use case** and motivation
3. **Propose an API** if applicable
4. **Consider implementation** complexity
5. **Offer to contribute** if possible
---
**Thank you for contributing to SVM Classifier C++! 🎯**