Files
ArffFiles/CLAUDE.md

83 lines
2.9 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
ArffFiles is a header-only C++ library for reading ARFF (Attribute-Relation File Format) files and converting them into STL vectors. The library handles both numeric and categorical features, automatically factorizing categorical attributes.
## Build System
This project uses CMake with Conan for package management:
- **CMake**: Primary build system (requires CMake 3.20+)
- **Conan**: Package management for dependencies
- **Makefile**: Convenience wrapper for common tasks
## Common Development Commands
### Building and Testing
```bash
# Build and run tests (recommended)
make build && make test
# Alternative manual build process
mkdir build_debug
cmake -S . -B build_debug -D CMAKE_BUILD_TYPE=Debug -D ENABLE_TESTING=ON -D CODE_COVERAGE=ON
cmake --build build_debug -t unit_tests_arffFiles -j 16
cd build_debug/tests && ./unit_tests_arffFiles
```
### Testing Options
```bash
# Run tests with verbose output
make test opt="-s"
# Clean test artifacts
make clean
```
### Code Coverage
Code coverage is enabled when building with `-D CODE_COVERAGE=ON` and `-D ENABLE_TESTING=ON`. Coverage reports are generated during test runs.
## Architecture
### Core Components
**Single Header Library**: `ArffFiles.hpp` contains the complete implementation.
**Main Class**: `ArffFiles`
- Header-only design for easy integration
- Handles ARFF file parsing and data conversion
- Automatically determines numeric vs categorical features
- Supports flexible class attribute positioning
### Key Methods
- `load(fileName, classLast=true)`: Load with class attribute at end/beginning
- `load(fileName, className)`: Load with specific named class attribute
- `getX()`: Returns feature vectors as `std::vector<std::vector<float>>`
- `getY()`: Returns labels as `std::vector<int>`
- `getNumericAttributes()`: Returns feature type mapping
### Data Processing Pipeline
1. **File Parsing**: Reads ARFF format, extracts attributes and data
2. **Feature Detection**: Automatically identifies numeric vs categorical attributes
3. **Preprocessing**: Handles missing values (lines with '?' are skipped)
4. **Factorization**: Converts categorical features to numeric codes
5. **Dataset Generation**: Creates final X (features) and y (labels) vectors
### Dependencies
- **Catch2**: Testing framework (fetched via CMake FetchContent)
- **Standard Library**: Uses STL containers (vector, map, string)
- **C++17**: Minimum required standard
### Test Structure
- Tests located in `tests/` directory
- Sample ARFF files in `tests/data/`
- Single test executable: `unit_tests_arffFiles`
- Uses Catch2 v3.3.2 for test framework
### Conan Integration
The project includes a `conanfile.py` that:
- Automatically extracts version from CMakeLists.txt
- Packages as a header-only library
- Exports only the main header file