2.9 KiB
2.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
ArffFiles is a header-only C++ library for reading ARFF (Attribute-Relation File Format) files and converting them into STL vectors. The library handles both numeric and categorical features, automatically factorizing categorical attributes.
Build System
This project uses CMake with Conan for package management:
- CMake: Primary build system (requires CMake 3.20+)
- Conan: Package management for dependencies
- Makefile: Convenience wrapper for common tasks
Common Development Commands
Building and Testing
# Build and run tests (recommended)
make build && make test
# Alternative manual build process
mkdir build_debug
cmake -S . -B build_debug -D CMAKE_BUILD_TYPE=Debug -D ENABLE_TESTING=ON -D CODE_COVERAGE=ON
cmake --build build_debug -t unit_tests_arffFiles -j 16
cd build_debug/tests && ./unit_tests_arffFiles
Testing Options
# Run tests with verbose output
make test opt="-s"
# Clean test artifacts
make clean
Code Coverage
Code coverage is enabled when building with -D CODE_COVERAGE=ON
and -D ENABLE_TESTING=ON
. Coverage reports are generated during test runs.
Architecture
Core Components
Single Header Library: ArffFiles.hpp
contains the complete implementation.
Main Class: ArffFiles
- Header-only design for easy integration
- Handles ARFF file parsing and data conversion
- Automatically determines numeric vs categorical features
- Supports flexible class attribute positioning
Key Methods
load(fileName, classLast=true)
: Load with class attribute at end/beginningload(fileName, className)
: Load with specific named class attributegetX()
: Returns feature vectors asstd::vector<std::vector<float>>
getY()
: Returns labels asstd::vector<int>
getNumericAttributes()
: Returns feature type mapping
Data Processing Pipeline
- File Parsing: Reads ARFF format, extracts attributes and data
- Feature Detection: Automatically identifies numeric vs categorical attributes
- Preprocessing: Handles missing values (lines with '?' are skipped)
- Factorization: Converts categorical features to numeric codes
- Dataset Generation: Creates final X (features) and y (labels) vectors
Dependencies
- Catch2: Testing framework (fetched via CMake FetchContent)
- Standard Library: Uses STL containers (vector, map, string)
- C++17: Minimum required standard
Test Structure
- Tests located in
tests/
directory - Sample ARFF files in
tests/data/
- Single test executable:
unit_tests_arffFiles
- Uses Catch2 v3.3.2 for test framework
Conan Integration
The project includes a conanfile.py
that:
- Automatically extracts version from CMakeLists.txt
- Packages as a header-only library
- Exports only the main header file