Compare commits

..

54 Commits

Author SHA1 Message Date
89142f8997 Update version number 2025-07-19 22:47:32 +02:00
17ee6a909a Merge pull request 'Create version 1.2.1' (#40) from ldi into main
Reviewed-on: #40
2025-07-19 20:42:25 +00:00
56d85b1a43 Update test libraries version number 2025-07-19 22:25:17 +02:00
481c702302 Update libraries versions 2025-07-19 22:12:27 +02:00
3e0b790cfe Update Changelog 2025-07-08 18:57:57 +02:00
e2a0c5f4a5 Add Notes to Proposal convergence 2025-07-08 18:50:09 +02:00
aa77745e55 Fix TANLd valid_hyperparameters 2025-07-08 17:28:27 +02:00
e5227c5f4b Add dataset tests to Ld models 2025-07-08 16:07:16 +02:00
ed380b1494 Complete implementation with tests 2025-07-08 11:42:20 +02:00
2c7352ac38 Fix classifier build in proposal 2025-07-07 02:10:08 +02:00
0ce7f664b4 remove unneeded files 2025-07-07 00:38:00 +02:00
62fa85a1b3 Complete proposal 2025-07-07 00:37:16 +02:00
97894cc49c First approach with derived class 2025-07-06 18:49:05 +02:00
090172c6c5 Add Claude local discretization analysis 2025-07-04 12:19:58 +02:00
3048244a27 Add cache clean to conan-clean 2025-07-04 11:56:55 +02:00
c142ff2c4a Compact Makefile and remove unneeded in CMakeLists 2025-07-03 09:55:05 +02:00
a5841000d3 Change optimization flag in Release 2025-07-02 13:56:54 +02:00
e7e80cfa9c Update CHANGELOG 2025-07-02 00:52:53 +02:00
1d58cea276 Add build_type option to sample target in Makefile 2025-07-02 00:51:31 +02:00
189d314990 Fix Conan debug build
Fix smell issues in markdown and python
2025-07-02 00:44:24 +02:00
dfa74056f5 Fix conan debug build 2025-07-02 00:38:47 +02:00
839be5335d Fix smell issues in markdown and python 2025-07-01 19:16:48 +02:00
28be43db02 Update sample target in Makefile 2025-07-01 18:42:20 +02:00
55a24fbaf0 Update optimization flag 2025-07-01 16:49:04 +02:00
3b170324f4 Merge pull request 'conan' (#38) from conan into main
Reviewed-on: #38
2025-07-01 14:33:50 +00:00
8ccc7e263c Update .gitignore 2025-07-01 14:14:38 +02:00
b1e25a7d05 Update Coverage Makefile 2025-07-01 14:13:45 +02:00
3cb454d4aa Fix conan build and remove vcpkg 2025-07-01 13:56:28 +02:00
3178bcbda9 Fix conan build 2025-07-01 12:24:29 +02:00
32d231cdaf Update Makefile 2025-07-01 09:59:29 +02:00
526d036d75 Remove cmake modules unneeded 2025-06-30 22:41:04 +02:00
7a9d4178d9 First profiles
Signed-off-by: Ricardo Montañana Gómez <rmontanana@gmail.com>
2025-06-30 22:40:35 +02:00
3e94d400e2 Fix conan-init 2025-06-30 09:50:27 +02:00
31fa9cd498 First approach 2025-06-29 18:46:11 +02:00
676637fb1b Merge pull request 'Fix vcpkg build and installation' (#36) from fix_vcpkg into main
Reviewed-on: #36
2025-06-29 11:01:08 +00:00
9f3de4d924 Add new hyperparameters to the Ld classifiers
- *ld_algorithm*: algorithm to use for local discretization, with the following options: "MDLP", "BINQ", "BINU".
  - *ld_proposed_cuts*: number of cut points to return.
  - *mdlp_min_length*: minimum length of a partition in MDLP algorithm to be evaluated for partition.
  - *mdlp_max_depth*: maximum level of recursion in MDLP algorithm.
2025-06-29 13:00:34 +02:00
dafd5672bc Add Claude config and report 2025-06-25 14:17:10 +02:00
70545ee0ad Add docs generation and remove 2 code smells 2025-06-24 19:06:41 +02:00
7917a7598b Update json version in vcpkg 2025-06-19 12:17:50 +02:00
bb547a3347 Remove tests/lib 2025-06-04 16:42:01 +02:00
23d74c4643 Add L1FS feature selection 2025-06-04 11:54:36 +02:00
fcccbc15dd Fix iwss selection of second feature 2025-06-02 17:11:20 +02:00
c68b75fcc1 Update version number 2025-06-01 18:28:39 +02:00
ab86dae90d Add tests for Ld models predict_proba 2025-06-01 14:55:31 +02:00
ad72bb355b Fix CFS merit computation error 2025-06-01 13:54:18 +02:00
da357ac5ba remove lib 2025-05-31 20:01:42 +02:00
833455803e Update changelog 2025-05-31 20:01:22 +02:00
74a9d29dc1 Merge pull request 'Fix some issues in FeatureSelect' (#37) from FixSelectFeatures into fix_vcpkg
Reviewed-on: #37
2025-05-31 16:47:03 +00:00
3615a1463c Fix some issues in FeatureSelect 2025-05-31 14:36:51 +02:00
36ce6effe9 Optimize ComputeCPT method with a approx. 30% reducing time 2025-05-19 17:00:07 +02:00
250036f224 ComputeCPT Optimization 2025-05-13 17:43:17 +02:00
b11620bbe8 Add predict_proba to Ld classifiers 2025-05-12 19:47:04 +02:00
8a02a3a5cb Update CHANGELOG 2025-05-08 12:33:48 +02:00
7f6f49b3d0 Update project version to 1.1.1
Fix CMakeLists and different configurations to fix vcpkg build & installation
Fix sample build
Update CHANGELOG
2025-05-08 12:33:11 +02:00
57 changed files with 3330 additions and 1446 deletions

1
.gitignore vendored
View File

@@ -46,3 +46,4 @@ docs/man
docs/Doxyfile
.cache
vcpkg_installed
CMakeUserPresets.json

View File

@@ -5,13 +5,51 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [1.2.1] - 2025-07-19
### Internal
- Update Libtorch to version 2.7.1
- Update libraries versions:
- mdlp: 2.1.1
- Folding: 1.1.2
- ArffFiles: 1.2.1
## [1.2.0] - 2025-07-08
### Internal
- Add docs generation to CMakeLists.txt.
- Add new hyperparameters to the Ld classifiers:
- *ld_algorithm*: algorithm to use for local discretization, with the following options: "MDLP", "BINQ", "BINU".
- *ld_proposed_cuts*: number of cut points to return.
- *mdlp_min_length*: minimum length of a partition in MDLP algorithm to be evaluated for partition.
- *mdlp_max_depth*: maximum level of recursion in MDLP algorithm.
- *max_iterations*: maximum number of iterations of discretization-build model loop.
- *verbose_convergence*: display status messages during the convergence process.
- Remove vcpkg as a dependency manager, now the library is built with Conan package manager and CMake.
- Add `build_type` option to the sample target in the Makefile to allow building in *Debug* or *Release* mode. Default is *Debug*.
## [1.1.1] - 2025-05-20
### Internal
- Fix CFS metric expression in the FeatureSelection class.
- Fix the vcpkg configuration in building the library.
- Fix the sample app to use the vcpkg configuration.
- Refactor the computeCPT method in the Node class with libtorch vectorized operations.
- Refactor the sample to use local discretization models.
### Added
- Add predict_proba method to all Ld classifiers.
- Add L1FS feature selection methods to the FeatureSelection class.
## [1.1.0] - 2025-04-27
### Internal
- Add changes to .clang-format to ajust to vscode format style thanks to <https://clang-format-configurator.site/>
- Add changes to .clang-format to adjust to vscode format style thanks to <https://clang-format-configurator.site/>
- Remove all the dependencies as git submodules and add them as vcpkg dependencies.
- Fix the dependencies versions for this specific BayesNet version.

191
CLAUDE.md Normal file
View File

@@ -0,0 +1,191 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
BayesNet is a C++ library implementing Bayesian Network Classifiers. It provides various algorithms for machine learning classification including TAN, KDB, SPODE, SPnDE, AODE, A2DE, and their ensemble variants (Boost, XB). The library also includes local discretization variants (Ld) and feature selection algorithms.
## Build System & Dependencies
### Dependency Management
The project supports **two package managers**:
#### vcpkg (Default)
- Uses vcpkg with private registry at <https://github.com/rmontanana/vcpkg-stash>
- Core dependencies: libtorch, nlohmann-json, folding, fimdlp, arff-files, catch2
- All dependencies defined in `vcpkg.json` with version overrides
#### Conan (Alternative)
- Modern C++ package manager with better dependency resolution
- Configured via `conanfile.py` for packaging and distribution
- Supports subset of dependencies (libtorch, nlohmann-json, catch2)
- Custom dependencies (folding, fimdlp, arff-files) need custom Conan recipes
### Build Commands
#### Using vcpkg (Default)
```bash
# Initialize dependencies
make init
# Build debug version (with tests and coverage)
make debug
make buildd
# Build release version
make release
make buildr
# Run tests
make test
# Generate coverage report
make coverage
make viewcoverage
# Clean project
make clean
```
#### Using Conan
```bash
# Install Conan first: pip install conan
# Initialize dependencies
make conan-init
# Build debug version (with tests and coverage)
make conan-debug
make buildd
# Build release version
make conan-release
make buildr
# Create and test Conan package
make conan-create
# Upload to Conan remote
make conan-upload remote=myremote
# Clean Conan cache and builds
make conan-clean
```
### CMake Configuration
- Uses CMake 3.27+ with C++17 standard
- Debug builds automatically enable testing and coverage
- Release builds optimize with `-Ofast`
- **Automatic package manager detection**: CMake detects whether Conan or vcpkg is being used
- Supports both static library and package manager installation
- Conditional dependency linking based on availability
## Testing Framework
- **Catch2** testing framework (version 3.8.1)
- Test executable: `TestBayesNet` in `build_Debug/tests/`
- Individual test categories can be run: `./TestBayesNet "[CategoryName]"`
- Coverage reporting with lcov/genhtml
### Test Categories
- A2DE, BoostA2DE, BoostAODE, XSPODE, XSPnDE, XBAODE, XBA2DE
- Classifier, Ensemble, FeatureSelection, Metrics, Models
- Network, Node, MST, Modules
## Code Architecture
### Core Structure
```
bayesnet/
├── BaseClassifier.h # Abstract base for all classifiers
├── classifiers/ # Basic Bayesian classifiers (TAN, KDB, SPODE, etc.)
├── ensembles/ # Ensemble methods (AODE, A2DE, Boost variants)
├── feature_selection/ # Feature selection algorithms (CFS, FCBF, IWSS, L1FS)
├── network/ # Bayesian network structure (Network, Node)
└── utils/ # Utilities (metrics, MST, tensor operations)
```
### Key Design Patterns
- **BaseClassifier** abstract interface for all algorithms
- Template-based design with both std::vector and torch::Tensor support
- Network/Node abstraction for Bayesian network representation
- Feature selection as separate, composable modules
### Data Handling
- Supports both discrete integer data and continuous data with discretization
- ARFF file format support through arff-files library
- Tensor operations via PyTorch C++ (libtorch)
- Local discretization variants use fimdlp library
## Documentation & Tools
- **Doxygen** for API documentation: `make doc`
- **lcov** for coverage reports: `make coverage`
- **plantuml + clang-uml** for UML diagrams: `make diagrams`
- Man pages available in `docs/man3/`
## Sample Applications
Sample code in `sample/` directory demonstrates library usage:
```bash
make sample fname=tests/data/iris.arff model=TANLd
```
## Package Distribution
### Creating Conan Packages
```bash
# Create package locally
make conan-create
# Test package installation
cd test_package
conan create ..
# Upload to remote repository
make conan-upload remote=myremote profile=myprofile
```
### Using the Library
With Conan:
```python
# conanfile.txt or conanfile.py
[requires]
bayesnet/1.1.2@user/channel
[generators]
cmake
```
With vcpkg:
```json
{
"dependencies": ["bayesnet"]
}
```
## Common Development Tasks
- **Add new classifier**: Extend BaseClassifier, implement in appropriate subdirectory
- **Add new test**: Update `tests/CMakeLists.txt` and create test in `tests/`
- **Modify build**: Edit main `CMakeLists.txt` or use Makefile targets
- **Update dependencies**:
- vcpkg: Modify `vcpkg.json` and run `make init`
- Conan: Modify `conanfile.py` and run `make conan-init`
- **Package for distribution**: Use `make conan-create` for Conan packaging

View File

@@ -1,21 +1,14 @@
cmake_minimum_required(VERSION 3.20)
cmake_minimum_required(VERSION 3.27)
project(BayesNet
VERSION 1.1.0
project(bayesnet
VERSION 1.2.1
DESCRIPTION "Bayesian Network and basic classifiers Library."
HOMEPAGE_URL "https://github.com/rmontanana/bayesnet"
LANGUAGES CXX
)
if (CODE_COVERAGE AND NOT ENABLE_TESTING)
MESSAGE(FATAL_ERROR "Code coverage requires testing enabled")
endif (CODE_COVERAGE AND NOT ENABLE_TESTING)
find_package(Torch REQUIRED)
if (POLICY CMP0135)
cmake_policy(SET CMP0135 NEW)
endif ()
set(CMAKE_CXX_STANDARD 17)
cmake_policy(SET CMP0135 NEW)
# Global CMake variables
# ----------------------
@@ -25,73 +18,106 @@ set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -fno-elide-constructors")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast")
if (NOT ${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-default-inline")
endif()
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3")
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
MESSAGE("Debug mode")
else(CMAKE_BUILD_TYPE STREQUAL "Debug")
MESSAGE("Release mode")
endif (CMAKE_BUILD_TYPE STREQUAL "Debug")
# Options
# -------
option(ENABLE_CLANG_TIDY "Enable to add clang tidy." OFF)
option(ENABLE_TESTING "Unit testing build" OFF)
option(CODE_COVERAGE "Collect coverage from test library" OFF)
option(INSTALL_GTEST "Enable installation of googletest." OFF)
# CMakes modules
# --------------
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules ${CMAKE_MODULE_PATH})
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
MESSAGE("Debug mode")
set(ENABLE_TESTING ON)
set(CODE_COVERAGE ON)
endif (CMAKE_BUILD_TYPE STREQUAL "Debug")
get_property(LANGUAGES GLOBAL PROPERTY ENABLED_LANGUAGES)
message(STATUS "Languages=${LANGUAGES}")
if (CODE_COVERAGE)
enable_testing()
include(CodeCoverage)
MESSAGE(STATUS "Code coverage enabled")
SET(GCC_COVERAGE_LINK_FLAGS " ${GCC_COVERAGE_LINK_FLAGS} -lgcov --coverage")
endif (CODE_COVERAGE)
if (ENABLE_CLANG_TIDY)
include(StaticAnalyzers) # clang-tidy
endif (ENABLE_CLANG_TIDY)
# External libraries - dependencies of BayesNet
# ---------------------------------------------
option(ENABLE_TESTING "Unit testing build" OFF)
find_package(Torch CONFIG REQUIRED)
find_package(fimdlp CONFIG REQUIRED)
find_package(nlohmann_json CONFIG REQUIRED)
find_package(folding CONFIG REQUIRED)
if(NOT TARGET torch::torch)
add_library(torch::torch INTERFACE IMPORTED GLOBAL)
# expose include paths and libraries that the find-module discovered
set_target_properties(torch::torch PROPERTIES
INTERFACE_INCLUDE_DIRECTORIES "${TORCH_INCLUDE_DIRS}"
INTERFACE_LINK_LIBRARIES "${TORCH_LIBRARIES}")
endif()
find_package(fimdlp CONFIG REQUIRED)
find_package(folding CONFIG REQUIRED)
find_package(nlohmann_json REQUIRED)
# Subdirectories
# --------------
add_subdirectory(config)
add_subdirectory(bayesnet)
# Add the library
# ---------------
include_directories(
${bayesnet_SOURCE_DIR}
${CMAKE_BINARY_DIR}/configured_files/include
)
file(GLOB_RECURSE Sources "bayesnet/*.cc")
add_library(bayesnet ${Sources})
target_link_libraries(bayesnet
nlohmann_json::nlohmann_json
folding::folding
fimdlp::fimdlp
torch::torch
arff-files::arff-files
)
# Testing
# -------
if (ENABLE_TESTING)
MESSAGE(STATUS "Testing enabled")
find_package(Catch2 CONFIG REQUIRED)
include(CTest)
add_subdirectory(tests)
MESSAGE(STATUS "Testing enabled")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -fno-elide-constructors")
if (NOT ${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-default-inline")
endif()
find_package(Catch2 CONFIG REQUIRED)
find_package(arff-files CONFIG REQUIRED)
enable_testing()
include(CTest)
add_subdirectory(tests)
endif (ENABLE_TESTING)
# Installation
# ------------
install(TARGETS BayesNet
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib
CONFIGURATIONS Release)
install(DIRECTORY bayesnet/ DESTINATION include/bayesnet FILES_MATCHING CONFIGURATIONS Release PATTERN "*.h")
install(FILES ${CMAKE_BINARY_DIR}/configured_files/include/bayesnet/config.h DESTINATION include/bayesnet CONFIGURATIONS Release)
include(CMakePackageConfigHelpers)
write_basic_package_version_file(
"${CMAKE_CURRENT_BINARY_DIR}/bayesnetConfigVersion.cmake"
VERSION ${PROJECT_VERSION}
COMPATIBILITY AnyNewerVersion
)
configure_package_config_file(
${CMAKE_CURRENT_SOURCE_DIR}/bayesnetConfig.cmake.in
"${CMAKE_CURRENT_BINARY_DIR}/bayesnetConfig.cmake"
INSTALL_DESTINATION share/bayesnet)
install(TARGETS bayesnet
EXPORT bayesnetTargets
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib)
install(DIRECTORY bayesnet/
DESTINATION include/bayesnet
FILES_MATCHING
PATTERN "*.h")
install(FILES ${CMAKE_BINARY_DIR}/configured_files/include/bayesnet/config.h
DESTINATION include/bayesnet)
install(EXPORT bayesnetTargets
FILE bayesnetTargets.cmake
NAMESPACE bayesnet::
DESTINATION share/bayesnet)
install(FILES
"${CMAKE_CURRENT_BINARY_DIR}/bayesnetConfig.cmake"
"${CMAKE_CURRENT_BINARY_DIR}/bayesnetConfigVersion.cmake"
DESTINATION share/bayesnet
)
# Documentation
# -------------
find_package(Doxygen)

86
CONAN_README.md Normal file
View File

@@ -0,0 +1,86 @@
# Using BayesNet with Conan
This document explains how to use Conan as an alternative package manager for BayesNet.
## Prerequisites
```bash
pip install conan
conan remote add Cimmeria https://conan.rmontanana.es/artifactory/api/conan/Cimmeria
conan profile new default --detect
```
## Quick Start
### As a Consumer
1. Create a `conanfile.txt` in your project:
```ini
[requires]
libtorch/2.7.0
bayesnet/1.2.0
[generators]
CMakeDeps
CMakeToolchain
```
1. Install dependencies:
```bash
conan install . --build=missing
```
1. In your CMakeLists.txt:
```cmake
find_package(bayesnet REQUIRED)
target_link_libraries(your_target bayesnet::bayesnet)
```
### Building BayesNet with Conan
```bash
# Install dependencies
make conan-init
# Build debug version
make debug
make buildd
# Build release version
make release
make buildr
# Create package
make conan-create
```
## Current Limitations
- Custom dependencies (folding, fimdlp, arff-files) are not available in ConanCenter
- These need to be built as custom Conan packages or replaced with alternatives
- The conanfile.py currently comments out these dependencies
## Creating Custom Dependency Packages
For the custom dependencies, you'll need to create Conan recipes:
1. **folding**: Cross-validation library
1. **fimdlp**: Discretization library
1. **arff-files**: ARFF file format parser
Contact the maintainer or create custom recipes for these packages.
## Package Distribution
Once custom dependencies are resolved:
```bash
# Create and test package
make conan-create
# Upload to your remote
conan upload bayesnet/1.2.0 -r myremote
```

164
Makefile
View File

@@ -1,11 +1,11 @@
SHELL := /bin/bash
.DEFAULT_GOAL := help
.PHONY: viewcoverage coverage setup help install uninstall diagrams buildr buildd test clean debug release sample updatebadge doc doc-install init clean-test
.PHONY: viewcoverage coverage setup help install uninstall diagrams buildr buildd test clean updatebadge doc doc-install init clean-test debug release conan-create conan-upload conan-clean sample
f_release = build_Release
f_debug = build_Debug
f_diagrams = diagrams
app_targets = BayesNet
app_targets = bayesnet
test_targets = TestBayesNet
clang-uml = clang-uml
plantuml = plantuml
@@ -17,6 +17,14 @@ mansrcdir = docs/man3
mandestdir = /usr/local/share/man
sed_command_link = 's/e">LCOV -/e"><a href="https:\/\/rmontanana.github.io\/bayesnet">Back to manual<\/a> LCOV -/g'
sed_command_diagram = 's/Diagram"/Diagram" width="100%" height="100%" /g'
# Set the number of parallel jobs to the number of available processors minus 7
CPUS := $(shell getconf _NPROCESSORS_ONLN 2>/dev/null \
|| nproc --all 2>/dev/null \
|| sysctl -n hw.ncpu)
# --- Your desired job count: CPUs 7, but never less than 1 --------------
JOBS := $(shell n=$(CPUS); [ $${n} -gt 7 ] && echo $$((n-7)) || echo 1)
define ClearTests
@for t in $(test_targets); do \
@@ -31,6 +39,14 @@ define ClearTests
fi ;
endef
define setup_target
@echo ">>> Setup the project for $(1)..."
@if [ -d $(2) ]; then rm -fr $(2); fi
@conan install . --build=missing -of $(2) -s build_type=$(1)
@cmake -S . -B $(2) -DCMAKE_TOOLCHAIN_FILE=$(2)/build/$(1)/generators/conan_toolchain.cmake -DCMAKE_BUILD_TYPE=$(1) -D$(3)
@echo ">>> Will build using $(JOBS) parallel jobs"
@echo ">>> Done"
endef
setup: ## Install dependencies for tests and coverage
@if [ "$(shell uname)" = "Darwin" ]; then \
@@ -43,30 +59,36 @@ setup: ## Install dependencies for tests and coverage
fi
@echo "* You should install plantuml & graphviz for the diagrams"
diagrams: ## Create an UML class diagram & dependency of the project (diagrams/BayesNet.png)
@which $(plantuml) || (echo ">>> Please install plantuml"; exit 1)
@which $(dot) || (echo ">>> Please install graphviz"; exit 1)
@which $(clang-uml) || (echo ">>> Please install clang-uml"; exit 1)
@export PLANTUML_LIMIT_SIZE=16384
@echo ">>> Creating UML class diagram of the project...";
@$(clang-uml) -p
@cd $(f_diagrams); \
$(plantuml) -tsvg BayesNet.puml
@echo ">>> Creating dependency graph diagram of the project...";
$(MAKE) debug
cd $(f_debug) && cmake .. --graphviz=dependency.dot
@$(dot) -Tsvg $(f_debug)/dependency.dot.BayesNet -o $(f_diagrams)/dependency.svg
clean: ## Clean the project
@echo ">>> Cleaning the project..."
@if test -f CMakeCache.txt ; then echo "- Deleting CMakeCache.txt"; rm -f CMakeCache.txt; fimake
@for folder in $(f_release) $(f_debug) vpcpkg_installed install_test ; do \
if test -d "$$folder" ; then \
echo "- Deleting $$folder folder" ; \
rm -rf "$$folder"; \
fi; \
done
@$(MAKE) clean-test
@echo ">>> Done";
# Build targets
# =============
debug: ## Setup debug version using Conan
@$(call setup_target,"Debug","$(f_debug)","ENABLE_TESTING=ON")
release: ## Setup release version using Conan
@$(call setup_target,"Release","$(f_release)","ENABLE_TESTING=OFF")
buildd: ## Build the debug targets
cmake --build $(f_debug) -t $(app_targets) --parallel $(CMAKE_BUILD_PARALLEL_LEVEL)
cmake --build $(f_debug) --config Debug -t $(app_targets) --parallel $(JOBS)
buildr: ## Build the release targets
cmake --build $(f_release) -t $(app_targets) --parallel $(CMAKE_BUILD_PARALLEL_LEVEL)
cmake --build $(f_release) --config Release -t $(app_targets) --parallel $(JOBS)
clean-test: ## Clean the tests info
@echo ">>> Cleaning Debug BayesNet tests...";
$(call ClearTests)
@echo ">>> Done";
# Install targets
# ===============
uninstall: ## Uninstall library
@echo ">>> Uninstalling BayesNet...";
@@ -79,56 +101,20 @@ install: ## Install library
@cmake --install $(f_release) --prefix $(prefix)
@echo ">>> Done";
init: ## Initialize the project installing dependencies
@echo ">>> Installing dependencies"
@vcpkg install
@echo ">>> Done";
clean: ## Clean the project
@echo ">>> Cleaning the project..."
@if test -d build_Debug ; then echo "- Deleting build_Debug folder" ; rm -rf build_Debug; fi
@if test -d build_Release ; then echo "- Deleting build_Release folder" ; rm -rf build_Release; fi
@if test -f CMakeCache.txt ; then echo "- Deleting CMakeCache.txt"; rm -f CMakeCache.txt; fi
@if test -d vcpkg_installed ; then echo "- Deleting vcpkg_installed folder" ; rm -rf vcpkg_installed; fi
@$(MAKE) clean-test
@echo ">>> Done";
# Test targets
# ============
debug: ## Build a debug version of the project
@echo ">>> Building Debug BayesNet...";
@if [ -d ./$(f_debug) ]; then rm -rf ./$(f_debug); fi
@mkdir $(f_debug);
@cmake -S . -B $(f_debug) -D CMAKE_BUILD_TYPE=Debug -D ENABLE_TESTING=ON -D CODE_COVERAGE=ON -DCMAKE_TOOLCHAIN_FILE=${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake
@echo ">>> Done";
release: ## Build a Release version of the project
@echo ">>> Building Release BayesNet...";
@if [ -d ./$(f_release) ]; then rm -rf ./$(f_release); fi
@mkdir $(f_release);
@cmake -S . -B $(f_release) -D CMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake
@echo ">>> Done";
fname = "tests/data/iris.arff"
sample: ## Build sample
@echo ">>> Building Sample...";
@if [ -d ./sample/build ]; then rm -rf ./sample/build; fi
@cd sample && cmake -B build -S . -D CMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake && \
cmake --build build -t bayesnet_sample
sample/build/bayesnet_sample $(fname)
@echo ">>> Done";
fname = "tests/data/iris.arff"
sample2: ## Build sample2
@echo ">>> Building Sample...";
@if [ -d ./sample/build ]; then rm -rf ./sample/build; fi
@cd sample && cmake -B build -S . -D CMAKE_BUILD_TYPE=Debug && cmake --build build -t bayesnet_sample_xspode
sample/build/bayesnet_sample_xspode $(fname)
clean-test: ## Clean the tests info
@echo ">>> Cleaning Debug BayesNet tests...";
$(call ClearTests)
@echo ">>> Done";
opt = ""
test: ## Run tests (opt="-s") to verbose output the tests, (opt="-c='Test Maximum Spanning Tree'") to run only that section
@echo ">>> Running BayesNet tests...";
@$(MAKE) clean-test
@cmake --build $(f_debug) -t $(test_targets) --parallel $(CMAKE_BUILD_PARALLEL_LEVEL)
@cmake --build $(f_debug) -t $(test_targets) --parallel $(JOBS)
@for t in $(test_targets); do \
echo ">>> Running $$t...";\
if [ -f $(f_debug)/tests/$$t ]; then \
@@ -153,6 +139,7 @@ coverage: ## Run tests and generate coverage report (build/index.html)
$(lcov) --remove coverage.info 'tests/*' --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info 'bayesnet/utils/loguru.*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info '/opt/miniconda/*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info '*/.conan2/*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --summary coverage.info
@$(MAKE) updatebadge
@echo ">>> Done";
@@ -178,6 +165,9 @@ updatebadge: ## Update the coverage badge in README.md
@env python update_coverage.py $(f_debug)/tests
@echo ">>> Done";
# Documentation targets
# =====================
doc: ## Generate documentation
@echo ">>> Generating documentation..."
@cmake --build $(f_release) -t doxygen
@@ -192,6 +182,22 @@ doc: ## Generate documentation
fi
@echo ">>> Done";
diagrams: ## Create an UML class diagram & dependency of the project (diagrams/BayesNet.png)
@echo ">>> Creating diagrams..."
@which $(plantuml) || (echo ">>> Please install plantuml"; exit 1)
@which $(dot) || (echo ">>> Please install graphviz"; exit 1)
@which $(clang-uml) || (echo ">>> Please install clang-uml"; exit 1)
@export PLANTUML_LIMIT_SIZE=16384
@echo ">>> Creating UML class diagram of the project...";
@$(clang-uml) -p
@cd $(f_diagrams); \
$(plantuml) -tsvg BayesNet.puml
@echo ">>> Creating dependency graph diagram of the project...";
$(MAKE) debug
cd $(f_debug) && cmake .. --graphviz=dependency.dot
@$(dot) -Tsvg $(f_debug)/dependency.dot.BayesNet -o $(f_diagrams)/dependency.svg
@echo ">>> Done";
docdir = ""
doc-install: ## Install documentation
@echo ">>> Installing documentation..."
@@ -206,6 +212,38 @@ doc-install: ## Install documentation
@sudo cp -rp $(mansrcdir) $(mandestdir)
@echo ">>> Done";
# Conan package manager targets
# =============================
conan-create: ## Create Conan package
@echo ">>> Creating Conan package..."
@conan create . --build=missing -tf "" -s:a build_type=Release
@conan create . --build=missing -tf "" -s:a build_type=Debug -o "&:enable_coverage=False" -o "&:enable_testing=False"
@echo ">>> Done"
conan-clean: ## Clean Conan cache and build folders
@echo ">>> Cleaning Conan cache and build folders..."
@conan remove "*" --confirm
@conan cache clean
@if test -d "$(f_release)" ; then rm -rf "$(f_release)"; fi
@if test -d "$(f_debug)" ; then rm -rf "$(f_debug)"; fi
@echo ">>> Done"
fname = "tests/data/iris.arff"
model = "TANLd"
build_type = "Debug"
sample: ## Build sample with Conan
@echo ">>> Building Sample with Conan...";
@if [ -d ./sample/build ]; then rm -rf ./sample/build; fi
@cd sample && conan install . --output-folder=build --build=missing -s build_type=$(build_type) -o "&:enable_coverage=False" -o "&:enable_testing=False"
@cd sample && cmake -B build -S . -DCMAKE_BUILD_TYPE=$(build_type) -DCMAKE_TOOLCHAIN_FILE=build/conan_toolchain.cmake && \
cmake --build build -t bayesnet_sample --parallel $(JOBS)
sample/build/bayesnet_sample $(fname) $(model)
@echo ">>> Done";
# Help target
# ===========
help: ## Show help message
@IFS=$$'\n' ; \
help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//' | sed -e 's/##/:/'`); \

119
README.md
View File

@@ -6,120 +6,121 @@
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/cf3e0ac71d764650b1bf4d8d00d303b1)](https://app.codacy.com/gh/Doctorado-ML/BayesNet/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=rmontanana_BayesNet&metric=security_rating)](https://sonarcloud.io/summary/new_code?id=rmontanana_BayesNet)
[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=rmontanana_BayesNet&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=rmontanana_BayesNet)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/Doctorado-ML/BayesNet)
![Gitea Last Commit](https://img.shields.io/gitea/last-commit/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es&logo=gitea)
[![Coverage Badge](https://img.shields.io/badge/Coverage-99,1%25-green)](https://gitea.rmontanana.es/rmontanana/BayesNet)
[![DOI](https://zenodo.org/badge/667782806.svg)](https://doi.org/10.5281/zenodo.14210344)
Bayesian Network Classifiers library
## Setup
## Using the Library
### Using the vcpkg library
### Using Conan Package Manager
You can use the library with the vcpkg library manager. In your project you have to add the following files:
You can use the library with the [Conan](https://conan.io/) package manager. In your project you need to add the following files:
#### vcpkg.json
#### conanfile.txt
```json
{
"name": "sample-project",
"version-string": "0.1.0",
"dependencies": [
"bayesnet"
]
}
```
```txt
[requires]
bayesnet/1.1.2
#### vcpkg-configuration.json
```json
{
"registries": [
{
"kind": "git",
"repository": "https://github.com/rmontanana/vcpkg-stash",
"baseline": "393efa4e74e053b6f02c4ab03738c8fe796b28e5",
"packages": [
"folding",
"bayesnet",
"arff-files",
"fimdlp",
"libtorch-bin"
]
}
],
"default-registry": {
"kind": "git",
"repository": "https://github.com/microsoft/vcpkg",
"baseline": "760bfd0c8d7c89ec640aec4df89418b7c2745605"
}
}
[generators]
CMakeDeps
CMakeToolchain
```
#### CMakeLists.txt
You have to include the following lines in your `CMakeLists.txt` file:
Include the following lines in your `CMakeLists.txt` file:
```cmake
find_package(bayesnet CONFIG REQUIRED)
find_package(bayesnet REQUIRED)
add_executable(myapp main.cpp)
target_link_libraries(myapp PRIVATE bayesnet::bayesnet)
```
After that, you can use the `vcpkg` command to install the dependencies:
Then install the dependencies and build your project:
```bash
vcpkg install
conan install . --output-folder=build --build=missing
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=build/conan_toolchain.cmake
cmake --build build
```
**Note: In the `sample` folder you can find a sample application that uses the library. You can use it as a reference to create your own application.**
## Playing with the library
## Building and Testing
The dependencies are managed with [vcpkg](https://vcpkg.io/) and supported by a private vcpkg repository in [https://github.com/rmontanana/vcpkg-stash](https://github.com/rmontanana/vcpkg-stash).
The project uses [Conan](https://conan.io/) for dependency management and provides convenient Makefile targets for common tasks.
### Prerequisites
- [Conan](https://conan.io/) package manager (`pip install conan`)
- CMake 3.27+
- C++17 compatible compiler
### Getting the code
```bash
git clone https://github.com/doctorado-ml/bayesnet
cd bayesnet
```
Once you have the code, you can use the `make` command to build the project. The `Makefile` is used to manage the build process and it will automatically download and install the dependencies.
### Build Commands
### Release
#### Release Build
```bash
make init # Install dependencies
make release # Build the release version
make buildr # compile and link the release version
make release # Configure release build with Conan
make buildr # Build the release version
```
### Debug & Tests
#### Debug Build & Tests
```bash
make init # Install dependencies
make debug # Build the debug version
make test # Run the tests
make debug # Configure debug build with Conan
make buildd # Build the debug version
make test # Run the tests
```
### Coverage
#### Coverage Analysis
```bash
make coverage # Run the tests with coverage
make viewcoverage # View the coverage report in the browser
make coverage # Run tests with coverage analysis
make viewcoverage # View coverage report in browser
```
### Sample app
#### Sample Application
After building and installing the release version, you can run the sample app with the following commands:
Run the sample application with different datasets and models:
```bash
make sample
make sample fname=tests/data/glass.arff
make sample # Run with default settings
make sample fname=tests/data/glass.arff # Use glass dataset
make sample fname=tests/data/iris.arff model=AODE # Use specific model
```
### Available Makefile Targets
- `debug` - Configure debug build using Conan
- `release` - Configure release build using Conan
- `buildd` - Build debug targets
- `buildr` - Build release targets
- `test` - Run all tests (use `opt="-s"` for verbose output)
- `coverage` - Generate test coverage report
- `viewcoverage` - Open coverage report in browser
- `sample` - Build and run sample application
- `conan-create` - Create Conan package
- `conan-upload` - Upload package to Conan remote
- `conan-clean` - Clean Conan cache and build folders
- `clean` - Clean all build artifacts
- `doc` - Generate documentation
- `diagrams` - Generate UML diagrams
- `help` - Show all available targets
## Models
#### - TAN

View File

@@ -0,0 +1,518 @@
# Revisión Técnica de BayesNet - Informe Completo
## Resumen Ejecutivo
Como desarrollador experto en C++, he realizado una revisión técnica exhaustiva de la biblioteca BayesNet, evaluando su arquitectura, calidad de código, rendimiento y mantenibilidad. A continuación presento un análisis detallado con recomendaciones priorizadas para mejorar la biblioteca.
## 1. Fortalezas Identificadas
### 1.1 Arquitectura y Diseño
- **✅ Diseño orientado a objetos bien estructurado** con jerarquía clara de clases
- **✅ Uso adecuado de smart pointers** (std::unique_ptr) en la mayoría del código
- **✅ Abstracción coherente** a través de BaseClassifier
- **✅ Separación clara de responsabilidades** entre módulos
- **✅ Documentación API con Doxygen** completa y actualizada
### 1.2 Gestión de Dependencias y Build
- **✅ Sistema vcpkg** bien configurado para gestión de dependencias
- **✅ CMake moderno** (3.27+) con configuración robusta
- **✅ Separación Debug/Release** con optimizaciones apropiadas
- **✅ Sistema de testing integrado** con Catch2
### 1.3 Testing y Cobertura
- **✅ 17 archivos de test** cubriendo los componentes principales
- **✅ Tests parametrizados** con múltiples datasets
- **✅ Integración con lcov** para reportes de cobertura
- **✅ Tests automáticos** en el proceso de build
## 2. Debilidades y Problemas Críticos
### 2.1 Problemas de Gestión de Memoria
#### **🔴 CRÍTICO: Memory Leak Potencial**
**Archivo:** `/bayesnet/ensembles/Boost.cc` (líneas 124-141)
```cpp
// PROBLEMA: Raw pointer sin RAII
FeatureSelect* featureSelector = nullptr;
if (select_features_algorithm == SelectFeatures.CFS) {
featureSelector = new CFS(...); // ❌ Riesgo de leak
}
// ...
delete featureSelector; // ❌ Puede fallar por excepción
```
**Impacto:** Memory leak si se lanza excepción entre `new` y `delete`
**Prioridad:** ALTA
### 2.2 Problemas de Performance
#### **🔴 CRÍTICO: Complejidad O(n³)**
**Archivo:** `/bayesnet/utils/BayesMetrics.cc` (líneas 41-53)
```cpp
for (int i = 0; i < n - 1; ++i) {
if (std::find(featuresExcluded.begin(), featuresExcluded.end(), i) != featuresExcluded.end()) {
continue; // ❌ O(n) en bucle anidado
}
for (int j = i + 1; j < n; ++j) {
if (std::find(featuresExcluded.begin(), featuresExcluded.end(), j) != featuresExcluded.end()) {
continue; // ❌ O(n) en bucle anidado
}
// Más operaciones costosas...
}
}
```
**Impacto:** Con 100 features = 1,250,000 operaciones de búsqueda
**Prioridad:** ALTA
#### **🔴 CRÍTICO: Threading Ineficiente**
**Archivo:** `/bayesnet/network/Network.cc` (líneas 269-273)
```cpp
for (int i = 0; i < samples.size(1); ++i) {
threads.emplace_back(worker, sample, i); // ❌ Thread per sample
}
```
**Impacto:** Con 10,000 muestras = 10,000 threads (context switching excesivo)
**Prioridad:** ALTA
### 2.3 Problemas de Calidad de Código
#### **🟡 MODERADO: Funciones Excesivamente Largas**
- `XSP2DE.cc`: 575 líneas (violación de SRP)
- `Boost::setHyperparameters()`: 150+ líneas
- `L1FS::fitLasso()`: 200+ líneas de complejidad algoritmica alta
#### **🟡 MODERADO: Validación Insuficiente**
```cpp
// En múltiples archivos: falta validación de entrada
if (features.empty()) {
// Sin manejo de caso edge
}
```
### 2.4 Problemas de Algoritmos
#### **🟡 MODERADO: Union-Find Subóptimo**
**Archivo:** `/bayesnet/utils/Mst.cc`
```cpp
// ❌ Sin compresión de caminos ni unión por rango
int find_set(int i) {
if (i != parent[i])
i = find_set(parent[i]); // Ineficiente O(n)
return i;
}
```
**Impacto:** Algoritmo MST subóptimo O(V²) en lugar de O(E log V)
## 3. Plan de Mejoras Priorizadas
### 3.1 Fase 1: Problemas Críticos (Semanas 1-2)
#### **Tarea 1.1: Eliminar Memory Leak en Boost.cc**
```cpp
// ANTES (línea 51 en Boost.h):
FeatureSelect* featureSelector = nullptr;
// DESPUÉS:
std::unique_ptr<FeatureSelect> featureSelector;
// ANTES (líneas 124-141 en Boost.cc):
if (select_features_algorithm == SelectFeatures.CFS) {
featureSelector = new CFS(...);
}
// ...
delete featureSelector;
// DESPUÉS:
if (select_features_algorithm == SelectFeatures.CFS) {
featureSelector = std::make_unique<CFS>(...);
}
// ... automática limpieza del smart pointer
```
**Estimación:** 2 horas
**Prioridad:** CRÍTICA
#### **Tarea 1.2: Optimizar BayesMetrics::SelectKPairs()**
```cpp
// SOLUCIÓN PROPUESTA:
std::vector<std::pair<int, int>> Metrics::SelectKPairs(
const torch::Tensor& weights,
std::vector<int>& featuresExcluded,
bool ascending, unsigned k) {
// ✅ O(1) lookups en lugar de O(n)
std::unordered_set<int> excludedSet(featuresExcluded.begin(), featuresExcluded.end());
auto n = features.size();
scoresKPairs.clear();
scoresKPairs.reserve((n * (n-1)) / 2); // ✅ Reserve memoria
for (int i = 0; i < n - 1; ++i) {
if (excludedSet.count(i)) continue; // ✅ O(1)
for (int j = i + 1; j < n; ++j) {
if (excludedSet.count(j)) continue; // ✅ O(1)
// resto del procesamiento...
}
}
// ✅ nth_element en lugar de sort completo
if (k > 0 && k < scoresKPairs.size()) {
std::nth_element(scoresKPairs.begin(),
scoresKPairs.begin() + k,
scoresKPairs.end());
scoresKPairs.resize(k);
}
return pairsKBest;
}
```
**Beneficio:** 50x mejora de performance (de O(n³) a O(n² log k))
**Estimación:** 4 horas
**Prioridad:** CRÍTICA
#### **Tarea 1.3: Implementar Thread Pool**
```cpp
// SOLUCIÓN PROPUESTA para Network.cc:
void Network::predict_tensor_optimized(const torch::Tensor& samples, const bool proba) {
const int num_threads = std::min(
static_cast<int>(std::thread::hardware_concurrency()),
static_cast<int>(samples.size(1))
);
const int batch_size = (samples.size(1) + num_threads - 1) / num_threads;
std::vector<std::thread> threads;
threads.reserve(num_threads);
for (int t = 0; t < num_threads; ++t) {
int start = t * batch_size;
int end = std::min(start + batch_size, static_cast<int>(samples.size(1)));
threads.emplace_back([this, &samples, &result, start, end]() {
for (int i = start; i < end; ++i) {
const auto sample = samples.index({ "...", i });
auto prediction = predict_sample(sample);
// Thread-safe escritura
std::lock_guard<std::mutex> lock(result_mutex);
result.index_put_({ i, "..." }, torch::tensor(prediction));
}
});
}
for (auto& thread : threads) {
thread.join();
}
}
```
**Beneficio:** 4-8x mejora en predicción con múltiples cores
**Estimación:** 6 horas
**Prioridad:** CRÍTICA
### 3.2 Fase 2: Optimizaciones Importantes (Semanas 3-4)
#### **Tarea 2.1: Refactoring de Funciones Largas**
**XSP2DE.cc** - Dividir en funciones más pequeñas:
```cpp
// ANTES: Una función de 575 líneas
void XSP2DE::buildModel(const torch::Tensor& weights) {
// ... 575 líneas de código
}
// DESPUÉS: Funciones especializadas
class XSP2DE {
private:
void initializeHyperparameters();
void selectFeatures(const torch::Tensor& weights);
void buildSubModels();
void trainIndividualModels(const torch::Tensor& weights);
public:
void buildModel(const torch::Tensor& weights) override {
initializeHyperparameters();
selectFeatures(weights);
buildSubModels();
trainIndividualModels(weights);
}
};
```
**Estimación:** 8 horas
**Beneficio:** Mejora mantenibilidad y testing
#### **Tarea 2.2: Optimizar Union-Find en MST**
```cpp
// SOLUCIÓN PROPUESTA para Mst.cc:
class UnionFind {
private:
std::vector<int> parent, rank;
public:
UnionFind(int n) : parent(n), rank(n, 0) {
std::iota(parent.begin(), parent.end(), 0);
}
int find_set(int i) {
if (i != parent[i])
parent[i] = find_set(parent[i]); // ✅ Path compression
return parent[i];
}
bool union_set(int u, int v) {
u = find_set(u);
v = find_set(v);
if (u == v) return false;
// ✅ Union by rank
if (rank[u] < rank[v]) std::swap(u, v);
parent[v] = u;
if (rank[u] == rank[v]) rank[u]++;
return true;
}
};
```
**Beneficio:** Mejora de O(V²) a O(E log V)
**Estimación:** 4 horas
#### **Tarea 2.3: Eliminar Copias Innecesarias de Tensores**
```cpp
// ANTES (múltiples archivos):
X = X.to(torch::kFloat32); // ❌ Copia completa
y = y.to(torch::kFloat32); // ❌ Copia completa
// DESPUÉS:
torch::Tensor X = samples.index({Slice(0, n_features), Slice()})
.t()
.to(torch::kFloat32); // ✅ Una sola conversión
torch::Tensor y = samples.index({-1, Slice()})
.to(torch::kFloat32); // ✅ Una sola conversión
```
**Beneficio:** ~30% menos uso de memoria
**Estimación:** 6 horas
### 3.3 Fase 3: Mejoras de Robustez (Semanas 5-6)
#### **Tarea 3.1: Implementar Validación Comprehensiva**
```cpp
// TEMPLATE PARA VALIDACIÓN:
template<typename T>
void validateInput(const std::vector<T>& data, const std::string& name) {
if (data.empty()) {
throw std::invalid_argument(name + " cannot be empty");
}
}
void validateTensorDimensions(const torch::Tensor& tensor,
const std::vector<int64_t>& expected_dims) {
if (tensor.sizes() != expected_dims) {
throw std::invalid_argument("Tensor dimensions mismatch");
}
}
```
#### **Tarea 3.2: Implementar Jerarquía de Excepciones**
```cpp
// PROPUESTA DE JERARQUÍA:
namespace bayesnet {
class BayesNetException : public std::exception {
public:
explicit BayesNetException(const std::string& msg) : message(msg) {}
const char* what() const noexcept override { return message.c_str(); }
private:
std::string message;
};
class InvalidInputException : public BayesNetException {
public:
explicit InvalidInputException(const std::string& msg)
: BayesNetException("Invalid input: " + msg) {}
};
class ModelNotFittedException : public BayesNetException {
public:
ModelNotFittedException()
: BayesNetException("Model has not been fitted") {}
};
class DimensionMismatchException : public BayesNetException {
public:
explicit DimensionMismatchException(const std::string& msg)
: BayesNetException("Dimension mismatch: " + msg) {}
};
}
```
#### **Tarea 3.3: Mejorar Cobertura de Tests**
```cpp
// TESTS ADICIONALES NECESARIOS:
TEST_CASE("Edge Cases", "[FeatureSelection]") {
SECTION("Empty dataset") {
torch::Tensor empty_dataset = torch::empty({0, 0});
std::vector<std::string> empty_features;
REQUIRE_THROWS_AS(
CFS(empty_dataset, empty_features, "class", 0, 2, torch::ones({1})),
InvalidInputException
);
}
SECTION("Single feature") {
// Test comportamiento con un solo feature
}
SECTION("All features excluded") {
// Test cuando todas las features están excluidas
}
}
```
### 3.4 Fase 4: Mejoras de Performance Avanzadas (Semanas 7-8)
#### **Tarea 4.1: Paralelización con OpenMP**
```cpp
// EXAMPLE PARA BUCLES CRÍTICOS:
#include <omp.h>
void computeIntensiveOperation(const torch::Tensor& data) {
const int n = data.size(0);
std::vector<double> results(n);
#pragma omp parallel for
for (int i = 0; i < n; ++i) {
results[i] = expensiveComputation(data[i]);
}
}
```
#### **Tarea 4.2: Memory Pool para Operaciones Frecuentes**
```cpp
// PROPUESTA DE MEMORY POOL:
class TensorPool {
private:
std::stack<torch::Tensor> available_tensors;
std::mutex pool_mutex;
public:
torch::Tensor acquire(const std::vector<int64_t>& shape) {
std::lock_guard<std::mutex> lock(pool_mutex);
if (!available_tensors.empty()) {
auto tensor = available_tensors.top();
available_tensors.pop();
return tensor.resize_(shape);
}
return torch::zeros(shape);
}
void release(torch::Tensor tensor) {
std::lock_guard<std::mutex> lock(pool_mutex);
available_tensors.push(tensor);
}
};
```
## 4. Estimaciones y Timeline
### 4.1 Resumen de Esfuerzo
| Fase | Tareas | Estimación | Beneficio |
|------|--------|------------|-----------|
| Fase 1 | Problemas Críticos | 12 horas | 10-50x mejora performance |
| Fase 2 | Optimizaciones | 18 horas | Mantenibilidad + 30% menos memoria |
| Fase 3 | Robustez | 16 horas | Estabilidad y debugging |
| Fase 4 | Performance Avanzada | 12 horas | Escalabilidad |
| **Total** | | **58 horas** | **Transformación significativa** |
### 4.2 Timeline Sugerido
```
Semana 1: [CRÍTICO] Memory leak + BayesMetrics
Semana 2: [CRÍTICO] Thread pool + validación básica
Semana 3: [IMPORTANTE] Refactoring XSP2DE + MST
Semana 4: [IMPORTANTE] Optimización tensores + duplicación
Semana 5: [ROBUSTEZ] Validación + excepciones
Semana 6: [ROBUSTEZ] Tests adicionales + edge cases
Semana 7: [AVANZADO] Paralelización OpenMP
Semana 8: [AVANZADO] Memory pool + optimizaciones finales
```
## 5. Impacto Esperado
### 5.1 Performance
- **50x más rápido** en operaciones de feature selection
- **4-8x más rápido** en predicción con datasets grandes
- **30% menos uso de memoria** eliminando copias innecesarias
- **Escalabilidad mejorada** con paralelización
### 5.2 Mantenibilidad
- **Funciones más pequeñas** y especializadas
- **Mejor separación de responsabilidades**
- **Testing más comprehensivo**
- **Debugging más fácil** con excepciones específicas
### 5.3 Robustez
- **Eliminación de memory leaks**
- **Validación comprehensiva de entrada**
- **Manejo robusto de casos edge**
- **Mejor reportes de error**
## 6. Recomendaciones Adicionales
### 6.1 Herramientas de Desarrollo
- **Análisis estático:** Implementar clang-static-analyzer y cppcheck
- **Sanitizers:** Usar AddressSanitizer y ThreadSanitizer en CI
- **Profiling:** Integrar valgrind y perf para análisis de performance
- **Benchmarking:** Implementar Google Benchmark para tests de regression
### 6.2 Proceso de Desarrollo
- **Code reviews obligatorios** para cambios críticos
- **CI/CD con tests automáticos** en múltiples plataformas
- **Métricas de calidad** integradas (cobertura, complejidad ciclomática)
- **Documentación de algoritmos** con complejidad y referencias
### 6.3 Monitoreo de Performance
```cpp
// PROPUESTA DE PROFILING INTEGRADO:
class PerformanceProfiler {
private:
std::unordered_map<std::string, std::chrono::duration<double>> timings;
public:
class ScopedTimer {
// RAII timer para medir automáticamente
};
void startProfiling(const std::string& operation);
void endProfiling(const std::string& operation);
void generateReport();
};
```
## 7. Conclusiones
BayesNet es una biblioteca sólida con una arquitectura bien diseñada y uso apropiado de técnicas modernas de C++. Sin embargo, existen oportunidades significativas de mejora que pueden transformar dramáticamente su performance y mantenibilidad.
### Prioridades Inmediatas:
1. **Eliminar memory leak crítico** en Boost.cc
2. **Optimizar algoritmo O(n³)** en BayesMetrics.cc
3. **Implementar thread pool eficiente** en Network.cc
### Beneficios del Plan de Mejoras:
- **Performance:** 10-50x mejora en operaciones críticas
- **Memoria:** 30% reducción en uso de memoria
- **Mantenibilidad:** Código más modular y testing comprehensivo
- **Robustez:** Eliminación de crashes y mejor handling de errores
La implementación de estas mejoras convertirá BayesNet en una biblioteca de clase industrial, ready para production en entornos de alto rendimiento y misión crítica.
---
**Próximos Pasos Recomendados:**
1. Revisar y aprobar este plan de mejoras
2. Establecer prioridades basadas en necesidades del proyecto
3. Implementar mejoras en el orden sugerido
4. Establecer métricas de success para cada fase
5. Configurar CI/CD para validar mejoras automáticamente

View File

@@ -1,13 +0,0 @@
include_directories(
${BayesNet_SOURCE_DIR}/lib/log
${BayesNet_SOURCE_DIR}/lib/mdlp/src
${BayesNet_SOURCE_DIR}/lib/folding
${BayesNet_SOURCE_DIR}/lib/json/include
${BayesNet_SOURCE_DIR}
${CMAKE_BINARY_DIR}/configured_files/include
)
file(GLOB_RECURSE Sources "*.cc")
add_library(BayesNet ${Sources})
target_link_libraries(BayesNet fimdlp "${TORCH_LIBRARIES}")

View File

@@ -37,6 +37,7 @@ namespace bayesnet {
std::vector<std::string> getNotes() const override { return notes; }
std::string dump_cpt() const override;
void setHyperparameters(const nlohmann::json& hyperparameters) override; //For classifiers that don't have hyperparameters
Network& getModel() { return model; }
protected:
bool fitted;
unsigned int m, n; // m: number of samples, n: number of features

View File

@@ -10,17 +10,16 @@
#include "Classifier.h"
namespace bayesnet {
class KDB : public Classifier {
private:
int k;
float theta;
protected:
void add_m_edges(int idx, std::vector<int>& S, torch::Tensor& weights);
void buildModel(const torch::Tensor& weights) override;
public:
explicit KDB(int k, float theta = 0.03);
virtual ~KDB() = default;
void setHyperparameters(const nlohmann::json& hyperparameters_) override;
std::vector<std::string> graph(const std::string& name = "KDB") const override;
protected:
int k;
float theta;
void add_m_edges(int idx, std::vector<int>& S, torch::Tensor& weights);
void buildModel(const torch::Tensor& weights) override;
};
}
#endif

View File

@@ -5,22 +5,38 @@
// ***************************************************************
#include "KDBLd.h"
#include <memory>
namespace bayesnet {
KDBLd::KDBLd(int k) : KDB(k), Proposal(dataset, features, className) {}
KDBLd::KDBLd(int k) : KDB(k), Proposal(dataset, features, className, KDB::notes)
{
validHyperparameters = validHyperparameters_ld;
validHyperparameters.push_back("k");
validHyperparameters.push_back("theta");
}
KDBLd& KDBLd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
checkInput(X_, y_);
features = features_;
className = className_;
Xf = X_;
y = y_;
// Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y
states = fit_local_discretization(y);
// We have discretized the input data
// 1st we need to fit the model to build the normal KDB structure, KDB::fit initializes the base Bayesian network
return commonFit(features_, className_, states_, smoothing);
}
KDBLd& KDBLd::fit(torch::Tensor& dataset, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
if (!torch::is_floating_point(dataset)) {
throw std::runtime_error("Dataset must be a floating point tensor");
}
Xf = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." }).clone();
y = dataset.index({ -1, "..." }).clone().to(torch::kInt32);
return commonFit(features_, className_, states_, smoothing);
}
KDBLd& KDBLd::commonFit(const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
features = features_;
className = className_;
states = iterativeLocalDiscretization(y, static_cast<KDB*>(this), dataset, features, className, states_, smoothing);
KDB::fit(dataset, features, className, states, smoothing);
states = localDiscretizationProposal(states, model);
return *this;
}
torch::Tensor KDBLd::predict(torch::Tensor& X)
@@ -28,6 +44,11 @@ namespace bayesnet {
auto Xt = prepareX(X);
return KDB::predict(Xt);
}
torch::Tensor KDBLd::predict_proba(torch::Tensor& X)
{
auto Xt = prepareX(X);
return KDB::predict_proba(Xt);
}
std::vector<std::string> KDBLd::graph(const std::string& name) const
{
return KDB::graph(name);

View File

@@ -11,13 +11,21 @@
namespace bayesnet {
class KDBLd : public KDB, public Proposal {
private:
public:
explicit KDBLd(int k);
virtual ~KDBLd() = default;
KDBLd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
KDBLd& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
KDBLd& commonFit(const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
std::vector<std::string> graph(const std::string& name = "KDB") const override;
void setHyperparameters(const nlohmann::json& hyperparameters_) override
{
auto hyperparameters = hyperparameters_;
Proposal::setHyperparameters(hyperparameters);
KDB::setHyperparameters(hyperparameters);
}
torch::Tensor predict(torch::Tensor& X) override;
torch::Tensor predict_proba(torch::Tensor& X) override;
static inline std::string version() { return "0.0.1"; };
};
}

View File

@@ -5,15 +5,58 @@
// ***************************************************************
#include "Proposal.h"
#include <iostream>
#include <cmath>
#include <limits>
#include "Classifier.h"
#include "KDB.h"
#include "TAN.h"
#include "SPODE.h"
#include "KDBLd.h"
#include "TANLd.h"
namespace bayesnet {
Proposal::Proposal(torch::Tensor& dataset_, std::vector<std::string>& features_, std::string& className_) : pDataset(dataset_), pFeatures(features_), pClassName(className_) {}
Proposal::~Proposal()
Proposal::Proposal(torch::Tensor& dataset_, std::vector<std::string>& features_, std::string& className_, std::vector<std::string>& notes_) : pDataset(dataset_), pFeatures(features_), pClassName(className_), notes(notes_)
{
for (auto& [key, value] : discretizers) {
delete value;
}
void Proposal::setHyperparameters(nlohmann::json& hyperparameters)
{
if (hyperparameters.contains("ld_proposed_cuts")) {
ld_params.proposed_cuts = hyperparameters["ld_proposed_cuts"];
hyperparameters.erase("ld_proposed_cuts");
}
if (hyperparameters.contains("mdlp_max_depth")) {
ld_params.max_depth = hyperparameters["mdlp_max_depth"];
hyperparameters.erase("mdlp_max_depth");
}
if (hyperparameters.contains("mdlp_min_length")) {
ld_params.min_length = hyperparameters["mdlp_min_length"];
hyperparameters.erase("mdlp_min_length");
}
if (hyperparameters.contains("ld_algorithm")) {
auto algorithm = hyperparameters["ld_algorithm"];
hyperparameters.erase("ld_algorithm");
if (algorithm == "MDLP") {
discretizationType = discretization_t::MDLP;
} else if (algorithm == "BINQ") {
discretizationType = discretization_t::BINQ;
} else if (algorithm == "BINU") {
discretizationType = discretization_t::BINU;
} else {
throw std::invalid_argument("Invalid discretization algorithm: " + algorithm.get<std::string>());
}
}
// Convergence parameters
if (hyperparameters.contains("max_iterations")) {
convergence_params.maxIterations = hyperparameters["max_iterations"];
hyperparameters.erase("max_iterations");
}
if (hyperparameters.contains("verbose_convergence")) {
convergence_params.verbose = hyperparameters["verbose_convergence"];
hyperparameters.erase("verbose_convergence");
}
}
void Proposal::checkInput(const torch::Tensor& X, const torch::Tensor& y)
{
if (!torch::is_floating_point(X)) {
@@ -23,6 +66,7 @@ namespace bayesnet {
throw std::invalid_argument("y must be an integer tensor");
}
}
// Fit method for single classifier
map<std::string, std::vector<int>> Proposal::localDiscretizationProposal(const map<std::string, std::vector<int>>& oldStates, Network& model)
{
// order of local discretization is important. no good 0, 1, 2...
@@ -83,8 +127,15 @@ namespace bayesnet {
pDataset = torch::zeros({ n + 1, m }, torch::kInt32);
auto yv = std::vector<int>(y.data_ptr<int>(), y.data_ptr<int>() + y.size(0));
// discretize input data by feature(row)
std::unique_ptr<mdlp::Discretizer> discretizer;
for (auto i = 0; i < pFeatures.size(); ++i) {
auto* discretizer = new mdlp::CPPFImdlp();
if (discretizationType == discretization_t::BINQ) {
discretizer = std::make_unique<mdlp::BinDisc>(ld_params.proposed_cuts, mdlp::strategy_t::QUANTILE);
} else if (discretizationType == discretization_t::BINU) {
discretizer = std::make_unique<mdlp::BinDisc>(ld_params.proposed_cuts, mdlp::strategy_t::UNIFORM);
} else { // Default is MDLP
discretizer = std::make_unique<mdlp::CPPFImdlp>(ld_params.min_length, ld_params.max_depth, ld_params.proposed_cuts);
}
auto Xt_ptr = Xf.index({ i }).data_ptr<float>();
auto Xt = std::vector<float>(Xt_ptr, Xt_ptr + Xf.size(1));
discretizer->fit(Xt, yv);
@@ -92,7 +143,7 @@ namespace bayesnet {
auto xStates = std::vector<int>(discretizer->getCutPoints().size() + 1);
iota(xStates.begin(), xStates.end(), 0);
states[pFeatures[i]] = xStates;
discretizers[pFeatures[i]] = discretizer;
discretizers[pFeatures[i]] = std::move(discretizer);
}
int n_classes = torch::max(y).item<int>() + 1;
auto yStates = std::vector<int>(n_classes);
@@ -126,4 +177,65 @@ namespace bayesnet {
}
return yy;
}
template<typename Classifier>
map<std::string, std::vector<int>> Proposal::iterativeLocalDiscretization(
const torch::Tensor& y,
Classifier* classifier,
torch::Tensor& dataset,
const std::vector<std::string>& features,
const std::string& className,
const map<std::string, std::vector<int>>& initialStates,
Smoothing_t smoothing
)
{
// Phase 1: Initial discretization (same as original)
auto currentStates = fit_local_discretization(y);
auto previousModel = Network();
if (convergence_params.verbose) {
std::cout << "Starting iterative local discretization with "
<< convergence_params.maxIterations << " max iterations" << std::endl;
}
const torch::Tensor weights = torch::full({ pDataset.size(1) }, 1.0 / pDataset.size(1), torch::kDouble);
for (int iteration = 0; iteration < convergence_params.maxIterations; ++iteration) {
if (convergence_params.verbose) {
std::cout << "Iteration " << (iteration + 1) << "/" << convergence_params.maxIterations << std::endl;
}
// Phase 2: Build model with current discretization
classifier->fit(dataset, features, className, currentStates, weights, smoothing);
// Phase 3: Network-aware discretization refinement
currentStates = localDiscretizationProposal(currentStates, classifier->getModel());
// Check convergence
if (iteration > 0 && previousModel == classifier->getModel()) {
if (convergence_params.verbose) {
std::cout << "Converged after " << (iteration + 1) << " iterations" << std::endl;
}
notes.push_back("Converged after " + std::to_string(iteration + 1) + " of "
+ std::to_string(convergence_params.maxIterations) + " iterations");
break;
}
// Update for next iteration
previousModel = classifier->getModel();
}
return currentStates;
}
// Explicit template instantiation for common classifier types
template map<std::string, std::vector<int>> Proposal::iterativeLocalDiscretization<KDB>(
const torch::Tensor&, KDB*, torch::Tensor&, const std::vector<std::string>&,
const std::string&, const map<std::string, std::vector<int>>&, Smoothing_t);
template map<std::string, std::vector<int>> Proposal::iterativeLocalDiscretization<TAN>(
const torch::Tensor&, TAN*, torch::Tensor&, const std::vector<std::string>&,
const std::string&, const map<std::string, std::vector<int>>&, Smoothing_t);
template map<std::string, std::vector<int>> Proposal::iterativeLocalDiscretization<SPODE>(
const torch::Tensor&, SPODE*, torch::Tensor&, const std::vector<std::string>&,
const std::string&, const map<std::string, std::vector<int>>&, Smoothing_t);
}

View File

@@ -10,27 +10,66 @@
#include <map>
#include <torch/torch.h>
#include <fimdlp/CPPFImdlp.h>
#include <fimdlp/BinDisc.h>
#include "bayesnet/network/Network.h"
#include <nlohmann/json.hpp>
#include "Classifier.h"
namespace bayesnet {
class Proposal {
public:
Proposal(torch::Tensor& pDataset, std::vector<std::string>& features_, std::string& className_);
virtual ~Proposal();
Proposal(torch::Tensor& pDataset, std::vector<std::string>& features_, std::string& className_, std::vector<std::string>& notes);
void setHyperparameters(nlohmann::json& hyperparameters_);
protected:
void checkInput(const torch::Tensor& X, const torch::Tensor& y);
torch::Tensor prepareX(torch::Tensor& X);
map<std::string, std::vector<int>> localDiscretizationProposal(const map<std::string, std::vector<int>>& states, Network& model);
map<std::string, std::vector<int>> fit_local_discretization(const torch::Tensor& y);
// Iterative discretization method
template<typename Classifier>
map<std::string, std::vector<int>> iterativeLocalDiscretization(
const torch::Tensor& y,
Classifier* classifier,
torch::Tensor& dataset,
const std::vector<std::string>& features,
const std::string& className,
const map<std::string, std::vector<int>>& initialStates,
const Smoothing_t smoothing
);
torch::Tensor Xf; // X continuous nxm tensor
torch::Tensor y; // y discrete nx1 tensor
map<std::string, mdlp::CPPFImdlp*> discretizers;
map<std::string, std::unique_ptr<mdlp::Discretizer>> discretizers;
// MDLP parameters
struct {
size_t min_length = 3; // Minimum length of the interval to consider it in mdlp
float proposed_cuts = 0.0; // Proposed cuts for the Discretization algorithm
int max_depth = std::numeric_limits<int>::max(); // Maximum depth of the MDLP tree
} ld_params;
// Convergence parameters
struct {
int maxIterations = 10;
bool verbose = false;
} convergence_params;
nlohmann::json validHyperparameters_ld = {
"ld_algorithm", "ld_proposed_cuts", "mdlp_min_length", "mdlp_max_depth",
"max_iterations", "verbose_convergence"
};
private:
std::vector<int> factorize(const std::vector<std::string>& labels_t);
std::vector<std::string>& notes; // Notes during fit from BaseClassifier
torch::Tensor& pDataset; // (n+1)xm tensor
std::vector<std::string>& pFeatures;
std::string& pClassName;
enum class discretization_t {
MDLP,
BINQ,
BINU
} discretizationType = discretization_t::MDLP; // Default discretization type
};
}

View File

@@ -7,7 +7,11 @@
#include "SPODELd.h"
namespace bayesnet {
SPODELd::SPODELd(int root) : SPODE(root), Proposal(dataset, features, className) {}
SPODELd::SPODELd(int root) : SPODE(root), Proposal(dataset, features, className, SPODE::notes)
{
validHyperparameters = validHyperparameters_ld; // Inherits the valid hyperparameters from Proposal
}
SPODELd& SPODELd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
checkInput(X_, y_);
@@ -30,12 +34,8 @@ namespace bayesnet {
{
features = features_;
className = className_;
// Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y
states = fit_local_discretization(y);
// We have discretized the input data
// 1st we need to fit the model to build the normal SPODE structure, SPODE::fit initializes the base Bayesian network
states = iterativeLocalDiscretization(y, static_cast<SPODE*>(this), dataset, features, className, states_, smoothing);
SPODE::fit(dataset, features, className, states, smoothing);
states = localDiscretizationProposal(states, model);
return *this;
}
torch::Tensor SPODELd::predict(torch::Tensor& X)
@@ -43,6 +43,11 @@ namespace bayesnet {
auto Xt = prepareX(X);
return SPODE::predict(Xt);
}
torch::Tensor SPODELd::predict_proba(torch::Tensor& X)
{
auto Xt = prepareX(X);
return SPODE::predict_proba(Xt);
}
std::vector<std::string> SPODELd::graph(const std::string& name) const
{
return SPODE::graph(name);

View File

@@ -18,7 +18,14 @@ namespace bayesnet {
SPODELd& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
SPODELd& commonFit(const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
std::vector<std::string> graph(const std::string& name = "SPODELd") const override;
void setHyperparameters(const nlohmann::json& hyperparameters_) override
{
auto hyperparameters = hyperparameters_;
Proposal::setHyperparameters(hyperparameters);
SPODE::setHyperparameters(hyperparameters);
}
torch::Tensor predict(torch::Tensor& X) override;
torch::Tensor predict_proba(torch::Tensor& X) override;
static inline std::string version() { return "0.0.1"; };
};
}

View File

@@ -5,30 +5,48 @@
// ***************************************************************
#include "TANLd.h"
#include <memory>
namespace bayesnet {
TANLd::TANLd() : TAN(), Proposal(dataset, features, className) {}
TANLd::TANLd() : TAN(), Proposal(dataset, features, className, TAN::notes)
{
validHyperparameters = validHyperparameters_ld; // Inherits the valid hyperparameters from Proposal
}
TANLd& TANLd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
checkInput(X_, y_);
features = features_;
className = className_;
Xf = X_;
y = y_;
// Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y
states = fit_local_discretization(y);
// We have discretized the input data
// 1st we need to fit the model to build the normal TAN structure, TAN::fit initializes the base Bayesian network
TAN::fit(dataset, features, className, states, smoothing);
states = localDiscretizationProposal(states, model);
return *this;
return commonFit(features_, className_, states_, smoothing);
}
TANLd& TANLd::fit(torch::Tensor& dataset, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
if (!torch::is_floating_point(dataset)) {
throw std::runtime_error("Dataset must be a floating point tensor");
}
Xf = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." }).clone();
y = dataset.index({ -1, "..." }).clone().to(torch::kInt32);
return commonFit(features_, className_, states_, smoothing);
}
TANLd& TANLd::commonFit(const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
features = features_;
className = className_;
states = iterativeLocalDiscretization(y, static_cast<TAN*>(this), dataset, features, className, states_, smoothing);
TAN::fit(dataset, features, className, states, smoothing);
return *this;
}
torch::Tensor TANLd::predict(torch::Tensor& X)
{
auto Xt = prepareX(X);
return TAN::predict(Xt);
}
torch::Tensor TANLd::predict_proba(torch::Tensor& X)
{
auto Xt = prepareX(X);
return TAN::predict_proba(Xt);
}
std::vector<std::string> TANLd::graph(const std::string& name) const
{
return TAN::graph(name);

View File

@@ -16,8 +16,17 @@ namespace bayesnet {
TANLd();
virtual ~TANLd() = default;
TANLd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
TANLd& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
TANLd& commonFit(const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
std::vector<std::string> graph(const std::string& name = "TANLd") const override;
void setHyperparameters(const nlohmann::json& hyperparameters_) override
{
auto hyperparameters = hyperparameters_;
Proposal::setHyperparameters(hyperparameters);
TAN::setHyperparameters(hyperparameters);
}
torch::Tensor predict(torch::Tensor& X) override;
torch::Tensor predict_proba(torch::Tensor& X) override;
};
}
#endif // !TANLD_H

View File

@@ -7,8 +7,9 @@
#include "AODELd.h"
namespace bayesnet {
AODELd::AODELd(bool predict_voting) : Ensemble(predict_voting), Proposal(dataset, features, className)
AODELd::AODELd(bool predict_voting) : Ensemble(predict_voting), Proposal(dataset, features, className, Ensemble::notes)
{
validHyperparameters = validHyperparameters_ld; // Inherits the valid hyperparameters from Proposal
}
AODELd& AODELd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
@@ -31,6 +32,7 @@ namespace bayesnet {
models.clear();
for (int i = 0; i < features.size(); ++i) {
models.push_back(std::make_unique<SPODELd>(i));
models.back()->setHyperparameters(hyperparameters);
}
n_models = models.size();
significanceModels = std::vector<double>(n_models, 1.0);

View File

@@ -17,9 +17,15 @@ namespace bayesnet {
virtual ~AODELd() = default;
AODELd& fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing) override;
std::vector<std::string> graph(const std::string& name = "AODELd") const override;
void setHyperparameters(const nlohmann::json& hyperparameters_) override
{
hyperparameters = hyperparameters_;
}
protected:
void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) override;
void buildModel(const torch::Tensor& weights) override;
private:
nlohmann::json hyperparameters = {}; // Hyperparameters for the model
};
}
#endif // !AODELD_H

View File

@@ -4,81 +4,136 @@
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <limits>
#include "bayesnet/utils/bayesnetUtils.h"
#include "FeatureSelect.h"
namespace bayesnet {
FeatureSelect::FeatureSelect(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int maxFeatures, const int classNumStates, const torch::Tensor& weights) :
Metrics(samples, features, className, classNumStates), maxFeatures(maxFeatures == 0 ? samples.size(0) - 1 : maxFeatures), weights(weights)
namespace bayesnet {
using namespace torch::indexing; // for Ellipsis constant
//---------------------------------------------------------------------
// ctor
//---------------------------------------------------------------------
FeatureSelect::FeatureSelect(const torch::Tensor& samples,
const std::vector<std::string>& features,
const std::string& className,
int maxFeatures,
int classNumStates,
const torch::Tensor& weights)
: Metrics(samples, features, className, classNumStates),
maxFeatures(maxFeatures == 0 ? samples.size(0) - 1 : maxFeatures),
weights(weights)
{
}
//---------------------------------------------------------------------
// public helpers
//---------------------------------------------------------------------
void FeatureSelect::initialize()
{
selectedFeatures.clear();
selectedScores.clear();
suLabels.clear();
suFeatures.clear();
fitted = false;
}
//---------------------------------------------------------------------
// Symmetrical Uncertainty (SU)
//---------------------------------------------------------------------
double FeatureSelect::symmetricalUncertainty(int a, int b)
{
/*
Compute symmetrical uncertainty. Normalize* information gain (mutual
information) with the entropies of the features in order to compensate
the bias due to high cardinality features. *Range [0, 1]
(https://www.sciencedirect.com/science/article/pii/S0020025519303603)
*/
auto x = samples.index({ a, "..." });
auto y = samples.index({ b, "..." });
auto mu = mutualInformation(x, y, weights);
auto hx = entropy(x, weights);
auto hy = entropy(y, weights);
return 2.0 * mu / (hx + hy);
* Compute symmetrical uncertainty. Normalises the information gain
* (mutual information) with the entropies of the variables to compensate
* the bias due to highcardinality features. Range: [0, 1]
* See: https://www.sciencedirect.com/science/article/pii/S0020025519303603
*/
auto x = samples.index({ a, Ellipsis }); // row a => feature a
auto y = (b >= 0) ? samples.index({ b, Ellipsis }) // row b (>=0) => feature b
: samples.index({ -1, Ellipsis }); // 1 treated as last row = labels
double mu = mutualInformation(x, y, weights);
double hx = entropy(x, weights);
double hy = entropy(y, weights);
const double denom = hx + hy;
if (denom == 0.0) return 0.0; // perfectly pure variables
return 2.0 * mu / denom;
}
//---------------------------------------------------------------------
// SU featureclass
//---------------------------------------------------------------------
void FeatureSelect::computeSuLabels()
{
// Compute Simmetrical Uncertainty between features and labels
// Compute Symmetrical Uncertainty between each feature and the class labels
// https://en.wikipedia.org/wiki/Symmetric_uncertainty
for (int i = 0; i < features.size(); ++i) {
suLabels.push_back(symmetricalUncertainty(i, -1));
const int classIdx = static_cast<int>(samples.size(0)) - 1; // labels in last row
suLabels.reserve(features.size());
for (int i = 0; i < static_cast<int>(features.size()); ++i) {
suLabels.emplace_back(symmetricalUncertainty(i, classIdx));
}
}
double FeatureSelect::computeSuFeatures(const int firstFeature, const int secondFeature)
//---------------------------------------------------------------------
// SU featurefeature with cache
//---------------------------------------------------------------------
double FeatureSelect::computeSuFeatures(int firstFeature, int secondFeature)
{
// Compute Simmetrical Uncertainty between features
// https://en.wikipedia.org/wiki/Symmetric_uncertainty
try {
return suFeatures.at({ firstFeature, secondFeature });
}
catch (const std::out_of_range& e) {
double result = symmetricalUncertainty(firstFeature, secondFeature);
suFeatures[{firstFeature, secondFeature}] = result;
return result;
}
// Order the pair to exploit symmetry => only one entry in the map
auto ordered = std::minmax(firstFeature, secondFeature);
const std::pair<int, int> key{ ordered.first, ordered.second };
auto it = suFeatures.find(key);
if (it != suFeatures.end()) return it->second;
double result = symmetricalUncertainty(key.first, key.second);
suFeatures[key] = result; // store once (symmetry handled by ordering)
return result;
}
//---------------------------------------------------------------------
// Correlationbased Feature Selection (CFS) merit
//---------------------------------------------------------------------
double FeatureSelect::computeMeritCFS()
{
double rcf = 0;
for (auto feature : selectedFeatures) {
rcf += suLabels[feature];
}
double rff = 0;
int n = selectedFeatures.size();
for (const auto& item : doCombinations(selectedFeatures)) {
rff += computeSuFeatures(item.first, item.second);
}
return rcf / sqrt(n + (n * n - n) * rff);
const int n = static_cast<int>(selectedFeatures.size());
if (n == 0) return 0.0;
// average r_cf (featureclass)
double rcf_sum = 0.0;
for (int f : selectedFeatures) rcf_sum += suLabels[f];
const double rcf_avg = rcf_sum / n;
// average r_ff (featurefeature)
double rff_sum = 0.0;
const auto& pairs = doCombinations(selectedFeatures); // generates each unordered pair once
for (const auto& p : pairs) rff_sum += computeSuFeatures(p.first, p.second);
const double numPairs = n * (n - 1) * 0.5;
const double rff_avg = (numPairs > 0) ? rff_sum / numPairs : 0.0;
// Merit_S = k * r_cf / sqrt( k + k*(k1) * r_ff ) (Hall, 1999)
const double k = static_cast<double>(n);
return (k * rcf_avg) / std::sqrt(k + k * (k - 1) * rff_avg);
}
//---------------------------------------------------------------------
// getters
//---------------------------------------------------------------------
std::vector<int> FeatureSelect::getFeatures() const
{
if (!fitted) {
throw std::runtime_error("FeatureSelect not fitted");
}
if (!fitted) throw std::runtime_error("FeatureSelect not fitted");
return selectedFeatures;
}
std::vector<double> FeatureSelect::getScores() const
{
if (!fitted) {
throw std::runtime_error("FeatureSelect not fitted");
}
if (!fitted) throw std::runtime_error("FeatureSelect not fitted");
return selectedScores;
}
}

View File

@@ -26,10 +26,26 @@ namespace bayesnet {
auto first_feature = pop_first(featureOrderCopy);
selectedFeatures.push_back(first_feature);
selectedScores.push_back(suLabels.at(first_feature));
// Second with the score of the candidates
selectedFeatures.push_back(pop_first(featureOrderCopy));
auto merit = computeMeritCFS();
selectedScores.push_back(merit);
// Select second feature that maximizes merit with first
double maxMerit = 0.0;
int secondFeature = -1;
for (const auto& candidate : featureOrderCopy) {
selectedFeatures.push_back(candidate);
double candidateMerit = computeMeritCFS();
if (candidateMerit > maxMerit) {
maxMerit = candidateMerit;
secondFeature = candidate;
}
selectedFeatures.pop_back();
}
if (secondFeature != -1) {
selectedFeatures.push_back(secondFeature);
selectedScores.push_back(maxMerit);
// Remove from featureOrderCopy
featureOrderCopy.erase(std::remove(featureOrderCopy.begin(), featureOrderCopy.end(), secondFeature), featureOrderCopy.end());
}
double merit = maxMerit;
for (const auto feature : featureOrderCopy) {
selectedFeatures.push_back(feature);
// Compute merit with selectedFeatures

View File

@@ -0,0 +1,279 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <algorithm>
#include <cmath>
#include <numeric>
#include "bayesnet/utils/bayesnetUtils.h"
#include "L1FS.h"
namespace bayesnet {
using namespace torch::indexing;
L1FS::L1FS(const torch::Tensor& samples,
const std::vector<std::string>& features,
const std::string& className,
const int maxFeatures,
const int classNumStates,
const torch::Tensor& weights,
const double alpha,
const int maxIter,
const double tolerance,
const bool fitIntercept)
: FeatureSelect(samples, features, className, maxFeatures, classNumStates, weights),
alpha(alpha), maxIter(maxIter), tolerance(tolerance), fitIntercept(fitIntercept)
{
if (alpha < 0) {
throw std::invalid_argument("Alpha (regularization strength) must be non-negative");
}
if (maxIter < 1) {
throw std::invalid_argument("Maximum iterations must be positive");
}
if (tolerance <= 0) {
throw std::invalid_argument("Tolerance must be positive");
}
// Determine if this is a regression or classification task
// For simplicity, assume binary classification if classNumStates == 2
// and regression otherwise (this can be refined based on your needs)
isRegression = (classNumStates > 2 || classNumStates == 0);
}
void L1FS::fit()
{
initialize();
// Prepare data
int n_samples = samples.size(1);
int n_features = features.size();
// Extract features (all rows except last)
auto X = samples.index({ Slice(0, n_features), Slice() }).t().contiguous();
// Extract labels (last row)
auto y = samples.index({ -1, Slice() }).contiguous();
// Convert to float for numerical operations
X = X.to(torch::kFloat32);
y = y.to(torch::kFloat32);
// Normalize features for better convergence
auto X_mean = X.mean(0);
auto X_std = X.std(0);
X_std = torch::where(X_std == 0, torch::ones_like(X_std), X_std);
X = (X - X_mean) / X_std;
if (isRegression) {
// Normalize y for regression
auto y_mean = y.mean();
auto y_std = y.std();
if (y_std.item<double>() > 0) {
y = (y - y_mean) / y_std;
}
fitLasso(X, y, weights);
} else {
// For binary classification
fitL1Logistic(X, y, weights);
}
// Select features based on non-zero coefficients
std::vector<std::pair<int, double>> featureImportance;
for (int i = 0; i < n_features; ++i) {
double coef_magnitude = std::abs(coefficients[i]);
if (coef_magnitude > 1e-10) { // Threshold for numerical zero
featureImportance.push_back({ i, coef_magnitude });
}
}
// If all coefficients are zero (high regularization), select based on original feature-class correlation
if (featureImportance.empty() && maxFeatures > 0) {
// Compute SU with labels as fallback
computeSuLabels();
auto featureOrder = argsort(suLabels);
// Select top features by SU score
int numToSelect = std::min(static_cast<int>(featureOrder.size()),
std::min(maxFeatures, 3)); // At most 3 features as fallback
for (int i = 0; i < numToSelect; ++i) {
selectedFeatures.push_back(featureOrder[i]);
selectedScores.push_back(suLabels[featureOrder[i]]);
}
} else {
// Sort by importance (absolute coefficient value)
std::sort(featureImportance.begin(), featureImportance.end(),
[](const auto& a, const auto& b) { return a.second > b.second; });
// Select top features up to maxFeatures
int numToSelect = std::min(static_cast<int>(featureImportance.size()),
maxFeatures);
for (int i = 0; i < numToSelect; ++i) {
selectedFeatures.push_back(featureImportance[i].first);
selectedScores.push_back(featureImportance[i].second);
}
}
fitted = true;
}
void L1FS::fitLasso(const torch::Tensor& X, const torch::Tensor& y,
const torch::Tensor& sampleWeights)
{
int n_samples = X.size(0);
int n_features = X.size(1);
// Initialize coefficients
coefficients.resize(n_features, 0.0);
double intercept = 0.0;
// Ensure consistent types
torch::Tensor weights = sampleWeights.to(torch::kFloat32);
// Coordinate descent for Lasso
torch::Tensor residuals = y.clone();
if (fitIntercept) {
intercept = (y * weights).sum().item<float>() / weights.sum().item<float>();
residuals = y - intercept;
}
// Precompute feature norms
std::vector<double> featureNorms(n_features);
for (int j = 0; j < n_features; ++j) {
auto Xj = X.index({ Slice(), j });
featureNorms[j] = (Xj * Xj * weights).sum().item<float>();
}
// Coordinate descent iterations
for (int iter = 0; iter < maxIter; ++iter) {
double maxChange = 0.0;
// Update each coordinate
for (int j = 0; j < n_features; ++j) {
auto Xj = X.index({ Slice(), j });
// Compute partial residuals (excluding feature j)
torch::Tensor partialResiduals = residuals + coefficients[j] * Xj;
// Compute rho (correlation with residuals)
double rho = (Xj * partialResiduals * weights).sum().item<float>();
// Soft thresholding
double oldCoef = coefficients[j];
coefficients[j] = softThreshold(rho, alpha) / featureNorms[j];
// Update residuals
residuals = partialResiduals - coefficients[j] * Xj;
maxChange = std::max(maxChange, std::abs(coefficients[j] - oldCoef));
}
// Update intercept if needed
if (fitIntercept) {
double oldIntercept = intercept;
intercept = (residuals * weights).sum().item<float>() /
weights.sum().item<float>();
residuals = residuals - (intercept - oldIntercept);
maxChange = std::max(maxChange, std::abs(intercept - oldIntercept));
}
// Check convergence
if (maxChange < tolerance) {
break;
}
}
}
void L1FS::fitL1Logistic(const torch::Tensor& X, const torch::Tensor& y,
const torch::Tensor& sampleWeights)
{
int n_samples = X.size(0);
int n_features = X.size(1);
// Initialize coefficients
torch::Tensor coef = torch::zeros({ n_features }, torch::kFloat32);
double intercept = 0.0;
// Ensure consistent types
torch::Tensor weights = sampleWeights.to(torch::kFloat32);
// Learning rate (can be adaptive)
double learningRate = 0.01;
// Proximal gradient descent
for (int iter = 0; iter < maxIter; ++iter) {
// Compute predictions
torch::Tensor linearPred = X.matmul(coef);
if (fitIntercept) {
linearPred = linearPred + intercept;
}
torch::Tensor pred = sigmoid(linearPred);
// Compute gradient
torch::Tensor diff = pred - y;
torch::Tensor grad = X.t().matmul(diff * weights) / n_samples;
// Gradient descent step
torch::Tensor coef_new = coef - learningRate * grad;
// Proximal step (soft thresholding)
for (int j = 0; j < n_features; ++j) {
coef_new[j] = softThreshold(coef_new[j].item<float>(),
learningRate * alpha);
}
// Update intercept if needed
if (fitIntercept) {
double grad_intercept = (diff * weights).sum().item<float>() / n_samples;
intercept -= learningRate * grad_intercept;
}
// Check convergence
double change = (coef_new - coef).abs().max().item<float>();
coef = coef_new;
if (change < tolerance) {
break;
}
// Adaptive learning rate (optional)
if (iter % 100 == 0) {
learningRate *= 0.9;
}
}
// Store final coefficients
coefficients.resize(n_features);
for (int j = 0; j < n_features; ++j) {
coefficients[j] = coef[j].item<float>();
}
}
double L1FS::softThreshold(double x, double lambda) const
{
if (x > lambda) {
return x - lambda;
} else if (x < -lambda) {
return x + lambda;
} else {
return 0.0;
}
}
torch::Tensor L1FS::sigmoid(const torch::Tensor& z) const
{
return 1.0 / (1.0 + torch::exp(-z));
}
std::vector<double> L1FS::getCoefficients() const
{
if (!fitted) {
throw std::runtime_error("L1FS not fitted");
}
return coefficients;
}
} // namespace bayesnet

View File

@@ -0,0 +1,83 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2025 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef L1FS_H
#define L1FS_H
#include <torch/torch.h>
#include <vector>
#include "bayesnet/feature_selection/FeatureSelect.h"
namespace bayesnet {
/**
* L1-Regularized Feature Selection (L1FS)
*
* This class implements feature selection using L1-regularized linear models.
* For classification tasks, it uses one-vs-rest logistic regression with L1 penalty.
* For regression tasks, it uses Lasso regression.
*
* The L1 penalty induces sparsity in the model coefficients, effectively
* performing feature selection by setting irrelevant feature weights to zero.
*/
class L1FS : public FeatureSelect {
public:
/**
* Constructor for L1FS
* @param samples n+1xm tensor where samples[-1] is the target variable
* @param features vector of feature names
* @param className name of the class/target variable
* @param maxFeatures maximum number of features to select (0 = all)
* @param classNumStates number of states for classification (ignored for regression)
* @param weights sample weights
* @param alpha L1 regularization strength (higher = more sparsity)
* @param maxIter maximum iterations for optimization
* @param tolerance convergence tolerance
* @param fitIntercept whether to fit an intercept term
*/
L1FS(const torch::Tensor& samples,
const std::vector<std::string>& features,
const std::string& className,
const int maxFeatures,
const int classNumStates,
const torch::Tensor& weights,
const double alpha = 1.0,
const int maxIter = 1000,
const double tolerance = 1e-4,
const bool fitIntercept = true);
virtual ~L1FS() {};
void fit() override;
// Get the learned coefficients for each feature
std::vector<double> getCoefficients() const;
private:
double alpha; // L1 regularization strength
int maxIter; // Maximum iterations for optimization
double tolerance; // Convergence tolerance
bool fitIntercept; // Whether to fit intercept
bool isRegression; // Task type (regression vs classification)
std::vector<double> coefficients; // Learned coefficients
// Coordinate descent for Lasso regression
void fitLasso(const torch::Tensor& X, const torch::Tensor& y, const torch::Tensor& sampleWeights);
// Proximal gradient descent for L1-regularized logistic regression
void fitL1Logistic(const torch::Tensor& X, const torch::Tensor& y, const torch::Tensor& sampleWeights);
// Soft thresholding operator for L1 regularization
double softThreshold(double x, double lambda) const;
// Logistic function
torch::Tensor sigmoid(const torch::Tensor& z) const;
// Compute logistic loss
double logisticLoss(const torch::Tensor& X, const torch::Tensor& y,
const torch::Tensor& coef, const torch::Tensor& sampleWeights) const;
};
}
#endif

View File

@@ -17,14 +17,90 @@ namespace bayesnet {
Network::Network() : fitted{ false }, classNumStates{ 0 }
{
}
Network::Network(const Network& other) : features(other.features), className(other.className), classNumStates(other.getClassNumStates()),
fitted(other.fitted), samples(other.samples)
Network::Network(const Network& other)
: features(other.features), className(other.className), classNumStates(other.classNumStates),
fitted(other.fitted)
{
if (samples.defined())
samples = samples.clone();
// Deep copy the samples tensor
if (other.samples.defined()) {
samples = other.samples.clone();
}
// First, create all nodes (without relationships)
for (const auto& node : other.nodes) {
nodes[node.first] = std::make_unique<Node>(*node.second);
}
// Second, reconstruct the relationships between nodes
for (const auto& node : other.nodes) {
const std::string& nodeName = node.first;
Node* originalNode = node.second.get();
Node* newNode = nodes[nodeName].get();
// Reconstruct parent relationships
for (Node* parent : originalNode->getParents()) {
const std::string& parentName = parent->getName();
if (nodes.find(parentName) != nodes.end()) {
newNode->addParent(nodes[parentName].get());
}
}
// Reconstruct child relationships
for (Node* child : originalNode->getChildren()) {
const std::string& childName = child->getName();
if (nodes.find(childName) != nodes.end()) {
newNode->addChild(nodes[childName].get());
}
}
}
}
Network& Network::operator=(const Network& other)
{
if (this != &other) {
// Clear existing state
nodes.clear();
features = other.features;
className = other.className;
classNumStates = other.classNumStates;
fitted = other.fitted;
// Deep copy the samples tensor
if (other.samples.defined()) {
samples = other.samples.clone();
} else {
samples = torch::Tensor();
}
// First, create all nodes (without relationships)
for (const auto& node : other.nodes) {
nodes[node.first] = std::make_unique<Node>(*node.second);
}
// Second, reconstruct the relationships between nodes
for (const auto& node : other.nodes) {
const std::string& nodeName = node.first;
Node* originalNode = node.second.get();
Node* newNode = nodes[nodeName].get();
// Reconstruct parent relationships
for (Node* parent : originalNode->getParents()) {
const std::string& parentName = parent->getName();
if (nodes.find(parentName) != nodes.end()) {
newNode->addParent(nodes[parentName].get());
}
}
// Reconstruct child relationships
for (Node* child : originalNode->getChildren()) {
const std::string& childName = child->getName();
if (nodes.find(childName) != nodes.end()) {
newNode->addChild(nodes[childName].get());
}
}
}
}
return *this;
}
void Network::initialize()
{
@@ -503,4 +579,41 @@ namespace bayesnet {
}
return oss.str();
}
bool Network::operator==(const Network& other) const
{
// Compare number of nodes
if (nodes.size() != other.nodes.size()) {
return false;
}
// Compare if all node names exist in both networks
for (const auto& node : nodes) {
if (other.nodes.find(node.first) == other.nodes.end()) {
return false;
}
}
// Compare edges (topology)
auto thisEdges = getEdges();
auto otherEdges = other.getEdges();
// Compare number of edges
if (thisEdges.size() != otherEdges.size()) {
return false;
}
// Sort both edge lists for comparison
std::sort(thisEdges.begin(), thisEdges.end());
std::sort(otherEdges.begin(), otherEdges.end());
// Compare each edge
for (size_t i = 0; i < thisEdges.size(); ++i) {
if (thisEdges[i] != otherEdges[i]) {
return false;
}
}
return true;
}
}

View File

@@ -17,7 +17,8 @@ namespace bayesnet {
class Network {
public:
Network();
explicit Network(const Network&);
Network(const Network& other);
Network& operator=(const Network& other);
~Network() = default;
torch::Tensor& getSamples();
void addNode(const std::string&);
@@ -47,6 +48,7 @@ namespace bayesnet {
void initialize();
std::string dump_cpt() const;
inline std::string version() { return { project_version.begin(), project_version.end() }; }
bool operator==(const Network& other) const;
private:
std::map<std::string, std::unique_ptr<Node>> nodes;
bool fitted;

View File

@@ -5,6 +5,7 @@
// ***************************************************************
#include "Node.h"
#include <iterator>
namespace bayesnet {
@@ -12,6 +13,41 @@ namespace bayesnet {
: name(name)
{
}
Node::Node(const Node& other)
: name(other.name), numStates(other.numStates), dimensions(other.dimensions)
{
// Deep copy the CPT tensor
if (other.cpTable.defined()) {
cpTable = other.cpTable.clone();
}
// Note: parent and children pointers are NOT copied here
// They will be reconstructed by the Network copy constructor
// to maintain proper object relationships
}
Node& Node::operator=(const Node& other)
{
if (this != &other) {
name = other.name;
numStates = other.numStates;
dimensions = other.dimensions;
// Deep copy the CPT tensor
if (other.cpTable.defined()) {
cpTable = other.cpTable.clone();
} else {
cpTable = torch::Tensor();
}
// Clear existing relationships
parents.clear();
children.clear();
// Note: parent and children pointers are NOT copied here
// They must be reconstructed to maintain proper object relationships
}
return *this;
}
void Node::clear()
{
parents.clear();
@@ -94,39 +130,51 @@ namespace bayesnet {
{
dimensions.clear();
dimensions.reserve(parents.size() + 1);
// Get dimensions of the CPT
dimensions.push_back(numStates);
for (const auto& parent : parents) {
dimensions.push_back(parent->getNumStates());
}
//transform(parents.begin(), parents.end(), back_inserter(dimensions), [](const auto& parent) { return parent->getNumStates(); });
// Create a tensor initialized with smoothing
cpTable = torch::full(dimensions, smoothing, torch::kDouble);
// Create a map for quick feature index lookup
// Build feature index map
std::unordered_map<std::string, int> featureIndexMap;
for (size_t i = 0; i < features.size(); ++i) {
featureIndexMap[features[i]] = i;
}
// Fill table with counts
// Get the index of this node's feature
int name_index = featureIndexMap[name];
// Get parent indices in dataset
std::vector<int> parent_indices;
parent_indices.reserve(parents.size());
// Gather indices for node and parents
std::vector<int64_t> all_indices;
all_indices.push_back(featureIndexMap[name]);
for (const auto& parent : parents) {
parent_indices.push_back(featureIndexMap[parent->getName()]);
all_indices.push_back(featureIndexMap[parent->getName()]);
}
c10::List<c10::optional<at::Tensor>> coordinates;
for (int n_sample = 0; n_sample < dataset.size(1); ++n_sample) {
coordinates.clear();
auto sample = dataset.index({ "...", n_sample });
coordinates.push_back(sample[name_index]);
for (size_t i = 0; i < parent_indices.size(); ++i) {
coordinates.push_back(sample[parent_indices[i]]);
// Extract relevant columns: shape (num_features, num_samples)
auto indices_tensor = dataset.index_select(0, torch::tensor(all_indices, torch::kLong));
indices_tensor = indices_tensor.transpose(0, 1).to(torch::kLong); // (num_samples, num_features)
// Manual flattening of indices
std::vector<int64_t> strides(all_indices.size(), 1);
for (int i = strides.size() - 2; i >= 0; --i) {
strides[i] = strides[i + 1] * cpTable.size(i + 1);
}
auto indices_tensor_cpu = indices_tensor.cpu();
auto indices_accessor = indices_tensor_cpu.accessor<int64_t, 2>();
std::vector<int64_t> flat_indices(indices_tensor.size(0));
for (int64_t i = 0; i < indices_tensor.size(0); ++i) {
int64_t idx = 0;
for (size_t j = 0; j < strides.size(); ++j) {
idx += indices_accessor[i][j] * strides[j];
}
// Increment the count of the corresponding coordinate
cpTable.index_put_({ coordinates }, weights.index({ n_sample }), true);
flat_indices[i] = idx;
}
// Accumulate weights into flat CPT
auto flat_cpt = cpTable.flatten();
auto flat_indices_tensor = torch::from_blob(flat_indices.data(), { (int64_t)flat_indices.size() }, torch::kLong).clone();
flat_cpt.index_add_(0, flat_indices_tensor, weights.cpu());
cpTable = flat_cpt.view(cpTable.sizes());
// Normalize the counts (dividing each row by the sum of the row)
cpTable /= cpTable.sum(0, true);
}

View File

@@ -14,6 +14,9 @@ namespace bayesnet {
class Node {
public:
explicit Node(const std::string&);
Node(const Node& other);
Node& operator=(const Node& other);
~Node() = default;
void clear();
void addParent(Node*);
void addChild(Node*);

View File

@@ -4,9 +4,6 @@
#include <condition_variable>
#include <algorithm>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <thread>
class CountingSemaphore {
public:

4
bayesnetConfig.cmake.in Normal file
View File

@@ -0,0 +1,4 @@
@PACKAGE_INIT@
include("${CMAKE_CURRENT_LIST_DIR}/bayesnetTargets.cmake")

View File

@@ -1,12 +0,0 @@
function(add_git_submodule dir)
find_package(Git REQUIRED)
if(NOT EXISTS ${dir}/CMakeLists.txt)
message(STATUS "🚨 Adding git submodule => ${dir}")
execute_process(COMMAND ${GIT_EXECUTABLE}
submodule update --init --recursive -- ${dir}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
endif()
add_subdirectory(${dir})
endfunction(add_git_submodule)

View File

@@ -1,746 +0,0 @@
# Copyright (c) 2012 - 2017, Lars Bilke
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# CHANGES:
#
# 2012-01-31, Lars Bilke
# - Enable Code Coverage
#
# 2013-09-17, Joakim Söderberg
# - Added support for Clang.
# - Some additional usage instructions.
#
# 2016-02-03, Lars Bilke
# - Refactored functions to use named parameters
#
# 2017-06-02, Lars Bilke
# - Merged with modified version from github.com/ufz/ogs
#
# 2019-05-06, Anatolii Kurotych
# - Remove unnecessary --coverage flag
#
# 2019-12-13, FeRD (Frank Dana)
# - Deprecate COVERAGE_LCOVR_EXCLUDES and COVERAGE_GCOVR_EXCLUDES lists in favor
# of tool-agnostic COVERAGE_EXCLUDES variable, or EXCLUDE setup arguments.
# - CMake 3.4+: All excludes can be specified relative to BASE_DIRECTORY
# - All setup functions: accept BASE_DIRECTORY, EXCLUDE list
# - Set lcov basedir with -b argument
# - Add automatic --demangle-cpp in lcovr, if 'c++filt' is available (can be
# overridden with NO_DEMANGLE option in setup_target_for_coverage_lcovr().)
# - Delete output dir, .info file on 'make clean'
# - Remove Python detection, since version mismatches will break gcovr
# - Minor cleanup (lowercase function names, update examples...)
#
# 2019-12-19, FeRD (Frank Dana)
# - Rename Lcov outputs, make filtered file canonical, fix cleanup for targets
#
# 2020-01-19, Bob Apthorpe
# - Added gfortran support
#
# 2020-02-17, FeRD (Frank Dana)
# - Make all add_custom_target()s VERBATIM to auto-escape wildcard characters
# in EXCLUDEs, and remove manual escaping from gcovr targets
#
# 2021-01-19, Robin Mueller
# - Add CODE_COVERAGE_VERBOSE option which will allow to print out commands which are run
# - Added the option for users to set the GCOVR_ADDITIONAL_ARGS variable to supply additional
# flags to the gcovr command
#
# 2020-05-04, Mihchael Davis
# - Add -fprofile-abs-path to make gcno files contain absolute paths
# - Fix BASE_DIRECTORY not working when defined
# - Change BYPRODUCT from folder to index.html to stop ninja from complaining about double defines
#
# 2021-05-10, Martin Stump
# - Check if the generator is multi-config before warning about non-Debug builds
#
# 2022-02-22, Marko Wehle
# - Change gcovr output from -o <filename> for --xml <filename> and --html <filename> output respectively.
# This will allow for Multiple Output Formats at the same time by making use of GCOVR_ADDITIONAL_ARGS, e.g. GCOVR_ADDITIONAL_ARGS "--txt".
#
# 2022-09-28, Sebastian Mueller
# - fix append_coverage_compiler_flags_to_target to correctly add flags
# - replace "-fprofile-arcs -ftest-coverage" with "--coverage" (equivalent)
#
# USAGE:
#
# 1. Copy this file into your cmake modules path.
#
# 2. Add the following line to your CMakeLists.txt (best inside an if-condition
# using a CMake option() to enable it just optionally):
# include(CodeCoverage)
#
# 3. Append necessary compiler flags for all supported source files:
# append_coverage_compiler_flags()
# Or for specific target:
# append_coverage_compiler_flags_to_target(YOUR_TARGET_NAME)
#
# 3.a (OPTIONAL) Set appropriate optimization flags, e.g. -O0, -O1 or -Og
#
# 4. If you need to exclude additional directories from the report, specify them
# using full paths in the COVERAGE_EXCLUDES variable before calling
# setup_target_for_coverage_*().
# Example:
# set(COVERAGE_EXCLUDES
# '${PROJECT_SOURCE_DIR}/src/dir1/*'
# '/path/to/my/src/dir2/*')
# Or, use the EXCLUDE argument to setup_target_for_coverage_*().
# Example:
# setup_target_for_coverage_lcov(
# NAME coverage
# EXECUTABLE testrunner
# EXCLUDE "${PROJECT_SOURCE_DIR}/src/dir1/*" "/path/to/my/src/dir2/*")
#
# 4.a NOTE: With CMake 3.4+, COVERAGE_EXCLUDES or EXCLUDE can also be set
# relative to the BASE_DIRECTORY (default: PROJECT_SOURCE_DIR)
# Example:
# set(COVERAGE_EXCLUDES "dir1/*")
# setup_target_for_coverage_gcovr_html(
# NAME coverage
# EXECUTABLE testrunner
# BASE_DIRECTORY "${PROJECT_SOURCE_DIR}/src"
# EXCLUDE "dir2/*")
#
# 5. Use the functions described below to create a custom make target which
# runs your test executable and produces a code coverage report.
#
# 6. Build a Debug build:
# cmake -DCMAKE_BUILD_TYPE=Debug ..
# make
# make my_coverage_target
#
include(CMakeParseArguments)
option(CODE_COVERAGE_VERBOSE "Verbose information" TRUE)
# Check prereqs
find_program( GCOV_PATH gcov )
find_program( LCOV_PATH NAMES lcov lcov.bat lcov.exe lcov.perl)
find_program( FASTCOV_PATH NAMES fastcov fastcov.py )
find_program( GENHTML_PATH NAMES genhtml genhtml.perl genhtml.bat )
find_program( GCOVR_PATH gcovr PATHS ${CMAKE_SOURCE_DIR}/scripts/test)
find_program( CPPFILT_PATH NAMES c++filt )
if(NOT GCOV_PATH)
message(FATAL_ERROR "gcov not found! Aborting...")
endif() # NOT GCOV_PATH
# Check supported compiler (Clang, GNU and Flang)
get_property(LANGUAGES GLOBAL PROPERTY ENABLED_LANGUAGES)
foreach(LANG ${LANGUAGES})
if("${CMAKE_${LANG}_COMPILER_ID}" MATCHES "(Apple)?[Cc]lang")
if("${CMAKE_${LANG}_COMPILER_VERSION}" VERSION_LESS 3)
message(FATAL_ERROR "Clang version must be 3.0.0 or greater! Aborting...")
endif()
elseif(NOT "${CMAKE_${LANG}_COMPILER_ID}" MATCHES "GNU"
AND NOT "${CMAKE_${LANG}_COMPILER_ID}" MATCHES "(LLVM)?[Ff]lang")
if ("${LANG}" MATCHES "CUDA")
message(STATUS "Ignoring CUDA")
else()
message(FATAL_ERROR "Compiler is not GNU or Flang! Aborting...")
endif()
endif()
endforeach()
set(COVERAGE_COMPILER_FLAGS "-g --coverage"
CACHE INTERNAL "")
if(CMAKE_CXX_COMPILER_ID MATCHES "(GNU|Clang)")
include(CheckCXXCompilerFlag)
check_cxx_compiler_flag(-fprofile-abs-path HAVE_fprofile_abs_path)
if(HAVE_fprofile_abs_path)
set(COVERAGE_COMPILER_FLAGS "${COVERAGE_COMPILER_FLAGS} -fprofile-abs-path")
endif()
endif()
set(CMAKE_Fortran_FLAGS_COVERAGE
${COVERAGE_COMPILER_FLAGS}
CACHE STRING "Flags used by the Fortran compiler during coverage builds."
FORCE )
set(CMAKE_CXX_FLAGS_COVERAGE
${COVERAGE_COMPILER_FLAGS}
CACHE STRING "Flags used by the C++ compiler during coverage builds."
FORCE )
set(CMAKE_C_FLAGS_COVERAGE
${COVERAGE_COMPILER_FLAGS}
CACHE STRING "Flags used by the C compiler during coverage builds."
FORCE )
set(CMAKE_EXE_LINKER_FLAGS_COVERAGE
""
CACHE STRING "Flags used for linking binaries during coverage builds."
FORCE )
set(CMAKE_SHARED_LINKER_FLAGS_COVERAGE
""
CACHE STRING "Flags used by the shared libraries linker during coverage builds."
FORCE )
mark_as_advanced(
CMAKE_Fortran_FLAGS_COVERAGE
CMAKE_CXX_FLAGS_COVERAGE
CMAKE_C_FLAGS_COVERAGE
CMAKE_EXE_LINKER_FLAGS_COVERAGE
CMAKE_SHARED_LINKER_FLAGS_COVERAGE )
get_property(GENERATOR_IS_MULTI_CONFIG GLOBAL PROPERTY GENERATOR_IS_MULTI_CONFIG)
if(NOT (CMAKE_BUILD_TYPE STREQUAL "Debug" OR GENERATOR_IS_MULTI_CONFIG))
message(WARNING "Code coverage results with an optimised (non-Debug) build may be misleading")
endif() # NOT (CMAKE_BUILD_TYPE STREQUAL "Debug" OR GENERATOR_IS_MULTI_CONFIG)
if(CMAKE_C_COMPILER_ID STREQUAL "GNU" OR CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
link_libraries(gcov)
endif()
# Defines a target for running and collection code coverage information
# Builds dependencies, runs the given executable and outputs reports.
# NOTE! The executable should always have a ZERO as exit code otherwise
# the coverage generation will not complete.
#
# setup_target_for_coverage_lcov(
# NAME testrunner_coverage # New target name
# EXECUTABLE testrunner -j ${PROCESSOR_COUNT} # Executable in PROJECT_BINARY_DIR
# DEPENDENCIES testrunner # Dependencies to build first
# BASE_DIRECTORY "../" # Base directory for report
# # (defaults to PROJECT_SOURCE_DIR)
# EXCLUDE "src/dir1/*" "src/dir2/*" # Patterns to exclude (can be relative
# # to BASE_DIRECTORY, with CMake 3.4+)
# NO_DEMANGLE # Don't demangle C++ symbols
# # even if c++filt is found
# )
function(setup_target_for_coverage_lcov)
set(options NO_DEMANGLE SONARQUBE)
set(oneValueArgs BASE_DIRECTORY NAME)
set(multiValueArgs EXCLUDE EXECUTABLE EXECUTABLE_ARGS DEPENDENCIES LCOV_ARGS GENHTML_ARGS)
cmake_parse_arguments(Coverage "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
if(NOT LCOV_PATH)
message(FATAL_ERROR "lcov not found! Aborting...")
endif() # NOT LCOV_PATH
if(NOT GENHTML_PATH)
message(FATAL_ERROR "genhtml not found! Aborting...")
endif() # NOT GENHTML_PATH
# Set base directory (as absolute path), or default to PROJECT_SOURCE_DIR
if(DEFINED Coverage_BASE_DIRECTORY)
get_filename_component(BASEDIR ${Coverage_BASE_DIRECTORY} ABSOLUTE)
else()
set(BASEDIR ${PROJECT_SOURCE_DIR})
endif()
# Collect excludes (CMake 3.4+: Also compute absolute paths)
set(LCOV_EXCLUDES "")
foreach(EXCLUDE ${Coverage_EXCLUDE} ${COVERAGE_EXCLUDES} ${COVERAGE_LCOV_EXCLUDES})
if(CMAKE_VERSION VERSION_GREATER 3.4)
get_filename_component(EXCLUDE ${EXCLUDE} ABSOLUTE BASE_DIR ${BASEDIR})
endif()
list(APPEND LCOV_EXCLUDES "${EXCLUDE}")
endforeach()
list(REMOVE_DUPLICATES LCOV_EXCLUDES)
# Conditional arguments
if(CPPFILT_PATH AND NOT ${Coverage_NO_DEMANGLE})
set(GENHTML_EXTRA_ARGS "--demangle-cpp")
endif()
# Setting up commands which will be run to generate coverage data.
# Cleanup lcov
set(LCOV_CLEAN_CMD
${LCOV_PATH} ${Coverage_LCOV_ARGS} --gcov-tool ${GCOV_PATH} -directory .
-b ${BASEDIR} --zerocounters
)
# Create baseline to make sure untouched files show up in the report
set(LCOV_BASELINE_CMD
${LCOV_PATH} ${Coverage_LCOV_ARGS} --gcov-tool ${GCOV_PATH} -c -i -d . -b
${BASEDIR} -o ${Coverage_NAME}.base
)
# Run tests
set(LCOV_EXEC_TESTS_CMD
${Coverage_EXECUTABLE} ${Coverage_EXECUTABLE_ARGS}
)
# Capturing lcov counters and generating report
set(LCOV_CAPTURE_CMD
${LCOV_PATH} ${Coverage_LCOV_ARGS} --gcov-tool ${GCOV_PATH} --directory . -b
${BASEDIR} --capture --output-file ${Coverage_NAME}.capture
)
# add baseline counters
set(LCOV_BASELINE_COUNT_CMD
${LCOV_PATH} ${Coverage_LCOV_ARGS} --gcov-tool ${GCOV_PATH} -a ${Coverage_NAME}.base
-a ${Coverage_NAME}.capture --output-file ${Coverage_NAME}.total
)
# filter collected data to final coverage report
set(LCOV_FILTER_CMD
${LCOV_PATH} ${Coverage_LCOV_ARGS} --gcov-tool ${GCOV_PATH} --remove
${Coverage_NAME}.total ${LCOV_EXCLUDES} --output-file ${Coverage_NAME}.info
)
# Generate HTML output
set(LCOV_GEN_HTML_CMD
${GENHTML_PATH} ${GENHTML_EXTRA_ARGS} ${Coverage_GENHTML_ARGS} -o
${Coverage_NAME} ${Coverage_NAME}.info
)
if(${Coverage_SONARQUBE})
# Generate SonarQube output
set(GCOVR_XML_CMD
${GCOVR_PATH} --sonarqube ${Coverage_NAME}_sonarqube.xml -r ${BASEDIR} ${GCOVR_ADDITIONAL_ARGS}
${GCOVR_EXCLUDE_ARGS} --object-directory=${PROJECT_BINARY_DIR}
)
set(GCOVR_XML_CMD_COMMAND
COMMAND ${GCOVR_XML_CMD}
)
set(GCOVR_XML_CMD_BYPRODUCTS ${Coverage_NAME}_sonarqube.xml)
set(GCOVR_XML_CMD_COMMENT COMMENT "SonarQube code coverage info report saved in ${Coverage_NAME}_sonarqube.xml.")
endif()
if(CODE_COVERAGE_VERBOSE)
message(STATUS "Executed command report")
message(STATUS "Command to clean up lcov: ")
string(REPLACE ";" " " LCOV_CLEAN_CMD_SPACED "${LCOV_CLEAN_CMD}")
message(STATUS "${LCOV_CLEAN_CMD_SPACED}")
message(STATUS "Command to create baseline: ")
string(REPLACE ";" " " LCOV_BASELINE_CMD_SPACED "${LCOV_BASELINE_CMD}")
message(STATUS "${LCOV_BASELINE_CMD_SPACED}")
message(STATUS "Command to run the tests: ")
string(REPLACE ";" " " LCOV_EXEC_TESTS_CMD_SPACED "${LCOV_EXEC_TESTS_CMD}")
message(STATUS "${LCOV_EXEC_TESTS_CMD_SPACED}")
message(STATUS "Command to capture counters and generate report: ")
string(REPLACE ";" " " LCOV_CAPTURE_CMD_SPACED "${LCOV_CAPTURE_CMD}")
message(STATUS "${LCOV_CAPTURE_CMD_SPACED}")
message(STATUS "Command to add baseline counters: ")
string(REPLACE ";" " " LCOV_BASELINE_COUNT_CMD_SPACED "${LCOV_BASELINE_COUNT_CMD}")
message(STATUS "${LCOV_BASELINE_COUNT_CMD_SPACED}")
message(STATUS "Command to filter collected data: ")
string(REPLACE ";" " " LCOV_FILTER_CMD_SPACED "${LCOV_FILTER_CMD}")
message(STATUS "${LCOV_FILTER_CMD_SPACED}")
message(STATUS "Command to generate lcov HTML output: ")
string(REPLACE ";" " " LCOV_GEN_HTML_CMD_SPACED "${LCOV_GEN_HTML_CMD}")
message(STATUS "${LCOV_GEN_HTML_CMD_SPACED}")
if(${Coverage_SONARQUBE})
message(STATUS "Command to generate SonarQube XML output: ")
string(REPLACE ";" " " GCOVR_XML_CMD_SPACED "${GCOVR_XML_CMD}")
message(STATUS "${GCOVR_XML_CMD_SPACED}")
endif()
endif()
# Setup target
add_custom_target(${Coverage_NAME}
COMMAND ${LCOV_CLEAN_CMD}
COMMAND ${LCOV_BASELINE_CMD}
COMMAND ${LCOV_EXEC_TESTS_CMD}
COMMAND ${LCOV_CAPTURE_CMD}
COMMAND ${LCOV_BASELINE_COUNT_CMD}
COMMAND ${LCOV_FILTER_CMD}
COMMAND ${LCOV_GEN_HTML_CMD}
${GCOVR_XML_CMD_COMMAND}
# Set output files as GENERATED (will be removed on 'make clean')
BYPRODUCTS
${Coverage_NAME}.base
${Coverage_NAME}.capture
${Coverage_NAME}.total
${Coverage_NAME}.info
${GCOVR_XML_CMD_BYPRODUCTS}
${Coverage_NAME}/index.html
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
DEPENDS ${Coverage_DEPENDENCIES}
VERBATIM # Protect arguments to commands
COMMENT "Resetting code coverage counters to zero.\nProcessing code coverage counters and generating report."
)
# Show where to find the lcov info report
add_custom_command(TARGET ${Coverage_NAME} POST_BUILD
COMMAND ;
COMMENT "Lcov code coverage info report saved in ${Coverage_NAME}.info."
${GCOVR_XML_CMD_COMMENT}
)
# Show info where to find the report
add_custom_command(TARGET ${Coverage_NAME} POST_BUILD
COMMAND ;
COMMENT "Open ./${Coverage_NAME}/index.html in your browser to view the coverage report."
)
endfunction() # setup_target_for_coverage_lcov
# Defines a target for running and collection code coverage information
# Builds dependencies, runs the given executable and outputs reports.
# NOTE! The executable should always have a ZERO as exit code otherwise
# the coverage generation will not complete.
#
# setup_target_for_coverage_gcovr_xml(
# NAME ctest_coverage # New target name
# EXECUTABLE ctest -j ${PROCESSOR_COUNT} # Executable in PROJECT_BINARY_DIR
# DEPENDENCIES executable_target # Dependencies to build first
# BASE_DIRECTORY "../" # Base directory for report
# # (defaults to PROJECT_SOURCE_DIR)
# EXCLUDE "src/dir1/*" "src/dir2/*" # Patterns to exclude (can be relative
# # to BASE_DIRECTORY, with CMake 3.4+)
# )
# The user can set the variable GCOVR_ADDITIONAL_ARGS to supply additional flags to the
# GCVOR command.
function(setup_target_for_coverage_gcovr_xml)
set(options NONE)
set(oneValueArgs BASE_DIRECTORY NAME)
set(multiValueArgs EXCLUDE EXECUTABLE EXECUTABLE_ARGS DEPENDENCIES)
cmake_parse_arguments(Coverage "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
if(NOT GCOVR_PATH)
message(FATAL_ERROR "gcovr not found! Aborting...")
endif() # NOT GCOVR_PATH
# Set base directory (as absolute path), or default to PROJECT_SOURCE_DIR
if(DEFINED Coverage_BASE_DIRECTORY)
get_filename_component(BASEDIR ${Coverage_BASE_DIRECTORY} ABSOLUTE)
else()
set(BASEDIR ${PROJECT_SOURCE_DIR})
endif()
# Collect excludes (CMake 3.4+: Also compute absolute paths)
set(GCOVR_EXCLUDES "")
foreach(EXCLUDE ${Coverage_EXCLUDE} ${COVERAGE_EXCLUDES} ${COVERAGE_GCOVR_EXCLUDES})
if(CMAKE_VERSION VERSION_GREATER 3.4)
get_filename_component(EXCLUDE ${EXCLUDE} ABSOLUTE BASE_DIR ${BASEDIR})
endif()
list(APPEND GCOVR_EXCLUDES "${EXCLUDE}")
endforeach()
list(REMOVE_DUPLICATES GCOVR_EXCLUDES)
# Combine excludes to several -e arguments
set(GCOVR_EXCLUDE_ARGS "")
foreach(EXCLUDE ${GCOVR_EXCLUDES})
list(APPEND GCOVR_EXCLUDE_ARGS "-e")
list(APPEND GCOVR_EXCLUDE_ARGS "${EXCLUDE}")
endforeach()
# Set up commands which will be run to generate coverage data
# Run tests
set(GCOVR_XML_EXEC_TESTS_CMD
${Coverage_EXECUTABLE} ${Coverage_EXECUTABLE_ARGS}
)
# Running gcovr
set(GCOVR_XML_CMD
${GCOVR_PATH} --xml ${Coverage_NAME}.xml -r ${BASEDIR} ${GCOVR_ADDITIONAL_ARGS}
${GCOVR_EXCLUDE_ARGS} --object-directory=${PROJECT_BINARY_DIR}
)
if(CODE_COVERAGE_VERBOSE)
message(STATUS "Executed command report")
message(STATUS "Command to run tests: ")
string(REPLACE ";" " " GCOVR_XML_EXEC_TESTS_CMD_SPACED "${GCOVR_XML_EXEC_TESTS_CMD}")
message(STATUS "${GCOVR_XML_EXEC_TESTS_CMD_SPACED}")
message(STATUS "Command to generate gcovr XML coverage data: ")
string(REPLACE ";" " " GCOVR_XML_CMD_SPACED "${GCOVR_XML_CMD}")
message(STATUS "${GCOVR_XML_CMD_SPACED}")
endif()
add_custom_target(${Coverage_NAME}
COMMAND ${GCOVR_XML_EXEC_TESTS_CMD}
COMMAND ${GCOVR_XML_CMD}
BYPRODUCTS ${Coverage_NAME}.xml
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
DEPENDS ${Coverage_DEPENDENCIES}
VERBATIM # Protect arguments to commands
COMMENT "Running gcovr to produce Cobertura code coverage report."
)
# Show info where to find the report
add_custom_command(TARGET ${Coverage_NAME} POST_BUILD
COMMAND ;
COMMENT "Cobertura code coverage report saved in ${Coverage_NAME}.xml."
)
endfunction() # setup_target_for_coverage_gcovr_xml
# Defines a target for running and collection code coverage information
# Builds dependencies, runs the given executable and outputs reports.
# NOTE! The executable should always have a ZERO as exit code otherwise
# the coverage generation will not complete.
#
# setup_target_for_coverage_gcovr_html(
# NAME ctest_coverage # New target name
# EXECUTABLE ctest -j ${PROCESSOR_COUNT} # Executable in PROJECT_BINARY_DIR
# DEPENDENCIES executable_target # Dependencies to build first
# BASE_DIRECTORY "../" # Base directory for report
# # (defaults to PROJECT_SOURCE_DIR)
# EXCLUDE "src/dir1/*" "src/dir2/*" # Patterns to exclude (can be relative
# # to BASE_DIRECTORY, with CMake 3.4+)
# )
# The user can set the variable GCOVR_ADDITIONAL_ARGS to supply additional flags to the
# GCVOR command.
function(setup_target_for_coverage_gcovr_html)
set(options NONE)
set(oneValueArgs BASE_DIRECTORY NAME)
set(multiValueArgs EXCLUDE EXECUTABLE EXECUTABLE_ARGS DEPENDENCIES)
cmake_parse_arguments(Coverage "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
if(NOT GCOVR_PATH)
message(FATAL_ERROR "gcovr not found! Aborting...")
endif() # NOT GCOVR_PATH
# Set base directory (as absolute path), or default to PROJECT_SOURCE_DIR
if(DEFINED Coverage_BASE_DIRECTORY)
get_filename_component(BASEDIR ${Coverage_BASE_DIRECTORY} ABSOLUTE)
else()
set(BASEDIR ${PROJECT_SOURCE_DIR})
endif()
# Collect excludes (CMake 3.4+: Also compute absolute paths)
set(GCOVR_EXCLUDES "")
foreach(EXCLUDE ${Coverage_EXCLUDE} ${COVERAGE_EXCLUDES} ${COVERAGE_GCOVR_EXCLUDES})
if(CMAKE_VERSION VERSION_GREATER 3.4)
get_filename_component(EXCLUDE ${EXCLUDE} ABSOLUTE BASE_DIR ${BASEDIR})
endif()
list(APPEND GCOVR_EXCLUDES "${EXCLUDE}")
endforeach()
list(REMOVE_DUPLICATES GCOVR_EXCLUDES)
# Combine excludes to several -e arguments
set(GCOVR_EXCLUDE_ARGS "")
foreach(EXCLUDE ${GCOVR_EXCLUDES})
list(APPEND GCOVR_EXCLUDE_ARGS "-e")
list(APPEND GCOVR_EXCLUDE_ARGS "${EXCLUDE}")
endforeach()
# Set up commands which will be run to generate coverage data
# Run tests
set(GCOVR_HTML_EXEC_TESTS_CMD
${Coverage_EXECUTABLE} ${Coverage_EXECUTABLE_ARGS}
)
# Create folder
set(GCOVR_HTML_FOLDER_CMD
${CMAKE_COMMAND} -E make_directory ${PROJECT_BINARY_DIR}/${Coverage_NAME}
)
# Running gcovr
set(GCOVR_HTML_CMD
${GCOVR_PATH} --html ${Coverage_NAME}/index.html --html-details -r ${BASEDIR} ${GCOVR_ADDITIONAL_ARGS}
${GCOVR_EXCLUDE_ARGS} --object-directory=${PROJECT_BINARY_DIR}
)
if(CODE_COVERAGE_VERBOSE)
message(STATUS "Executed command report")
message(STATUS "Command to run tests: ")
string(REPLACE ";" " " GCOVR_HTML_EXEC_TESTS_CMD_SPACED "${GCOVR_HTML_EXEC_TESTS_CMD}")
message(STATUS "${GCOVR_HTML_EXEC_TESTS_CMD_SPACED}")
message(STATUS "Command to create a folder: ")
string(REPLACE ";" " " GCOVR_HTML_FOLDER_CMD_SPACED "${GCOVR_HTML_FOLDER_CMD}")
message(STATUS "${GCOVR_HTML_FOLDER_CMD_SPACED}")
message(STATUS "Command to generate gcovr HTML coverage data: ")
string(REPLACE ";" " " GCOVR_HTML_CMD_SPACED "${GCOVR_HTML_CMD}")
message(STATUS "${GCOVR_HTML_CMD_SPACED}")
endif()
add_custom_target(${Coverage_NAME}
COMMAND ${GCOVR_HTML_EXEC_TESTS_CMD}
COMMAND ${GCOVR_HTML_FOLDER_CMD}
COMMAND ${GCOVR_HTML_CMD}
BYPRODUCTS ${PROJECT_BINARY_DIR}/${Coverage_NAME}/index.html # report directory
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
DEPENDS ${Coverage_DEPENDENCIES}
VERBATIM # Protect arguments to commands
COMMENT "Running gcovr to produce HTML code coverage report."
)
# Show info where to find the report
add_custom_command(TARGET ${Coverage_NAME} POST_BUILD
COMMAND ;
COMMENT "Open ./${Coverage_NAME}/index.html in your browser to view the coverage report."
)
endfunction() # setup_target_for_coverage_gcovr_html
# Defines a target for running and collection code coverage information
# Builds dependencies, runs the given executable and outputs reports.
# NOTE! The executable should always have a ZERO as exit code otherwise
# the coverage generation will not complete.
#
# setup_target_for_coverage_fastcov(
# NAME testrunner_coverage # New target name
# EXECUTABLE testrunner -j ${PROCESSOR_COUNT} # Executable in PROJECT_BINARY_DIR
# DEPENDENCIES testrunner # Dependencies to build first
# BASE_DIRECTORY "../" # Base directory for report
# # (defaults to PROJECT_SOURCE_DIR)
# EXCLUDE "src/dir1/" "src/dir2/" # Patterns to exclude.
# NO_DEMANGLE # Don't demangle C++ symbols
# # even if c++filt is found
# SKIP_HTML # Don't create html report
# POST_CMD perl -i -pe s!${PROJECT_SOURCE_DIR}/!!g ctest_coverage.json # E.g. for stripping source dir from file paths
# )
function(setup_target_for_coverage_fastcov)
set(options NO_DEMANGLE SKIP_HTML)
set(oneValueArgs BASE_DIRECTORY NAME)
set(multiValueArgs EXCLUDE EXECUTABLE EXECUTABLE_ARGS DEPENDENCIES FASTCOV_ARGS GENHTML_ARGS POST_CMD)
cmake_parse_arguments(Coverage "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
if(NOT FASTCOV_PATH)
message(FATAL_ERROR "fastcov not found! Aborting...")
endif()
if(NOT Coverage_SKIP_HTML AND NOT GENHTML_PATH)
message(FATAL_ERROR "genhtml not found! Aborting...")
endif()
# Set base directory (as absolute path), or default to PROJECT_SOURCE_DIR
if(Coverage_BASE_DIRECTORY)
get_filename_component(BASEDIR ${Coverage_BASE_DIRECTORY} ABSOLUTE)
else()
set(BASEDIR ${PROJECT_SOURCE_DIR})
endif()
# Collect excludes (Patterns, not paths, for fastcov)
set(FASTCOV_EXCLUDES "")
foreach(EXCLUDE ${Coverage_EXCLUDE} ${COVERAGE_EXCLUDES} ${COVERAGE_FASTCOV_EXCLUDES})
list(APPEND FASTCOV_EXCLUDES "${EXCLUDE}")
endforeach()
list(REMOVE_DUPLICATES FASTCOV_EXCLUDES)
# Conditional arguments
if(CPPFILT_PATH AND NOT ${Coverage_NO_DEMANGLE})
set(GENHTML_EXTRA_ARGS "--demangle-cpp")
endif()
# Set up commands which will be run to generate coverage data
set(FASTCOV_EXEC_TESTS_CMD ${Coverage_EXECUTABLE} ${Coverage_EXECUTABLE_ARGS})
set(FASTCOV_CAPTURE_CMD ${FASTCOV_PATH} ${Coverage_FASTCOV_ARGS} --gcov ${GCOV_PATH}
--search-directory ${BASEDIR}
--process-gcno
--output ${Coverage_NAME}.json
--exclude ${FASTCOV_EXCLUDES}
)
set(FASTCOV_CONVERT_CMD ${FASTCOV_PATH}
-C ${Coverage_NAME}.json --lcov --output ${Coverage_NAME}.info
)
if(Coverage_SKIP_HTML)
set(FASTCOV_HTML_CMD ";")
else()
set(FASTCOV_HTML_CMD ${GENHTML_PATH} ${GENHTML_EXTRA_ARGS} ${Coverage_GENHTML_ARGS}
-o ${Coverage_NAME} ${Coverage_NAME}.info
)
endif()
set(FASTCOV_POST_CMD ";")
if(Coverage_POST_CMD)
set(FASTCOV_POST_CMD ${Coverage_POST_CMD})
endif()
if(CODE_COVERAGE_VERBOSE)
message(STATUS "Code coverage commands for target ${Coverage_NAME} (fastcov):")
message(" Running tests:")
string(REPLACE ";" " " FASTCOV_EXEC_TESTS_CMD_SPACED "${FASTCOV_EXEC_TESTS_CMD}")
message(" ${FASTCOV_EXEC_TESTS_CMD_SPACED}")
message(" Capturing fastcov counters and generating report:")
string(REPLACE ";" " " FASTCOV_CAPTURE_CMD_SPACED "${FASTCOV_CAPTURE_CMD}")
message(" ${FASTCOV_CAPTURE_CMD_SPACED}")
message(" Converting fastcov .json to lcov .info:")
string(REPLACE ";" " " FASTCOV_CONVERT_CMD_SPACED "${FASTCOV_CONVERT_CMD}")
message(" ${FASTCOV_CONVERT_CMD_SPACED}")
if(NOT Coverage_SKIP_HTML)
message(" Generating HTML report: ")
string(REPLACE ";" " " FASTCOV_HTML_CMD_SPACED "${FASTCOV_HTML_CMD}")
message(" ${FASTCOV_HTML_CMD_SPACED}")
endif()
if(Coverage_POST_CMD)
message(" Running post command: ")
string(REPLACE ";" " " FASTCOV_POST_CMD_SPACED "${FASTCOV_POST_CMD}")
message(" ${FASTCOV_POST_CMD_SPACED}")
endif()
endif()
# Setup target
add_custom_target(${Coverage_NAME}
# Cleanup fastcov
COMMAND ${FASTCOV_PATH} ${Coverage_FASTCOV_ARGS} --gcov ${GCOV_PATH}
--search-directory ${BASEDIR}
--zerocounters
COMMAND ${FASTCOV_EXEC_TESTS_CMD}
COMMAND ${FASTCOV_CAPTURE_CMD}
COMMAND ${FASTCOV_CONVERT_CMD}
COMMAND ${FASTCOV_HTML_CMD}
COMMAND ${FASTCOV_POST_CMD}
# Set output files as GENERATED (will be removed on 'make clean')
BYPRODUCTS
${Coverage_NAME}.info
${Coverage_NAME}.json
${Coverage_NAME}/index.html # report directory
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
DEPENDS ${Coverage_DEPENDENCIES}
VERBATIM # Protect arguments to commands
COMMENT "Resetting code coverage counters to zero. Processing code coverage counters and generating report."
)
set(INFO_MSG "fastcov code coverage info report saved in ${Coverage_NAME}.info and ${Coverage_NAME}.json.")
if(NOT Coverage_SKIP_HTML)
string(APPEND INFO_MSG " Open ${PROJECT_BINARY_DIR}/${Coverage_NAME}/index.html in your browser to view the coverage report.")
endif()
# Show where to find the fastcov info report
add_custom_command(TARGET ${Coverage_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E echo ${INFO_MSG}
)
endfunction() # setup_target_for_coverage_fastcov
function(append_coverage_compiler_flags)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${COVERAGE_COMPILER_FLAGS}" PARENT_SCOPE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${COVERAGE_COMPILER_FLAGS}" PARENT_SCOPE)
set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} ${COVERAGE_COMPILER_FLAGS}" PARENT_SCOPE)
message(STATUS "Appending code coverage compiler flags: ${COVERAGE_COMPILER_FLAGS}")
endfunction() # append_coverage_compiler_flags
# Setup coverage for specific library
function(append_coverage_compiler_flags_to_target name)
separate_arguments(_flag_list NATIVE_COMMAND "${COVERAGE_COMPILER_FLAGS}")
target_compile_options(${name} PRIVATE ${_flag_list})
if(CMAKE_C_COMPILER_ID STREQUAL "GNU" OR CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
target_link_libraries(${name} PRIVATE gcov)
endif()
endfunction()

View File

@@ -1,22 +0,0 @@
if(ENABLE_CLANG_TIDY)
find_program(CLANG_TIDY_COMMAND NAMES clang-tidy)
if(NOT CLANG_TIDY_COMMAND)
message(WARNING "🔴 CMake_RUN_CLANG_TIDY is ON but clang-tidy is not found!")
set(CMAKE_CXX_CLANG_TIDY "" CACHE STRING "" FORCE)
else()
message(STATUS "🟢 CMake_RUN_CLANG_TIDY is ON")
set(CLANGTIDY_EXTRA_ARGS
"-extra-arg=-Wno-unknown-warning-option"
)
set(CMAKE_CXX_CLANG_TIDY "${CLANG_TIDY_COMMAND};-p=${CMAKE_BINARY_DIR};${CLANGTIDY_EXTRA_ARGS}" CACHE STRING "" FORCE)
add_custom_target(clang-tidy
COMMAND ${CMAKE_COMMAND} --build ${CMAKE_BINARY_DIR} --target ${CMAKE_PROJECT_NAME}
COMMAND ${CMAKE_COMMAND} --build ${CMAKE_BINARY_DIR} --target clang-tidy
COMMENT "Running clang-tidy..."
)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
endif()
endif(ENABLE_CLANG_TIDY)

10
conandata.yml Normal file
View File

@@ -0,0 +1,10 @@
sources:
"1.1.2":
url: "https://github.com/rmontanana/BayesNet/archive/v1.1.2.tar.gz"
sha256: "placeholder_sha256" # Replace with actual SHA256 when releasing
"1.0.7":
url: "https://github.com/rmontanana/BayesNet/archive/v1.0.7.tar.gz"
sha256: "placeholder_sha256" # Replace with actual SHA256 when releasing
patches:
# Add patches here if needed for specific versions

108
conanfile.py Normal file
View File

@@ -0,0 +1,108 @@
import os, re, pathlib
from conan import ConanFile
from conan.tools.cmake import CMakeToolchain, CMake, cmake_layout, CMakeDeps
from conan.tools.files import copy
class BayesNetConan(ConanFile):
name = "bayesnet"
settings = "os", "compiler", "build_type", "arch"
options = {
"shared": [True, False],
"fPIC": [True, False],
"enable_testing": [True, False],
"enable_coverage": [True, False],
}
default_options = {
"shared": False,
"fPIC": True,
"enable_testing": False,
"enable_coverage": False,
}
# Sources are located in the same place as this recipe, copy them to the recipe
exports_sources = (
"CMakeLists.txt",
"bayesnet/*",
"config/*",
"cmake/*",
"docs/*",
"tests/*",
"bayesnetConfig.cmake.in",
)
def set_version(self) -> None:
cmake = pathlib.Path(self.recipe_folder) / "CMakeLists.txt"
text = cmake.read_text(encoding="utf-8")
# Accept either: project(foo VERSION 1.2.3) or set(foo_VERSION 1.2.3)
match = re.search(
r"""project\s*\([^\)]*VERSION\s+([0-9]+\.[0-9]+\.[0-9]+)""",
text,
re.IGNORECASE | re.VERBOSE,
)
if match:
self.version = match.group(1)
else:
raise Exception("Version not found in CMakeLists.txt")
self.version = match.group(1)
def config_options(self):
if self.settings.os == "Windows":
del self.options.fPIC
def configure(self):
if self.options.shared:
self.options.rm_safe("fPIC")
def requirements(self):
# Core dependencies
self.requires("libtorch/2.7.1")
self.requires("nlohmann_json/3.11.3")
self.requires("folding/1.1.2") # Custom package
self.requires("fimdlp/2.1.1") # Custom package
def build_requirements(self):
self.build_requires("cmake/[>=3.27]")
self.test_requires("arff-files/1.2.1") # Custom package
self.test_requires("catch2/3.8.1")
def layout(self):
cmake_layout(self)
def generate(self):
deps = CMakeDeps(self)
deps.generate()
tc = CMakeToolchain(self)
tc.variables["ENABLE_TESTING"] = self.options.enable_testing
tc.variables["CODE_COVERAGE"] = self.options.enable_coverage
tc.generate()
def build(self):
cmake = CMake(self)
cmake.configure()
cmake.build()
if self.options.enable_testing:
# Run tests only if we're building with testing enabled
self.run("ctest --output-on-failure", cwd=self.build_folder)
def package(self):
copy(
self,
"LICENSE",
src=self.source_folder,
dst=os.path.join(self.package_folder, "licenses"),
)
cmake = CMake(self)
cmake.install()
def package_info(self):
self.cpp_info.libs = ["bayesnet"]
self.cpp_info.includedirs = ["include"]
self.cpp_info.set_property("cmake_find_mode", "both")
self.cpp_info.set_property("cmake_target_name", "bayesnet::bayesnet")
# Add compiler flags that might be needed
if self.settings.os == "Linux":
self.cpp_info.system_libs = ["pthread"]

View File

@@ -3,12 +3,8 @@
#include <string>
#include <string_view>
#define PROJECT_VERSION_MAJOR @PROJECT_VERSION_MAJOR @
#define PROJECT_VERSION_MINOR @PROJECT_VERSION_MINOR @
#define PROJECT_VERSION_PATCH @PROJECT_VERSION_PATCH @
static constexpr std::string_view project_name = "@PROJECT_NAME@";
static constexpr std::string_view project_version = "@PROJECT_VERSION@";
static constexpr std::string_view project_description = "@PROJECT_DESCRIPTION@";
static constexpr std::string_view git_sha = "@GIT_SHA@";
static constexpr std::string_view data_path = "@BayesNet_SOURCE_DIR@/tests/data/";
static constexpr std::string_view data_path = "@bayesnet_SOURCE_DIR@/tests/data/";

View File

@@ -0,0 +1,235 @@
# Local Discretization Analysis - BayesNet Library
## Overview
This document analyzes the local discretization implementation in the BayesNet library, specifically focusing on the `Proposal.cc` implementation, and evaluates the feasibility of implementing an iterative discretization approach.
## Current Local Discretization Implementation
### Core Architecture
The local discretization functionality is implemented through a **Proposal class** (`bayesnet/classifiers/Proposal.h`) that serves as a mixin/base class for creating "Ld" (Local Discretization) variants of existing classifiers.
### Key Components
#### 1. The Proposal Class
- **Purpose**: Handles continuous data by applying local discretization using discretization algorithms
- **Dependencies**: Uses the `fimdlp` library for discretization algorithms
- **Supported Algorithms**:
- **MDLP** (Minimum Description Length Principle) - Default
- **BINQ** - Quantile-based binning
- **BINU** - Uniform binning
#### 2. Local Discretization Variants
The codebase implements Ld variants using multiple inheritance:
**Individual Classifiers:**
- `TANLd` - Tree Augmented Naive Bayes with Local Discretization
- `KDBLd` - K-Dependence Bayesian with Local Discretization
- `SPODELd` - Super-Parent One-Dependence Estimator with Local Discretization
**Ensemble Classifiers:**
- `AODELd` - Averaged One-Dependence Estimator with Local Discretization
### Implementation Pattern
All Ld variants follow a consistent pattern using **multiple inheritance**:
```cpp
class TANLd : public TAN, public Proposal {
// Inherits from both the base classifier and Proposal
};
```
### Two-Phase Discretization Process
#### Phase 1: Initial Discretization (`fit_local_discretization`)
- Each continuous feature is discretized independently using the chosen algorithm
- Creates initial discrete dataset
- Uses only class labels for discretization decisions
#### Phase 2: Network-Aware Refinement (`localDiscretizationProposal`)
- After building the initial Bayesian network structure
- Features are re-discretized considering their parent nodes in the network
- Uses topological ordering to ensure proper dependency handling
- Creates more informed discretization boundaries based on network relationships
### Hyperparameter Support
The Proposal class supports several configurable hyperparameters:
- `ld_algorithm`: Choice of discretization algorithm (MDLP, BINQ, BINU)
- `ld_proposed_cuts`: Number of proposed cuts for discretization
- `mdlp_min_length`: Minimum interval length for MDLP
- `mdlp_max_depth`: Maximum depth for MDLP tree
## Current Implementation Strengths
1. **Sophisticated Approach**: Considers network structure in discretization decisions
2. **Modular Design**: Clean separation through Proposal class mixin
3. **Multiple Algorithm Support**: Flexible discretization strategies
4. **Proper Dependency Handling**: Topological ordering ensures correct processing
5. **Well-Integrated**: Seamless integration with existing classifier architecture
## Areas for Improvement
### Code Quality Issues
1. **Dead Code**: Line 161 in `Proposal.cc` contains unused variable `allDigits`
2. **Performance Issues**:
- String concatenation in tight loop (lines 82-84) using `+=` operator
- Memory allocations could be optimized
- Tensor operations could be batched better
3. **Error Handling**: Could be more robust with better exception handling
### Algorithm Clarity
1. **Logic Clarity**: The `upgrade` flag logic could be more descriptive
2. **Variable Naming**: Some variables need more descriptive names
3. **Documentation**: Better inline documentation of the two-phase process
4. **Method Complexity**: `localDiscretizationProposal` method is quite long and complex
### Suggested Code Improvements
```cpp
// Instead of string concatenation in loop:
for (auto idx : indices) {
for (int i = 0; i < Xf.size(1); ++i) {
yJoinParents[i] += to_string(pDataset.index({ idx, i }).item<int>());
}
}
// Consider using stringstream or pre-allocation:
std::stringstream ss;
for (auto idx : indices) {
for (int i = 0; i < Xf.size(1); ++i) {
ss << pDataset.index({ idx, i }).item<int>();
yJoinParents[i] = ss.str();
ss.str("");
}
}
```
## Iterative Discretization Proposal
### Concept
Implement an iterative process: discretize → build model → re-discretize → rebuild model → repeat until convergence.
### Feasibility Assessment
**Highly Feasible** - The current implementation already provides a solid foundation with its two-phase approach, making extension straightforward.
### Proposed Implementation Strategy
```cpp
class IterativeProposal : public Proposal {
public:
struct ConvergenceParams {
int max_iterations = 10;
double tolerance = 1e-6;
bool check_network_structure = true;
bool check_discretization_stability = true;
};
private:
map<string, vector<int>> iterativeLocalDiscretization(const torch::Tensor& y) {
auto states = fit_local_discretization(y); // Initial discretization
Network previousModel, currentModel;
int iteration = 0;
do {
previousModel = currentModel;
// Build model with current discretization
const torch::Tensor weights = torch::full({ pDataset.size(1) }, 1.0 / pDataset.size(1), torch::kDouble);
currentModel.fit(pDataset, weights, pFeatures, pClassName, states, Smoothing_t::ORIGINAL);
// Apply local discretization based on current model
auto newStates = localDiscretizationProposal(states, currentModel);
// Check for convergence
if (hasConverged(previousModel, currentModel, states, newStates)) {
break;
}
states = newStates;
iteration++;
} while (iteration < convergenceParams.max_iterations);
return states;
}
bool hasConverged(const Network& prev, const Network& curr,
const map<string, vector<int>>& oldStates,
const map<string, vector<int>>& newStates) {
// Implementation of convergence criteria
return checkNetworkStructureConvergence(prev, curr) &&
checkDiscretizationStability(oldStates, newStates);
}
};
```
### Convergence Criteria Options
1. **Network Structure Comparison**: Compare edge sets between iterations
```cpp
bool checkNetworkStructureConvergence(const Network& prev, const Network& curr) {
// Compare adjacency matrices or edge lists
return prev.getEdges() == curr.getEdges();
}
```
2. **Discretization Stability**: Check if cut points change significantly
```cpp
bool checkDiscretizationStability(const map<string, vector<int>>& oldStates,
const map<string, vector<int>>& newStates) {
for (const auto& [feature, states] : oldStates) {
if (states != newStates.at(feature)) {
return false;
}
}
return true;
}
```
3. **Performance Metrics**: Monitor accuracy/likelihood convergence
4. **Maximum Iterations**: Prevent infinite loops
### Expected Benefits
1. **Better Discretization Quality**: Each iteration refines boundaries based on learned dependencies
2. **Improved Model Accuracy**: More informed discretization leads to better classification
3. **Adaptive Process**: Automatically finds optimal discretization-model combination
4. **Principled Approach**: Theoretically sound iterative refinement
5. **Reduced Manual Tuning**: Less need for hyperparameter optimization
### Implementation Considerations
1. **Convergence Detection**: Need robust criteria to detect when to stop
2. **Performance Impact**: Multiple iterations increase computational cost
3. **Overfitting Prevention**: May need regularization to avoid over-discretization
4. **Stability Guarantees**: Ensure the process doesn't oscillate indefinitely
5. **Memory Management**: Handle multiple model instances efficiently
### Integration Strategy
1. **Backward Compatibility**: Keep existing two-phase approach as default
2. **Optional Feature**: Add iterative mode as configurable option
3. **Hyperparameter Extension**: Add convergence-related parameters
4. **Testing Framework**: Comprehensive testing on standard datasets
## Conclusion
The current local discretization implementation in BayesNet is well-designed and functional, providing a solid foundation for the proposed iterative enhancement. The iterative approach would significantly improve the quality of discretization by creating a feedback loop between model structure and discretization decisions.
The implementation is highly feasible given the existing architecture, and the expected benefits justify the additional computational complexity. The key to success will be implementing robust convergence criteria and maintaining the modularity of the current design.
## Recommendations
1. **Immediate Improvements**: Fix code quality issues and optimize performance bottlenecks
2. **Iterative Implementation**: Develop the iterative approach as an optional enhancement
3. **Comprehensive Testing**: Validate improvements on standard benchmark datasets
4. **Documentation**: Enhance inline documentation and user guides
5. **Performance Profiling**: Monitor computational overhead and optimize where necessary

View File

@@ -1,22 +1,22 @@
cmake_minimum_required(VERSION 3.20)
project(bayesnet_sample)
project(bayesnet_sample VERSION 0.1.0 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
find_package(Torch CONFIG REQUIRED)
find_package(bayesnet CONFIG REQUIRED)
find_package(fimdlp CONFIG REQUIRED)
find_package(folding CONFIG REQUIRED)
find_package(arff-files CONFIG REQUIRED)
find_package(nlohman_json CONFIG REQUIRED)
find_package(nlohmann_json REQUIRED)
find_package(bayesnet CONFIG REQUIRED)
add_executable(bayesnet_sample sample.cc)
target_link_libraries(bayesnet_sample PRIVATE
fimdlp::fimdlp
arff-files::arff-files
"${TORCH_LIBRARIES}"
torch::torch
bayesnet::bayesnet
nlohmann_json::nlohmann_json
folding::folding
nlohmann_json::nlohmann_json
)

View File

@@ -0,0 +1,9 @@
{
"version": 4,
"vendor": {
"conan": {}
},
"include": [
"build/CMakePresets.json"
]
}

14
sample/conanfile.txt Normal file
View File

@@ -0,0 +1,14 @@
[requires]
libtorch/2.7.0
arff-files/1.2.0
fimdlp/2.1.0
folding/1.1.1
bayesnet/1.2.0
nlohmann_json/3.11.3
[generators]
CMakeToolchain
CMakeDeps
[options]
libtorch/2.7.0:shared=True

View File

@@ -4,9 +4,22 @@
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <ArffFiles/ArffFiles.hpp>
#include <map>
#include <string>
#include <ArffFiles.hpp>
#include <fimdlp/CPPFImdlp.h>
#include <bayesnet/ensembles/XBAODE.h>
#include <bayesnet/classifiers/TANLd.h>
#include <bayesnet/classifiers/KDBLd.h>
#include <bayesnet/ensembles/AODELd.h>
torch::Tensor matrix2tensor(const std::vector<std::vector<float>>& matrix)
{
auto tensor = torch::empty({ static_cast<int>(matrix.size()), static_cast<int>(matrix[0].size()) }, torch::kFloat32);
for (int i = 0; i < matrix.size(); ++i) {
tensor.index_put_({ i, "..." }, torch::tensor(matrix[i], torch::kFloat32));
}
return tensor;
}
std::vector<mdlp::labels_t> discretizeDataset(std::vector<mdlp::samples_t>& X, mdlp::labels_t& y)
{
@@ -19,63 +32,89 @@ std::vector<mdlp::labels_t> discretizeDataset(std::vector<mdlp::samples_t>& X, m
}
return Xd;
}
tuple<torch::Tensor, torch::Tensor, std::vector<std::string>, std::string, map<std::string, std::vector<int>>> loadDataset(const std::string& name, bool class_last)
std::tuple<torch::Tensor, torch::Tensor, std::vector<std::string>, std::string> loadArff(const std::string& name, bool class_last)
{
auto handler = ArffFiles();
handler.load(name, class_last);
// Get Dataset X, y
std::vector<mdlp::samples_t>& X = handler.getX();
mdlp::labels_t& y = handler.getY();
// Get className & Features
auto className = handler.getClassName();
std::vector<mdlp::samples_t> X = handler.getX();
mdlp::labels_t y = handler.getY();
std::vector<std::string> features;
auto attributes = handler.getAttributes();
transform(attributes.begin(), attributes.end(), back_inserter(features), [](const auto& pair) { return pair.first; });
torch::Tensor Xd;
auto states = map<std::string, std::vector<int>>();
auto Xr = discretizeDataset(X, y);
Xd = torch::zeros({ static_cast<int>(Xr.size()), static_cast<int>(Xr[0].size()) }, torch::kInt32);
for (int i = 0; i < features.size(); ++i) {
states[features[i]] = std::vector<int>(*max_element(Xr[i].begin(), Xr[i].end()) + 1);
auto item = states.at(features[i]);
iota(begin(item), end(item), 0);
Xd.index_put_({ i, "..." }, torch::tensor(Xr[i], torch::kInt32));
}
states[className] = std::vector<int>(*max_element(y.begin(), y.end()) + 1);
iota(begin(states.at(className)), end(states.at(className)), 0);
return { Xd, torch::tensor(y, torch::kInt32), features, className, states };
auto Xt = matrix2tensor(X);
auto yt = torch::tensor(y, torch::kInt32);
return { Xt, yt, features, handler.getClassName() };
}
// tuple<torch::Tensor, torch::Tensor, std::vector<std::string>, std::string, map<std::string, std::vector<int>>> loadDataset(const std::string& name, bool class_last)
// {
// auto [X, y, features, className] = loadArff(name, class_last);
// // Discretize the dataset
// torch::Tensor Xd;
// auto states = map<std::string, std::vector<int>>();
// // Fill the class states
// states[className] = std::vector<int>(*max_element(y.begin(), y.end()) + 1);
// iota(begin(states.at(className)), end(states.at(className)), 0);
// auto Xr = discretizeDataset(X, y);
// Xd = torch::zeros({ static_cast<int>(Xr.size()), static_cast<int>(Xr[0].size()) }, torch::kInt32);
// for (int i = 0; i < features.size(); ++i) {
// states[features[i]] = std::vector<int>(*max_element(Xr[i].begin(), Xr[i].end()) + 1);
// auto item = states.at(features[i]);
// iota(begin(item), end(item), 0);
// Xd.index_put_({ i, "..." }, torch::tensor(Xr[i], torch::kInt32));
// }
// auto yt = torch::tensor(y, torch::kInt32);
// return { Xd, yt, features, className, states };
// }
int main(int argc, char* argv[])
{
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <file_name>" << std::endl;
if (argc < 3) {
std::cerr << "Usage: " << argv[0] << " <arff_file_name> <model>" << std::endl;
return 1;
}
std::string file_name = argv[1];
torch::Tensor X, y;
std::vector<std::string> features;
std::string className;
map<std::string, std::vector<int>> states;
auto clf = bayesnet::XBAODE(); // false for not using voting in predict
std::cout << "Library version: " << clf.getVersion() << std::endl;
tie(X, y, features, className, states) = loadDataset(file_name, true);
torch::Tensor weights = torch::full({ X.size(1) }, 15, torch::kDouble);
torch::Tensor dataset;
try {
auto yresized = torch::transpose(y.view({ y.size(0), 1 }), 0, 1);
dataset = torch::cat({ X, yresized }, 0);
std::string model_name = argv[2];
std::map<std::string, bayesnet::Classifier*> models{ {"TANLd", new bayesnet::TANLd()}, {"KDBLd", new bayesnet::KDBLd(2)}, {"AODELd", new bayesnet::AODELd() }
};
if (models.find(model_name) == models.end()) {
std::cerr << "Model not found: " << model_name << std::endl;
std::cerr << "Available models: ";
for (const auto& model : models) {
std::cerr << model.first << " ";
}
std::cerr << std::endl;
return 1;
}
catch (const std::exception& e) {
std::stringstream oss;
oss << "* Error in X and y dimensions *\n";
oss << "X dimensions: " << dataset.sizes() << "\n";
oss << "y dimensions: " << y.sizes();
throw std::runtime_error(oss.str());
auto clf = models[model_name];
std::cout << "Library version: " << clf->getVersion() << std::endl;
// auto [X, y, features, className, states] = loadDataset(file_name, true);
auto [Xt, yt, features, className] = loadArff(file_name, true);
std::map<std::string, std::vector<int>> states;
// int m = Xt.size(1);
// auto weights = torch::full({ m }, 1 / m, torch::kDouble);
// auto dataset = buildDataset(Xv, yv);
// try {
// auto yresized = torch::transpose(y.view({ y.size(0), 1 }), 0, 1);
// dataset = torch::cat({ X, yresized }, 0);
// }
// catch (const std::exception& e) {
// std::stringstream oss;
// oss << "* Error in X and y dimensions *\n";
// oss << "X dimensions: " << dataset.sizes() << "\n";
// oss << "y dimensions: " << y.sizes();
// throw std::runtime_error(oss.str());
// }
clf->fit(Xt, yt, features, className, states, bayesnet::Smoothing_t::ORIGINAL);
auto total = yt.size(0);
auto y_proba = clf->predict_proba(Xt);
auto y_pred = y_proba.argmax(1);
auto accuracy_value = (y_pred == yt).sum().item<float>() / total;
auto score = clf->score(Xt, yt);
std::cout << "File: " << file_name << " Model: " << model_name << " score: " << score << " Computed accuracy: " << accuracy_value << std::endl;
for (const auto clf : models) {
delete clf.second;
}
clf.fit(dataset, features, className, states, weights, bayesnet::Smoothing_t::LAPLACE);
auto score = clf.score(X, y);
std::cout << "File: " << file_name << " Model: BoostAODE score: " << score << std::endl;
return 0;
}

View File

@@ -1,21 +1,21 @@
{
"default-registry": {
"kind": "git",
"baseline": "760bfd0c8d7c89ec640aec4df89418b7c2745605",
"repository": "https://github.com/microsoft/vcpkg"
},
"registries": [
{
"kind": "git",
"repository": "https://github.com/rmontanana/vcpkg-stash",
"baseline": "393efa4e74e053b6f02c4ab03738c8fe796b28e5",
"baseline": "1ea69243c0e8b0de77c9d1dd6e1d7593ae7f3627",
"packages": [
"folding",
"bayesnet",
"arff-files",
"bayesnet",
"fimdlp",
"folding",
"libtorch-bin"
]
}
],
"default-registry": {
"kind": "git",
"repository": "https://github.com/microsoft/vcpkg",
"baseline": "760bfd0c8d7c89ec640aec4df89418b7c2745605"
}
]
}

View File

@@ -2,11 +2,32 @@
"name": "sample-project",
"version-string": "0.1.0",
"dependencies": [
"bayesnet",
"folding",
"arff-files",
"fimdlp",
"nlohmann-json",
"libtorch-bin"
"libtorch-bin",
"folding",
"nlohmann-json"
],
"overrides": [
{
"name": "arff-files",
"version": "1.1.0"
},
{
"name": "fimdlp",
"version": "2.0.1"
},
{
"name": "libtorch-bin",
"version": "2.7.0"
},
{
"name": "bayesnet",
"version": "1.1.1"
},
{
"name": "folding",
"version": "1.1.1"
}
]
}

View File

@@ -1,18 +1,14 @@
if(ENABLE_TESTING)
include_directories(
${BayesNet_SOURCE_DIR}/tests/lib/Files
${BayesNet_SOURCE_DIR}/lib/folding
${BayesNet_SOURCE_DIR}/lib/mdlp/src
${BayesNet_SOURCE_DIR}/lib/log
${BayesNet_SOURCE_DIR}/lib/json/include
${BayesNet_SOURCE_DIR}
${CMAKE_BINARY_DIR}/configured_files/include
${nlohmann_json_INCLUDE_DIRS}
)
file(GLOB_RECURSE BayesNet_SOURCES "${BayesNet_SOURCE_DIR}/bayesnet/*.cc")
file(GLOB_RECURSE BayesNet_SOURCES "${bayesnet_SOURCE_DIR}/bayesnet/*.cc")
add_executable(TestBayesNet TestBayesNetwork.cc TestBayesNode.cc TestBayesClassifier.cc TestXSPnDE.cc TestXBA2DE.cc
TestBayesModels.cc TestBayesMetrics.cc TestFeatureSelection.cc TestBoostAODE.cc TestXBAODE.cc TestA2DE.cc
TestUtils.cc TestBayesEnsemble.cc TestModulesVersions.cc TestBoostA2DE.cc TestMST.cc TestXSPODE.cc ${BayesNet_SOURCES})
target_link_libraries(TestBayesNet PUBLIC "${TORCH_LIBRARIES}" fimdlp PRIVATE Catch2::Catch2WithMain)
target_link_libraries(TestBayesNet PRIVATE torch::torch fimdlp::fimdlp Catch2::Catch2WithMain folding::folding)
add_test(NAME BayesNetworkTest COMMAND TestBayesNet)
add_test(NAME A2DE COMMAND TestBayesNet "[A2DE]")
add_test(NAME BoostA2DE COMMAND TestBayesNet "[BoostA2DE]")

View File

@@ -20,7 +20,7 @@
#include "bayesnet/ensembles/AODELd.h"
#include "bayesnet/ensembles/BoostAODE.h"
const std::string ACTUAL_VERSION = "1.1.0";
const std::string ACTUAL_VERSION = "1.2.1";
TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
{
@@ -31,9 +31,9 @@ TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
{{"diabetes", "SPODE"}, 0.802083},
{{"diabetes", "TAN"}, 0.821615},
{{"diabetes", "AODELd"}, 0.8125f},
{{"diabetes", "KDBLd"}, 0.80208f},
{{"diabetes", "KDBLd"}, 0.804688f},
{{"diabetes", "SPODELd"}, 0.7890625f},
{{"diabetes", "TANLd"}, 0.803385437f},
{{"diabetes", "TANLd"}, 0.8125f},
{{"diabetes", "BoostAODE"}, 0.83984f},
// Ecoli
{{"ecoli", "AODE"}, 0.889881},
@@ -42,9 +42,9 @@ TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
{{"ecoli", "SPODE"}, 0.880952},
{{"ecoli", "TAN"}, 0.892857},
{{"ecoli", "AODELd"}, 0.875f},
{{"ecoli", "KDBLd"}, 0.880952358f},
{{"ecoli", "KDBLd"}, 0.872024f},
{{"ecoli", "SPODELd"}, 0.839285731f},
{{"ecoli", "TANLd"}, 0.848214269f},
{{"ecoli", "TANLd"}, 0.869047642f},
{{"ecoli", "BoostAODE"}, 0.89583f},
// Glass
{{"glass", "AODE"}, 0.79439},
@@ -53,9 +53,9 @@ TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
{{"glass", "SPODE"}, 0.775701},
{{"glass", "TAN"}, 0.827103},
{{"glass", "AODELd"}, 0.799065411f},
{{"glass", "KDBLd"}, 0.82710278f},
{{"glass", "KDBLd"}, 0.864485979f},
{{"glass", "SPODELd"}, 0.780373812f},
{{"glass", "TANLd"}, 0.869158864f},
{{"glass", "TANLd"}, 0.831775725f},
{{"glass", "BoostAODE"}, 0.84579f},
// Iris
{{"iris", "AODE"}, 0.973333},
@@ -68,29 +68,29 @@ TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
{{"iris", "SPODELd"}, 0.96f},
{{"iris", "TANLd"}, 0.97333f},
{{"iris", "BoostAODE"}, 0.98f} };
std::map<std::string, bayesnet::BaseClassifier*> models{ {"AODE", new bayesnet::AODE()},
{"AODELd", new bayesnet::AODELd()},
{"BoostAODE", new bayesnet::BoostAODE()},
{"KDB", new bayesnet::KDB(2)},
{"KDBLd", new bayesnet::KDBLd(2)},
{"XSPODE", new bayesnet::XSpode(1)},
{"SPODE", new bayesnet::SPODE(1)},
{"SPODELd", new bayesnet::SPODELd(1)},
{"TAN", new bayesnet::TAN()},
{"TANLd", new bayesnet::TANLd()} };
std::map<std::string, std::unique_ptr<bayesnet::BaseClassifier>> models;
models["AODE"] = std::make_unique<bayesnet::AODE>();
models["AODELd"] = std::make_unique<bayesnet::AODELd>();
models["BoostAODE"] = std::make_unique<bayesnet::BoostAODE>();
models["KDB"] = std::make_unique<bayesnet::KDB>(2);
models["KDBLd"] = std::make_unique<bayesnet::KDBLd>(2);
models["XSPODE"] = std::make_unique<bayesnet::XSpode>(1);
models["SPODE"] = std::make_unique<bayesnet::SPODE>(1);
models["SPODELd"] = std::make_unique<bayesnet::SPODELd>(1);
models["TAN"] = std::make_unique<bayesnet::TAN>();
models["TANLd"] = std::make_unique<bayesnet::TANLd>();
std::string name = GENERATE("AODE", "AODELd", "KDB", "KDBLd", "SPODE", "XSPODE", "SPODELd", "TAN", "TANLd");
auto clf = models[name];
auto clf = std::move(models[name]);
SECTION("Test " + name + " classifier")
{
for (const std::string& file_name : { "glass", "iris", "ecoli", "diabetes" }) {
auto clf = models[name];
auto discretize = name.substr(name.length() - 2) != "Ld";
auto raw = RawDatasets(file_name, discretize);
clf->fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing);
auto score = clf->score(raw.Xt, raw.yt);
// std::cout << "Classifier: " << name << " File: " << file_name << " Score: " << score << " expected = " <<
// scores[{file_name, name}] << std::endl;
// scores[{file_name, name}] << std::endl;
INFO("Classifier: " << name << " File: " << file_name);
REQUIRE(score == Catch::Approx(scores[{file_name, name}]).epsilon(raw.epsilon));
REQUIRE(clf->getStatus() == bayesnet::NORMAL);
@@ -101,7 +101,6 @@ TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
INFO("Checking version of " << name << " classifier");
REQUIRE(clf->getVersion() == ACTUAL_VERSION);
}
delete clf;
}
TEST_CASE("Models features & Graph", "[Models]")
{
@@ -133,7 +132,7 @@ TEST_CASE("Models features & Graph", "[Models]")
clf.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 5);
REQUIRE(clf.getNumberOfEdges() == 7);
REQUIRE(clf.getNumberOfStates() == 27);
REQUIRE(clf.getNumberOfStates() == 26);
REQUIRE(clf.getClassNumStates() == 3);
REQUIRE(clf.show() == std::vector<std::string>{"class -> sepallength, sepalwidth, petallength, petalwidth, ",
"petallength -> sepallength, ", "petalwidth -> ",
@@ -149,10 +148,9 @@ TEST_CASE("Get num features & num edges", "[Models]")
REQUIRE(clf.getNumberOfNodes() == 5);
REQUIRE(clf.getNumberOfEdges() == 8);
}
TEST_CASE("Model predict_proba", "[Models]")
{
std::string model = GENERATE("TAN", "SPODE", "BoostAODEproba", "BoostAODEvoting");
std::string model = GENERATE("TAN", "SPODE", "BoostAODEproba", "BoostAODEvoting", "TANLd", "SPODELd", "KDBLd");
auto res_prob_tan = std::vector<std::vector<double>>({ {0.00375671, 0.994457, 0.00178621},
{0.00137462, 0.992734, 0.00589123},
{0.00137462, 0.992734, 0.00589123},
@@ -180,56 +178,111 @@ TEST_CASE("Model predict_proba", "[Models]")
{0.0284828, 0.770524, 0.200993},
{0.0213182, 0.857189, 0.121493},
{0.00868436, 0.949494, 0.0418215} });
auto res_prob_tanld = std::vector<std::vector<double>>({ {0.000597557, 0.9957, 0.00370254},
{0.000731377, 0.997914, 0.0013544},
{0.000731377, 0.997914, 0.0013544},
{0.000731377, 0.997914, 0.0013544},
{0.000838614, 0.998122, 0.00103923},
{0.00130852, 0.0659492, 0.932742},
{0.00365946, 0.979412, 0.0169281},
{0.00435035, 0.986248, 0.00940212},
{0.000583815, 0.997746, 0.00167066} });
auto res_prob_spodeld = std::vector<std::vector<double>>({ {0.000908024, 0.993742, 0.00535024 },
{0.00187726, 0.99167, 0.00645308 },
{0.00187726, 0.99167, 0.00645308 },
{0.00187726, 0.99167, 0.00645308 },
{0.00287539, 0.993736, 0.00338846 },
{0.00294402, 0.268495, 0.728561 },
{0.0132381, 0.873282, 0.113479 },
{0.0159412, 0.969228, 0.0148308 },
{0.00203487, 0.989762, 0.00820356 } });
auto res_prob_kdbld = std::vector<std::vector<double>>({ {0.000738981, 0.997208, 0.00205272 },
{0.00087708, 0.996687, 0.00243633 },
{0.00087708, 0.996687, 0.00243633 },
{0.00087708, 0.996687, 0.00243633 },
{0.000738981, 0.997208, 0.00205272 },
{0.00512442, 0.0455504, 0.949325 },
{0.0023632, 0.976631, 0.0210063 },
{0.00189194, 0.992853, 0.00525538 },
{0.00189194, 0.992853, 0.00525538, } });
auto res_prob_voting = std::vector<std::vector<double>>(
{ {0, 1, 0}, {0, 1, 0}, {0, 1, 0}, {0, 1, 0}, {0, 1, 0}, {0, 0, 1}, {0, 1, 0}, {0, 1, 0}, {0, 1, 0} });
std::map<std::string, std::vector<std::vector<double>>> res_prob{ {"TAN", res_prob_tan},
{"SPODE", res_prob_spode},
{"BoostAODEproba", res_prob_baode},
{"BoostAODEvoting", res_prob_voting} };
std::map<std::string, bayesnet::BaseClassifier*> models{ {"TAN", new bayesnet::TAN()},
{"SPODE", new bayesnet::SPODE(0)},
{"BoostAODEproba", new bayesnet::BoostAODE(false)},
{"BoostAODEvoting", new bayesnet::BoostAODE(true)} };
{"BoostAODEvoting", res_prob_voting},
{"TANLd", res_prob_tanld},
{"SPODELd", res_prob_spodeld},
{"KDBLd", res_prob_kdbld} };
std::map<std::string, std::unique_ptr<bayesnet::BaseClassifier>> models;
models["TAN"] = std::make_unique<bayesnet::TAN>();
models["SPODE"] = std::make_unique<bayesnet::SPODE>(0);
models["BoostAODEproba"] = std::make_unique<bayesnet::BoostAODE>(false);
models["BoostAODEvoting"] = std::make_unique<bayesnet::BoostAODE>(true);
models["TANLd"] = std::make_unique<bayesnet::TANLd>();
models["SPODELd"] = std::make_unique<bayesnet::SPODELd>(0);
models["KDBLd"] = std::make_unique<bayesnet::KDBLd>(2);
int init_index = 78;
auto raw = RawDatasets("iris", true);
SECTION("Test " + model + " predict_proba")
{
auto clf = models[model];
clf->fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
auto y_pred_proba = clf->predict_proba(raw.Xv);
auto yt_pred_proba = clf->predict_proba(raw.Xt);
auto y_pred = clf->predict(raw.Xv);
auto yt_pred = clf->predict(raw.Xt);
REQUIRE(y_pred.size() == yt_pred.size(0));
REQUIRE(y_pred.size() == y_pred_proba.size());
REQUIRE(y_pred.size() == yt_pred_proba.size(0));
REQUIRE(y_pred.size() == raw.yv.size());
REQUIRE(y_pred_proba[0].size() == 3);
REQUIRE(yt_pred_proba.size(1) == y_pred_proba[0].size());
for (int i = 0; i < 9; ++i) {
auto maxElem = max_element(y_pred_proba[i].begin(), y_pred_proba[i].end());
int predictedClass = distance(y_pred_proba[i].begin(), maxElem);
REQUIRE(predictedClass == y_pred[i]);
// Check predict is coherent with predict_proba
REQUIRE(yt_pred_proba[i].argmax().item<int>() == y_pred[i]);
for (int j = 0; j < yt_pred_proba.size(1); j++) {
REQUIRE(yt_pred_proba[i][j].item<double>() == Catch::Approx(y_pred_proba[i][j]).epsilon(raw.epsilon));
INFO("Testing " << model << " predict_proba");
auto ld_model = model.substr(model.length() - 2) == "Ld";
auto discretize = !ld_model;
auto raw = RawDatasets("iris", discretize);
auto& clf = *models[model];
clf.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing);
auto yt_pred_proba = clf.predict_proba(raw.Xt);
auto yt_pred = clf.predict(raw.Xt);
std::vector<int> y_pred;
std::vector<std::vector<double>> y_pred_proba;
if (!ld_model) {
y_pred = clf.predict(raw.Xv);
y_pred_proba = clf.predict_proba(raw.Xv);
REQUIRE(y_pred.size() == y_pred_proba.size());
REQUIRE(y_pred.size() == yt_pred.size(0));
REQUIRE(y_pred.size() == yt_pred_proba.size(0));
REQUIRE(y_pred_proba[0].size() == 3);
REQUIRE(y_pred.size() == raw.yv.size());
REQUIRE(yt_pred_proba.size(1) == y_pred_proba[0].size());
for (int i = 0; i < 9; ++i) {
auto maxElem = max_element(y_pred_proba[i].begin(), y_pred_proba[i].end());
int predictedClass = distance(y_pred_proba[i].begin(), maxElem);
REQUIRE(predictedClass == y_pred[i]);
// Check predict is coherent with predict_proba
REQUIRE(yt_pred_proba[i].argmax().item<int>() == y_pred[i]);
for (int j = 0; j < yt_pred_proba.size(1); j++) {
REQUIRE(yt_pred_proba[i][j].item<double>() == Catch::Approx(y_pred_proba[i][j]).epsilon(raw.epsilon));
}
}
// Check predict_proba values for vectors and tensors
for (int i = 0; i < 9; i++) {
REQUIRE(y_pred[i] == yt_pred[i].item<int>());
for (int j = 0; j < 3; j++) {
REQUIRE(res_prob[model][i][j] == Catch::Approx(y_pred_proba[i + init_index][j]).epsilon(raw.epsilon));
REQUIRE(res_prob[model][i][j] ==
Catch::Approx(yt_pred_proba[i + init_index][j].item<double>()).epsilon(raw.epsilon));
}
}
} else {
// Check predict_proba values for vectors and tensors
auto predictedClasses = yt_pred_proba.argmax(1);
// std::cout << model << std::endl;
for (int i = 0; i < 9; i++) {
REQUIRE(predictedClasses[i].item<int>() == yt_pred[i].item<int>());
// std::cout << "{";
for (int j = 0; j < 3; j++) {
// std::cout << yt_pred_proba[i + init_index][j].item<double>() << ", ";
REQUIRE(res_prob[model][i][j] ==
Catch::Approx(yt_pred_proba[i + init_index][j].item<double>()).epsilon(raw.epsilon));
}
// std::cout << "\b\b}," << std::endl;
}
}
// Check predict_proba values for vectors and tensors
for (int i = 0; i < 9; i++) {
REQUIRE(y_pred[i] == yt_pred[i].item<int>());
for (int j = 0; j < 3; j++) {
REQUIRE(res_prob[model][i][j] == Catch::Approx(y_pred_proba[i + init_index][j]).epsilon(raw.epsilon));
REQUIRE(res_prob[model][i][j] ==
Catch::Approx(yt_pred_proba[i + init_index][j].item<double>()).epsilon(raw.epsilon));
}
}
delete clf;
}
}
TEST_CASE("AODE voting-proba", "[Models]")
{
auto raw = RawDatasets("glass", true);
@@ -248,17 +301,30 @@ TEST_CASE("AODE voting-proba", "[Models]")
REQUIRE(pred_proba[67][0] == Catch::Approx(0.702184).epsilon(raw.epsilon));
REQUIRE(clf.topological_order() == std::vector<std::string>());
}
TEST_CASE("SPODELd dataset", "[Models]")
TEST_CASE("Ld models with dataset", "[Models]")
{
auto raw = RawDatasets("iris", false);
auto clf = bayesnet::SPODELd(0);
// raw.dataset.to(torch::kFloat32);
clf.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing);
auto score = clf.score(raw.Xt, raw.yt);
clf.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing);
auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.97333f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.97333f).epsilon(raw.epsilon));
auto clf2 = bayesnet::TANLd();
clf2.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing);
auto score2 = clf2.score(raw.Xt, raw.yt);
clf2.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing);
auto score2t = clf2.score(raw.Xt, raw.yt);
REQUIRE(score2 == Catch::Approx(0.97333f).epsilon(raw.epsilon));
REQUIRE(score2t == Catch::Approx(0.97333f).epsilon(raw.epsilon));
auto clf3 = bayesnet::KDBLd(2);
clf3.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing);
auto score3 = clf3.score(raw.Xt, raw.yt);
clf3.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing);
auto score3t = clf3.score(raw.Xt, raw.yt);
REQUIRE(score3 == Catch::Approx(0.97333f).epsilon(raw.epsilon));
REQUIRE(score3t == Catch::Approx(0.97333f).epsilon(raw.epsilon));
}
TEST_CASE("KDB with hyperparameters", "[Models]")
{
@@ -275,11 +341,15 @@ TEST_CASE("KDB with hyperparameters", "[Models]")
REQUIRE(score == Catch::Approx(0.827103).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.761682).epsilon(raw.epsilon));
}
TEST_CASE("Incorrect type of data for SPODELd", "[Models]")
TEST_CASE("Incorrect type of data for Ld models", "[Models]")
{
auto raw = RawDatasets("iris", true);
auto clf = bayesnet::SPODELd(0);
REQUIRE_THROWS_AS(clf.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing), std::runtime_error);
auto clfs = bayesnet::SPODELd(0);
REQUIRE_THROWS_AS(clfs.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing), std::runtime_error);
auto clft = bayesnet::TANLd();
REQUIRE_THROWS_AS(clft.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing), std::runtime_error);
auto clfk = bayesnet::KDBLd(0);
REQUIRE_THROWS_AS(clfk.fit(raw.dataset, raw.features, raw.className, raw.states, raw.smoothing), std::runtime_error);
}
TEST_CASE("Predict, predict_proba & score without fitting", "[Models]")
{
@@ -337,14 +407,15 @@ TEST_CASE("Check proposal checkInput", "[Models]")
{
class testProposal : public bayesnet::Proposal {
public:
testProposal(torch::Tensor& dataset_, std::vector<std::string>& features_, std::string& className_)
: Proposal(dataset_, features_, className_)
testProposal(torch::Tensor& dataset_, std::vector<std::string>& features_, std::string& className_, std::vector<std::string>& notes_)
: Proposal(dataset_, features_, className_, notes_)
{
}
void test_X_y(const torch::Tensor& X, const torch::Tensor& y) { checkInput(X, y); }
};
auto raw = RawDatasets("iris", true);
auto clf = testProposal(raw.dataset, raw.features, raw.className);
std::vector<std::string> notes;
auto clf = testProposal(raw.dataset, raw.features, raw.className, notes);
torch::Tensor X = torch::randint(0, 3, { 10, 4 });
torch::Tensor y = torch::rand({ 10 });
INFO("Check X is not float");
@@ -379,3 +450,49 @@ TEST_CASE("Check KDB loop detection", "[Models]")
REQUIRE_NOTHROW(clf.test_add_m_edges(features, 0, S, weights));
REQUIRE_NOTHROW(clf.test_add_m_edges(features, 1, S, weights));
}
TEST_CASE("Local discretization hyperparameters", "[Models]")
{
auto raw = RawDatasets("iris", false);
auto clfs = bayesnet::SPODELd(0);
clfs.setHyperparameters({
{"max_iterations", 7},
{"verbose_convergence", true},
});
REQUIRE_NOTHROW(clfs.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing));
REQUIRE(clfs.getStatus() == bayesnet::NORMAL);
auto clfk = bayesnet::KDBLd(0);
clfk.setHyperparameters({
{"k", 3},
{"theta", 1e-4},
});
REQUIRE_NOTHROW(clfk.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing));
REQUIRE(clfk.getStatus() == bayesnet::NORMAL);
auto clfa = bayesnet::AODELd();
clfa.setHyperparameters({
{"ld_proposed_cuts", 9},
{"ld_algorithm", "BINQ"},
});
REQUIRE_NOTHROW(clfa.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing));
REQUIRE(clfa.getStatus() == bayesnet::NORMAL);
auto clft = bayesnet::TANLd();
clft.setHyperparameters({
{"ld_proposed_cuts", 7},
{"mdlp_max_depth", 5},
{"mdlp_min_length", 3},
{"ld_algorithm", "MDLP"},
});
REQUIRE_NOTHROW(clft.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing));
REQUIRE(clft.getStatus() == bayesnet::NORMAL);
clft.setHyperparameters({
{"ld_proposed_cuts", 9},
{"ld_algorithm", "BINQ"},
});
REQUIRE_NOTHROW(clft.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing));
REQUIRE(clft.getStatus() == bayesnet::NORMAL);
clft.setHyperparameters({
{"ld_proposed_cuts", 5},
{"ld_algorithm", "BINU"},
});
REQUIRE_NOTHROW(clft.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states, raw.smoothing));
REQUIRE(clft.getStatus() == bayesnet::NORMAL);
}

View File

@@ -338,6 +338,190 @@ TEST_CASE("Test Bayesian Network", "[Network]")
REQUIRE_THROWS_AS(net5.addEdge("A", "B"), std::logic_error);
REQUIRE_THROWS_WITH(net5.addEdge("A", "B"), "Cannot add edge to a fitted network. Initialize first.");
}
SECTION("Test assignment operator")
{
INFO("Test assignment operator");
// Create original network
auto net1 = bayesnet::Network();
buildModel(net1, raw.features, raw.className);
net1.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states, raw.smoothing);
// Create empty network and assign
auto net2 = bayesnet::Network();
net2.addNode("TempNode"); // Add something to make sure it gets cleared
net2 = net1;
// Verify they are equal
REQUIRE(net1.getFeatures() == net2.getFeatures());
REQUIRE(net1.getEdges() == net2.getEdges());
REQUIRE(net1.getNumEdges() == net2.getNumEdges());
REQUIRE(net1.getStates() == net2.getStates());
REQUIRE(net1.getClassName() == net2.getClassName());
REQUIRE(net1.getClassNumStates() == net2.getClassNumStates());
REQUIRE(net1.getSamples().size(0) == net2.getSamples().size(0));
REQUIRE(net1.getSamples().size(1) == net2.getSamples().size(1));
REQUIRE(net1.getNodes().size() == net2.getNodes().size());
// Verify topology equality
REQUIRE(net1 == net2);
// Verify they are separate objects by modifying one
net2.initialize();
net2.addNode("OnlyInNet2");
REQUIRE(net1.getNodes().size() != net2.getNodes().size());
REQUIRE_FALSE(net1 == net2);
}
SECTION("Test self assignment")
{
INFO("Test self assignment");
buildModel(net, raw.features, raw.className);
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states, raw.smoothing);
int original_edges = net.getNumEdges();
int original_nodes = net.getNodes().size();
// Self assignment should not corrupt the network
net = net;
auto all_features = raw.features;
all_features.push_back(raw.className);
REQUIRE(net.getNumEdges() == original_edges);
REQUIRE(net.getNodes().size() == original_nodes);
REQUIRE(net.getFeatures() == all_features);
REQUIRE(net.getClassName() == raw.className);
}
SECTION("Test operator== topology comparison")
{
INFO("Test operator== topology comparison");
// Test 1: Two identical networks
auto net1 = bayesnet::Network();
auto net2 = bayesnet::Network();
net1.addNode("A");
net1.addNode("B");
net1.addNode("C");
net1.addEdge("A", "B");
net1.addEdge("B", "C");
net2.addNode("A");
net2.addNode("B");
net2.addNode("C");
net2.addEdge("A", "B");
net2.addEdge("B", "C");
REQUIRE(net1 == net2);
// Test 2: Different nodes
auto net3 = bayesnet::Network();
net3.addNode("A");
net3.addNode("D"); // Different node
REQUIRE_FALSE(net1 == net3);
// Test 3: Same nodes, different edges
auto net4 = bayesnet::Network();
net4.addNode("A");
net4.addNode("B");
net4.addNode("C");
net4.addEdge("A", "C"); // Different topology
net4.addEdge("B", "C");
REQUIRE_FALSE(net1 == net4);
// Test 4: Empty networks
auto net5 = bayesnet::Network();
auto net6 = bayesnet::Network();
REQUIRE(net5 == net6);
// Test 5: Same topology, different edge order
auto net7 = bayesnet::Network();
net7.addNode("A");
net7.addNode("B");
net7.addNode("C");
net7.addEdge("B", "C"); // Add edges in different order
net7.addEdge("A", "B");
REQUIRE(net1 == net7); // Should still be equal
}
SECTION("Test RAII compliance with smart pointers")
{
INFO("Test RAII compliance with smart pointers");
std::unique_ptr<bayesnet::Network> net1 = std::make_unique<bayesnet::Network>();
buildModel(*net1, raw.features, raw.className);
net1->fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states, raw.smoothing);
// Test that copy constructor works with smart pointers
std::unique_ptr<bayesnet::Network> net2 = std::make_unique<bayesnet::Network>(*net1);
REQUIRE(*net1 == *net2);
REQUIRE(net1->getNumEdges() == net2->getNumEdges());
REQUIRE(net1->getNodes().size() == net2->getNodes().size());
// Destroy original
net1.reset();
// Test predictions still work
std::vector<std::vector<int>> test = { {1}, {2}, {0}, {1} };
REQUIRE_NOTHROW(net2->predict(test));
// net2 should still be valid and functional
net2->initialize();
REQUIRE_NOTHROW(net2->addNode("NewNode"));
REQUIRE(net2->getNodes().count("NewNode") == 1);
}
SECTION("Test complex topology copy")
{
INFO("Test complex topology copy");
auto original = bayesnet::Network();
// Create a more complex network
original.addNode("Root");
original.addNode("Child1");
original.addNode("Child2");
original.addNode("Grandchild1");
original.addNode("Grandchild2");
original.addNode("Grandchild3");
original.addEdge("Root", "Child1");
original.addEdge("Root", "Child2");
original.addEdge("Child1", "Grandchild1");
original.addEdge("Child1", "Grandchild2");
original.addEdge("Child2", "Grandchild3");
// Copy it
auto copy = original;
// Verify topology is identical
REQUIRE(original == copy);
REQUIRE(original.getNodes().size() == copy.getNodes().size());
REQUIRE(original.getNumEdges() == copy.getNumEdges());
// Verify edges are properly reconstructed
auto originalEdges = original.getEdges();
auto copyEdges = copy.getEdges();
REQUIRE(originalEdges.size() == copyEdges.size());
// Verify node relationships are properly copied
for (const auto& nodePair : original.getNodes()) {
const std::string& nodeName = nodePair.first;
auto* originalNode = nodePair.second.get();
auto* copyNode = copy.getNodes().at(nodeName).get();
REQUIRE(originalNode->getParents().size() == copyNode->getParents().size());
REQUIRE(originalNode->getChildren().size() == copyNode->getChildren().size());
// Verify parent names match
for (size_t i = 0; i < originalNode->getParents().size(); ++i) {
REQUIRE(originalNode->getParents()[i]->getName() ==
copyNode->getParents()[i]->getName());
}
// Verify child names match
for (size_t i = 0; i < originalNode->getChildren().size(); ++i) {
REQUIRE(originalNode->getChildren()[i]->getName() ==
copyNode->getChildren()[i]->getName());
}
}
}
}
TEST_CASE("Test and empty Node", "[Network]")

View File

@@ -159,3 +159,47 @@ TEST_CASE("TEST MinFill method", "[Node]")
REQUIRE(node_3.minFill() == 3);
REQUIRE(node_4.minFill() == 1);
}
TEST_CASE("Test operator =", "[Node]")
{
// Generate a test to test the operator = of the Node class
// Create a node with 3 parents and 2 children
auto node = bayesnet::Node("N1");
auto parent_1 = bayesnet::Node("P1");
parent_1.setNumStates(3);
auto child_1 = bayesnet::Node("H1");
child_1.setNumStates(2);
node.addParent(&parent_1);
node.addChild(&child_1);
// Create a cpt in the node using computeCPT
auto dataset = torch::tensor({ {1, 0, 0, 1}, {0, 1, 2, 1}, {0, 1, 1, 0} });
auto states = std::vector<int>({ 2, 3, 3 });
auto features = std::vector<std::string>{ "N1", "P1", "H1" };
auto className = std::string("Class");
auto weights = torch::tensor({ 1.0, 1.0, 1.0, 1.0 }, torch::kDouble);
node.setNumStates(2);
node.computeCPT(dataset, features, 0.0, weights);
// Get the cpt of the node
auto cpt = node.getCPT();
// Check that the cpt is not empty
REQUIRE(cpt.numel() > 0);
// Check that the cpt has the correct dimensions
auto dimensions = cpt.sizes();
REQUIRE(dimensions.size() == 2);
REQUIRE(dimensions[0] == 2); // Number of states of the node
REQUIRE(dimensions[1] == 3); // Number of states of the first parent
// Create a copy of the node
bayesnet::Node node_copy("XX");
node_copy = node;
// Check that the copy has not any parents or children
auto parents = node_copy.getParents();
auto children = node_copy.getChildren();
REQUIRE(parents.size() == 0);
REQUIRE(children.size() == 0);
// Check that the copy has the same name
REQUIRE(node_copy.getName() == "N1");
// Check that the copy has the same cpt
auto cpt_copy = node_copy.getCPT();
REQUIRE(cpt_copy.equal(cpt));
// Check that the copy has the same number of states
REQUIRE(node_copy.getNumStates() == node.getNumStates());
}

View File

@@ -33,13 +33,11 @@ TEST_CASE("Feature_select IWSS", "[BoostA2DE]")
auto clf = bayesnet::BoostA2DE();
clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5 } });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 140);
REQUIRE(clf.getNumberOfEdges() == 294);
REQUIRE(clf.getNotes().size() == 4);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with IWSS");
REQUIRE(clf.getNotes()[1] == "Convergence threshold reached & 15 models eliminated");
REQUIRE(clf.getNotes()[2] == "Pairs not used in train: 2");
REQUIRE(clf.getNotes()[3] == "Number of models: 14");
REQUIRE(clf.getNumberOfNodes() == 360);
REQUIRE(clf.getNumberOfEdges() == 756);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with IWSS");
REQUIRE(clf.getNotes()[1] == "Number of models: 36");
}
TEST_CASE("Feature_select FCBF", "[BoostA2DE]")
{
@@ -64,15 +62,15 @@ TEST_CASE("Test used features in train note and score", "[BoostA2DE]")
{"select_features","CFS"},
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 144);
REQUIRE(clf.getNumberOfEdges() == 288);
REQUIRE(clf.getNumberOfNodes() == 189);
REQUIRE(clf.getNumberOfEdges() == 378);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 8 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 16");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 7 of 8 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 21");
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.856771).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.856771).epsilon(raw.epsilon));
REQUIRE(score == Catch::Approx(0.85546875f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.85546875f).epsilon(raw.epsilon));
}
TEST_CASE("Voting vs proba", "[BoostA2DE]")
{

View File

@@ -11,32 +11,35 @@
#include "TestUtils.h"
#include "bayesnet/ensembles/BoostAODE.h"
TEST_CASE("Feature_select CFS", "[BoostAODE]") {
TEST_CASE("Feature_select CFS", "[BoostAODE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({{"select_features", "CFS"}});
clf.setHyperparameters({ {"select_features", "CFS"} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 153);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 9 with CFS");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 9");
}
TEST_CASE("Feature_select IWSS", "[BoostAODE]") {
TEST_CASE("Feature_select IWSS", "[BoostAODE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({{"select_features", "IWSS"}, {"threshold", 0.5}});
clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 153);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with IWSS");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with IWSS");
REQUIRE(clf.getNotes()[1] == "Number of models: 9");
}
TEST_CASE("Feature_select FCBF", "[BoostAODE]") {
TEST_CASE("Feature_select FCBF", "[BoostAODE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({{"select_features", "FCBF"}, {"threshold", 1e-7}});
clf.setHyperparameters({ {"select_features", "FCBF"}, {"threshold", 1e-7} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 153);
@@ -44,26 +47,28 @@ TEST_CASE("Feature_select FCBF", "[BoostAODE]") {
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with FCBF");
REQUIRE(clf.getNotes()[1] == "Number of models: 9");
}
TEST_CASE("Test used features in train note and score", "[BoostAODE]") {
TEST_CASE("Test used features in train note and score", "[BoostAODE]")
{
auto raw = RawDatasets("diabetes", true);
auto clf = bayesnet::BoostAODE(true);
clf.setHyperparameters({
{"order", "asc"},
{"convergence", true},
{"select_features", "CFS"},
});
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 72);
REQUIRE(clf.getNumberOfEdges() == 120);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 8 with CFS");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 7 of 8 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 8");
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.809895813).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.809895813).epsilon(raw.epsilon));
REQUIRE(score == Catch::Approx(0.8046875f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.8046875f).epsilon(raw.epsilon));
}
TEST_CASE("Voting vs proba", "[BoostAODE]") {
TEST_CASE("Voting vs proba", "[BoostAODE]")
{
auto raw = RawDatasets("iris", true);
auto clf = bayesnet::BoostAODE(false);
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
@@ -71,7 +76,7 @@ TEST_CASE("Voting vs proba", "[BoostAODE]") {
auto pred_proba = clf.predict_proba(raw.Xv);
clf.setHyperparameters({
{"predict_voting", true},
});
});
auto score_voting = clf.score(raw.Xv, raw.yv);
auto pred_voting = clf.predict_proba(raw.Xv);
REQUIRE(score_proba == Catch::Approx(0.97333).epsilon(raw.epsilon));
@@ -81,17 +86,18 @@ TEST_CASE("Voting vs proba", "[BoostAODE]") {
REQUIRE(clf.dump_cpt().size() == 7004);
REQUIRE(clf.topological_order() == std::vector<std::string>());
}
TEST_CASE("Order asc, desc & random", "[BoostAODE]") {
TEST_CASE("Order asc, desc & random", "[BoostAODE]")
{
auto raw = RawDatasets("glass", true);
std::map<std::string, double> scores{{"asc", 0.83645f}, {"desc", 0.84579f}, {"rand", 0.84112}};
for (const std::string &order : {"asc", "desc", "rand"}) {
std::map<std::string, double> scores{ {"asc", 0.83645f}, {"desc", 0.84579f}, {"rand", 0.84112} };
for (const std::string& order : { "asc", "desc", "rand" }) {
auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({
{"order", order},
{"bisection", false},
{"maxTolerance", 1},
{"convergence", false},
});
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
@@ -100,7 +106,8 @@ TEST_CASE("Order asc, desc & random", "[BoostAODE]") {
REQUIRE(scoret == Catch::Approx(scores[order]).epsilon(raw.epsilon));
}
}
TEST_CASE("Oddities", "[BoostAODE]") {
TEST_CASE("Oddities", "[BoostAODE]")
{
auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("iris", true);
auto bad_hyper = nlohmann::json{
@@ -109,34 +116,35 @@ TEST_CASE("Oddities", "[BoostAODE]") {
{{"maxTolerance", 0}},
{{"maxTolerance", 7}},
};
for (const auto &hyper : bad_hyper.items()) {
for (const auto& hyper : bad_hyper.items()) {
INFO("BoostAODE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
}
REQUIRE_THROWS_AS(clf.setHyperparameters({{"maxTolerance", 0}}), std::invalid_argument);
REQUIRE_THROWS_AS(clf.setHyperparameters({ {"maxTolerance", 0} }), std::invalid_argument);
auto bad_hyper_fit = nlohmann::json{
{{"select_features", "IWSS"}, {"threshold", -0.01}},
{{"select_features", "IWSS"}, {"threshold", 0.51}},
{{"select_features", "FCBF"}, {"threshold", 1e-8}},
{{"select_features", "FCBF"}, {"threshold", 1.01}},
};
for (const auto &hyper : bad_hyper_fit.items()) {
for (const auto& hyper : bad_hyper_fit.items()) {
INFO("BoostAODE hyper: " << hyper.value().dump());
clf.setHyperparameters(hyper.value());
REQUIRE_THROWS_AS(clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing),
std::invalid_argument);
std::invalid_argument);
}
auto bad_hyper_fit2 = nlohmann::json{
{{"alpha_block", true}, {"block_update", true}},
{{"bisection", false}, {"block_update", true}},
};
for (const auto &hyper : bad_hyper_fit2.items()) {
for (const auto& hyper : bad_hyper_fit2.items()) {
INFO("BoostAODE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
}
}
TEST_CASE("Bisection Best", "[BoostAODE]") {
TEST_CASE("Bisection Best", "[BoostAODE]")
{
auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1200, true, false);
clf.setHyperparameters({
@@ -145,7 +153,7 @@ TEST_CASE("Bisection Best", "[BoostAODE]") {
{"convergence", true},
{"block_update", false},
{"convergence_best", false},
});
});
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 210);
REQUIRE(clf.getNumberOfEdges() == 378);
@@ -156,7 +164,8 @@ TEST_CASE("Bisection Best", "[BoostAODE]") {
REQUIRE(score == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
}
TEST_CASE("Bisection Best vs Last", "[BoostAODE]") {
TEST_CASE("Bisection Best vs Last", "[BoostAODE]")
{
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1500, true, false);
auto clf = bayesnet::BoostAODE(true);
auto hyperparameters = nlohmann::json{
@@ -176,7 +185,8 @@ TEST_CASE("Bisection Best vs Last", "[BoostAODE]") {
auto score_last = clf.score(raw.X_test, raw.y_test);
REQUIRE(score_last == Catch::Approx(0.976666689f).epsilon(raw.epsilon));
}
TEST_CASE("Block Update", "[BoostAODE]") {
TEST_CASE("Block Update", "[BoostAODE]")
{
auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("mfeat-factors", true, 500);
clf.setHyperparameters({
@@ -184,7 +194,7 @@ TEST_CASE("Block Update", "[BoostAODE]") {
{"block_update", true},
{"maxTolerance", 3},
{"convergence", true},
});
});
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 868);
REQUIRE(clf.getNumberOfEdges() == 1724);
@@ -205,13 +215,14 @@ TEST_CASE("Block Update", "[BoostAODE]") {
// }
// std::cout << "Score " << score << std::endl;
}
TEST_CASE("Alphablock", "[BoostAODE]") {
TEST_CASE("Alphablock", "[BoostAODE]")
{
auto clf_alpha = bayesnet::BoostAODE();
auto clf_no_alpha = bayesnet::BoostAODE();
auto raw = RawDatasets("diabetes", true);
clf_alpha.setHyperparameters({
{"alpha_block", true},
});
});
clf_alpha.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
clf_no_alpha.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
auto score_alpha = clf_alpha.score(raw.X_test, raw.y_test);

View File

@@ -12,6 +12,7 @@
#include "bayesnet/feature_selection/CFS.h"
#include "bayesnet/feature_selection/FCBF.h"
#include "bayesnet/feature_selection/IWSS.h"
#include "bayesnet/feature_selection/L1FS.h"
#include "TestUtils.h"
bayesnet::FeatureSelect* build_selector(RawDatasets& raw, std::string selector, double threshold, int max_features = 0)
@@ -23,6 +24,9 @@ bayesnet::FeatureSelect* build_selector(RawDatasets& raw, std::string selector,
return new bayesnet::FCBF(raw.dataset, raw.features, raw.className, max_features, raw.classNumStates, raw.weights, threshold);
} else if (selector == "IWSS") {
return new bayesnet::IWSS(raw.dataset, raw.features, raw.className, max_features, raw.classNumStates, raw.weights, threshold);
} else if (selector == "L1FS") {
// For L1FS, threshold is used as alpha parameter
return new bayesnet::L1FS(raw.dataset, raw.features, raw.className, max_features, raw.classNumStates, raw.weights, threshold);
}
return nullptr;
}
@@ -36,25 +40,30 @@ TEST_CASE("Features Selected", "[FeatureSelection]")
SECTION("Test features selected, scores and sizes")
{
map<pair<std::string, std::string>, pair<std::vector<int>, std::vector<double>>> results = {
{ {"glass", "CFS"}, { { 2, 3, 6, 1, 8, 4 }, {0.365513, 0.42895, 0.369809, 0.298294, 0.240952, 0.200915} } },
{ {"iris", "CFS"}, { { 3, 2, 1, 0 }, {0.870521, 0.890375, 0.588155, 0.41843} } },
{ {"ecoli", "CFS"}, { { 5, 0, 4, 2, 1, 6 }, {0.512319, 0.565381, 0.486025, 0.41087, 0.331423, 0.266251} } },
{ {"diabetes", "CFS"}, { { 1, 5, 7, 6, 4, 2 }, {0.132858, 0.151209, 0.14244, 0.126591, 0.106028, 0.0825904} } },
{ {"glass", "IWSS" }, { { 2, 3, 5, 7, 6 }, {0.365513, 0.42895, 0.359907, 0.273784, 0.223346} } },
{ {"iris", "IWSS"}, { { 3, 2, 0 }, {0.870521, 0.890375, 0.585426} }},
{ {"ecoli", "IWSS"}, { { 5, 6, 0, 1, 4 }, {0.512319, 0.550978, 0.475025, 0.382607, 0.308203} } },
{ {"diabetes", "IWSS"}, { { 1, 5, 4, 7, 3 }, {0.132858, 0.151209, 0.136576, 0.122097, 0.0802232} } },
{ {"glass", "CFS"}, { { 2, 3, 5, 6, 7, 1, 0, 8, 4 }, {0.365513, 0.42895, 0.46186, 0.481897, 0.500943, 0.504027, 0.505625, 0.493256, 0.478226} } },
{ {"iris", "CFS"}, { { 3, 2, 0, 1 }, {0.870521, 0.890375, 0.84104719, 0.799310961} } },
{ {"ecoli", "CFS"}, { { 5, 0, 6, 1, 4, 2, 3 }, {0.512319, 0.565381, 0.61824, 0.637094, 0.637759, 0.633802, 0.598266} } },
{ {"diabetes", "CFS"}, { { 1, 5, 7, 4, 6, 0 }, {0.132858, 0.151209, 0.148887, 0.14862, 0.142902, 0.137233} } },
{ {"glass", "IWSS" }, { { 2, 3, 5, 7, 6, 1, 0, 8, 4 }, {0.365513, 0.42895, 0.46186, 0.479866, 0.500943, 0.504027, 0.505625, 0.493256, 0.478226} } },
{ {"iris", "IWSS"}, { { 3, 2, 0 }, {0.870521, 0.890375, 0.841047} }},
{ {"ecoli", "IWSS"}, { { 5, 0, 6, 1, 4, 2, 3}, {0.512319, 0.565381, 0.61824, 0.637094, 0.637759, 0.633802, 0.598266} } },
{ {"diabetes", "IWSS"}, { { 1, 5, 4, 7, 3 }, {0.132858, 0.151209, 0.146771, 0.14862, 0.136493,} } },
{ {"glass", "FCBF" }, { { 2, 3, 5, 7, 6 }, {0.365513, 0.304911, 0.302109, 0.281621, 0.253297} } },
{ {"iris", "FCBF"}, {{ 3, 2 }, {0.870521, 0.816401} }},
{ {"ecoli", "FCBF"}, {{ 5, 0, 1, 4, 2 }, {0.512319, 0.350406, 0.260905, 0.203132, 0.11229} }},
{ {"diabetes", "FCBF"}, {{ 1, 5, 7, 6 }, {0.132858, 0.083191, 0.0480135, 0.0224186} }}
{ {"diabetes", "FCBF"}, {{ 1, 5, 7, 6 }, {0.132858, 0.083191, 0.0480135, 0.0224186} }},
{ {"glass", "L1FS" }, { { 2, 3, 5}, { 0.365513, 0.304911, 0.302109 } } },
{ {"iris", "L1FS"}, {{ 3, 2, 1, 0 }, { 0.570928, 0.37569, 0.0774792, 0.00835904 }}},
{ {"ecoli", "L1FS"}, {{ 0, 1, 6, 5, 2, 3 }, {0.490179, 0.365944, 0.291177, 0.199171, 0.0400928, 0.0192575} }},
{ {"diabetes", "L1FS"}, {{ 1, 5, 4 }, {0.132858, 0.083191, 0.0486187} }}
};
double threshold;
std::string selector;
std::vector<std::pair<std::string, double>> selectors = {
{ "CFS", 0.0 },
{ "IWSS", 0.5 },
{ "FCBF", 1e-7 }
{ "IWSS", 0.1 },
{ "FCBF", 1e-7 },
{ "L1FS", 0.01 }
};
for (const auto item : selectors) {
selector = item.first; threshold = item.second;
@@ -76,17 +85,144 @@ TEST_CASE("Features Selected", "[FeatureSelection]")
delete featureSelector;
}
}
SECTION("Test L1FS")
{
bayesnet::L1FS* featureSelector = new bayesnet::L1FS(
raw.dataset, raw.features, raw.className,
raw.features.size(), raw.classNumStates, raw.weights,
0.01, 1000, 1e-4, true
);
featureSelector->fit();
std::vector<int> selected_features = featureSelector->getFeatures();
std::vector<double> selected_scores = featureSelector->getScores();
// Check if features are selected
REQUIRE(selected_features.size() > 0);
REQUIRE(selected_scores.size() == selected_features.size());
// Scores should be non-negative (absolute coefficient values)
for (double score : selected_scores) {
REQUIRE(score >= 0.0);
}
// Scores should be in descending order
// std::cout << file_name << " " << selected_features << std::endl << "{";
for (size_t i = 1; i < selected_scores.size(); i++) {
// std::cout << selected_scores[i - 1] << ", ";
REQUIRE(selected_scores[i - 1] >= selected_scores[i]);
}
// std::cout << selected_scores[selected_scores.size() - 1];
// std::cout << "}" << std::endl;
delete featureSelector;
}
}
TEST_CASE("L1FS Features Selected", "[FeatureSelection]")
{
auto raw = RawDatasets("ecoli", true);
SECTION("Test L1FS with different alpha values")
{
std::vector<double> alphas = { 0.01, 0.1, 0.5 };
for (double alpha : alphas) {
bayesnet::L1FS* featureSelector = new bayesnet::L1FS(
raw.dataset, raw.features, raw.className,
raw.features.size(), raw.classNumStates, raw.weights,
alpha, 1000, 1e-4, true
);
featureSelector->fit();
INFO("Alpha: " << alpha);
std::vector<int> selected_features = featureSelector->getFeatures();
std::vector<double> selected_scores = featureSelector->getScores();
// Higher alpha should lead to fewer features
REQUIRE(selected_features.size() > 0);
REQUIRE(selected_features.size() <= raw.features.size());
REQUIRE(selected_scores.size() == selected_features.size());
// Scores should be non-negative (absolute coefficient values)
for (double score : selected_scores) {
REQUIRE(score >= 0.0);
}
// Scores should be in descending order
for (size_t i = 1; i < selected_scores.size(); i++) {
REQUIRE(selected_scores[i - 1] >= selected_scores[i]);
}
delete featureSelector;
}
}
SECTION("Test L1FS with max features limit")
{
int max_features = 2;
bayesnet::L1FS* featureSelector = new bayesnet::L1FS(
raw.dataset, raw.features, raw.className,
max_features, raw.classNumStates, raw.weights,
0.1, 1000, 1e-4, true
);
featureSelector->fit();
std::vector<int> selected_features = featureSelector->getFeatures();
REQUIRE(selected_features.size() <= max_features);
delete featureSelector;
}
SECTION("Test L1FS getCoefficients method")
{
bayesnet::L1FS* featureSelector = new bayesnet::L1FS(
raw.dataset, raw.features, raw.className,
raw.features.size(), raw.classNumStates, raw.weights,
0.1, 1000, 1e-4, true
);
// Should throw before fitting
REQUIRE_THROWS_AS(featureSelector->getCoefficients(), std::runtime_error);
REQUIRE_THROWS_WITH(featureSelector->getCoefficients(), "L1FS not fitted");
featureSelector->fit();
// Should work after fitting
auto coefficients = featureSelector->getCoefficients();
REQUIRE(coefficients.size() == raw.features.size());
delete featureSelector;
}
}
TEST_CASE("Oddities", "[FeatureSelection]")
{
auto raw = RawDatasets("iris", true);
// FCBF Limits
REQUIRE_THROWS_AS(bayesnet::FCBF(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1e-8), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::FCBF(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1e-8), "Threshold cannot be less than 1e-7");
// IWSS Limits
REQUIRE_THROWS_AS(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, -1e4), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, -1e4), "Threshold has to be in [0, 0.5]");
REQUIRE_THROWS_AS(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 0.501), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 0.501), "Threshold has to be in [0, 0.5]");
// L1FS Limits
REQUIRE_THROWS_AS(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, -0.1), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, -0.1), "Alpha (regularization strength) must be non-negative");
REQUIRE_THROWS_AS(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1.0, 0), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1.0, 0), "Maximum iterations must be positive");
REQUIRE_THROWS_AS(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1.0, 1000, 0.0), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1.0, 1000, 0.0), "Tolerance must be positive");
REQUIRE_THROWS_AS(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1.0, 1000, -1e-4), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::L1FS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1.0, 1000, -1e-4), "Tolerance must be positive");
// Not fitted error
auto selector = build_selector(raw, "CFS", 0);
const std::string message = "FeatureSelect not fitted";
@@ -96,6 +232,7 @@ TEST_CASE("Oddities", "[FeatureSelection]")
REQUIRE_THROWS_WITH(selector->getScores(), message);
delete selector;
}
TEST_CASE("Test threshold limits", "[FeatureSelection]")
{
auto raw = RawDatasets("diabetes", true);
@@ -112,4 +249,77 @@ TEST_CASE("Test threshold limits", "[FeatureSelection]")
selector->fit();
REQUIRE(selector->getFeatures().size() == 5);
delete selector;
// L1FS with different alpha values
selector = build_selector(raw, "L1FS", 0.01); // Low alpha - more features
selector->fit();
int num_features_low_alpha = selector->getFeatures().size();
delete selector;
selector = build_selector(raw, "L1FS", 0.9); // High alpha - fewer features
selector->fit();
int num_features_high_alpha = selector->getFeatures().size();
REQUIRE(num_features_high_alpha <= num_features_low_alpha);
delete selector;
// L1FS with max features limit
selector = build_selector(raw, "L1FS", 0.01, 4);
selector->fit();
REQUIRE(selector->getFeatures().size() <= 4);
delete selector;
}
TEST_CASE("L1FS Regression vs Classification", "[FeatureSelection]")
{
SECTION("Regression Task")
{
auto raw = RawDatasets("diabetes", true);
// diabetes dataset should be treated as regression (classNumStates > 2)
bayesnet::L1FS* l1fs = new bayesnet::L1FS(
raw.dataset, raw.features, raw.className,
raw.features.size(), raw.classNumStates, raw.weights,
0.1, 1000, 1e-4, true
);
l1fs->fit();
auto features = l1fs->getFeatures();
REQUIRE(features.size() > 0);
delete l1fs;
}
SECTION("Binary Classification Task")
{
// Create a simple binary classification dataset
int n_samples = 100;
int n_features = 5;
torch::Tensor X = torch::randn({ n_features, n_samples });
torch::Tensor y = (X[0] + X[2] > 0).to(torch::kFloat32);
torch::Tensor samples = torch::cat({ X, y.unsqueeze(0) }, 0);
std::vector<std::string> features;
for (int i = 0; i < n_features; ++i) {
features.push_back("feature_" + std::to_string(i));
}
torch::Tensor weights = torch::ones({ n_samples });
bayesnet::L1FS* l1fs = new bayesnet::L1FS(
samples, features, "target",
n_features, 2, weights, // 2 states = binary classification
0.1, 1000, 1e-4, true
);
l1fs->fit();
auto selected_features = l1fs->getFeatures();
REQUIRE(selected_features.size() > 0);
// Features 0 and 2 should be among the top selected
bool has_feature_0 = std::find(selected_features.begin(), selected_features.end(), 0) != selected_features.end();
bool has_feature_2 = std::find(selected_features.begin(), selected_features.end(), 2) != selected_features.end();
REQUIRE((has_feature_0 || has_feature_2));
delete l1fs;
}
}

View File

@@ -16,10 +16,10 @@
#include "TestUtils.h"
std::map<std::string, std::string> modules = {
{ "mdlp", "2.0.1" },
{ "Folding", "1.1.1" },
{ "json", "3.12" },
{ "ArffFiles", "1.1.0" }
{ "mdlp", "2.1.1" },
{ "Folding", "1.1.2" },
{ "json", "3.11" },
{ "ArffFiles", "1.2.1" }
};
TEST_CASE("MDLP", "[Modules]")

View File

@@ -11,7 +11,7 @@
#include <vector>
#include <map>
#include <tuple>
#include <ArffFiles/ArffFiles.hpp>
#include <ArffFiles.hpp>
#include <fimdlp/CPPFImdlp.h>
#include <folding.hpp>
#include <bayesnet/network/Network.h>

View File

@@ -11,7 +11,8 @@
#include "TestUtils.h"
#include "bayesnet/ensembles/XBA2DE.h"
TEST_CASE("Normal test", "[XBA2DE]") {
TEST_CASE("Normal test", "[XBA2DE]")
{
auto raw = RawDatasets("iris", true);
auto clf = bayesnet::XBA2DE();
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
@@ -25,37 +26,38 @@ TEST_CASE("Normal test", "[XBA2DE]") {
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(1.0f));
REQUIRE(clf.graph().size() == 1);
}
TEST_CASE("Feature_select CFS", "[XBA2DE]") {
TEST_CASE("Feature_select CFS", "[XBA2DE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::XBA2DE();
clf.setHyperparameters({{"select_features", "CFS"}});
clf.setHyperparameters({ {"select_features", "CFS"} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 220);
REQUIRE(clf.getNumberOfEdges() == 506);
REQUIRE(clf.getNumberOfNodes() == 360);
REQUIRE(clf.getNumberOfEdges() == 828);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 9 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 22");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 36");
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.720930219));
}
TEST_CASE("Feature_select IWSS", "[XBA2DE]") {
TEST_CASE("Feature_select IWSS", "[XBA2DE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::XBA2DE();
clf.setHyperparameters({{"select_features", "IWSS"}, {"threshold", 0.5}});
clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 220);
REQUIRE(clf.getNumberOfEdges() == 506);
REQUIRE(clf.getNotes().size() == 4);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with IWSS");
REQUIRE(clf.getNotes()[1] == "Convergence threshold reached & 15 models eliminated");
REQUIRE(clf.getNotes()[2] == "Pairs not used in train: 2");
REQUIRE(clf.getNotes()[3] == "Number of models: 22");
REQUIRE(clf.getNumberOfStates() == 5346);
REQUIRE(clf.getNumberOfNodes() == 360);
REQUIRE(clf.getNumberOfEdges() == 828);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with IWSS");
REQUIRE(clf.getNotes()[1] == "Number of models: 36");
REQUIRE(clf.getNumberOfStates() == 8748);
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.72093));
}
TEST_CASE("Feature_select FCBF", "[XBA2DE]") {
TEST_CASE("Feature_select FCBF", "[XBA2DE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::XBA2DE();
clf.setHyperparameters({{"select_features", "FCBF"}, {"threshold", 1e-7}});
clf.setHyperparameters({ {"select_features", "FCBF"}, {"threshold", 1e-7} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 290);
REQUIRE(clf.getNumberOfEdges() == 667);
@@ -66,37 +68,39 @@ TEST_CASE("Feature_select FCBF", "[XBA2DE]") {
REQUIRE(clf.getNotes()[2] == "Number of models: 29");
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.744186));
}
TEST_CASE("Test used features in train note and score", "[XBA2DE]") {
TEST_CASE("Test used features in train note and score", "[XBA2DE]")
{
auto raw = RawDatasets("diabetes", true);
auto clf = bayesnet::XBA2DE();
clf.setHyperparameters({
{"order", "asc"},
{"convergence", true},
{"select_features", "CFS"},
});
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 144);
REQUIRE(clf.getNumberOfEdges() == 320);
REQUIRE(clf.getNumberOfStates() == 5504);
REQUIRE(clf.getNumberOfNodes() == 189);
REQUIRE(clf.getNumberOfEdges() == 420);
REQUIRE(clf.getNumberOfStates() == 7224);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 8 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 16");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 7 of 8 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 21");
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.850260437f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.850260437f).epsilon(raw.epsilon));
REQUIRE(score == Catch::Approx(0.854166687f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.854166687f).epsilon(raw.epsilon));
}
TEST_CASE("Order asc, desc & random", "[XBA2DE]") {
TEST_CASE("Order asc, desc & random", "[XBA2DE]")
{
auto raw = RawDatasets("glass", true);
std::map<std::string, double> scores{{"asc", 0.827103}, {"desc", 0.808411}, {"rand", 0.827103}};
for (const std::string &order : {"asc", "desc", "rand"}) {
std::map<std::string, double> scores{ {"asc", 0.827103}, {"desc", 0.808411}, {"rand", 0.827103} };
for (const std::string& order : { "asc", "desc", "rand" }) {
auto clf = bayesnet::XBA2DE();
clf.setHyperparameters({
{"order", order},
{"bisection", false},
{"maxTolerance", 1},
{"convergence", true},
});
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
@@ -105,7 +109,8 @@ TEST_CASE("Order asc, desc & random", "[XBA2DE]") {
REQUIRE(scoret == Catch::Approx(scores[order]).epsilon(raw.epsilon));
}
}
TEST_CASE("Oddities", "[XBA2DE]") {
TEST_CASE("Oddities", "[XBA2DE]")
{
auto clf = bayesnet::XBA2DE();
auto raw = RawDatasets("iris", true);
auto bad_hyper = nlohmann::json{
@@ -114,28 +119,28 @@ TEST_CASE("Oddities", "[XBA2DE]") {
{{"maxTolerance", 0}},
{{"maxTolerance", 7}},
};
for (const auto &hyper : bad_hyper.items()) {
for (const auto& hyper : bad_hyper.items()) {
INFO("XBA2DE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
}
REQUIRE_THROWS_AS(clf.setHyperparameters({{"maxTolerance", 0}}), std::invalid_argument);
REQUIRE_THROWS_AS(clf.setHyperparameters({ {"maxTolerance", 0} }), std::invalid_argument);
auto bad_hyper_fit = nlohmann::json{
{{"select_features", "IWSS"}, {"threshold", -0.01}},
{{"select_features", "IWSS"}, {"threshold", 0.51}},
{{"select_features", "FCBF"}, {"threshold", 1e-8}},
{{"select_features", "FCBF"}, {"threshold", 1.01}},
};
for (const auto &hyper : bad_hyper_fit.items()) {
for (const auto& hyper : bad_hyper_fit.items()) {
INFO("XBA2DE hyper: " << hyper.value().dump());
clf.setHyperparameters(hyper.value());
REQUIRE_THROWS_AS(clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing),
std::invalid_argument);
std::invalid_argument);
}
auto bad_hyper_fit2 = nlohmann::json{
{{"alpha_block", true}, {"block_update", true}},
{{"bisection", false}, {"block_update", true}},
};
for (const auto &hyper : bad_hyper_fit2.items()) {
for (const auto& hyper : bad_hyper_fit2.items()) {
INFO("XBA2DE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
}
@@ -146,12 +151,13 @@ TEST_CASE("Oddities", "[XBA2DE]") {
raw.features.pop_back();
raw.features.pop_back();
raw.features.pop_back();
clf.setHyperparameters({{"select_features", "CFS"}, {"alpha_block", false}, {"block_update", false}});
clf.setHyperparameters({ {"select_features", "CFS"}, {"alpha_block", false}, {"block_update", false} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNotes().size() == 1);
REQUIRE(clf.getNotes()[0] == "No features selected in initialization");
}
TEST_CASE("Bisection Best", "[XBA2DE]") {
TEST_CASE("Bisection Best", "[XBA2DE]")
{
auto clf = bayesnet::XBA2DE();
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1200, true, false);
clf.setHyperparameters({
@@ -159,7 +165,7 @@ TEST_CASE("Bisection Best", "[XBA2DE]") {
{"maxTolerance", 3},
{"convergence", true},
{"convergence_best", false},
});
});
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 330);
REQUIRE(clf.getNumberOfEdges() == 836);
@@ -173,7 +179,8 @@ TEST_CASE("Bisection Best", "[XBA2DE]") {
REQUIRE(score == Catch::Approx(0.975).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.975).epsilon(raw.epsilon));
}
TEST_CASE("Bisection Best vs Last", "[XBA2DE]") {
TEST_CASE("Bisection Best vs Last", "[XBA2DE]")
{
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1500, true, false);
auto clf = bayesnet::XBA2DE();
auto hyperparameters = nlohmann::json{
@@ -193,7 +200,8 @@ TEST_CASE("Bisection Best vs Last", "[XBA2DE]") {
auto score_last = clf.score(raw.X_test, raw.y_test);
REQUIRE(score_last == Catch::Approx(0.99).epsilon(raw.epsilon));
}
TEST_CASE("Block Update", "[XBA2DE]") {
TEST_CASE("Block Update", "[XBA2DE]")
{
auto clf = bayesnet::XBA2DE();
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1500, true, false);
clf.setHyperparameters({
@@ -201,7 +209,7 @@ TEST_CASE("Block Update", "[XBA2DE]") {
{"block_update", true},
{"maxTolerance", 3},
{"convergence", true},
});
});
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 120);
REQUIRE(clf.getNumberOfEdges() == 304);
@@ -221,13 +229,14 @@ TEST_CASE("Block Update", "[XBA2DE]") {
/*}*/
/*std::cout << "Score " << score << std::endl;*/
}
TEST_CASE("Alphablock", "[XBA2DE]") {
TEST_CASE("Alphablock", "[XBA2DE]")
{
auto clf_alpha = bayesnet::XBA2DE();
auto clf_no_alpha = bayesnet::XBA2DE();
auto raw = RawDatasets("diabetes", true);
clf_alpha.setHyperparameters({
{"alpha_block", true},
});
});
clf_alpha.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
clf_no_alpha.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
auto score_alpha = clf_alpha.score(raw.X_test, raw.y_test);

View File

@@ -11,7 +11,8 @@
#include "TestUtils.h"
#include "bayesnet/ensembles/XBAODE.h"
TEST_CASE("Normal test", "[XBAODE]") {
TEST_CASE("Normal test", "[XBAODE]")
{
auto raw = RawDatasets("iris", true);
auto clf = bayesnet::XBAODE();
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
@@ -23,34 +24,37 @@ TEST_CASE("Normal test", "[XBAODE]") {
REQUIRE(clf.getNumberOfStates() == 256);
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.933333));
}
TEST_CASE("Feature_select CFS", "[XBAODE]") {
TEST_CASE("Feature_select CFS", "[XBAODE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::XBAODE();
clf.setHyperparameters({{"select_features", "CFS"}});
clf.setHyperparameters({ {"select_features", "CFS"} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 171);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 9 with CFS");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 9");
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.720930219));
}
TEST_CASE("Feature_select IWSS", "[XBAODE]") {
TEST_CASE("Feature_select IWSS", "[XBAODE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::XBAODE();
clf.setHyperparameters({{"select_features", "IWSS"}, {"threshold", 0.5}});
clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 171);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with IWSS");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 9 of 9 with IWSS");
REQUIRE(clf.getNotes()[1] == "Number of models: 9");
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.697674394));
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.720930219f));
}
TEST_CASE("Feature_select FCBF", "[XBAODE]") {
TEST_CASE("Feature_select FCBF", "[XBAODE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::XBAODE();
clf.setHyperparameters({{"select_features", "FCBF"}, {"threshold", 1e-7}});
clf.setHyperparameters({ {"select_features", "FCBF"}, {"threshold", 1e-7} });
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 171);
@@ -59,36 +63,38 @@ TEST_CASE("Feature_select FCBF", "[XBAODE]") {
REQUIRE(clf.getNotes()[1] == "Number of models: 9");
REQUIRE(clf.score(raw.X_test, raw.y_test) == Catch::Approx(0.720930219));
}
TEST_CASE("Test used features in train note and score", "[XBAODE]") {
TEST_CASE("Test used features in train note and score", "[XBAODE]")
{
auto raw = RawDatasets("diabetes", true);
auto clf = bayesnet::XBAODE();
clf.setHyperparameters({
{"order", "asc"},
{"convergence", true},
{"select_features", "CFS"},
});
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 72);
REQUIRE(clf.getNumberOfEdges() == 136);
REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 8 with CFS");
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 7 of 8 with CFS");
REQUIRE(clf.getNotes()[1] == "Number of models: 8");
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.819010437f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.819010437f).epsilon(raw.epsilon));
REQUIRE(score == Catch::Approx(0.82421875f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.82421875f).epsilon(raw.epsilon));
}
TEST_CASE("Order asc, desc & random", "[XBAODE]") {
TEST_CASE("Order asc, desc & random", "[XBAODE]")
{
auto raw = RawDatasets("glass", true);
std::map<std::string, double> scores{{"asc", 0.83645f}, {"desc", 0.84579f}, {"rand", 0.84112}};
for (const std::string &order : {"asc", "desc", "rand"}) {
std::map<std::string, double> scores{ {"asc", 0.83645f}, {"desc", 0.84579f}, {"rand", 0.84112} };
for (const std::string& order : { "asc", "desc", "rand" }) {
auto clf = bayesnet::XBAODE();
clf.setHyperparameters({
{"order", order},
{"bisection", false},
{"maxTolerance", 1},
{"convergence", false},
});
});
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing);
auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt);
@@ -97,7 +103,8 @@ TEST_CASE("Order asc, desc & random", "[XBAODE]") {
REQUIRE(scoret == Catch::Approx(scores[order]).epsilon(raw.epsilon));
}
}
TEST_CASE("Oddities", "[XBAODE]") {
TEST_CASE("Oddities", "[XBAODE]")
{
auto clf = bayesnet::XBAODE();
auto raw = RawDatasets("iris", true);
auto bad_hyper = nlohmann::json{
@@ -106,33 +113,34 @@ TEST_CASE("Oddities", "[XBAODE]") {
{{"maxTolerance", 0}},
{{"maxTolerance", 7}},
};
for (const auto &hyper : bad_hyper.items()) {
for (const auto& hyper : bad_hyper.items()) {
INFO("XBAODE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
}
REQUIRE_THROWS_AS(clf.setHyperparameters({{"maxTolerance", 0}}), std::invalid_argument);
REQUIRE_THROWS_AS(clf.setHyperparameters({ {"maxTolerance", 0} }), std::invalid_argument);
auto bad_hyper_fit = nlohmann::json{
{{"select_features", "IWSS"}, {"threshold", -0.01}},
{{"select_features", "IWSS"}, {"threshold", 0.51}},
{{"select_features", "FCBF"}, {"threshold", 1e-8}},
{{"select_features", "FCBF"}, {"threshold", 1.01}},
};
for (const auto &hyper : bad_hyper_fit.items()) {
for (const auto& hyper : bad_hyper_fit.items()) {
INFO("XBAODE hyper: " << hyper.value().dump());
clf.setHyperparameters(hyper.value());
REQUIRE_THROWS_AS(clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states, raw.smoothing),
std::invalid_argument);
std::invalid_argument);
}
auto bad_hyper_fit2 = nlohmann::json{
{{"alpha_block", true}, {"block_update", true}},
{{"bisection", false}, {"block_update", true}},
};
for (const auto &hyper : bad_hyper_fit2.items()) {
for (const auto& hyper : bad_hyper_fit2.items()) {
INFO("XBAODE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
}
}
TEST_CASE("Bisection Best", "[XBAODE]") {
TEST_CASE("Bisection Best", "[XBAODE]")
{
auto clf = bayesnet::XBAODE();
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1200, true, false);
clf.setHyperparameters({
@@ -140,7 +148,7 @@ TEST_CASE("Bisection Best", "[XBAODE]") {
{"maxTolerance", 3},
{"convergence", true},
{"convergence_best", false},
});
});
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 210);
REQUIRE(clf.getNumberOfEdges() == 406);
@@ -151,7 +159,8 @@ TEST_CASE("Bisection Best", "[XBAODE]") {
REQUIRE(score == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
}
TEST_CASE("Bisection Best vs Last", "[XBAODE]") {
TEST_CASE("Bisection Best vs Last", "[XBAODE]")
{
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1500, true, false);
auto clf = bayesnet::XBAODE();
auto hyperparameters = nlohmann::json{
@@ -171,7 +180,8 @@ TEST_CASE("Bisection Best vs Last", "[XBAODE]") {
auto score_last = clf.score(raw.X_test, raw.y_test);
REQUIRE(score_last == Catch::Approx(0.976666689f).epsilon(raw.epsilon));
}
TEST_CASE("Block Update", "[XBAODE]") {
TEST_CASE("Block Update", "[XBAODE]")
{
auto clf = bayesnet::XBAODE();
auto raw = RawDatasets("mfeat-factors", true, 500);
clf.setHyperparameters({
@@ -179,7 +189,7 @@ TEST_CASE("Block Update", "[XBAODE]") {
{"block_update", true},
{"maxTolerance", 3},
{"convergence", true},
});
});
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
REQUIRE(clf.getNumberOfNodes() == 1085);
REQUIRE(clf.getNumberOfEdges() == 2165);
@@ -200,13 +210,14 @@ TEST_CASE("Block Update", "[XBAODE]") {
// }
// std::cout << "Score " << score << std::endl;
}
TEST_CASE("Alphablock", "[XBAODE]") {
TEST_CASE("Alphablock", "[XBAODE]")
{
auto clf_alpha = bayesnet::XBAODE();
auto clf_no_alpha = bayesnet::XBAODE();
auto raw = RawDatasets("diabetes", true);
clf_alpha.setHyperparameters({
{"alpha_block", true},
});
});
clf_alpha.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
clf_no_alpha.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states, raw.smoothing);
auto score_alpha = clf_alpha.score(raw.X_test, raw.y_test);

View File

@@ -1,21 +0,0 @@
{
"default-registry": {
"kind": "git",
"baseline": "760bfd0c8d7c89ec640aec4df89418b7c2745605",
"repository": "https://github.com/microsoft/vcpkg"
},
"registries": [
{
"kind": "git",
"repository": "https://github.com/rmontanana/vcpkg-stash",
"baseline": "393efa4e74e053b6f02c4ab03738c8fe796b28e5",
"packages": [
"arff-files",
"fimdlp",
"libtorch-bin",
"bayesnet",
"folding"
]
}
]
}

View File

@@ -1,40 +0,0 @@
{
"name": "bayesnet",
"version": "1.0.7",
"description": "Bayesian Network C++ Library",
"license": "MIT",
"dependencies": [
"arff-files",
"folding",
"fimdlp",
"libtorch-bin",
"nlohmann-json",
"catch2"
],
"overrides": [
{
"name": "arff-files",
"version": "1.1.0"
},
{
"name": "fimdlp",
"version": "2.0.1"
},
{
"name": "libtorch-bin",
"version": "2.7.0"
},
{
"name": "folding",
"version": "1.1.1"
},
{
"name": "nlohmann-json",
"version": "3.12.0"
},
{
"name": "catch2",
"version": "3.8.1"
}
]
}