Compare commits

...

55 Commits

Author SHA1 Message Date
015b1b0c0f Fix diagram size in manual 2024-05-28 11:43:39 +02:00
7bb8e4df01 Fix back to manual link 2024-05-23 18:59:08 +00:00
53710378de Fix manual generation and deploy 2024-05-23 17:34:48 +00:00
c833e9ba32 Remove coverage report from html folder and integrate in doc 2024-05-23 16:27:02 +02:00
f5cb46ee29 Add doc-install to Makefile 2024-05-22 12:09:58 +02:00
fa35681abe Add documentation link to readme 2024-05-22 11:39:33 +02:00
b0bd0e6eee Create doc target to build documentation 2024-05-22 11:10:21 +02:00
d43be27821 Remove manual and doc pages 2024-05-22 10:17:49 +02:00
a2853dd2e5 Add Doxygen to generate man and manual pages 2024-05-21 23:38:10 +02:00
0341bd5648 Refactor ArffFiles library as a git submodule only for tests 2024-05-21 11:50:19 +00:00
22b742f068 Convert ArffFile library to header only library 2024-05-21 10:11:33 +02:00
2584e8294d Force mutual information methods to be at least 0
There were cases where a tiny negative number was returned (less than -1e-7)
Fix mst glass test that is affected with this change
2024-05-17 11:15:45 +02:00
291ba0fb0e First functional BoostA2DE with its 1st test 2024-05-16 16:33:33 +02:00
80043d5181 First approach to BoostA2DE::trainModel 2024-05-16 14:32:59 +02:00
677ec5613d Add features used to selectKPairs 2024-05-16 14:18:45 +02:00
cccaa6e0af Complete selectKPairs method & test 2024-05-16 13:46:38 +02:00
2e3e0e0fc2 Add selectKParis method 2024-05-16 11:17:21 +02:00
8784a24898 Extract buildModel method to parent class in Boost 2024-05-15 20:00:44 +02:00
54496c68f1 Create Boost class as Boost<x> classifiers parent 2024-05-15 19:49:15 +02:00
1f236a70db Create BoostA2DE base class 2024-05-15 11:53:17 +02:00
ef3c74633c Conditional Entropy test 2024-05-15 11:28:09 +02:00
7efd95095c Merge pull request 'AnDE' (#28) from AnDE into main
Reviewed-on: #28
2024-05-15 09:16:12 +00:00
0e24135d46 Complete Conditional Mutual Information and test 2024-05-15 11:09:23 +02:00
521bfd2a8e Remove unoptimized implementation of conditionalEntropy 2024-05-15 01:24:27 +02:00
e2e0fb0c40 Implement Conditional Mutual Information 2024-05-15 00:48:02 +02:00
56b62a67cc Change BoostAODE tests results because folding upgrade 2024-05-12 20:23:05 +02:00
c0fc107abb Fix catch2 submodule config 2024-05-12 19:05:36 +02:00
d8c44b3b7c Add tests to check the correct version of the mdlp, folding and json libraries 2024-05-12 12:22:44 +02:00
6ab7cd2cbd Remove submodule catch from tests/lib 2024-05-12 11:05:53 +02:00
b578ea8a2d Remove module lib/catch2 2024-05-12 11:04:42 +02:00
9a752d15dc Change build cmake folder names to Debug & Release 2024-05-09 10:51:52 +02:00
4992685e94 Add devcontainer to repository
Fix update_coverage.py with lcov2.1 output
2024-05-08 06:42:19 +00:00
346b693c79 Update pdf coverage report 2024-05-06 18:28:15 +02:00
164c8bd90c Update changelog 2024-05-06 18:02:18 +02:00
ced29a2c2e Refactor coverage report generation
Add some tests to reach 99%
2024-05-06 17:56:00 +02:00
0ec53f405f Fix mistakes in feature selection in SPnDE
Complete the first A2DE test
Update version number
2024-05-05 11:14:01 +02:00
f806015b29 Implement SPnDE and A2DE 2024-05-05 01:35:17 +02:00
8115f25c06 Fix mispell mistake in doc 2024-05-02 10:53:15 +02:00
618a1e539c Return File Library to /lib as it is needed by Local Discretization (factorize) 2024-04-30 20:31:14 +02:00
7aeffba740 Add list of models to README 2024-04-30 18:59:38 +02:00
e79ea63afb Merge pull request 'convergence_best' (#27) from convergence_best into main
Add convergence_best as hyperparameter to allow to take the last or the best accuracy as the accuracy to compare to in convergence

Reviewed-on: #27
2024-04-30 16:22:08 +00:00
3c7382a93a Enhance tests coverage and report output 2024-04-30 14:00:24 +02:00
b4a222b100 Update gcovr configuration 2024-04-30 12:06:32 +02:00
23ef0cc5f7 Remove catch2 as submodule
Add link to pdf coverage report
2024-04-30 11:02:23 +02:00
793b2d3cd5 Refactor TestUtils to allow partial and shuffle dataset load 2024-04-30 02:11:14 +02:00
ae469b8146 Add hyperparameter convergence_best
move test libraries to test folder
2024-04-30 00:52:09 +02:00
f014928411 Update Makefile actions for coverage 2024-04-21 18:54:13 +02:00
c4b563a339 Add link to the coverage report in the README.md coverage label 2024-04-21 16:44:35 +02:00
49bb0582e6 Add Library Logo 2024-04-21 11:31:27 +02:00
b4c5261e01 Delete .github/workflows/main.yml 2024-04-20 17:54:56 +00:00
b956aa3873 Upgrade version number to 1.0.5
Fix dependency graph
Remove loguru library
2024-04-20 18:00:40 +02:00
1f06631f69 Add check dependencies in make diagrams endpoint 2024-04-19 19:47:37 +02:00
6dd589bd61 Add diagram changes to CHANGELOG 2024-04-19 18:29:43 +02:00
6475f10825 Add class and dependency diagrams 2024-04-19 14:33:00 +02:00
7d906b24d1 Merge pull request 'block_update' (#26) from block_update into main
Reviewed-on: #26
2024-04-15 10:26:50 +00:00
77 changed files with 10918 additions and 4561 deletions

39
.clang-uml Normal file
View File

@@ -0,0 +1,39 @@
compilation_database_dir: build_debug
output_directory: diagrams
diagrams:
BayesNet:
type: class
glob:
- bayesnet/*.h
- bayesnet/classifiers/*.h
- bayesnet/classifiers/*.cc
- bayesnet/ensembles/*.h
- bayesnet/ensembles/*.cc
- bayesnet/feature_selection/*.h
- bayesnet/feature_selection/*.cc
- bayesnet/network/*.h
- bayesnet/network/*.cc
- bayesnet/utils/*.h
- bayesnet/utils/*.cc
include:
# Only include entities from the following namespaces
namespaces:
- bayesnet
exclude:
access:
- private
plantuml:
style:
# Apply this style to all classes in the diagram
class: "#aliceblue;line:blue;line.dotted;text:blue"
# Apply this style to all packages in the diagram
package: "#back:grey"
# Make all template instantiation relations point upwards and draw them
# as green and dotted lines
instantiation: "up[#green,dotted]"
cmd: "/usr/bin/plantuml -tsvg \"diagrams/{}.puml\""
before:
- 'title clang-uml class diagram model'
mermaid:
before:
- 'classDiagram'

57
.devcontainer/Dockerfile Normal file
View File

@@ -0,0 +1,57 @@
FROM mcr.microsoft.com/devcontainers/cpp:ubuntu22.04
ARG REINSTALL_CMAKE_VERSION_FROM_SOURCE="3.22.2"
# Optionally install the cmake for vcpkg
COPY ./reinstall-cmake.sh /tmp/
RUN if [ "${REINSTALL_CMAKE_VERSION_FROM_SOURCE}" != "none" ]; then \
chmod +x /tmp/reinstall-cmake.sh && /tmp/reinstall-cmake.sh ${REINSTALL_CMAKE_VERSION_FROM_SOURCE}; \
fi \
&& rm -f /tmp/reinstall-cmake.sh
# [Optional] Uncomment this section to install additional vcpkg ports.
# RUN su vscode -c "${VCPKG_ROOT}/vcpkg install <your-port-name-here>"
# [Optional] Uncomment this section to install additional packages.
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends wget software-properties-common libdatetime-perl libcapture-tiny-perl libdatetime-format-dateparse-perl libgd-perl
# Add PPA for GCC 13
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update
# Install GCC 13.1
RUN apt-get install -y gcc-13 g++-13
# Install lcov 2.1
RUN wget --quiet https://github.com/linux-test-project/lcov/releases/download/v2.1/lcov-2.1.tar.gz && \
tar -xvf lcov-2.1.tar.gz && \
cd lcov-2.1 && \
make install
RUN rm lcov-2.1.tar.gz
RUN rm -fr lcov-2.1
# Install Miniconda
RUN mkdir -p /opt/conda
RUN wget --quiet "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh" -O /opt/conda/miniconda.sh && \
bash /opt/conda/miniconda.sh -b -p /opt/miniconda
# Add conda to PATH
ENV PATH=/opt/miniconda/bin:$PATH
# add CXX and CC to the environment with gcc 13
ENV CXX=/usr/bin/g++-13
ENV CC=/usr/bin/gcc-13
# link the last gcov version
RUN rm /usr/bin/gcov
RUN ln -s /usr/bin/gcov-13 /usr/bin/gcov
# change ownership of /opt/miniconda to vscode user
RUN chown -R vscode:vscode /opt/miniconda
USER vscode
RUN conda init
RUN conda install -y -c conda-forge yaml pytorch

View File

@@ -0,0 +1,37 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/cpp
{
"name": "C++",
"build": {
"dockerfile": "Dockerfile"
},
// "features": {
// "ghcr.io/devcontainers/features/conda:1": {}
// }
// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "make release && make debug && echo 'Done!'",
// Configure tool-specific properties.
// "customizations": {},
"customizations": {
// Configure properties specific to VS Code.
"vscode": {
"settings": {},
"extensions": [
"ms-vscode.cpptools",
"ms-vscode.cpptools-extension-pack",
"ms-vscode.cpptools-themes",
"ms-vscode.cmake-tools",
"ms-azuretools.vscode-docker",
"jbenden.c-cpp-flylint",
"matepek.vscode-catch2-test-adapter",
"GitHub.copilot"
]
}
}
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}

View File

@@ -0,0 +1,59 @@
#!/usr/bin/env bash
#-------------------------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See https://go.microsoft.com/fwlink/?linkid=2090316 for license information.
#-------------------------------------------------------------------------------------------------------------
#
set -e
CMAKE_VERSION=${1:-"none"}
if [ "${CMAKE_VERSION}" = "none" ]; then
echo "No CMake version specified, skipping CMake reinstallation"
exit 0
fi
# Cleanup temporary directory and associated files when exiting the script.
cleanup() {
EXIT_CODE=$?
set +e
if [[ -n "${TMP_DIR}" ]]; then
echo "Executing cleanup of tmp files"
rm -Rf "${TMP_DIR}"
fi
exit $EXIT_CODE
}
trap cleanup EXIT
echo "Installing CMake..."
apt-get -y purge --auto-remove cmake
mkdir -p /opt/cmake
architecture=$(dpkg --print-architecture)
case "${architecture}" in
arm64)
ARCH=aarch64 ;;
amd64)
ARCH=x86_64 ;;
*)
echo "Unsupported architecture ${architecture}."
exit 1
;;
esac
CMAKE_BINARY_NAME="cmake-${CMAKE_VERSION}-linux-${ARCH}.sh"
CMAKE_CHECKSUM_NAME="cmake-${CMAKE_VERSION}-SHA-256.txt"
TMP_DIR=$(mktemp -d -t cmake-XXXXXXXXXX)
echo "${TMP_DIR}"
cd "${TMP_DIR}"
curl -sSL "https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${CMAKE_BINARY_NAME}" -O
curl -sSL "https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${CMAKE_CHECKSUM_NAME}" -O
sha256sum -c --ignore-missing "${CMAKE_CHECKSUM_NAME}"
sh "${TMP_DIR}/${CMAKE_BINARY_NAME}" --prefix=/opt/cmake --skip-license
ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake
ln -s /opt/cmake/bin/ctest /usr/local/bin/ctest

12
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for more information:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
# https://containers.dev/guide/dependabot
version: 2
updates:
- package-ecosystem: "devcontainers"
directory: "/"
schedule:
interval: weekly

5
.gitignore vendored
View File

@@ -39,4 +39,9 @@ cmake-build*/**
puml/** puml/**
.vscode/settings.json .vscode/settings.json
sample/build sample/build
**/.DS_Store
docs/manual
docs/man3
docs/man
docs/Doxyfile

13
.gitmodules vendored
View File

@@ -3,11 +3,6 @@
url = https://github.com/rmontanana/mdlp url = https://github.com/rmontanana/mdlp
main = main main = main
update = merge update = merge
[submodule "lib/catch2"]
path = lib/catch2
main = v2.x
update = merge
url = https://github.com/catchorg/Catch2.git
[submodule "lib/json"] [submodule "lib/json"]
path = lib/json path = lib/json
url = https://github.com/nlohmann/json.git url = https://github.com/nlohmann/json.git
@@ -18,3 +13,11 @@
url = https://github.com/rmontanana/folding url = https://github.com/rmontanana/folding
main = main main = main
update = merge update = merge
[submodule "tests/lib/catch2"]
path = tests/lib/catch2
url = https://github.com/catchorg/Catch2.git
main = main
update = merge
[submodule "tests/lib/Files"]
path = tests/lib/Files
url = https://github.com/rmontanana/ArffFiles

View File

@@ -0,0 +1,4 @@
{
"sonarCloudOrganization": "rmontanana",
"projectKey": "rmontanana_BayesNet"
}

2
.vscode/launch.json vendored
View File

@@ -16,7 +16,7 @@
"name": "test", "name": "test",
"program": "${workspaceFolder}/build_debug/tests/TestBayesNet", "program": "${workspaceFolder}/build_debug/tests/TestBayesNet",
"args": [ "args": [
"Block Update" "[Node]"
], ],
"cwd": "${workspaceFolder}/build_debug/tests" "cwd": "${workspaceFolder}/build_debug/tests"
}, },

View File

@@ -5,7 +5,34 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [unreleased] ## [Unreleased]
### Added
- Library logo generated with <https://openart.ai> to README.md
- Link to the coverage report in the README.md coverage label.
- *convergence_best* hyperparameter to the BoostAODE class, to control the way the prior accuracy is computed if convergence is set. Default value is *false*.
- SPnDE model.
- A2DE model.
- A2DE & SPnDE tests.
- Add tests to reach 99% of coverage.
- Add tests to check the correct version of the mdlp, folding and json libraries.
- Library documentation generated with Doxygen.
- Link to documentation in the README.md.
### Internal
- Create library ShuffleArffFile to limit the number of samples with a parameter and shuffle them.
- Refactor catch2 library location to test/lib
- Refactor loadDataset function in tests.
- Remove conditionalEdgeWeights method in BayesMetrics.
- Refactor Coverage Report generation.
- Add devcontainer to work on apple silicon.
- Change build cmake folder names to Debug & Release.
- Add a Makefile target (doc) to generate the documentation.
- Add a Makefile target (doc-install) to install the documentation.
## [1.0.5] 2024-04-20
### Added ### Added
@@ -16,6 +43,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Badges of coverage and code quality (codacy) in README.md. Coverage badge is updated with *make viewcoverage* - Badges of coverage and code quality (codacy) in README.md. Coverage badge is updated with *make viewcoverage*
- Tests to reach 97% of coverage. - Tests to reach 97% of coverage.
- Copyright header to source files. - Copyright header to source files.
- Diagrams to README.md: UML class diagram & dependency diagram
- Action to create diagrams to Makefile: *make diagrams*
### Changed ### Changed
@@ -23,6 +52,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- The worse model count in BoostAODE is reset to 0 every time a new model produces better accuracy, so the tolerance of the model is meant to be the number of **consecutive** models that produce worse accuracy. - The worse model count in BoostAODE is reset to 0 every time a new model produces better accuracy, so the tolerance of the model is meant to be the number of **consecutive** models that produce worse accuracy.
- Default hyperparameter values in BoostAODE: bisection is true, maxTolerance is 3, convergence is true - Default hyperparameter values in BoostAODE: bisection is true, maxTolerance is 3, convergence is true
### Removed
- The 'predict_single' hyperparameter from the BoostAODE class.
- The 'repeatSparent' hyperparameter from the BoostAODE class.
## [1.0.4] 2024-03-06 ## [1.0.4] 2024-03-06
### Added ### Added

View File

@@ -0,0 +1,5 @@
# Set the default graph title
set(GRAPHVIZ_GRAPH_NAME "BayesNet dependency graph")
set(GRAPHVIZ_SHARED_LIBS OFF)
set(GRAPHVIZ_STATIC_LIBS ON)

View File

@@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.20) cmake_minimum_required(VERSION 3.20)
project(BayesNet project(BayesNet
VERSION 1.0.4.1 VERSION 1.0.5.1
DESCRIPTION "Bayesian Network and basic classifiers Library." DESCRIPTION "Bayesian Network and basic classifiers Library."
HOMEPAGE_URL "https://github.com/rmontanana/bayesnet" HOMEPAGE_URL "https://github.com/rmontanana/bayesnet"
LANGUAGES CXX LANGUAGES CXX
@@ -25,8 +25,12 @@ set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread") SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -O0 -g") set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -fno-elide-constructors")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3") set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3")
if (NOT ${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-default-inline")
endif()
# Options # Options
# ------- # -------
option(ENABLE_CLANG_TIDY "Enable to add clang tidy." OFF) option(ENABLE_CLANG_TIDY "Enable to add clang tidy." OFF)
@@ -60,20 +64,19 @@ endif (ENABLE_CLANG_TIDY)
# External libraries - dependencies of BayesNet # External libraries - dependencies of BayesNet
# --------------------------------------------- # ---------------------------------------------
# include(FetchContent) # include(FetchContent)
add_git_submodule("lib/mdlp")
add_git_submodule("lib/json") add_git_submodule("lib/json")
add_git_submodule("lib/mdlp")
# Subdirectories # Subdirectories
# -------------- # --------------
add_subdirectory(config) add_subdirectory(config)
add_subdirectory(lib/Files)
add_subdirectory(bayesnet) add_subdirectory(bayesnet)
# Testing # Testing
# ------- # -------
if (ENABLE_TESTING) if (ENABLE_TESTING)
MESSAGE("Testing enabled") MESSAGE("Testing enabled")
add_git_submodule("lib/catch2") add_subdirectory(tests/lib/catch2)
include(CTest) include(CTest)
add_subdirectory(tests) add_subdirectory(tests)
endif (ENABLE_TESTING) endif (ENABLE_TESTING)
@@ -86,3 +89,14 @@ install(TARGETS BayesNet
CONFIGURATIONS Release) CONFIGURATIONS Release)
install(DIRECTORY bayesnet/ DESTINATION include/bayesnet FILES_MATCHING CONFIGURATIONS Release PATTERN "*.h") install(DIRECTORY bayesnet/ DESTINATION include/bayesnet FILES_MATCHING CONFIGURATIONS Release PATTERN "*.h")
install(FILES ${CMAKE_BINARY_DIR}/configured_files/include/bayesnet/config.h DESTINATION include/bayesnet CONFIGURATIONS Release) install(FILES ${CMAKE_BINARY_DIR}/configured_files/include/bayesnet/config.h DESTINATION include/bayesnet CONFIGURATIONS Release)
# Documentation
# -------------
find_package(Doxygen)
set(DOC_DIR ${CMAKE_CURRENT_SOURCE_DIR}/docs)
set(doxyfile_in ${DOC_DIR}/Doxyfile.in)
set(doxyfile ${DOC_DIR}/Doxyfile)
configure_file(${doxyfile_in} ${doxyfile} @ONLY)
doxygen_add_docs(doxygen
WORKING_DIRECTORY ${DOC_DIR}
CONFIG_FILE ${doxyfile})

102
Makefile
View File

@@ -1,12 +1,23 @@
SHELL := /bin/bash SHELL := /bin/bash
.DEFAULT_GOAL := help .DEFAULT_GOAL := help
.PHONY: viewcoverage coverage setup help install uninstall buildr buildd test clean debug release sample updatebadge .PHONY: viewcoverage coverage setup help install uninstall diagrams buildr buildd test clean debug release sample updatebadge doc doc-install
f_release = build_release f_release = build_Release
f_debug = build_debug f_debug = build_Debug
f_diagrams = diagrams
app_targets = BayesNet app_targets = BayesNet
test_targets = TestBayesNet test_targets = TestBayesNet
clang-uml = clang-uml
plantuml = plantuml
lcov = lcov
genhtml = genhtml
dot = dot
n_procs = -j 16 n_procs = -j 16
docsrcdir = docs/manual
mansrcdir = docs/man3
mandestdir = /usr/local/share/man
sed_command_link = 's/e">LCOV -/e"><a href="https:\/\/rmontanana.github.io\/bayesnet">Back to manual<\/a> LCOV -/g'
sed_command_diagram = 's/Diagram"/Diagram" width="100%" height="100%" /g'
define ClearTests define ClearTests
@for t in $(test_targets); do \ @for t in $(test_targets); do \
@@ -31,11 +42,21 @@ setup: ## Install dependencies for tests and coverage
pip install gcovr; \ pip install gcovr; \
sudo dnf install lcov;\ sudo dnf install lcov;\
fi fi
@echo "* You should install plantuml & graphviz for the diagrams"
dependency: ## Create a dependency graph diagram of the project (build/dependency.png) diagrams: ## Create an UML class diagram & depnendency of the project (diagrams/BayesNet.png)
@which $(plantuml) || (echo ">>> Please install plantuml"; exit 1)
@which $(dot) || (echo ">>> Please install graphviz"; exit 1)
@which $(clang-uml) || (echo ">>> Please install clang-uml"; exit 1)
@export PLANTUML_LIMIT_SIZE=16384
@echo ">>> Creating UML class diagram of the project...";
@$(clang-uml) -p
@cd $(f_diagrams); \
$(plantuml) -tsvg BayesNet.puml
@echo ">>> Creating dependency graph diagram of the project..."; @echo ">>> Creating dependency graph diagram of the project...";
$(MAKE) debug $(MAKE) debug
cd $(f_debug) && cmake .. --graphviz=dependency.dot && dot -Tpng dependency.dot -o dependency.png cd $(f_debug) && cmake .. --graphviz=dependency.dot
@$(dot) -Tsvg $(f_debug)/dependency.dot.BayesNet -o $(f_diagrams)/dependency.svg
buildd: ## Build the debug targets buildd: ## Build the debug targets
cmake --build $(f_debug) -t $(app_targets) $(n_procs) cmake --build $(f_debug) -t $(app_targets) $(n_procs)
@@ -83,7 +104,7 @@ sample: ## Build sample
opt = "" opt = ""
test: ## Run tests (opt="-s") to verbose output the tests, (opt="-c='Test Maximum Spanning Tree'") to run only that section test: ## Run tests (opt="-s") to verbose output the tests, (opt="-c='Test Maximum Spanning Tree'") to run only that section
@echo ">>> Running BayesNet & Platform tests..."; @echo ">>> Running BayesNet tests...";
@$(MAKE) clean @$(MAKE) clean
@cmake --build $(f_debug) -t $(test_targets) $(n_procs) @cmake --build $(f_debug) -t $(test_targets) $(n_procs)
@for t in $(test_targets); do \ @for t in $(test_targets); do \
@@ -98,31 +119,70 @@ test: ## Run tests (opt="-s") to verbose output the tests, (opt="-c='Test Maximu
coverage: ## Run tests and generate coverage report (build/index.html) coverage: ## Run tests and generate coverage report (build/index.html)
@echo ">>> Building tests with coverage..." @echo ">>> Building tests with coverage..."
@$(MAKE) test @which $(lcov) || (echo ">>> Please install lcov"; exit 1)
@gcovr $(f_debug)/tests @if [ ! -f $(f_debug)/tests/coverage.info ] ; then $(MAKE) test ; fi
@echo ">>> Done";
viewcoverage: ## Run tests, generate coverage report and upload it to codecov (build/index.html)
@echo ">>> Building tests with coverage..."
@$(MAKE) coverage
@echo ">>> Building report..." @echo ">>> Building report..."
@cd $(f_debug)/tests; \ @cd $(f_debug)/tests; \
lcov --directory . --capture --output-file coverage.info >/dev/null 2>&1; \ $(lcov) --directory CMakeFiles --capture --demangle-cpp --ignore-errors source,source --output-file coverage.info >/dev/null 2>&1; \
lcov --remove coverage.info '/usr/*' --output-file coverage.info >/dev/null 2>&1; \ $(lcov) --remove coverage.info '/usr/*' --output-file coverage.info >/dev/null 2>&1; \
lcov --remove coverage.info 'lib/*' --output-file coverage.info >/dev/null 2>&1; \ $(lcov) --remove coverage.info 'lib/*' --output-file coverage.info >/dev/null 2>&1; \
lcov --remove coverage.info 'libtorch/*' --output-file coverage.info >/dev/null 2>&1; \ $(lcov) --remove coverage.info 'libtorch/*' --output-file coverage.info >/dev/null 2>&1; \
lcov --remove coverage.info 'tests/*' --output-file coverage.info >/dev/null 2>&1; \ $(lcov) --remove coverage.info 'tests/*' --output-file coverage.info >/dev/null 2>&1; \
lcov --remove coverage.info 'bayesnet/utils/loguru.*' --output-file coverage.info >/dev/null 2>&1; \ $(lcov) --remove coverage.info 'bayesnet/utils/loguru.*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
genhtml coverage.info --output-directory coverage >/dev/null 2>&1; $(lcov) --remove coverage.info '/opt/miniconda/*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --summary coverage.info
@$(MAKE) updatebadge @$(MAKE) updatebadge
@xdg-open $(f_debug)/tests/coverage/index.html || open $(f_debug)/tests/coverage/index.html 2>/dev/null @echo ">>> Done";
viewcoverage: ## View the html coverage report
@which $(genhtml) >/dev/null || (echo ">>> Please install lcov (genhtml not found)"; exit 1)
@if [ ! -d $(docsrcdir)/coverage ]; then mkdir -p $(docsrcdir)/coverage; fi
@if [ ! -f $(f_debug)/tests/coverage.info ]; then \
echo ">>> No coverage.info file found. Run make coverage first!"; \
exit 1; \
fi
@$(genhtml) $(f_debug)/tests/coverage.info --demangle-cpp --output-directory $(docsrcdir)/coverage --title "BayesNet Coverage Report" -s -k -f --legend >/dev/null 2>&1;
@xdg-open $(docsrcdir)/coverage/index.html || open $(docsrcdir)/coverage/index.html 2>/dev/null
@echo ">>> Done"; @echo ">>> Done";
updatebadge: ## Update the coverage badge in README.md updatebadge: ## Update the coverage badge in README.md
@which python || (echo ">>> Please install python"; exit 1)
@if [ ! -f $(f_debug)/tests/coverage.info ]; then \
echo ">>> No coverage.info file found. Run make coverage first!"; \
exit 1; \
fi
@echo ">>> Updating coverage badge..." @echo ">>> Updating coverage badge..."
@env python update_coverage.py $(f_debug)/tests @env python update_coverage.py $(f_debug)/tests
@echo ">>> Done"; @echo ">>> Done";
doc: ## Generate documentation
@echo ">>> Generating documentation..."
@cmake --build $(f_release) -t doxygen
@cp -rp diagrams $(docsrcdir)
@
@if [ "$(shell uname)" = "Darwin" ]; then \
sed -i "" $(sed_command_link) $(docsrcdir)/coverage/index.html ; \
sed -i "" $(sed_command_diagram) $(docsrcdir)/index.html ; \
else \
sed -i $(sed_command_link) $(docsrcdir)/coverage/index.html ; \
sed -i $(sed_command_diagram) $(docsrcdir)/index.html ; \
fi
@echo ">>> Done";
docdir = ""
doc-install: ## Install documentation
@echo ">>> Installing documentation..."
@if [ "$(docdir)" = "" ]; then \
echo "docdir parameter has to be set when calling doc-install"; \
exit 1; \
fi
@if [ ! -d $(docdir) ]; then \
@$(MAKE) doc; \
fi
@cp -rp $(docsrcdir)/* $(docdir)
@sudo cp -rp $(mansrcdir) $(mandestdir)
@echo ">>> Done";
help: ## Show help message help: ## Show help message
@IFS=$$'\n' ; \ @IFS=$$'\n' ; \
help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//' | sed -e 's/##/:/'`); \ help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//' | sed -e 's/##/:/'`); \

View File

@@ -1,11 +1,13 @@
# BayesNet # <img src="logo.png" alt="logo" width="50"/> BayesNet
![C++](https://img.shields.io/badge/c++-%2300599C.svg?style=flat&logo=c%2B%2B&logoColor=white) ![C++](https://img.shields.io/badge/c++-%2300599C.svg?style=flat&logo=c%2B%2B&logoColor=white)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](<https://opensource.org/licenses/MIT>) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](<https://opensource.org/licenses/MIT>)
![Gitea Release](https://img.shields.io/gitea/v/release/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es:3000) ![Gitea Release](https://img.shields.io/gitea/v/release/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es:3000)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/cf3e0ac71d764650b1bf4d8d00d303b1)](https://app.codacy.com/gh/Doctorado-ML/BayesNet/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/cf3e0ac71d764650b1bf4d8d00d303b1)](https://app.codacy.com/gh/Doctorado-ML/BayesNet/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=rmontanana_BayesNet&metric=security_rating)](https://sonarcloud.io/summary/new_code?id=rmontanana_BayesNet)
[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=rmontanana_BayesNet&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=rmontanana_BayesNet)
![Gitea Last Commit](https://img.shields.io/gitea/last-commit/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es:3000&logo=gitea) ![Gitea Last Commit](https://img.shields.io/gitea/last-commit/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es:3000&logo=gitea)
![Static Badge](https://img.shields.io/badge/Coverage-97,2%25-green) [![Coverage Badge](https://img.shields.io/badge/Coverage-97,3%25-green)](html/index.html)
Bayesian Network Classifiers using libtorch from scratch Bayesian Network Classifiers using libtorch from scratch
@@ -20,6 +22,12 @@ unzip libtorch-shared-with-deps-latest.zips
## Setup ## Setup
### Getting the code
```bash
git clone --recurse-submodules https://github.com/doctorado-ml/bayesnet
```
### Release ### Release
```bash ```bash
@@ -33,7 +41,13 @@ sudo make install
```bash ```bash
make debug make debug
make test make test
```
### Coverage
```bash
make coverage make coverage
make viewcoverage
``` ```
### Sample app ### Sample app
@@ -47,4 +61,42 @@ make sample fname=tests/data/glass.arff
## Models ## Models
### [BoostAODE](docs/BoostAODE.md) #### - TAN
#### - KDB
#### - SPODE
#### - SPnDE
#### - AODE
#### - [BoostAODE](docs/BoostAODE.md)
#### - BoostA2DE
### With Local Discretization
#### - TANLd
#### - KDBLd
#### - SPODELd
#### - AODELd
## Documentation
### [Manual](https://rmontanana.github.io/bayesnet/)
### [Coverage report](https://rmontanana.github.io/bayesnet/coverage/index.html)
## Diagrams
### UML Class Diagram
![BayesNet UML Class Diagram](diagrams/BayesNet.svg)
### Dependency Diagram
![BayesNet Dependency Diagram](diagrams/dependency.svg)

View File

@@ -1,6 +1,5 @@
include_directories( include_directories(
${BayesNet_SOURCE_DIR}/lib/mdlp ${BayesNet_SOURCE_DIR}/lib/mdlp
${BayesNet_SOURCE_DIR}/lib/Files
${BayesNet_SOURCE_DIR}/lib/folding ${BayesNet_SOURCE_DIR}/lib/folding
${BayesNet_SOURCE_DIR}/lib/json/include ${BayesNet_SOURCE_DIR}/lib/json/include
${BayesNet_SOURCE_DIR} ${BayesNet_SOURCE_DIR}

View File

@@ -4,7 +4,6 @@
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// *************************************************************** // ***************************************************************
#include <ArffFiles.h>
#include "Proposal.h" #include "Proposal.h"
namespace bayesnet { namespace bayesnet {
@@ -54,8 +53,7 @@ namespace bayesnet {
yJoinParents[i] += to_string(pDataset.index({ idx, i }).item<int>()); yJoinParents[i] += to_string(pDataset.index({ idx, i }).item<int>());
} }
} }
auto arff = ArffFiles(); auto yxv = factorize(yJoinParents);
auto yxv = arff.factorize(yJoinParents);
auto xvf_ptr = Xf.index({ index }).data_ptr<float>(); auto xvf_ptr = Xf.index({ index }).data_ptr<float>();
auto xvf = std::vector<mdlp::precision_t>(xvf_ptr, xvf_ptr + Xf.size(1)); auto xvf = std::vector<mdlp::precision_t>(xvf_ptr, xvf_ptr + Xf.size(1));
discretizers[feature]->fit(xvf, yxv); discretizers[feature]->fit(xvf, yxv);
@@ -113,4 +111,19 @@ namespace bayesnet {
} }
return Xtd; return Xtd;
} }
std::vector<int> Proposal::factorize(const std::vector<std::string>& labels_t)
{
std::vector<int> yy;
yy.reserve(labels_t.size());
std::map<std::string, int> labelMap;
int i = 0;
for (const std::string& label : labels_t) {
if (labelMap.find(label) == labelMap.end()) {
labelMap[label] = i++;
bool allDigits = std::all_of(label.begin(), label.end(), ::isdigit);
}
yy.push_back(labelMap[label]);
}
return yy;
}
} }

View File

@@ -27,6 +27,7 @@ namespace bayesnet {
torch::Tensor y; // y discrete nx1 tensor torch::Tensor y; // y discrete nx1 tensor
map<std::string, mdlp::CPPFImdlp*> discretizers; map<std::string, mdlp::CPPFImdlp*> discretizers;
private: private:
std::vector<int> factorize(const std::vector<std::string>& labels_t);
torch::Tensor& pDataset; // (n+1)xm tensor torch::Tensor& pDataset; // (n+1)xm tensor
std::vector<std::string>& pFeatures; std::vector<std::string>& pFeatures;
std::string& pClassName; std::string& pClassName;

View File

@@ -0,0 +1,38 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "SPnDE.h"
namespace bayesnet {
SPnDE::SPnDE(std::vector<int> parents) : Classifier(Network()), parents(parents) {}
void SPnDE::buildModel(const torch::Tensor& weights)
{
// 0. Add all nodes to the model
addNodes();
std::vector<int> attributes;
for (int i = 0; i < static_cast<int>(features.size()); ++i) {
if (std::find(parents.begin(), parents.end(), i) == parents.end()) {
attributes.push_back(i);
}
}
// 1. Add edges from the class node to all other nodes
// 2. Add edges from the parents nodes to all other nodes
for (const auto& attribute : attributes) {
model.addEdge(className, features[attribute]);
for (const auto& root : parents) {
model.addEdge(features[root], features[attribute]);
}
}
}
std::vector<std::string> SPnDE::graph(const std::string& name) const
{
return model.graph(name);
}
}

View File

@@ -0,0 +1,26 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef SPnDE_H
#define SPnDE_H
#include <vector>
#include "Classifier.h"
namespace bayesnet {
class SPnDE : public Classifier {
public:
explicit SPnDE(std::vector<int> parents);
virtual ~SPnDE() = default;
std::vector<std::string> graph(const std::string& name = "SPnDE") const override;
protected:
void buildModel(const torch::Tensor& weights) override;
private:
std::vector<int> parents;
};
}
#endif

View File

@@ -0,0 +1,40 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "A2DE.h"
namespace bayesnet {
A2DE::A2DE(bool predict_voting) : Ensemble(predict_voting)
{
validHyperparameters = { "predict_voting" };
}
void A2DE::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("predict_voting")) {
predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting");
}
Classifier::setHyperparameters(hyperparameters);
}
void A2DE::buildModel(const torch::Tensor& weights)
{
models.clear();
significanceModels.clear();
for (int i = 0; i < features.size() - 1; ++i) {
for (int j = i + 1; j < features.size(); ++j) {
auto model = std::make_unique<SPnDE>(std::vector<int>({ i, j }));
models.push_back(std::move(model));
}
}
n_models = static_cast<unsigned>(models.size());
significanceModels = std::vector<double>(n_models, 1.0);
}
std::vector<std::string> A2DE::graph(const std::string& title) const
{
return Ensemble::graph(title);
}
}

22
bayesnet/ensembles/A2DE.h Normal file
View File

@@ -0,0 +1,22 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef A2DE_H
#define A2DE_H
#include "bayesnet/classifiers/SPnDE.h"
#include "Ensemble.h"
namespace bayesnet {
class A2DE : public Ensemble {
public:
A2DE(bool predict_voting = false);
virtual ~A2DE() {};
void setHyperparameters(const nlohmann::json& hyperparameters) override;
std::vector<std::string> graph(const std::string& title = "A2DE") const override;
protected:
void buildModel(const torch::Tensor& weights) override;
};
}
#endif

246
bayesnet/ensembles/Boost.cc Normal file
View File

@@ -0,0 +1,246 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <folding.hpp>
#include "bayesnet/feature_selection/CFS.h"
#include "bayesnet/feature_selection/FCBF.h"
#include "bayesnet/feature_selection/IWSS.h"
#include "Boost.h"
namespace bayesnet {
Boost::Boost(bool predict_voting) : Ensemble(predict_voting)
{
validHyperparameters = { "order", "convergence", "convergence_best", "bisection", "threshold", "maxTolerance",
"predict_voting", "select_features", "block_update" };
}
void Boost::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("order")) {
std::vector<std::string> algos = { Orders.ASC, Orders.DESC, Orders.RAND };
order_algorithm = hyperparameters["order"];
if (std::find(algos.begin(), algos.end(), order_algorithm) == algos.end()) {
throw std::invalid_argument("Invalid order algorithm, valid values [" + Orders.ASC + ", " + Orders.DESC + ", " + Orders.RAND + "]");
}
hyperparameters.erase("order");
}
if (hyperparameters.contains("convergence")) {
convergence = hyperparameters["convergence"];
hyperparameters.erase("convergence");
}
if (hyperparameters.contains("convergence_best")) {
convergence_best = hyperparameters["convergence_best"];
hyperparameters.erase("convergence_best");
}
if (hyperparameters.contains("bisection")) {
bisection = hyperparameters["bisection"];
hyperparameters.erase("bisection");
}
if (hyperparameters.contains("threshold")) {
threshold = hyperparameters["threshold"];
hyperparameters.erase("threshold");
}
if (hyperparameters.contains("maxTolerance")) {
maxTolerance = hyperparameters["maxTolerance"];
if (maxTolerance < 1 || maxTolerance > 4)
throw std::invalid_argument("Invalid maxTolerance value, must be greater in [1, 4]");
hyperparameters.erase("maxTolerance");
}
if (hyperparameters.contains("predict_voting")) {
predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting");
}
if (hyperparameters.contains("select_features")) {
auto selectedAlgorithm = hyperparameters["select_features"];
std::vector<std::string> algos = { SelectFeatures.IWSS, SelectFeatures.CFS, SelectFeatures.FCBF };
selectFeatures = true;
select_features_algorithm = selectedAlgorithm;
if (std::find(algos.begin(), algos.end(), selectedAlgorithm) == algos.end()) {
throw std::invalid_argument("Invalid selectFeatures value, valid values [" + SelectFeatures.IWSS + ", " + SelectFeatures.CFS + ", " + SelectFeatures.FCBF + "]");
}
hyperparameters.erase("select_features");
}
if (hyperparameters.contains("block_update")) {
block_update = hyperparameters["block_update"];
hyperparameters.erase("block_update");
}
Classifier::setHyperparameters(hyperparameters);
}
void Boost::buildModel(const torch::Tensor& weights)
{
// Models shall be built in trainModel
models.clear();
significanceModels.clear();
n_models = 0;
// Prepare the validation dataset
auto y_ = dataset.index({ -1, "..." });
if (convergence) {
// Prepare train & validation sets from train data
auto fold = folding::StratifiedKFold(5, y_, 271);
auto [train, test] = fold.getFold(0);
auto train_t = torch::tensor(train);
auto test_t = torch::tensor(test);
// Get train and validation sets
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), train_t });
y_train = dataset.index({ -1, train_t });
X_test = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), test_t });
y_test = dataset.index({ -1, test_t });
dataset = X_train;
m = X_train.size(1);
auto n_classes = states.at(className).size();
// Build dataset with train data
buildDataset(y_train);
metrics = Metrics(dataset, features, className, n_classes);
} else {
// Use all data to train
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." });
y_train = y_;
}
}
std::vector<int> Boost::featureSelection(torch::Tensor& weights_)
{
int maxFeatures = 0;
if (select_features_algorithm == SelectFeatures.CFS) {
featureSelector = new CFS(dataset, features, className, maxFeatures, states.at(className).size(), weights_);
} else if (select_features_algorithm == SelectFeatures.IWSS) {
if (threshold < 0 || threshold >0.5) {
throw std::invalid_argument("Invalid threshold value for " + SelectFeatures.IWSS + " [0, 0.5]");
}
featureSelector = new IWSS(dataset, features, className, maxFeatures, states.at(className).size(), weights_, threshold);
} else if (select_features_algorithm == SelectFeatures.FCBF) {
if (threshold < 1e-7 || threshold > 1) {
throw std::invalid_argument("Invalid threshold value for " + SelectFeatures.FCBF + " [1e-7, 1]");
}
featureSelector = new FCBF(dataset, features, className, maxFeatures, states.at(className).size(), weights_, threshold);
}
featureSelector->fit();
auto featuresUsed = featureSelector->getFeatures();
delete featureSelector;
return featuresUsed;
}
std::tuple<torch::Tensor&, double, bool> Boost::update_weights(torch::Tensor& ytrain, torch::Tensor& ypred, torch::Tensor& weights)
{
bool terminate = false;
double alpha_t = 0;
auto mask_wrong = ypred != ytrain;
auto mask_right = ypred == ytrain;
auto masked_weights = weights * mask_wrong.to(weights.dtype());
double epsilon_t = masked_weights.sum().item<double>();
if (epsilon_t > 0.5) {
// Inverse the weights policy (plot ln(wt))
// "In each round of AdaBoost, there is a sanity check to ensure that the current base
// learner is better than random guess" (Zhi-Hua Zhou, 2012)
terminate = true;
} else {
double wt = (1 - epsilon_t) / epsilon_t;
alpha_t = epsilon_t == 0 ? 1 : 0.5 * log(wt);
// Step 3.2: Update weights for next classifier
// Step 3.2.1: Update weights of wrong samples
weights += mask_wrong.to(weights.dtype()) * exp(alpha_t) * weights;
// Step 3.2.2: Update weights of right samples
weights += mask_right.to(weights.dtype()) * exp(-alpha_t) * weights;
// Step 3.3: Normalise the weights
double totalWeights = torch::sum(weights).item<double>();
weights = weights / totalWeights;
}
return { weights, alpha_t, terminate };
}
std::tuple<torch::Tensor&, double, bool> Boost::update_weights_block(int k, torch::Tensor& ytrain, torch::Tensor& weights)
{
/* Update Block algorithm
k = # of models in block
n_models = # of models in ensemble to make predictions
n_models_bak = # models saved
models = vector of models to make predictions
models_bak = models not used to make predictions
significances_bak = backup of significances vector
Case list
A) k = 1, n_models = 1 => n = 0 , n_models = n + k
B) k = 1, n_models = n + 1 => n_models = n + k
C) k > 1, n_models = k + 1 => n= 1, n_models = n + k
D) k > 1, n_models = k => n = 0, n_models = n + k
E) k > 1, n_models = k + n => n_models = n + k
A, D) n=0, k > 0, n_models == k
1. n_models_bak <- n_models
2. significances_bak <- significances
3. significances = vector(k, 1)
4. Dont move any classifiers out of models
5. n_models <- k
6. Make prediction, compute alpha, update weights
7. Dont restore any classifiers to models
8. significances <- significances_bak
9. Update last k significances
10. n_models <- n_models_bak
B, C, E) n > 0, k > 0, n_models == n + k
1. n_models_bak <- n_models
2. significances_bak <- significances
3. significances = vector(k, 1)
4. Move first n classifiers to models_bak
5. n_models <- k
6. Make prediction, compute alpha, update weights
7. Insert classifiers in models_bak to be the first n models
8. significances <- significances_bak
9. Update last k significances
10. n_models <- n_models_bak
*/
//
// Make predict with only the last k models
//
std::unique_ptr<Classifier> model;
std::vector<std::unique_ptr<Classifier>> models_bak;
// 1. n_models_bak <- n_models 2. significances_bak <- significances
auto significance_bak = significanceModels;
auto n_models_bak = n_models;
// 3. significances = vector(k, 1)
significanceModels = std::vector<double>(k, 1.0);
// 4. Move first n classifiers to models_bak
// backup the first n_models - k models (if n_models == k, don't backup any)
for (int i = 0; i < n_models - k; ++i) {
model = std::move(models[0]);
models.erase(models.begin());
models_bak.push_back(std::move(model));
}
assert(models.size() == k);
// 5. n_models <- k
n_models = k;
// 6. Make prediction, compute alpha, update weights
auto ypred = predict(X_train);
//
// Update weights
//
double alpha_t;
bool terminate;
std::tie(weights, alpha_t, terminate) = update_weights(y_train, ypred, weights);
//
// Restore the models if needed
//
// 7. Insert classifiers in models_bak to be the first n models
// if n_models_bak == k, don't restore any, because none of them were moved
if (k != n_models_bak) {
// Insert in the same order as they were extracted
int bak_size = models_bak.size();
for (int i = 0; i < bak_size; ++i) {
model = std::move(models_bak[bak_size - 1 - i]);
models_bak.erase(models_bak.end() - 1);
models.insert(models.begin(), std::move(model));
}
}
// 8. significances <- significances_bak
significanceModels = significance_bak;
//
// Update the significance of the last k models
//
// 9. Update last k significances
for (int i = 0; i < k; ++i) {
significanceModels[n_models_bak - k + i] = alpha_t;
}
// 10. n_models <- n_models_bak
n_models = n_models_bak;
return { weights, alpha_t, terminate };
}
}

View File

@@ -0,0 +1,52 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BOOST_H
#define BOOST_H
#include <string>
#include <tuple>
#include <vector>
#include <nlohmann/json.hpp>
#include <torch/torch.h>
#include "Ensemble.h"
#include "bayesnet/feature_selection/FeatureSelect.h"
namespace bayesnet {
const struct {
std::string CFS = "CFS";
std::string FCBF = "FCBF";
std::string IWSS = "IWSS";
}SelectFeatures;
const struct {
std::string ASC = "asc";
std::string DESC = "desc";
std::string RAND = "rand";
}Orders;
class Boost : public Ensemble {
public:
explicit Boost(bool predict_voting = false);
virtual ~Boost() = default;
void setHyperparameters(const nlohmann::json& hyperparameters_) override;
protected:
std::vector<int> featureSelection(torch::Tensor& weights_);
void buildModel(const torch::Tensor& weights) override;
std::tuple<torch::Tensor&, double, bool> update_weights(torch::Tensor& ytrain, torch::Tensor& ypred, torch::Tensor& weights);
std::tuple<torch::Tensor&, double, bool> update_weights_block(int k, torch::Tensor& ytrain, torch::Tensor& weights);
torch::Tensor X_train, y_train, X_test, y_test;
// Hyperparameters
bool bisection = true; // if true, use bisection stratety to add k models at once to the ensemble
int maxTolerance = 3;
std::string order_algorithm; // order to process the KBest features asc, desc, rand
bool convergence = true; //if true, stop when the model does not improve
bool convergence_best = false; // wether to keep the best accuracy to the moment or the last accuracy as prior accuracy
bool selectFeatures = false; // if true, use feature selection
std::string select_features_algorithm = Orders.DESC; // Selected feature selection algorithm
FeatureSelect* featureSelector = nullptr;
double threshold = -1;
bool block_update = false;
};
}
#endif

View File

@@ -0,0 +1,167 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <set>
#include <functional>
#include <limits.h>
#include <tuple>
#include <folding.hpp>
#include "bayesnet/feature_selection/CFS.h"
#include "bayesnet/feature_selection/FCBF.h"
#include "bayesnet/feature_selection/IWSS.h"
#include "BoostA2DE.h"
namespace bayesnet {
BoostA2DE::BoostA2DE(bool predict_voting) : Boost(predict_voting)
{
}
std::vector<int> BoostA2DE::initializeModels()
{
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
std::vector<int> featuresSelected = featureSelection(weights_);
if (featuresSelected.size() < 2) {
notes.push_back("No features selected in initialization");
status = ERROR;
return std::vector<int>();
}
for (int i = 0; i < featuresSelected.size() - 1; i++) {
for (int j = i + 1; j < featuresSelected.size(); j++) {
auto parents = { featuresSelected[i], featuresSelected[j] };
std::unique_ptr<Classifier> model = std::make_unique<SPnDE>(parents);
model->fit(dataset, features, className, states, weights_);
models.push_back(std::move(model));
significanceModels.push_back(1.0); // They will be updated later in trainModel
n_models++;
}
}
notes.push_back("Used features in initialization: " + std::to_string(featuresSelected.size()) + " of " + std::to_string(features.size()) + " with " + select_features_algorithm);
return featuresSelected;
}
void BoostA2DE::trainModel(const torch::Tensor& weights)
{
//
// Logging setup
//
// loguru::set_thread_name("BoostA2DE");
// loguru::g_stderr_verbosity = loguru::Verbosity_OFF;
// loguru::add_file("boostA2DE.log", loguru::Truncate, loguru::Verbosity_MAX);
// Algorithm based on the adaboost algorithm for classification
// as explained in Ensemble methods (Zhi-Hua Zhou, 2012)
fitted = true;
double alpha_t = 0;
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
bool finished = false;
std::vector<int> featuresUsed;
if (selectFeatures) {
featuresUsed = initializeModels();
auto ypred = predict(X_train);
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
// Update significance of the models
for (int i = 0; i < n_models; ++i) {
significanceModels[i] = alpha_t;
}
if (finished) {
return;
}
}
int numItemsPack = 0; // The counter of the models inserted in the current pack
// Variables to control the accuracy finish condition
double priorAccuracy = 0.0;
double improvement = 1.0;
double convergence_threshold = 1e-4;
int tolerance = 0; // number of times the accuracy is lower than the convergence_threshold
// Step 0: Set the finish condition
// epsilon sub t > 0.5 => inverse the weights policy
// validation error is not decreasing
// run out of features
bool ascending = order_algorithm == Orders.ASC;
std::mt19937 g{ 173 };
std::vector<std::pair<int, int>> pairSelection;
while (!finished) {
// Step 1: Build ranking with mutual information
pairSelection = metrics.SelectKPairs(weights_, featuresUsed, ascending, 0); // Get all the pairs sorted
if (order_algorithm == Orders.RAND) {
std::shuffle(pairSelection.begin(), pairSelection.end(), g);
}
int k = bisection ? pow(2, tolerance) : 1;
int counter = 0; // The model counter of the current pack
// VLOG_SCOPE_F(1, "counter=%d k=%d featureSelection.size: %zu", counter, k, featureSelection.size());
while (counter++ < k && pairSelection.size() > 0) {
auto feature_pair = pairSelection[0];
pairSelection.erase(pairSelection.begin());
std::unique_ptr<Classifier> model;
model = std::make_unique<SPnDE>(std::vector<int>({ feature_pair.first, feature_pair.second }));
model->fit(dataset, features, className, states, weights_);
alpha_t = 0.0;
if (!block_update) {
auto ypred = model->predict(X_train);
// Step 3.1: Compute the classifier amout of say
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
}
// Step 3.4: Store classifier and its accuracy to weigh its future vote
numItemsPack++;
models.push_back(std::move(model));
significanceModels.push_back(alpha_t);
n_models++;
// VLOG_SCOPE_F(2, "numItemsPack: %d n_models: %d featuresUsed: %zu", numItemsPack, n_models, featuresUsed.size());
}
if (block_update) {
std::tie(weights_, alpha_t, finished) = update_weights_block(k, y_train, weights_);
}
if (convergence && !finished) {
auto y_val_predict = predict(X_test);
double accuracy = (y_val_predict == y_test).sum().item<double>() / (double)y_test.size(0);
if (priorAccuracy == 0) {
priorAccuracy = accuracy;
} else {
improvement = accuracy - priorAccuracy;
}
if (improvement < convergence_threshold) {
// VLOG_SCOPE_F(3, " (improvement<threshold) tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance++;
} else {
// VLOG_SCOPE_F(3, "* (improvement>=threshold) Reset. tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance = 0; // Reset the counter if the model performs better
numItemsPack = 0;
}
if (convergence_best) {
// Keep the best accuracy until now as the prior accuracy
priorAccuracy = std::max(accuracy, priorAccuracy);
} else {
// Keep the last accuray obtained as the prior accuracy
priorAccuracy = accuracy;
}
}
// VLOG_SCOPE_F(1, "tolerance: %d featuresUsed.size: %zu features.size: %zu", tolerance, featuresUsed.size(), features.size());
finished = finished || tolerance > maxTolerance || pairSelection.size() == 0;
}
if (tolerance > maxTolerance) {
if (numItemsPack < n_models) {
notes.push_back("Convergence threshold reached & " + std::to_string(numItemsPack) + " models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & %d models eliminated of %d", numItemsPack, n_models);
for (int i = 0; i < numItemsPack; ++i) {
significanceModels.pop_back();
models.pop_back();
n_models--;
}
} else {
notes.push_back("Convergence threshold reached & 0 models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & 0 models eliminated n_models=%d numItemsPack=%d", n_models, numItemsPack);
}
}
if (pairSelection.size() > 0) {
notes.push_back("Pairs not used in train: " + std::to_string(pairSelection.size()));
status = WARNING;
}
notes.push_back("Number of models: " + std::to_string(n_models));
}
std::vector<std::string> BoostA2DE::graph(const std::string& title) const
{
return Ensemble::graph(title);
}
}

View File

@@ -0,0 +1,25 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BOOSTA2DE_H
#define BOOSTA2DE_H
#include <string>
#include <vector>
#include "bayesnet/classifiers/SPnDE.h"
#include "Boost.h"
namespace bayesnet {
class BoostA2DE : public Boost {
public:
explicit BoostA2DE(bool predict_voting = false);
virtual ~BoostA2DE() = default;
std::vector<std::string> graph(const std::string& title = "BoostA2DE") const override;
protected:
void trainModel(const torch::Tensor& weights) override;
private:
std::vector<int> initializeModels();
};
}
#endif

View File

@@ -4,276 +4,41 @@
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// *************************************************************** // ***************************************************************
#include <random>
#include <set> #include <set>
#include <functional> #include <functional>
#include <limits.h> #include <limits.h>
#include <tuple> #include <tuple>
#include <folding.hpp>
#include "bayesnet/feature_selection/CFS.h"
#include "bayesnet/feature_selection/FCBF.h"
#include "bayesnet/feature_selection/IWSS.h"
#include "BoostAODE.h" #include "BoostAODE.h"
#include "bayesnet/utils/loguru.cpp"
namespace bayesnet { namespace bayesnet {
BoostAODE::BoostAODE(bool predict_voting) : Ensemble(predict_voting) BoostAODE::BoostAODE(bool predict_voting) : Boost(predict_voting)
{ {
validHyperparameters = {
"maxModels", "bisection", "order", "convergence", "threshold",
"select_features", "maxTolerance", "predict_voting", "block_update"
};
}
void BoostAODE::buildModel(const torch::Tensor& weights)
{
// Models shall be built in trainModel
models.clear();
significanceModels.clear();
n_models = 0;
// Prepare the validation dataset
auto y_ = dataset.index({ -1, "..." });
if (convergence) {
// Prepare train & validation sets from train data
auto fold = folding::StratifiedKFold(5, y_, 271);
auto [train, test] = fold.getFold(0);
auto train_t = torch::tensor(train);
auto test_t = torch::tensor(test);
// Get train and validation sets
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), train_t });
y_train = dataset.index({ -1, train_t });
X_test = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), test_t });
y_test = dataset.index({ -1, test_t });
dataset = X_train;
m = X_train.size(1);
auto n_classes = states.at(className).size();
// Build dataset with train data
buildDataset(y_train);
metrics = Metrics(dataset, features, className, n_classes);
} else {
// Use all data to train
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." });
y_train = y_;
}
}
void BoostAODE::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("order")) {
std::vector<std::string> algos = { Orders.ASC, Orders.DESC, Orders.RAND };
order_algorithm = hyperparameters["order"];
if (std::find(algos.begin(), algos.end(), order_algorithm) == algos.end()) {
throw std::invalid_argument("Invalid order algorithm, valid values [" + Orders.ASC + ", " + Orders.DESC + ", " + Orders.RAND + "]");
}
hyperparameters.erase("order");
}
if (hyperparameters.contains("convergence")) {
convergence = hyperparameters["convergence"];
hyperparameters.erase("convergence");
}
if (hyperparameters.contains("bisection")) {
bisection = hyperparameters["bisection"];
hyperparameters.erase("bisection");
}
if (hyperparameters.contains("threshold")) {
threshold = hyperparameters["threshold"];
hyperparameters.erase("threshold");
}
if (hyperparameters.contains("maxTolerance")) {
maxTolerance = hyperparameters["maxTolerance"];
if (maxTolerance < 1 || maxTolerance > 4)
throw std::invalid_argument("Invalid maxTolerance value, must be greater in [1, 4]");
hyperparameters.erase("maxTolerance");
}
if (hyperparameters.contains("predict_voting")) {
predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting");
}
if (hyperparameters.contains("select_features")) {
auto selectedAlgorithm = hyperparameters["select_features"];
std::vector<std::string> algos = { SelectFeatures.IWSS, SelectFeatures.CFS, SelectFeatures.FCBF };
selectFeatures = true;
select_features_algorithm = selectedAlgorithm;
if (std::find(algos.begin(), algos.end(), selectedAlgorithm) == algos.end()) {
throw std::invalid_argument("Invalid selectFeatures value, valid values [" + SelectFeatures.IWSS + ", " + SelectFeatures.CFS + ", " + SelectFeatures.FCBF + "]");
}
hyperparameters.erase("select_features");
}
if (hyperparameters.contains("block_update")) {
block_update = hyperparameters["block_update"];
hyperparameters.erase("block_update");
}
Classifier::setHyperparameters(hyperparameters);
}
std::tuple<torch::Tensor&, double, bool> update_weights(torch::Tensor& ytrain, torch::Tensor& ypred, torch::Tensor& weights)
{
bool terminate = false;
double alpha_t = 0;
auto mask_wrong = ypred != ytrain;
auto mask_right = ypred == ytrain;
auto masked_weights = weights * mask_wrong.to(weights.dtype());
double epsilon_t = masked_weights.sum().item<double>();
if (epsilon_t > 0.5) {
// Inverse the weights policy (plot ln(wt))
// "In each round of AdaBoost, there is a sanity check to ensure that the current base
// learner is better than random guess" (Zhi-Hua Zhou, 2012)
terminate = true;
} else {
double wt = (1 - epsilon_t) / epsilon_t;
alpha_t = epsilon_t == 0 ? 1 : 0.5 * log(wt);
// Step 3.2: Update weights for next classifier
// Step 3.2.1: Update weights of wrong samples
weights += mask_wrong.to(weights.dtype()) * exp(alpha_t) * weights;
// Step 3.2.2: Update weights of right samples
weights += mask_right.to(weights.dtype()) * exp(-alpha_t) * weights;
// Step 3.3: Normalise the weights
double totalWeights = torch::sum(weights).item<double>();
weights = weights / totalWeights;
}
return { weights, alpha_t, terminate };
}
std::tuple<torch::Tensor&, double, bool> BoostAODE::update_weights_block(int k, torch::Tensor& ytrain, torch::Tensor& weights)
{
/* Update Block algorithm
k = # of models in block
n_models = # of models in ensemble to make predictions
n_models_bak = # models saved
models = vector of models to make predictions
models_bak = models not used to make predictions
significances_bak = backup of significances vector
Case list
A) k = 1, n_models = 1 => n = 0 , n_models = n + k
B) k = 1, n_models = n + 1 => n_models = n + k
C) k > 1, n_models = k + 1 => n= 1, n_models = n + k
D) k > 1, n_models = k => n = 0, n_models = n + k
E) k > 1, n_models = k + n => n_models = n + k
A, D) n=0, k > 0, n_models == k
1. n_models_bak <- n_models
2. significances_bak <- significances
3. significances = vector(k, 1)
4. Dont move any classifiers out of models
5. n_models <- k
6. Make prediction, compute alpha, update weights
7. Dont restore any classifiers to models
8. significances <- significances_bak
9. Update last k significances
10. n_models <- n_models_bak
B, C, E) n > 0, k > 0, n_models == n + k
1. n_models_bak <- n_models
2. significances_bak <- significances
3. significances = vector(k, 1)
4. Move first n classifiers to models_bak
5. n_models <- k
6. Make prediction, compute alpha, update weights
7. Insert classifiers in models_bak to be the first n models
8. significances <- significances_bak
9. Update last k significances
10. n_models <- n_models_bak
*/
//
// Make predict with only the last k models
//
std::unique_ptr<Classifier> model;
std::vector<std::unique_ptr<Classifier>> models_bak;
// 1. n_models_bak <- n_models 2. significances_bak <- significances
auto significance_bak = significanceModels;
auto n_models_bak = n_models;
// 3. significances = vector(k, 1)
significanceModels = std::vector<double>(k, 1.0);
// 4. Move first n classifiers to models_bak
// backup the first n_models - k models (if n_models == k, don't backup any)
VLOG_SCOPE_F(1, "upd_weights_block n_models=%d k=%d", n_models, k);
for (int i = 0; i < n_models - k; ++i) {
model = std::move(models[0]);
models.erase(models.begin());
models_bak.push_back(std::move(model));
}
assert(models.size() == k);
// 5. n_models <- k
n_models = k;
// 6. Make prediction, compute alpha, update weights
auto ypred = predict(X_train);
//
// Update weights
//
double alpha_t;
bool terminate;
std::tie(weights, alpha_t, terminate) = update_weights(y_train, ypred, weights);
//
// Restore the models if needed
//
// 7. Insert classifiers in models_bak to be the first n models
// if n_models_bak == k, don't restore any, because none of them were moved
if (k != n_models_bak) {
// Insert in the same order as they were extracted
int bak_size = models_bak.size();
for (int i = 0; i < bak_size; ++i) {
model = std::move(models_bak[bak_size - 1 - i]);
models_bak.erase(models_bak.end() - 1);
models.insert(models.begin(), std::move(model));
}
}
// 8. significances <- significances_bak
significanceModels = significance_bak;
//
// Update the significance of the last k models
//
// 9. Update last k significances
for (int i = 0; i < k; ++i) {
significanceModels[n_models_bak - k + i] = alpha_t;
}
// 10. n_models <- n_models_bak
n_models = n_models_bak;
return { weights, alpha_t, terminate };
} }
std::vector<int> BoostAODE::initializeModels() std::vector<int> BoostAODE::initializeModels()
{ {
std::vector<int> featuresUsed;
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64); torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
int maxFeatures = 0; std::vector<int> featuresSelected = featureSelection(weights_);
if (select_features_algorithm == SelectFeatures.CFS) { for (const int& feature : featuresSelected) {
featureSelector = new CFS(dataset, features, className, maxFeatures, states.at(className).size(), weights_);
} else if (select_features_algorithm == SelectFeatures.IWSS) {
if (threshold < 0 || threshold >0.5) {
throw std::invalid_argument("Invalid threshold value for " + SelectFeatures.IWSS + " [0, 0.5]");
}
featureSelector = new IWSS(dataset, features, className, maxFeatures, states.at(className).size(), weights_, threshold);
} else if (select_features_algorithm == SelectFeatures.FCBF) {
if (threshold < 1e-7 || threshold > 1) {
throw std::invalid_argument("Invalid threshold value for " + SelectFeatures.FCBF + " [1e-7, 1]");
}
featureSelector = new FCBF(dataset, features, className, maxFeatures, states.at(className).size(), weights_, threshold);
}
featureSelector->fit();
auto cfsFeatures = featureSelector->getFeatures();
auto scores = featureSelector->getScores();
for (int i = 0; i < cfsFeatures.size(); ++i) {
LOG_F(INFO, "Feature: %d Score: %f", cfsFeatures[i], scores[i]);
}
for (const int& feature : cfsFeatures) {
featuresUsed.push_back(feature);
std::unique_ptr<Classifier> model = std::make_unique<SPODE>(feature); std::unique_ptr<Classifier> model = std::make_unique<SPODE>(feature);
model->fit(dataset, features, className, states, weights_); model->fit(dataset, features, className, states, weights_);
models.push_back(std::move(model)); models.push_back(std::move(model));
significanceModels.push_back(1.0); // They will be updated later in trainModel significanceModels.push_back(1.0); // They will be updated later in trainModel
n_models++; n_models++;
} }
notes.push_back("Used features in initialization: " + std::to_string(featuresUsed.size()) + " of " + std::to_string(features.size()) + " with " + select_features_algorithm); notes.push_back("Used features in initialization: " + std::to_string(featuresSelected.size()) + " of " + std::to_string(features.size()) + " with " + select_features_algorithm);
delete featureSelector; return featuresSelected;
return featuresUsed;
} }
void BoostAODE::trainModel(const torch::Tensor& weights) void BoostAODE::trainModel(const torch::Tensor& weights)
{ {
// //
// Logging setup // Logging setup
// //
loguru::set_thread_name("BoostAODE"); // loguru::set_thread_name("BoostAODE");
loguru::g_stderr_verbosity = loguru::Verbosity_OFF;; // loguru::g_stderr_verbosity = loguru::Verbosity_OFF;
loguru::add_file("boostAODE.log", loguru::Truncate, loguru::Verbosity_MAX); // loguru::add_file("boostAODE.log", loguru::Truncate, loguru::Verbosity_MAX);
// Algorithm based on the adaboost algorithm for classification // Algorithm based on the adaboost algorithm for classification
// as explained in Ensemble methods (Zhi-Hua Zhou, 2012) // as explained in Ensemble methods (Zhi-Hua Zhou, 2012)
fitted = true; fitted = true;
@@ -292,11 +57,6 @@ namespace bayesnet {
if (finished) { if (finished) {
return; return;
} }
LOG_F(INFO, "Initial models: %d", n_models);
LOG_F(INFO, "Significances: ");
for (int i = 0; i < n_models; ++i) {
LOG_F(INFO, "i=%d significance=%f", i, significanceModels[i]);
}
} }
int numItemsPack = 0; // The counter of the models inserted in the current pack int numItemsPack = 0; // The counter of the models inserted in the current pack
// Variables to control the accuracy finish condition // Variables to control the accuracy finish condition
@@ -313,7 +73,6 @@ namespace bayesnet {
while (!finished) { while (!finished) {
// Step 1: Build ranking with mutual information // Step 1: Build ranking with mutual information
auto featureSelection = metrics.SelectKBestWeighted(weights_, ascending, n); // Get all the features sorted auto featureSelection = metrics.SelectKBestWeighted(weights_, ascending, n); // Get all the features sorted
VLOG_SCOPE_F(1, "featureSelection.size: %zu featuresUsed.size: %zu", featureSelection.size(), featuresUsed.size());
if (order_algorithm == Orders.RAND) { if (order_algorithm == Orders.RAND) {
std::shuffle(featureSelection.begin(), featureSelection.end(), g); std::shuffle(featureSelection.begin(), featureSelection.end(), g);
} }
@@ -322,9 +81,9 @@ namespace bayesnet {
{ return std::find(begin(featuresUsed), end(featuresUsed), x) != end(featuresUsed);}), { return std::find(begin(featuresUsed), end(featuresUsed), x) != end(featuresUsed);}),
end(featureSelection) end(featureSelection)
); );
int k = pow(2, tolerance); int k = bisection ? pow(2, tolerance) : 1;
int counter = 0; // The model counter of the current pack int counter = 0; // The model counter of the current pack
VLOG_SCOPE_F(1, "counter=%d k=%d featureSelection.size: %zu", counter, k, featureSelection.size()); // VLOG_SCOPE_F(1, "counter=%d k=%d featureSelection.size: %zu", counter, k, featureSelection.size());
while (counter++ < k && featureSelection.size() > 0) { while (counter++ < k && featureSelection.size() > 0) {
auto feature = featureSelection[0]; auto feature = featureSelection[0];
featureSelection.erase(featureSelection.begin()); featureSelection.erase(featureSelection.begin());
@@ -336,10 +95,6 @@ namespace bayesnet {
auto ypred = model->predict(X_train); auto ypred = model->predict(X_train);
// Step 3.1: Compute the classifier amout of say // Step 3.1: Compute the classifier amout of say
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_); std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
if (finished) {
VLOG_SCOPE_F(2, "** epsilon_t > 0.5 **");
break;
}
} }
// Step 3.4: Store classifier and its accuracy to weigh its future vote // Step 3.4: Store classifier and its accuracy to weigh its future vote
numItemsPack++; numItemsPack++;
@@ -347,7 +102,7 @@ namespace bayesnet {
models.push_back(std::move(model)); models.push_back(std::move(model));
significanceModels.push_back(alpha_t); significanceModels.push_back(alpha_t);
n_models++; n_models++;
VLOG_SCOPE_F(2, "numItemsPack: %d n_models: %d featuresUsed: %zu", numItemsPack, n_models, featuresUsed.size()); // VLOG_SCOPE_F(2, "numItemsPack: %d n_models: %d featuresUsed: %zu", numItemsPack, n_models, featuresUsed.size());
} }
if (block_update) { if (block_update) {
std::tie(weights_, alpha_t, finished) = update_weights_block(k, y_train, weights_); std::tie(weights_, alpha_t, finished) = update_weights_block(k, y_train, weights_);
@@ -357,37 +112,40 @@ namespace bayesnet {
double accuracy = (y_val_predict == y_test).sum().item<double>() / (double)y_test.size(0); double accuracy = (y_val_predict == y_test).sum().item<double>() / (double)y_test.size(0);
if (priorAccuracy == 0) { if (priorAccuracy == 0) {
priorAccuracy = accuracy; priorAccuracy = accuracy;
VLOG_SCOPE_F(3, "First accuracy: %f", priorAccuracy);
} else { } else {
improvement = accuracy - priorAccuracy; improvement = accuracy - priorAccuracy;
} }
if (improvement < convergence_threshold) { if (improvement < convergence_threshold) {
VLOG_SCOPE_F(3, "(improvement<threshold) tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy); // VLOG_SCOPE_F(3, " (improvement<threshold) tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance++; tolerance++;
} else { } else {
VLOG_SCOPE_F(3, "*(improvement>=threshold) Reset. tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy); // VLOG_SCOPE_F(3, "* (improvement>=threshold) Reset. tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance = 0; // Reset the counter if the model performs better tolerance = 0; // Reset the counter if the model performs better
numItemsPack = 0; numItemsPack = 0;
} }
if (convergence_best) {
// Keep the best accuracy until now as the prior accuracy // Keep the best accuracy until now as the prior accuracy
priorAccuracy = std::max(accuracy, priorAccuracy); priorAccuracy = std::max(accuracy, priorAccuracy);
// priorAccuracy = accuracy; } else {
// Keep the last accuray obtained as the prior accuracy
priorAccuracy = accuracy;
} }
VLOG_SCOPE_F(1, "tolerance: %d featuresUsed.size: %zu features.size: %zu", tolerance, featuresUsed.size(), features.size()); }
// VLOG_SCOPE_F(1, "tolerance: %d featuresUsed.size: %zu features.size: %zu", tolerance, featuresUsed.size(), features.size());
finished = finished || tolerance > maxTolerance || featuresUsed.size() == features.size(); finished = finished || tolerance > maxTolerance || featuresUsed.size() == features.size();
} }
if (tolerance > maxTolerance) { if (tolerance > maxTolerance) {
if (numItemsPack < n_models) { if (numItemsPack < n_models) {
notes.push_back("Convergence threshold reached & " + std::to_string(numItemsPack) + " models eliminated"); notes.push_back("Convergence threshold reached & " + std::to_string(numItemsPack) + " models eliminated");
VLOG_SCOPE_F(4, "Convergence threshold reached & %d models eliminated of %d", numItemsPack, n_models); // VLOG_SCOPE_F(4, "Convergence threshold reached & %d models eliminated of %d", numItemsPack, n_models);
for (int i = 0; i < numItemsPack; ++i) { for (int i = 0; i < numItemsPack; ++i) {
significanceModels.pop_back(); significanceModels.pop_back();
models.pop_back(); models.pop_back();
n_models--; n_models--;
} }
} else { } else {
VLOG_SCOPE_F(4, "Convergence threshold reached & 0 models eliminated n_models=%d numItemsPack=%d", n_models, numItemsPack);
notes.push_back("Convergence threshold reached & 0 models eliminated"); notes.push_back("Convergence threshold reached & 0 models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & 0 models eliminated n_models=%d numItemsPack=%d", n_models, numItemsPack);
} }
} }
if (featuresUsed.size() != features.size()) { if (featuresUsed.size() != features.size()) {

View File

@@ -6,44 +6,21 @@
#ifndef BOOSTAODE_H #ifndef BOOSTAODE_H
#define BOOSTAODE_H #define BOOSTAODE_H
#include <map> #include <string>
#include <vector>
#include "bayesnet/classifiers/SPODE.h" #include "bayesnet/classifiers/SPODE.h"
#include "bayesnet/feature_selection/FeatureSelect.h" #include "Boost.h"
#include "Ensemble.h"
namespace bayesnet { namespace bayesnet {
struct { class BoostAODE : public Boost {
std::string CFS = "CFS";
std::string FCBF = "FCBF";
std::string IWSS = "IWSS";
}SelectFeatures;
struct {
std::string ASC = "asc";
std::string DESC = "desc";
std::string RAND = "rand";
}Orders;
class BoostAODE : public Ensemble {
public: public:
BoostAODE(bool predict_voting = false); explicit BoostAODE(bool predict_voting = false);
virtual ~BoostAODE() = default; virtual ~BoostAODE() = default;
std::vector<std::string> graph(const std::string& title = "BoostAODE") const override; std::vector<std::string> graph(const std::string& title = "BoostAODE") const override;
void setHyperparameters(const nlohmann::json& hyperparameters_) override;
protected: protected:
void buildModel(const torch::Tensor& weights) override;
void trainModel(const torch::Tensor& weights) override; void trainModel(const torch::Tensor& weights) override;
private: private:
std::tuple<torch::Tensor&, double, bool> update_weights_block(int k, torch::Tensor& ytrain, torch::Tensor& weights);
std::vector<int> initializeModels(); std::vector<int> initializeModels();
torch::Tensor X_train, y_train, X_test, y_test;
// Hyperparameters
bool bisection = true; // if true, use bisection stratety to add k models at once to the ensemble
int maxTolerance = 3;
std::string order_algorithm; // order to process the KBest features asc, desc, rand
bool convergence = true; //if true, stop when the model does not improve
bool selectFeatures = false; // if true, use feature selection
std::string select_features_algorithm = Orders.DESC; // Selected feature selection algorithm
FeatureSelect* featureSelector = nullptr;
double threshold = -1;
bool block_update = false;
}; };
} }
#endif #endif

View File

@@ -410,11 +410,7 @@ namespace bayesnet {
result.insert(it2, fatherName); result.insert(it2, fatherName);
ending = false; ending = false;
} }
} else {
throw std::logic_error("Error in topological sort because of node " + feature + " is not in result");
} }
} else {
throw std::logic_error("Error in topological sort because of node father " + fatherName + " is not in result");
} }
} }
} }

View File

@@ -9,7 +9,7 @@
namespace bayesnet { namespace bayesnet {
Node::Node(const std::string& name) Node::Node(const std::string& name)
: name(name), numStates(0), cpTable(torch::Tensor()), parents(std::vector<Node*>()), children(std::vector<Node*>()) : name(name)
{ {
} }
void Node::clear() void Node::clear()
@@ -96,7 +96,6 @@ namespace bayesnet {
// Get dimensions of the CPT // Get dimensions of the CPT
dimensions.push_back(numStates); dimensions.push_back(numStates);
transform(parents.begin(), parents.end(), back_inserter(dimensions), [](const auto& parent) { return parent->getNumStates(); }); transform(parents.begin(), parents.end(), back_inserter(dimensions), [](const auto& parent) { return parent->getNumStates(); });
// Create a tensor of zeros with the dimensions of the CPT // Create a tensor of zeros with the dimensions of the CPT
cpTable = torch::zeros(dimensions, torch::kFloat) + laplaceSmoothing; cpTable = torch::zeros(dimensions, torch::kFloat) + laplaceSmoothing;
// Fill table with counts // Fill table with counts

View File

@@ -12,14 +12,6 @@
#include <torch/torch.h> #include <torch/torch.h>
namespace bayesnet { namespace bayesnet {
class Node { class Node {
private:
std::string name;
std::vector<Node*> parents;
std::vector<Node*> children;
int numStates; // number of states of the variable
torch::Tensor cpTable; // Order of indices is 0-> node variable, 1-> 1st parent, 2-> 2nd parent, ...
std::vector<int64_t> dimensions; // dimensions of the cpTable
std::vector<std::pair<std::string, std::string>> combinations(const std::vector<std::string>&);
public: public:
explicit Node(const std::string&); explicit Node(const std::string&);
void clear(); void clear();
@@ -37,6 +29,14 @@ namespace bayesnet {
unsigned minFill(); unsigned minFill();
std::vector<std::string> graph(const std::string& clasName); // Returns a std::vector of std::strings representing the graph in graphviz format std::vector<std::string> graph(const std::string& clasName); // Returns a std::vector of std::strings representing the graph in graphviz format
float getFactorValue(std::map<std::string, int>&); float getFactorValue(std::map<std::string, int>&);
private:
std::string name;
std::vector<Node*> parents;
std::vector<Node*> children;
int numStates = 0; // number of states of the variable
torch::Tensor cpTable; // Order of indices is 0-> node variable, 1-> 1st parent, 2-> 2nd parent, ...
std::vector<int64_t> dimensions; // dimensions of the cpTable
std::vector<std::pair<std::string, std::string>> combinations(const std::vector<std::string>&);
}; };
} }
#endif #endif

View File

@@ -4,29 +4,79 @@
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// *************************************************************** // ***************************************************************
#include <map>
#include <unordered_map>
#include <tuple>
#include "Mst.h" #include "Mst.h"
#include "BayesMetrics.h" #include "BayesMetrics.h"
namespace bayesnet { namespace bayesnet {
//samples is n+1xm tensor used to fit the model //samples is n+1xm tensor used to fit the model
Metrics::Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates) Metrics::Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates)
: samples(samples) : samples(samples)
, features(features)
, className(className) , className(className)
, features(features)
, classNumStates(classNumStates) , classNumStates(classNumStates)
{ {
} }
//samples is n+1xm std::vector used to fit the model //samples is n+1xm std::vector used to fit the model
Metrics::Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates) Metrics::Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates)
: features(features) : samples(torch::zeros({ static_cast<int>(vsamples.size() + 1), static_cast<int>(vsamples[0].size()) }, torch::kInt32))
, className(className) , className(className)
, features(features)
, classNumStates(classNumStates) , classNumStates(classNumStates)
, samples(torch::zeros({ static_cast<int>(vsamples.size() + 1), static_cast<int>(vsamples[0].size()) }, torch::kInt32))
{ {
for (int i = 0; i < vsamples.size(); ++i) { for (int i = 0; i < vsamples.size(); ++i) {
samples.index_put_({ i, "..." }, torch::tensor(vsamples[i], torch::kInt32)); samples.index_put_({ i, "..." }, torch::tensor(vsamples[i], torch::kInt32));
} }
samples.index_put_({ -1, "..." }, torch::tensor(labels, torch::kInt32)); samples.index_put_({ -1, "..." }, torch::tensor(labels, torch::kInt32));
} }
std::vector<std::pair<int, int>> Metrics::SelectKPairs(const torch::Tensor& weights, std::vector<int>& featuresExcluded, bool ascending, unsigned k)
{
// Return the K Best features
auto n = features.size();
// compute scores
scoresKPairs.clear();
pairsKBest.clear();
auto labels = samples.index({ -1, "..." });
for (int i = 0; i < n - 1; ++i) {
if (std::find(featuresExcluded.begin(), featuresExcluded.end(), i) != featuresExcluded.end()) {
continue;
}
for (int j = i + 1; j < n; ++j) {
if (std::find(featuresExcluded.begin(), featuresExcluded.end(), j) != featuresExcluded.end()) {
continue;
}
auto key = std::make_pair(i, j);
auto value = conditionalMutualInformation(samples.index({ i, "..." }), samples.index({ j, "..." }), labels, weights);
scoresKPairs.push_back({ key, value });
}
}
// sort scores
if (ascending) {
sort(scoresKPairs.begin(), scoresKPairs.end(), [](auto& a, auto& b)
{ return a.second < b.second; });
} else {
sort(scoresKPairs.begin(), scoresKPairs.end(), [](auto& a, auto& b)
{ return a.second > b.second; });
}
for (auto& [pairs, score] : scoresKPairs) {
pairsKBest.push_back(pairs);
}
if (k != 0 && k < pairsKBest.size()) {
if (ascending) {
int limit = pairsKBest.size() - k;
for (int i = 0; i < limit; i++) {
pairsKBest.erase(pairsKBest.begin());
scoresKPairs.erase(scoresKPairs.begin());
}
} else {
pairsKBest.resize(k);
scoresKPairs.resize(k);
}
}
return pairsKBest;
}
std::vector<int> Metrics::SelectKBestWeighted(const torch::Tensor& weights, bool ascending, unsigned k) std::vector<int> Metrics::SelectKBestWeighted(const torch::Tensor& weights, bool ascending, unsigned k)
{ {
// Return the K Best features // Return the K Best features
@@ -66,7 +116,10 @@ namespace bayesnet {
{ {
return scoresKBest; return scoresKBest;
} }
std::vector<std::pair<std::pair<int, int>, double>> Metrics::getScoresKPairs() const
{
return scoresKPairs;
}
torch::Tensor Metrics::conditionalEdge(const torch::Tensor& weights) torch::Tensor Metrics::conditionalEdge(const torch::Tensor& weights)
{ {
auto result = std::vector<double>(); auto result = std::vector<double>();
@@ -105,14 +158,8 @@ namespace bayesnet {
} }
return matrix; return matrix;
} }
// To use in Python // Measured in nats (natural logarithm (log) base e)
std::vector<float> Metrics::conditionalEdgeWeights(std::vector<float>& weights_) // Elements of Information Theory, 2nd Edition, Thomas M. Cover, Joy A. Thomas p. 14
{
const torch::Tensor weights = torch::tensor(weights_);
auto matrix = conditionalEdge(weights);
std::vector<float> v(matrix.data_ptr<float>(), matrix.data_ptr<float>() + matrix.numel());
return v;
}
double Metrics::entropy(const torch::Tensor& feature, const torch::Tensor& weights) double Metrics::entropy(const torch::Tensor& feature, const torch::Tensor& weights)
{ {
torch::Tensor counts = feature.bincount(weights); torch::Tensor counts = feature.bincount(weights);
@@ -151,10 +198,54 @@ namespace bayesnet {
} }
return entropyValue; return entropyValue;
} }
// I(X;Y) = H(Y) - H(Y|X) // H(X|Y,C) = sum_{y in Y, c in C} p(x,c) H(X|Y=y,C=c)
double Metrics::conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights)
{
// Ensure the tensors are of the same length
assert(firstFeature.size(0) == secondFeature.size(0) && firstFeature.size(0) == labels.size(0) && firstFeature.size(0) == weights.size(0));
// Convert tensors to vectors for easier processing
auto firstFeatureData = firstFeature.accessor<int, 1>();
auto secondFeatureData = secondFeature.accessor<int, 1>();
auto labelsData = labels.accessor<int, 1>();
auto weightsData = weights.accessor<double, 1>();
int numSamples = firstFeature.size(0);
// Maps for joint and marginal probabilities
std::map<std::tuple<int, int, int>, double> jointCount;
std::map<std::tuple<int, int>, double> marginalCount;
// Compute joint and marginal counts
for (int i = 0; i < numSamples; ++i) {
auto keyJoint = std::make_tuple(firstFeatureData[i], labelsData[i], secondFeatureData[i]);
auto keyMarginal = std::make_tuple(firstFeatureData[i], labelsData[i]);
jointCount[keyJoint] += weightsData[i];
marginalCount[keyMarginal] += weightsData[i];
}
// Total weight sum
double totalWeight = torch::sum(weights).item<double>();
if (totalWeight == 0)
return 0;
// Compute the conditional entropy
double conditionalEntropy = 0.0;
for (const auto& [keyJoint, jointFreq] : jointCount) {
auto [x, c, y] = keyJoint;
auto keyMarginal = std::make_tuple(x, c);
//double p_xc = marginalCount[keyMarginal] / totalWeight;
double p_y_given_xc = jointFreq / marginalCount[keyMarginal];
if (p_y_given_xc > 0) {
conditionalEntropy -= (jointFreq / totalWeight) * std::log(p_y_given_xc);
}
}
return conditionalEntropy;
}
// I(X;Y) = H(Y) - H(Y|X) ; I(X;Y) >= 0
double Metrics::mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights) double Metrics::mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights)
{ {
return entropy(firstFeature, weights) - conditionalEntropy(firstFeature, secondFeature, weights); return std::max(entropy(firstFeature, weights) - conditionalEntropy(firstFeature, secondFeature, weights), 0.0);
}
// I(X;Y|C) = H(X|C) - H(X|Y,C) >= 0
double Metrics::conditionalMutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights)
{
return std::max(conditionalEntropy(firstFeature, labels, weights) - conditionalEntropy(firstFeature, secondFeature, labels, weights), 0.0);
} }
/* /*
Compute the maximum spanning tree considering the weights as distances Compute the maximum spanning tree considering the weights as distances

View File

@@ -16,21 +16,26 @@ namespace bayesnet {
Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates); Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates);
Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates); Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates);
std::vector<int> SelectKBestWeighted(const torch::Tensor& weights, bool ascending = false, unsigned k = 0); std::vector<int> SelectKBestWeighted(const torch::Tensor& weights, bool ascending = false, unsigned k = 0);
std::vector<std::pair<int, int>> SelectKPairs(const torch::Tensor& weights, std::vector<int>& featuresExcluded, bool ascending = false, unsigned k = 0);
std::vector<double> getScoresKBest() const; std::vector<double> getScoresKBest() const;
std::vector<std::pair<std::pair<int, int>, double>> getScoresKPairs() const;
double mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights); double mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights);
std::vector<float> conditionalEdgeWeights(std::vector<float>& weights); // To use in Python double conditionalMutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights);
torch::Tensor conditionalEdge(const torch::Tensor& weights); torch::Tensor conditionalEdge(const torch::Tensor& weights);
std::vector<std::pair<int, int>> maximumSpanningTree(const std::vector<std::string>& features, const torch::Tensor& weights, const int root); std::vector<std::pair<int, int>> maximumSpanningTree(const std::vector<std::string>& features, const torch::Tensor& weights, const int root);
// Measured in nats (natural logarithm (log) base e)
// Elements of Information Theory, 2nd Edition, Thomas M. Cover, Joy A. Thomas p. 14
double entropy(const torch::Tensor& feature, const torch::Tensor& weights);
double conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights);
protected: protected:
torch::Tensor samples; // n+1xm torch::Tensor used to fit the model where samples[-1] is the y std::vector torch::Tensor samples; // n+1xm torch::Tensor used to fit the model where samples[-1] is the y std::vector
std::string className; std::string className;
double entropy(const torch::Tensor& feature, const torch::Tensor& weights);
std::vector<std::string> features; std::vector<std::string> features;
template <class T> template <class T>
std::vector<std::pair<T, T>> doCombinations(const std::vector<T>& source) std::vector<std::pair<T, T>> doCombinations(const std::vector<T>& source)
{ {
std::vector<std::pair<T, T>> result; std::vector<std::pair<T, T>> result;
for (int i = 0; i < source.size(); ++i) { for (int i = 0; i < source.size() - 1; ++i) {
T temp = source[i]; T temp = source[i];
for (int j = i + 1; j < source.size(); ++j) { for (int j = i + 1; j < source.size(); ++j) {
result.push_back({ temp, source[j] }); result.push_back({ temp, source[j] });
@@ -49,6 +54,8 @@ namespace bayesnet {
int classNumStates = 0; int classNumStates = 0;
std::vector<double> scoresKBest; std::vector<double> scoresKBest;
std::vector<int> featuresKBest; // sorted indices of the features std::vector<int> featuresKBest; // sorted indices of the features
std::vector<std::pair<int, int>> pairsKBest; // sorted indices of the pairs
std::vector<std::pair<std::pair<int, int>, double>> scoresKPairs;
double conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights); double conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights);
}; };
} }

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

412
diagrams/BayesNet.puml Normal file
View File

@@ -0,0 +1,412 @@
@startuml
title clang-uml class diagram model
class "bayesnet::Metrics" as C_0000736965376885623323
class C_0000736965376885623323 #aliceblue;line:blue;line.dotted;text:blue {
+Metrics() = default : void
+Metrics(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int classNumStates) : void
+Metrics(const std::vector<std::vector<int>> & vsamples, const std::vector<int> & labels, const std::vector<std::string> & features, const std::string & className, const int classNumStates) : void
..
+SelectKBestWeighted(const torch::Tensor & weights, bool ascending = false, unsigned int k = 0) : std::vector<int>
+conditionalEdge(const torch::Tensor & weights) : torch::Tensor
+conditionalEdgeWeights(std::vector<float> & weights) : std::vector<float>
#doCombinations<T>(const std::vector<T> & source) : std::vector<std::pair<T, T> >
#entropy(const torch::Tensor & feature, const torch::Tensor & weights) : double
+getScoresKBest() const : std::vector<double>
+maximumSpanningTree(const std::vector<std::string> & features, const torch::Tensor & weights, const int root) : std::vector<std::pair<int,int>>
+mutualInformation(const torch::Tensor & firstFeature, const torch::Tensor & secondFeature, const torch::Tensor & weights) : double
#pop_first<T>(std::vector<T> & v) : T
__
#className : std::string
#features : std::vector<std::string>
#samples : torch::Tensor
}
class "bayesnet::Node" as C_0001303524929067080934
class C_0001303524929067080934 #aliceblue;line:blue;line.dotted;text:blue {
+Node(const std::string &) : void
..
+addChild(Node *) : void
+addParent(Node *) : void
+clear() : void
+computeCPT(const torch::Tensor & dataset, const std::vector<std::string> & features, const double laplaceSmoothing, const torch::Tensor & weights) : void
+getCPT() : torch::Tensor &
+getChildren() : std::vector<Node *> &
+getFactorValue(std::map<std::string,int> &) : float
+getName() const : std::string
+getNumStates() const : int
+getParents() : std::vector<Node *> &
+graph(const std::string & clasName) : std::vector<std::string>
+minFill() : unsigned int
+removeChild(Node *) : void
+removeParent(Node *) : void
+setNumStates(int) : void
__
}
class "bayesnet::Network" as C_0001186707649890429575
class C_0001186707649890429575 #aliceblue;line:blue;line.dotted;text:blue {
+Network() : void
+Network(float) : void
+Network(const Network &) : void
+~Network() = default : void
..
+addEdge(const std::string &, const std::string &) : void
+addNode(const std::string &) : void
+dump_cpt() const : std::string
+fit(const torch::Tensor & samples, const torch::Tensor & weights, const std::vector<std::string> & featureNames, const std::string & className, const std::map<std::string,std::vector<int>> & states) : void
+fit(const torch::Tensor & X, const torch::Tensor & y, const torch::Tensor & weights, const std::vector<std::string> & featureNames, const std::string & className, const std::map<std::string,std::vector<int>> & states) : void
+fit(const std::vector<std::vector<int>> & input_data, const std::vector<int> & labels, const std::vector<double> & weights, const std::vector<std::string> & featureNames, const std::string & className, const std::map<std::string,std::vector<int>> & states) : void
+getClassName() const : std::string
+getClassNumStates() const : int
+getEdges() const : std::vector<std::pair<std::string,std::string>>
+getFeatures() const : std::vector<std::string>
+getMaxThreads() const : float
+getNodes() : std::map<std::string,std::unique_ptr<Node>> &
+getNumEdges() const : int
+getSamples() : torch::Tensor &
+getStates() const : int
+graph(const std::string & title) const : std::vector<std::string>
+initialize() : void
+predict(const std::vector<std::vector<int>> &) : std::vector<int>
+predict(const torch::Tensor &) : torch::Tensor
+predict_proba(const std::vector<std::vector<int>> &) : std::vector<std::vector<double>>
+predict_proba(const torch::Tensor &) : torch::Tensor
+predict_tensor(const torch::Tensor & samples, const bool proba) : torch::Tensor
+score(const std::vector<std::vector<int>> &, const std::vector<int> &) : double
+show() const : std::vector<std::string>
+topological_sort() : std::vector<std::string>
+version() : std::string
__
}
enum "bayesnet::status_t" as C_0000738420730783851375
enum C_0000738420730783851375 {
NORMAL
WARNING
ERROR
}
abstract "bayesnet::BaseClassifier" as C_0000327135989451974539
abstract C_0000327135989451974539 #aliceblue;line:blue;line.dotted;text:blue {
+~BaseClassifier() = default : void
..
{abstract} +dump_cpt() const = 0 : std::string
{abstract} +fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) = 0 : BaseClassifier &
{abstract} +fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) = 0 : BaseClassifier &
{abstract} +fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const torch::Tensor & weights) = 0 : BaseClassifier &
{abstract} +fit(std::vector<std::vector<int>> & X, std::vector<int> & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) = 0 : BaseClassifier &
{abstract} +getClassNumStates() const = 0 : int
{abstract} +getNotes() const = 0 : std::vector<std::string>
{abstract} +getNumberOfEdges() const = 0 : int
{abstract} +getNumberOfNodes() const = 0 : int
{abstract} +getNumberOfStates() const = 0 : int
{abstract} +getStatus() const = 0 : status_t
+getValidHyperparameters() : std::vector<std::string> &
{abstract} +getVersion() = 0 : std::string
{abstract} +graph(const std::string & title = "") const = 0 : std::vector<std::string>
{abstract} +predict(std::vector<std::vector<int>> & X) = 0 : std::vector<int>
{abstract} +predict(torch::Tensor & X) = 0 : torch::Tensor
{abstract} +predict_proba(std::vector<std::vector<int>> & X) = 0 : std::vector<std::vector<double>>
{abstract} +predict_proba(torch::Tensor & X) = 0 : torch::Tensor
{abstract} +score(std::vector<std::vector<int>> & X, std::vector<int> & y) = 0 : float
{abstract} +score(torch::Tensor & X, torch::Tensor & y) = 0 : float
{abstract} +setHyperparameters(const nlohmann::json & hyperparameters) = 0 : void
{abstract} +show() const = 0 : std::vector<std::string>
{abstract} +topological_order() = 0 : std::vector<std::string>
{abstract} #trainModel(const torch::Tensor & weights) = 0 : void
__
#validHyperparameters : std::vector<std::string>
}
abstract "bayesnet::Classifier" as C_0002043996622900301644
abstract C_0002043996622900301644 #aliceblue;line:blue;line.dotted;text:blue {
+Classifier(Network model) : void
+~Classifier() = default : void
..
+addNodes() : void
#buildDataset(torch::Tensor & y) : void
{abstract} #buildModel(const torch::Tensor & weights) = 0 : void
#checkFitParameters() : void
+dump_cpt() const : std::string
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : Classifier &
+fit(std::vector<std::vector<int>> & X, std::vector<int> & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : Classifier &
+fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : Classifier &
+fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const torch::Tensor & weights) : Classifier &
+getClassNumStates() const : int
+getNotes() const : std::vector<std::string>
+getNumberOfEdges() const : int
+getNumberOfNodes() const : int
+getNumberOfStates() const : int
+getStatus() const : status_t
+getVersion() : std::string
+predict(std::vector<std::vector<int>> & X) : std::vector<int>
+predict(torch::Tensor & X) : torch::Tensor
+predict_proba(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
+predict_proba(torch::Tensor & X) : torch::Tensor
+score(torch::Tensor & X, torch::Tensor & y) : float
+score(std::vector<std::vector<int>> & X, std::vector<int> & y) : float
+setHyperparameters(const nlohmann::json & hyperparameters) : void
+show() const : std::vector<std::string>
+topological_order() : std::vector<std::string>
#trainModel(const torch::Tensor & weights) : void
__
#className : std::string
#dataset : torch::Tensor
#features : std::vector<std::string>
#fitted : bool
#m : unsigned int
#metrics : Metrics
#model : Network
#n : unsigned int
#notes : std::vector<std::string>
#states : std::map<std::string,std::vector<int>>
#status : status_t
}
class "bayesnet::KDB" as C_0001112865019015250005
class C_0001112865019015250005 #aliceblue;line:blue;line.dotted;text:blue {
+KDB(int k, float theta = 0.03) : void
+~KDB() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "KDB") const : std::vector<std::string>
+setHyperparameters(const nlohmann::json & hyperparameters_) : void
__
}
class "bayesnet::TAN" as C_0001760994424884323017
class C_0001760994424884323017 #aliceblue;line:blue;line.dotted;text:blue {
+TAN() : void
+~TAN() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "TAN") const : std::vector<std::string>
__
}
class "bayesnet::Proposal" as C_0002219995589162262979
class C_0002219995589162262979 #aliceblue;line:blue;line.dotted;text:blue {
+Proposal(torch::Tensor & pDataset, std::vector<std::string> & features_, std::string & className_) : void
+~Proposal() : void
..
#checkInput(const torch::Tensor & X, const torch::Tensor & y) : void
#fit_local_discretization(const torch::Tensor & y) : std::map<std::string,std::vector<int>>
#localDiscretizationProposal(const std::map<std::string,std::vector<int>> & states, Network & model) : std::map<std::string,std::vector<int>>
#prepareX(torch::Tensor & X) : torch::Tensor
__
#Xf : torch::Tensor
#discretizers : map<std::string,mdlp::CPPFImdlp *>
#y : torch::Tensor
}
class "bayesnet::TANLd" as C_0001668829096702037834
class C_0001668829096702037834 #aliceblue;line:blue;line.dotted;text:blue {
+TANLd() : void
+~TANLd() = default : void
..
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : TANLd &
+graph(const std::string & name = "TAN") const : std::vector<std::string>
+predict(torch::Tensor & X) : torch::Tensor
{static} +version() : std::string
__
}
abstract "bayesnet::FeatureSelect" as C_0001695326193250580823
abstract C_0001695326193250580823 #aliceblue;line:blue;line.dotted;text:blue {
+FeatureSelect(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights) : void
+~FeatureSelect() : void
..
#computeMeritCFS() : double
#computeSuFeatures(const int a, const int b) : double
#computeSuLabels() : void
{abstract} +fit() = 0 : void
+getFeatures() const : std::vector<int>
+getScores() const : std::vector<double>
#initialize() : void
#symmetricalUncertainty(int a, int b) : double
__
#fitted : bool
#maxFeatures : int
#selectedFeatures : std::vector<int>
#selectedScores : std::vector<double>
#suFeatures : std::map<std::pair<int,int>,double>
#suLabels : std::vector<double>
#weights : const torch::Tensor &
}
class "bayesnet::CFS" as C_0000011627355691342494
class C_0000011627355691342494 #aliceblue;line:blue;line.dotted;text:blue {
+CFS(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights) : void
+~CFS() : void
..
+fit() : void
__
}
class "bayesnet::FCBF" as C_0000144682015341746929
class C_0000144682015341746929 #aliceblue;line:blue;line.dotted;text:blue {
+FCBF(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights, const double threshold) : void
+~FCBF() : void
..
+fit() : void
__
}
class "bayesnet::IWSS" as C_0000008268514674428553
class C_0000008268514674428553 #aliceblue;line:blue;line.dotted;text:blue {
+IWSS(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights, const double threshold) : void
+~IWSS() : void
..
+fit() : void
__
}
class "bayesnet::SPODE" as C_0000512022813807538451
class C_0000512022813807538451 #aliceblue;line:blue;line.dotted;text:blue {
+SPODE(int root) : void
+~SPODE() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "SPODE") const : std::vector<std::string>
__
}
class "bayesnet::Ensemble" as C_0001985241386355360576
class C_0001985241386355360576 #aliceblue;line:blue;line.dotted;text:blue {
+Ensemble(bool predict_voting = true) : void
+~Ensemble() = default : void
..
#compute_arg_max(std::vector<std::vector<double>> & X) : std::vector<int>
#compute_arg_max(torch::Tensor & X) : torch::Tensor
+dump_cpt() const : std::string
+getNumberOfEdges() const : int
+getNumberOfNodes() const : int
+getNumberOfStates() const : int
+graph(const std::string & title) const : std::vector<std::string>
+predict(std::vector<std::vector<int>> & X) : std::vector<int>
+predict(torch::Tensor & X) : torch::Tensor
#predict_average_proba(torch::Tensor & X) : torch::Tensor
#predict_average_proba(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
#predict_average_voting(torch::Tensor & X) : torch::Tensor
#predict_average_voting(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
+predict_proba(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
+predict_proba(torch::Tensor & X) : torch::Tensor
+score(std::vector<std::vector<int>> & X, std::vector<int> & y) : float
+score(torch::Tensor & X, torch::Tensor & y) : float
+show() const : std::vector<std::string>
+topological_order() : std::vector<std::string>
#trainModel(const torch::Tensor & weights) : void
#voting(torch::Tensor & votes) : torch::Tensor
__
#models : std::vector<std::unique_ptr<Classifier>>
#n_models : unsigned int
#predict_voting : bool
#significanceModels : std::vector<double>
}
class "bayesnet::(anonymous_45089536)" as C_0001186398587753535158
class C_0001186398587753535158 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_45090163)" as C_0000602764946063116717
class C_0000602764946063116717 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::BoostAODE" as C_0000358471592399852382
class C_0000358471592399852382 #aliceblue;line:blue;line.dotted;text:blue {
+BoostAODE(bool predict_voting = false) : void
+~BoostAODE() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & title = "BoostAODE") const : std::vector<std::string>
+setHyperparameters(const nlohmann::json & hyperparameters_) : void
#trainModel(const torch::Tensor & weights) : void
__
}
class "bayesnet::MST" as C_0000131858426172291700
class C_0000131858426172291700 #aliceblue;line:blue;line.dotted;text:blue {
+MST() = default : void
+MST(const std::vector<std::string> & features, const torch::Tensor & weights, const int root) : void
..
+maximumSpanningTree() : std::vector<std::pair<int,int>>
__
}
class "bayesnet::Graph" as C_0001197041682001898467
class C_0001197041682001898467 #aliceblue;line:blue;line.dotted;text:blue {
+Graph(int V) : void
..
+addEdge(int u, int v, float wt) : void
+find_set(int i) : int
+get_mst() : std::vector<std::pair<float,std::pair<int,int>>>
+kruskal_algorithm() : void
+union_set(int u, int v) : void
__
}
class "bayesnet::KDBLd" as C_0000344502277874806837
class C_0000344502277874806837 #aliceblue;line:blue;line.dotted;text:blue {
+KDBLd(int k) : void
+~KDBLd() = default : void
..
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : KDBLd &
+graph(const std::string & name = "KDB") const : std::vector<std::string>
+predict(torch::Tensor & X) : torch::Tensor
{static} +version() : std::string
__
}
class "bayesnet::AODE" as C_0000786111576121788282
class C_0000786111576121788282 #aliceblue;line:blue;line.dotted;text:blue {
+AODE(bool predict_voting = false) : void
+~AODE() : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & title = "AODE") const : std::vector<std::string>
+setHyperparameters(const nlohmann::json & hyperparameters) : void
__
}
class "bayesnet::SPODELd" as C_0001369655639257755354
class C_0001369655639257755354 #aliceblue;line:blue;line.dotted;text:blue {
+SPODELd(int root) : void
+~SPODELd() = default : void
..
+commonFit(const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : SPODELd &
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : SPODELd &
+fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states) : SPODELd &
+graph(const std::string & name = "SPODE") const : std::vector<std::string>
+predict(torch::Tensor & X) : torch::Tensor
{static} +version() : std::string
__
}
class "bayesnet::AODELd" as C_0000487273479333793647
class C_0000487273479333793647 #aliceblue;line:blue;line.dotted;text:blue {
+AODELd(bool predict_voting = true) : void
+~AODELd() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+fit(torch::Tensor & X_, torch::Tensor & y_, const std::vector<std::string> & features_, const std::string & className_, std::map<std::string,std::vector<int>> & states_) : AODELd &
+graph(const std::string & name = "AODELd") const : std::vector<std::string>
#trainModel(const torch::Tensor & weights) : void
__
}
C_0001303524929067080934 --> C_0001303524929067080934 : -parents
C_0001303524929067080934 --> C_0001303524929067080934 : -children
C_0001186707649890429575 o-- C_0001303524929067080934 : -nodes
C_0000327135989451974539 ..> C_0000738420730783851375
C_0002043996622900301644 o-- C_0001186707649890429575 : #model
C_0002043996622900301644 o-- C_0000736965376885623323 : #metrics
C_0002043996622900301644 o-- C_0000738420730783851375 : #status
C_0000327135989451974539 <|-- C_0002043996622900301644
C_0002043996622900301644 <|-- C_0001112865019015250005
C_0002043996622900301644 <|-- C_0001760994424884323017
C_0002219995589162262979 ..> C_0001186707649890429575
C_0001760994424884323017 <|-- C_0001668829096702037834
C_0002219995589162262979 <|-- C_0001668829096702037834
C_0000736965376885623323 <|-- C_0001695326193250580823
C_0001695326193250580823 <|-- C_0000011627355691342494
C_0001695326193250580823 <|-- C_0000144682015341746929
C_0001695326193250580823 <|-- C_0000008268514674428553
C_0002043996622900301644 <|-- C_0000512022813807538451
C_0001985241386355360576 o-- C_0002043996622900301644 : #models
C_0002043996622900301644 <|-- C_0001985241386355360576
C_0000358471592399852382 --> C_0001695326193250580823 : -featureSelector
C_0001985241386355360576 <|-- C_0000358471592399852382
C_0001112865019015250005 <|-- C_0000344502277874806837
C_0002219995589162262979 <|-- C_0000344502277874806837
C_0001985241386355360576 <|-- C_0000786111576121788282
C_0000512022813807538451 <|-- C_0001369655639257755354
C_0002219995589162262979 <|-- C_0001369655639257755354
C_0001985241386355360576 <|-- C_0000487273479333793647
C_0002219995589162262979 <|-- C_0000487273479333793647
'Generated with clang-uml, version 0.5.1
'LLVM version clang version 17.0.6 (Fedora 17.0.6-2.fc39)
@enduml

1
diagrams/BayesNet.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 139 KiB

128
diagrams/dependency.svg Normal file
View File

@@ -0,0 +1,128 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 8.1.0 (20230707.0739)
-->
<!-- Title: BayesNet Pages: 1 -->
<svg width="1632pt" height="288pt"
viewBox="0.00 0.00 1631.95 287.80" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 283.8)">
<title>BayesNet</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-283.8 1627.95,-283.8 1627.95,4 -4,4"/>
<!-- node1 -->
<g id="node1" class="node">
<title>node1</title>
<polygon fill="none" stroke="black" points="826.43,-254.35 826.43,-269.26 796.69,-279.8 754.63,-279.8 724.89,-269.26 724.89,-254.35 754.63,-243.8 796.69,-243.8 826.43,-254.35"/>
<text text-anchor="middle" x="775.66" y="-257.53" font-family="Times,serif" font-size="12.00">BayesNet</text>
</g>
<!-- node2 -->
<g id="node2" class="node">
<title>node2</title>
<polygon fill="none" stroke="black" points="413.32,-185.8 372.39,-201.03 206.66,-207.8 40.93,-201.03 0,-185.8 114.69,-173.59 298.64,-173.59 413.32,-185.8"/>
<text text-anchor="middle" x="206.66" y="-185.53" font-family="Times,serif" font-size="12.00">/home/rmontanana/Code/libtorch/lib/libc10.so</text>
</g>
<!-- node1&#45;&gt;node2 -->
<g id="edge1" class="edge">
<title>node1&#45;&gt;node2</title>
<path fill="none" stroke="black" d="M724.41,-254.5C634.7,-243.46 447.04,-220.38 324.01,-205.24"/>
<polygon fill="black" stroke="black" points="324.77,-201.69 314.42,-203.94 323.92,-208.63 324.77,-201.69"/>
</g>
<!-- node3 -->
<g id="node3" class="node">
<title>node3</title>
<polygon fill="none" stroke="black" points="857.68,-185.8 815.49,-201.03 644.66,-207.8 473.84,-201.03 431.65,-185.8 549.86,-173.59 739.46,-173.59 857.68,-185.8"/>
<text text-anchor="middle" x="644.66" y="-185.53" font-family="Times,serif" font-size="12.00">/home/rmontanana/Code/libtorch/lib/libkineto.a</text>
</g>
<!-- node1&#45;&gt;node3 -->
<g id="edge2" class="edge">
<title>node1&#45;&gt;node3</title>
<path fill="none" stroke="black" d="M747.56,-245.79C729.21,-235.98 704.97,-223.03 684.63,-212.16"/>
<polygon fill="black" stroke="black" points="686.47,-208.64 676,-207.02 683.17,-214.82 686.47,-208.64"/>
</g>
<!-- node4 -->
<g id="node4" class="node">
<title>node4</title>
<polygon fill="none" stroke="black" points="939.33,-182.35 939.33,-197.26 920.78,-207.8 894.54,-207.8 875.99,-197.26 875.99,-182.35 894.54,-171.8 920.78,-171.8 939.33,-182.35"/>
<text text-anchor="middle" x="907.66" y="-185.53" font-family="Times,serif" font-size="12.00">mdlp</text>
</g>
<!-- node1&#45;&gt;node4 -->
<g id="edge3" class="edge">
<title>node1&#45;&gt;node4</title>
<path fill="none" stroke="black" d="M803.66,-245.96C824.66,-234.82 853.45,-219.56 875.41,-207.91"/>
<polygon fill="black" stroke="black" points="876.78,-210.61 883.97,-202.84 873.5,-204.43 876.78,-210.61"/>
</g>
<!-- node9 -->
<g id="node5" class="node">
<title>node9</title>
<polygon fill="none" stroke="black" points="1107.74,-195.37 1032.66,-207.8 957.58,-195.37 986.26,-175.24 1079.06,-175.24 1107.74,-195.37"/>
<text text-anchor="middle" x="1032.66" y="-185.53" font-family="Times,serif" font-size="12.00">torch_library</text>
</g>
<!-- node1&#45;&gt;node9 -->
<g id="edge4" class="edge">
<title>node1&#45;&gt;node9</title>
<path fill="none" stroke="black" d="M815.25,-250.02C860.25,-237.77 933.77,-217.74 982.68,-204.42"/>
<polygon fill="black" stroke="black" points="983.3,-207.61 992.02,-201.6 981.46,-200.85 983.3,-207.61"/>
</g>
<!-- node10 -->
<g id="node6" class="node">
<title>node10</title>
<polygon fill="none" stroke="black" points="1159.81,-113.8 1086.89,-129.03 791.66,-135.8 496.43,-129.03 423.52,-113.8 627.82,-101.59 955.5,-101.59 1159.81,-113.8"/>
<text text-anchor="middle" x="791.66" y="-113.53" font-family="Times,serif" font-size="12.00">&#45;Wl,&#45;&#45;no&#45;as&#45;needed,&quot;/home/rmontanana/Code/libtorch/lib/libtorch.so&quot; &#45;Wl,&#45;&#45;as&#45;needed</text>
</g>
<!-- node9&#45;&gt;node10 -->
<g id="edge5" class="edge">
<title>node9&#45;&gt;node10</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M985.62,-175.14C949.2,-164.56 898.31,-149.78 857.79,-138.01"/>
<polygon fill="black" stroke="black" points="859.04,-134.44 848.46,-135.01 857.09,-141.16 859.04,-134.44"/>
</g>
<!-- node5 -->
<g id="node7" class="node">
<title>node5</title>
<polygon fill="none" stroke="black" points="1371.56,-123.37 1274.66,-135.8 1177.77,-123.37 1214.78,-103.24 1334.55,-103.24 1371.56,-123.37"/>
<text text-anchor="middle" x="1274.66" y="-113.53" font-family="Times,serif" font-size="12.00">torch_cpu_library</text>
</g>
<!-- node9&#45;&gt;node5 -->
<g id="edge6" class="edge">
<title>node9&#45;&gt;node5</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M1079.61,-175.22C1120.66,-163.35 1180.2,-146.13 1222.68,-133.84"/>
<polygon fill="black" stroke="black" points="1223.46,-136.97 1232.09,-130.83 1221.51,-130.24 1223.46,-136.97"/>
</g>
<!-- node6 -->
<g id="node8" class="node">
<title>node6</title>
<polygon fill="none" stroke="black" points="1191.4,-27.9 1114.6,-43.12 803.66,-49.9 492.72,-43.12 415.93,-27.9 631.1,-15.68 976.22,-15.68 1191.4,-27.9"/>
<text text-anchor="middle" x="803.66" y="-27.63" font-family="Times,serif" font-size="12.00">&#45;Wl,&#45;&#45;no&#45;as&#45;needed,&quot;/home/rmontanana/Code/libtorch/lib/libtorch_cpu.so&quot; &#45;Wl,&#45;&#45;as&#45;needed</text>
</g>
<!-- node5&#45;&gt;node6 -->
<g id="edge7" class="edge">
<title>node5&#45;&gt;node6</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M1210.16,-105.31C1130.55,-91.13 994.37,-66.87 901.77,-50.38"/>
<polygon fill="black" stroke="black" points="902.44,-46.77 891.98,-48.46 901.22,-53.66 902.44,-46.77"/>
</g>
<!-- node7 -->
<g id="node9" class="node">
<title>node7</title>
<polygon fill="none" stroke="black" points="1339.72,-37.46 1274.66,-49.9 1209.61,-37.46 1234.46,-17.34 1314.87,-17.34 1339.72,-37.46"/>
<text text-anchor="middle" x="1274.66" y="-27.63" font-family="Times,serif" font-size="12.00">caffe2::mkl</text>
</g>
<!-- node5&#45;&gt;node7 -->
<g id="edge8" class="edge">
<title>node5&#45;&gt;node7</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M1274.66,-102.95C1274.66,-91.56 1274.66,-75.07 1274.66,-60.95"/>
<polygon fill="black" stroke="black" points="1278.16,-61.27 1274.66,-51.27 1271.16,-61.27 1278.16,-61.27"/>
</g>
<!-- node8 -->
<g id="node10" class="node">
<title>node8</title>
<polygon fill="none" stroke="black" points="1623.95,-41.76 1490.66,-63.8 1357.37,-41.76 1408.28,-6.09 1573.04,-6.09 1623.95,-41.76"/>
<text text-anchor="middle" x="1490.66" y="-34.75" font-family="Times,serif" font-size="12.00">dummy</text>
<text text-anchor="middle" x="1490.66" y="-20.5" font-family="Times,serif" font-size="12.00">(protobuf::libprotobuf)</text>
</g>
<!-- node5&#45;&gt;node8 -->
<g id="edge9" class="edge">
<title>node5&#45;&gt;node8</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M1310.82,-102.76C1341.68,-90.77 1386.88,-73.21 1424.25,-58.7"/>
<polygon fill="black" stroke="black" points="1425.01,-61.77 1433.06,-54.89 1422.47,-55.25 1425.01,-61.77"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 7.1 KiB

View File

@@ -5,6 +5,7 @@
The hyperparameters defined in the algorithm are: The hyperparameters defined in the algorithm are:
- ***bisection*** (*boolean*): If set to true allows the algorithm to add *k* models at once (as specified in the algorithm) to the ensemble. Default value: *true*. - ***bisection*** (*boolean*): If set to true allows the algorithm to add *k* models at once (as specified in the algorithm) to the ensemble. Default value: *true*.
- ***bisection_best*** (*boolean*): If set to *true*, the algorithm will take as *priorAccuracy* the best accuracy computed. If set to *false⁺ it will take the last accuracy as *priorAccuracy*. Default value: *false*.
- ***order*** (*{"asc", "desc", "rand"}*): Sets the order (ascending/descending/random) in which dataset variables will be processed to choose the parents of the *SPODEs*. Default value: *"desc"*. - ***order*** (*{"asc", "desc", "rand"}*): Sets the order (ascending/descending/random) in which dataset variables will be processed to choose the parents of the *SPODEs*. Default value: *"desc"*.
@@ -26,4 +27,4 @@ The hyperparameters defined in the algorithm are:
## Operation ## Operation
### [Algorithm](./algorithm.md) ### [Base Algorithm](./algorithm.md)

2912
docs/Doxyfile.in Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -105,8 +105,7 @@
2. $numItemsPack \leftarrow 0$ 2. $numItemsPack \leftarrow 0$
10. If 10. If $(Vars == \emptyset \lor tolerance>maxTolerance) \; finished \leftarrow True$
$(Vars == \emptyset \lor tolerance>maxTolerance) \; finished \leftarrow True$
11. $lastAccuracy \leftarrow max(lastAccuracy, actualAccuracy)$ 11. $lastAccuracy \leftarrow max(lastAccuracy, actualAccuracy)$

BIN
docs/logo_small.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -1,5 +0,0 @@
filter = bayesnet/
exclude-directories = build_debug/lib/
exclude = bayesnet/utils/loguru.*
print-summary = yes
sort = uncovered-percent

View File

@@ -1,168 +0,0 @@
#include "ArffFiles.h"
#include <fstream>
#include <sstream>
#include <map>
#include <iostream>
ArffFiles::ArffFiles() = default;
std::vector<std::string> ArffFiles::getLines() const
{
return lines;
}
unsigned long int ArffFiles::getSize() const
{
return lines.size();
}
std::vector<std::pair<std::string, std::string>> ArffFiles::getAttributes() const
{
return attributes;
}
std::string ArffFiles::getClassName() const
{
return className;
}
std::string ArffFiles::getClassType() const
{
return classType;
}
std::vector<std::vector<float>>& ArffFiles::getX()
{
return X;
}
std::vector<int>& ArffFiles::getY()
{
return y;
}
void ArffFiles::loadCommon(std::string fileName)
{
std::ifstream file(fileName);
if (!file.is_open()) {
throw std::invalid_argument("Unable to open file");
}
std::string line;
std::string keyword;
std::string attribute;
std::string type;
std::string type_w;
while (getline(file, line)) {
if (line.empty() || line[0] == '%' || line == "\r" || line == " ") {
continue;
}
if (line.find("@attribute") != std::string::npos || line.find("@ATTRIBUTE") != std::string::npos) {
std::stringstream ss(line);
ss >> keyword >> attribute;
type = "";
while (ss >> type_w)
type += type_w + " ";
attributes.emplace_back(trim(attribute), trim(type));
continue;
}
if (line[0] == '@') {
continue;
}
lines.push_back(line);
}
file.close();
if (attributes.empty())
throw std::invalid_argument("No attributes found");
}
void ArffFiles::load(const std::string& fileName, bool classLast)
{
int labelIndex;
loadCommon(fileName);
if (classLast) {
className = std::get<0>(attributes.back());
classType = std::get<1>(attributes.back());
attributes.pop_back();
labelIndex = static_cast<int>(attributes.size());
} else {
className = std::get<0>(attributes.front());
classType = std::get<1>(attributes.front());
attributes.erase(attributes.begin());
labelIndex = 0;
}
generateDataset(labelIndex);
}
void ArffFiles::load(const std::string& fileName, const std::string& name)
{
int labelIndex;
loadCommon(fileName);
bool found = false;
for (int i = 0; i < attributes.size(); ++i) {
if (attributes[i].first == name) {
className = std::get<0>(attributes[i]);
classType = std::get<1>(attributes[i]);
attributes.erase(attributes.begin() + i);
labelIndex = i;
found = true;
break;
}
}
if (!found) {
throw std::invalid_argument("Class name not found");
}
generateDataset(labelIndex);
}
void ArffFiles::generateDataset(int labelIndex)
{
X = std::vector<std::vector<float>>(attributes.size(), std::vector<float>(lines.size()));
auto yy = std::vector<std::string>(lines.size(), "");
auto removeLines = std::vector<int>(); // Lines with missing values
for (size_t i = 0; i < lines.size(); i++) {
std::stringstream ss(lines[i]);
std::string value;
int pos = 0;
int xIndex = 0;
while (getline(ss, value, ',')) {
if (pos++ == labelIndex) {
yy[i] = value;
} else {
if (value == "?") {
X[xIndex++][i] = -1;
removeLines.push_back(i);
} else
X[xIndex++][i] = stof(value);
}
}
}
for (auto i : removeLines) {
yy.erase(yy.begin() + i);
for (auto& x : X) {
x.erase(x.begin() + i);
}
}
y = factorize(yy);
}
std::string ArffFiles::trim(const std::string& source)
{
std::string s(source);
s.erase(0, s.find_first_not_of(" '\n\r\t"));
s.erase(s.find_last_not_of(" '\n\r\t") + 1);
return s;
}
std::vector<int> ArffFiles::factorize(const std::vector<std::string>& labels_t)
{
std::vector<int> yy;
yy.reserve(labels_t.size());
std::map<std::string, int> labelMap;
int i = 0;
for (const std::string& label : labels_t) {
if (labelMap.find(label) == labelMap.end()) {
labelMap[label] = i++;
}
yy.push_back(labelMap[label]);
}
return yy;
}

View File

@@ -1,32 +0,0 @@
#ifndef ARFFFILES_H
#define ARFFFILES_H
#include <string>
#include <vector>
class ArffFiles {
private:
std::vector<std::string> lines;
std::vector<std::pair<std::string, std::string>> attributes;
std::string className;
std::string classType;
std::vector<std::vector<float>> X;
std::vector<int> y;
void generateDataset(int);
void loadCommon(std::string);
public:
ArffFiles();
void load(const std::string&, bool = true);
void load(const std::string&, const std::string&);
std::vector<std::string> getLines() const;
unsigned long int getSize() const;
std::string getClassName() const;
std::string getClassType() const;
static std::string trim(const std::string&);
std::vector<std::vector<float>>& getX();
std::vector<int>& getY();
std::vector<std::pair<std::string, std::string>> getAttributes() const;
static std::vector<int> factorize(const std::vector<std::string>& labels_t);
};
#endif

View File

@@ -1 +0,0 @@
add_library(ArffFiles ArffFiles.cc)

Submodule lib/catch2 updated: bff6e35e2b...029fe3b460

2009
lib/log/loguru.cpp Normal file

File diff suppressed because it is too large Load Diff

1475
lib/log/loguru.hpp Normal file

File diff suppressed because it is too large Load Diff

BIN
logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 543 KiB

View File

@@ -14,7 +14,6 @@ include_directories(
/usr/local/include /usr/local/include
) )
add_subdirectory(lib/Files)
add_subdirectory(lib/mdlp) add_subdirectory(lib/mdlp)
add_executable(bayesnet_sample sample.cc) add_executable(bayesnet_sample sample.cc)
target_link_libraries(bayesnet_sample ArffFiles mdlp "${TORCH_LIBRARIES}" "${BayesNet}") target_link_libraries(bayesnet_sample mdlp "${TORCH_LIBRARIES}" "${BayesNet}")

View File

@@ -1,174 +0,0 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "ArffFiles.h"
#include <fstream>
#include <sstream>
#include <map>
#include <iostream>
ArffFiles::ArffFiles() = default;
std::vector<std::string> ArffFiles::getLines() const
{
return lines;
}
unsigned long int ArffFiles::getSize() const
{
return lines.size();
}
std::vector<std::pair<std::string, std::string>> ArffFiles::getAttributes() const
{
return attributes;
}
std::string ArffFiles::getClassName() const
{
return className;
}
std::string ArffFiles::getClassType() const
{
return classType;
}
std::vector<std::vector<float>>& ArffFiles::getX()
{
return X;
}
std::vector<int>& ArffFiles::getY()
{
return y;
}
void ArffFiles::loadCommon(std::string fileName)
{
std::ifstream file(fileName);
if (!file.is_open()) {
throw std::invalid_argument("Unable to open file");
}
std::string line;
std::string keyword;
std::string attribute;
std::string type;
std::string type_w;
while (getline(file, line)) {
if (line.empty() || line[0] == '%' || line == "\r" || line == " ") {
continue;
}
if (line.find("@attribute") != std::string::npos || line.find("@ATTRIBUTE") != std::string::npos) {
std::stringstream ss(line);
ss >> keyword >> attribute;
type = "";
while (ss >> type_w)
type += type_w + " ";
attributes.emplace_back(trim(attribute), trim(type));
continue;
}
if (line[0] == '@') {
continue;
}
lines.push_back(line);
}
file.close();
if (attributes.empty())
throw std::invalid_argument("No attributes found");
}
void ArffFiles::load(const std::string& fileName, bool classLast)
{
int labelIndex;
loadCommon(fileName);
if (classLast) {
className = std::get<0>(attributes.back());
classType = std::get<1>(attributes.back());
attributes.pop_back();
labelIndex = static_cast<int>(attributes.size());
} else {
className = std::get<0>(attributes.front());
classType = std::get<1>(attributes.front());
attributes.erase(attributes.begin());
labelIndex = 0;
}
generateDataset(labelIndex);
}
void ArffFiles::load(const std::string& fileName, const std::string& name)
{
int labelIndex;
loadCommon(fileName);
bool found = false;
for (int i = 0; i < attributes.size(); ++i) {
if (attributes[i].first == name) {
className = std::get<0>(attributes[i]);
classType = std::get<1>(attributes[i]);
attributes.erase(attributes.begin() + i);
labelIndex = i;
found = true;
break;
}
}
if (!found) {
throw std::invalid_argument("Class name not found");
}
generateDataset(labelIndex);
}
void ArffFiles::generateDataset(int labelIndex)
{
X = std::vector<std::vector<float>>(attributes.size(), std::vector<float>(lines.size()));
auto yy = std::vector<std::string>(lines.size(), "");
auto removeLines = std::vector<int>(); // Lines with missing values
for (size_t i = 0; i < lines.size(); i++) {
std::stringstream ss(lines[i]);
std::string value;
int pos = 0;
int xIndex = 0;
while (getline(ss, value, ',')) {
if (pos++ == labelIndex) {
yy[i] = value;
} else {
if (value == "?") {
X[xIndex++][i] = -1;
removeLines.push_back(i);
} else
X[xIndex++][i] = stof(value);
}
}
}
for (auto i : removeLines) {
yy.erase(yy.begin() + i);
for (auto& x : X) {
x.erase(x.begin() + i);
}
}
y = factorize(yy);
}
std::string ArffFiles::trim(const std::string& source)
{
std::string s(source);
s.erase(0, s.find_first_not_of(" '\n\r\t"));
s.erase(s.find_last_not_of(" '\n\r\t") + 1);
return s;
}
std::vector<int> ArffFiles::factorize(const std::vector<std::string>& labels_t)
{
std::vector<int> yy;
yy.reserve(labels_t.size());
std::map<std::string, int> labelMap;
int i = 0;
for (const std::string& label : labels_t) {
if (labelMap.find(label) == labelMap.end()) {
labelMap[label] = i++;
}
yy.push_back(labelMap[label]);
}
return yy;
}

View File

@@ -1,38 +0,0 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef ARFFFILES_H
#define ARFFFILES_H
#include <string>
#include <vector>
class ArffFiles {
private:
std::vector<std::string> lines;
std::vector<std::pair<std::string, std::string>> attributes;
std::string className;
std::string classType;
std::vector<std::vector<float>> X;
std::vector<int> y;
void generateDataset(int);
void loadCommon(std::string);
public:
ArffFiles();
void load(const std::string&, bool = true);
void load(const std::string&, const std::string&);
std::vector<std::string> getLines() const;
unsigned long int getSize() const;
std::string getClassName() const;
std::string getClassType() const;
static std::string trim(const std::string&);
std::vector<std::vector<float>>& getX();
std::vector<int>& getY();
std::vector<std::pair<std::string, std::string>> getAttributes() const;
static std::vector<int> factorize(const std::vector<std::string>& labels_t);
};
#endif

View File

@@ -1 +0,0 @@
add_library(ArffFiles ArffFiles.cc)

View File

@@ -4,7 +4,7 @@
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// *************************************************************** // ***************************************************************
#include <ArffFiles.h> #include <ArffFiles.hpp>
#include <CPPFImdlp.h> #include <CPPFImdlp.h>
#include <bayesnet/ensembles/BoostAODE.h> #include <bayesnet/ensembles/BoostAODE.h>

View File

@@ -1,24 +1,27 @@
if(ENABLE_TESTING) if(ENABLE_TESTING)
include_directories( include_directories(
${BayesNet_SOURCE_DIR}/lib/Files ${BayesNet_SOURCE_DIR}/tests/lib/Files
${BayesNet_SOURCE_DIR}/lib/mdlp
${BayesNet_SOURCE_DIR}/lib/folding ${BayesNet_SOURCE_DIR}/lib/folding
${BayesNet_SOURCE_DIR}/lib/mdlp
${BayesNet_SOURCE_DIR}/lib/json/include ${BayesNet_SOURCE_DIR}/lib/json/include
${BayesNet_SOURCE_DIR} ${BayesNet_SOURCE_DIR}
${CMAKE_BINARY_DIR}/configured_files/include ${CMAKE_BINARY_DIR}/configured_files/include
) )
file(GLOB_RECURSE BayesNet_SOURCES "${BayesNet_SOURCE_DIR}/bayesnet/*.cc") file(GLOB_RECURSE BayesNet_SOURCES "${BayesNet_SOURCE_DIR}/bayesnet/*.cc")
add_executable(TestBayesNet TestBayesNetwork.cc TestBayesNode.cc TestBayesClassifier.cc add_executable(TestBayesNet TestBayesNetwork.cc TestBayesNode.cc TestBayesClassifier.cc
TestBayesModels.cc TestBayesMetrics.cc TestFeatureSelection.cc TestBoostAODE.cc TestBayesModels.cc TestBayesMetrics.cc TestFeatureSelection.cc TestBoostAODE.cc TestA2DE.cc
TestUtils.cc TestBayesEnsemble.cc ${BayesNet_SOURCES}) TestUtils.cc TestBayesEnsemble.cc TestModulesVersions.cc TestBoostA2DE.cc ${BayesNet_SOURCES})
target_link_libraries(TestBayesNet PUBLIC "${TORCH_LIBRARIES}" ArffFiles mdlp Catch2::Catch2WithMain ) target_link_libraries(TestBayesNet PUBLIC "${TORCH_LIBRARIES}" mdlp PRIVATE Catch2::Catch2WithMain)
add_test(NAME BayesNetworkTest COMMAND TestBayesNet) add_test(NAME BayesNetworkTest COMMAND TestBayesNet)
add_test(NAME Network COMMAND TestBayesNet "[Network]") add_test(NAME A2DE COMMAND TestBayesNet "[A2DE]")
add_test(NAME Node COMMAND TestBayesNet "[Node]") add_test(NAME BoostA2DE COMMAND TestBayesNet "[BoostA2DE]")
add_test(NAME Metrics COMMAND TestBayesNet "[Metrics]") add_test(NAME BoostAODE COMMAND TestBayesNet "[BoostAODE]")
add_test(NAME FeatureSelection COMMAND TestBayesNet "[FeatureSelection]")
add_test(NAME Classifier COMMAND TestBayesNet "[Classifier]") add_test(NAME Classifier COMMAND TestBayesNet "[Classifier]")
add_test(NAME Ensemble COMMAND TestBayesNet "[Ensemble]") add_test(NAME Ensemble COMMAND TestBayesNet "[Ensemble]")
add_test(NAME FeatureSelection COMMAND TestBayesNet "[FeatureSelection]")
add_test(NAME Metrics COMMAND TestBayesNet "[Metrics]")
add_test(NAME Models COMMAND TestBayesNet "[Models]") add_test(NAME Models COMMAND TestBayesNet "[Models]")
add_test(NAME BoostAODE COMMAND TestBayesNet "[BoostAODE]") add_test(NAME Modules COMMAND TestBayesNet "[Modules]")
add_test(NAME Network COMMAND TestBayesNet "[Network]")
add_test(NAME Node COMMAND TestBayesNet "[Node]")
endif(ENABLE_TESTING) endif(ENABLE_TESTING)

49
tests/TestA2DE.cc Normal file
View File

@@ -0,0 +1,49 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <type_traits>
#include <catch2/catch_test_macros.hpp>
#include <catch2/catch_approx.hpp>
#include <catch2/generators/catch_generators.hpp>
#include "bayesnet/ensembles/A2DE.h"
#include "TestUtils.h"
TEST_CASE("Fit and Score", "[A2DE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::A2DE();
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.score(raw.Xv, raw.yv) == Catch::Approx(0.831776).epsilon(raw.epsilon));
REQUIRE(clf.getNumberOfNodes() == 360);
REQUIRE(clf.getNumberOfEdges() == 756);
REQUIRE(clf.getNotes().size() == 0);
}
TEST_CASE("Test score with predict_voting", "[A2DE]")
{
auto raw = RawDatasets("glass", true);
auto clf = bayesnet::A2DE(true);
auto hyperparameters = nlohmann::json{
{"predict_voting", true},
};
clf.setHyperparameters(hyperparameters);
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.score(raw.Xv, raw.yv) == Catch::Approx(0.82243).epsilon(raw.epsilon));
hyperparameters["predict_voting"] = false;
clf.setHyperparameters(hyperparameters);
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.score(raw.Xv, raw.yv) == Catch::Approx(0.83178).epsilon(raw.epsilon));
}
TEST_CASE("Test graph", "[A2DE]")
{
auto raw = RawDatasets("iris", true);
auto clf = bayesnet::A2DE();
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto graph = clf.graph();
REQUIRE(graph.size() == 78);
REQUIRE(graph[0] == "digraph BayesNet {\nlabel=<BayesNet A2DE_0>\nfontsize=30\nfontcolor=blue\nlabelloc=t\nlayout=circo\n");
REQUIRE(graph[1] == "class [shape=circle, fontcolor=red, fillcolor=lightblue, style=filled ] \n");
}

View File

@@ -18,47 +18,47 @@ TEST_CASE("Test Cannot build dataset with wrong data vector", "[Classifier]")
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
raw.yv.pop_back(); raw.yv.pop_back();
REQUIRE_THROWS_AS(model.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv), std::runtime_error); REQUIRE_THROWS_AS(model.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states), std::runtime_error);
REQUIRE_THROWS_WITH(model.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv), "* Error in X and y dimensions *\nX dimensions: [4, 150]\ny dimensions: [149]"); REQUIRE_THROWS_WITH(model.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states), "* Error in X and y dimensions *\nX dimensions: [4, 150]\ny dimensions: [149]");
} }
TEST_CASE("Test Cannot build dataset with wrong data tensor", "[Classifier]") TEST_CASE("Test Cannot build dataset with wrong data tensor", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto yshort = torch::zeros({ 149 }, torch::kInt32); auto yshort = torch::zeros({ 149 }, torch::kInt32);
REQUIRE_THROWS_AS(model.fit(raw.Xt, yshort, raw.featurest, raw.classNamet, raw.statest), std::runtime_error); REQUIRE_THROWS_AS(model.fit(raw.Xt, yshort, raw.features, raw.className, raw.states), std::runtime_error);
REQUIRE_THROWS_WITH(model.fit(raw.Xt, yshort, raw.featurest, raw.classNamet, raw.statest), "* Error in X and y dimensions *\nX dimensions: [4, 150]\ny dimensions: [149]"); REQUIRE_THROWS_WITH(model.fit(raw.Xt, yshort, raw.features, raw.className, raw.states), "* Error in X and y dimensions *\nX dimensions: [4, 150]\ny dimensions: [149]");
} }
TEST_CASE("Invalid data type", "[Classifier]") TEST_CASE("Invalid data type", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", false); auto raw = RawDatasets("iris", false);
REQUIRE_THROWS_AS(model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest), std::invalid_argument); REQUIRE_THROWS_AS(model.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest), "dataset (X, y) must be of type Integer"); REQUIRE_THROWS_WITH(model.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states), "dataset (X, y) must be of type Integer");
} }
TEST_CASE("Invalid number of features", "[Classifier]") TEST_CASE("Invalid number of features", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto Xt = torch::cat({ raw.Xt, torch::zeros({ 1, 150 }, torch::kInt32) }, 0); auto Xt = torch::cat({ raw.Xt, torch::zeros({ 1, 150 }, torch::kInt32) }, 0);
REQUIRE_THROWS_AS(model.fit(Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest), std::invalid_argument); REQUIRE_THROWS_AS(model.fit(Xt, raw.yt, raw.features, raw.className, raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(model.fit(Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest), "Classifier: X 5 and features 4 must have the same number of features"); REQUIRE_THROWS_WITH(model.fit(Xt, raw.yt, raw.features, raw.className, raw.states), "Classifier: X 5 and features 4 must have the same number of features");
} }
TEST_CASE("Invalid class name", "[Classifier]") TEST_CASE("Invalid class name", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
REQUIRE_THROWS_AS(model.fit(raw.Xt, raw.yt, raw.featurest, "duck", raw.statest), std::invalid_argument); REQUIRE_THROWS_AS(model.fit(raw.Xt, raw.yt, raw.features, "duck", raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(model.fit(raw.Xt, raw.yt, raw.featurest, "duck", raw.statest), "class name not found in states"); REQUIRE_THROWS_WITH(model.fit(raw.Xt, raw.yt, raw.features, "duck", raw.states), "class name not found in states");
} }
TEST_CASE("Invalid feature name", "[Classifier]") TEST_CASE("Invalid feature name", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto statest = raw.statest; auto statest = raw.states;
statest.erase("petallength"); statest.erase("petallength");
REQUIRE_THROWS_AS(model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, statest), std::invalid_argument); REQUIRE_THROWS_AS(model.fit(raw.Xt, raw.yt, raw.features, raw.className, statest), std::invalid_argument);
REQUIRE_THROWS_WITH(model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, statest), "feature [petallength] not found in states"); REQUIRE_THROWS_WITH(model.fit(raw.Xt, raw.yt, raw.features, raw.className, statest), "feature [petallength] not found in states");
} }
TEST_CASE("Invalid hyperparameter", "[Classifier]") TEST_CASE("Invalid hyperparameter", "[Classifier]")
{ {
@@ -71,7 +71,7 @@ TEST_CASE("Topological order", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); model.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
auto order = model.topological_order(); auto order = model.topological_order();
REQUIRE(order.size() == 4); REQUIRE(order.size() == 4);
REQUIRE(order[0] == "petallength"); REQUIRE(order[0] == "petallength");
@@ -83,7 +83,7 @@ TEST_CASE("Dump_cpt", "[Classifier]")
{ {
auto model = bayesnet::TAN(); auto model = bayesnet::TAN();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); model.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
auto cpt = model.dump_cpt(); auto cpt = model.dump_cpt();
REQUIRE(cpt.size() == 1713); REQUIRE(cpt.size() == 1713);
} }
@@ -111,7 +111,7 @@ TEST_CASE("KDB Graph", "[Classifier]")
{ {
auto model = bayesnet::KDB(2); auto model = bayesnet::KDB(2);
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
model.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); model.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto graph = model.graph(); auto graph = model.graph();
REQUIRE(graph.size() == 15); REQUIRE(graph.size() == 15);
} }
@@ -119,7 +119,7 @@ TEST_CASE("KDBLd Graph", "[Classifier]")
{ {
auto model = bayesnet::KDBLd(2); auto model = bayesnet::KDBLd(2);
auto raw = RawDatasets("iris", false); auto raw = RawDatasets("iris", false);
model.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); model.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
auto graph = model.graph(); auto graph = model.graph();
REQUIRE(graph.size() == 15); REQUIRE(graph.size() == 15);
} }

View File

@@ -18,7 +18,7 @@ TEST_CASE("Topological Order", "[Ensemble]")
{ {
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto order = clf.topological_order(); auto order = clf.topological_order();
REQUIRE(order.size() == 0); REQUIRE(order.size() == 0);
} }
@@ -26,7 +26,7 @@ TEST_CASE("Dump CPT", "[Ensemble]")
{ {
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto dump = clf.dump_cpt(); auto dump = clf.dump_cpt();
REQUIRE(dump == ""); REQUIRE(dump == "");
} }
@@ -34,7 +34,7 @@ TEST_CASE("Number of States", "[Ensemble]")
{ {
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfStates() == 76); REQUIRE(clf.getNumberOfStates() == 76);
} }
TEST_CASE("Show", "[Ensemble]") TEST_CASE("Show", "[Ensemble]")
@@ -46,7 +46,7 @@ TEST_CASE("Show", "[Ensemble]")
{"maxTolerance", 1}, {"maxTolerance", 1},
{"convergence", false}, {"convergence", false},
}); });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
std::vector<std::string> expected = { std::vector<std::string> expected = {
"class -> sepallength, sepalwidth, petallength, petalwidth, ", "class -> sepallength, sepalwidth, petallength, petalwidth, ",
"petallength -> sepallength, sepalwidth, petalwidth, ", "petallength -> sepallength, sepalwidth, petalwidth, ",
@@ -78,16 +78,16 @@ TEST_CASE("Graph", "[Ensemble]")
{ {
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto graph = clf.graph(); auto graph = clf.graph();
REQUIRE(graph.size() == 56); REQUIRE(graph.size() == 56);
auto clf2 = bayesnet::AODE(); auto clf2 = bayesnet::AODE();
clf2.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf2.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
graph = clf2.graph(); graph = clf2.graph();
REQUIRE(graph.size() == 56); REQUIRE(graph.size() == 56);
raw = RawDatasets("glass", false); raw = RawDatasets("glass", false);
auto clf3 = bayesnet::AODELd(); auto clf3 = bayesnet::AODELd();
clf3.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); clf3.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
graph = clf3.graph(); graph = clf3.graph();
REQUIRE(graph.size() == 261); REQUIRE(graph.size() == 261);
} }

View File

@@ -9,7 +9,7 @@
#include <catch2/generators/catch_generators.hpp> #include <catch2/generators/catch_generators.hpp>
#include "bayesnet/utils/BayesMetrics.h" #include "bayesnet/utils/BayesMetrics.h"
#include "TestUtils.h" #include "TestUtils.h"
#include "Timer.h"
TEST_CASE("Metrics Test", "[Metrics]") TEST_CASE("Metrics Test", "[Metrics]")
{ {
@@ -27,8 +27,8 @@ TEST_CASE("Metrics Test", "[Metrics]")
{"diabetes", 0.0345470614} {"diabetes", 0.0345470614}
}; };
map<pair<std::string, int>, std::vector<pair<int, int>>> resultsMST = { map<pair<std::string, int>, std::vector<pair<int, int>>> resultsMST = {
{ {"glass", 0}, { {0, 6}, {0, 5}, {0, 3}, {5, 1}, {5, 8}, {5, 4}, {6, 2}, {6, 7} } }, { {"glass", 0}, { {0, 6}, {0, 5}, {0, 3}, {3, 4}, {5, 1}, {5, 8}, {6, 2}, {6, 7} } },
{ {"glass", 1}, { {1, 5}, {5, 0}, {5, 8}, {5, 4}, {0, 6}, {0, 3}, {6, 2}, {6, 7} } }, { {"glass", 1}, { {1, 5}, {5, 0}, {5, 8}, {0, 6}, {0, 3}, {3, 4}, {6, 2}, {6, 7} } },
{ {"iris", 0}, { {0, 1}, {0, 2}, {1, 3} } }, { {"iris", 0}, { {0, 1}, {0, 2}, {1, 3} } },
{ {"iris", 1}, { {1, 0}, {1, 3}, {0, 2} } }, { {"iris", 1}, { {1, 0}, {1, 3}, {0, 2} } },
{ {"ecoli", 0}, { {0, 1}, {0, 2}, {1, 5}, {1, 3}, {5, 6}, {5, 4} } }, { {"ecoli", 0}, { {0, 1}, {0, 2}, {1, 5}, {1, 3}, {5, 6}, {5, 4} } },
@@ -37,8 +37,8 @@ TEST_CASE("Metrics Test", "[Metrics]")
{ {"diabetes", 1}, { {1, 4}, {4, 3}, {3, 2}, {3, 5}, {2, 0}, {0, 7}, {0, 6} } } { {"diabetes", 1}, { {1, 4}, {4, 3}, {3, 2}, {3, 5}, {2, 0}, {0, 7}, {0, 6} } }
}; };
auto raw = RawDatasets(file_name, true); auto raw = RawDatasets(file_name, true);
bayesnet::Metrics metrics(raw.dataset, raw.featurest, raw.classNamet, raw.classNumStates); bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
bayesnet::Metrics metricsv(raw.Xv, raw.yv, raw.featurest, raw.classNamet, raw.classNumStates); bayesnet::Metrics metricsv(raw.Xv, raw.yv, raw.features, raw.className, raw.classNumStates);
SECTION("Test Constructor") SECTION("Test Constructor")
{ {
@@ -69,10 +69,199 @@ TEST_CASE("Metrics Test", "[Metrics]")
auto weights_matrix = metrics.conditionalEdge(raw.weights); auto weights_matrix = metrics.conditionalEdge(raw.weights);
auto weights_matrixv = metricsv.conditionalEdge(raw.weights); auto weights_matrixv = metricsv.conditionalEdge(raw.weights);
for (int i = 0; i < 2; ++i) { for (int i = 0; i < 2; ++i) {
auto result = metrics.maximumSpanningTree(raw.featurest, weights_matrix, i); auto result = metrics.maximumSpanningTree(raw.features, weights_matrix, i);
auto resultv = metricsv.maximumSpanningTree(raw.featurest, weights_matrixv, i); auto resultv = metricsv.maximumSpanningTree(raw.features, weights_matrixv, i);
REQUIRE(result == resultsMST.at({ file_name, i })); REQUIRE(result == resultsMST.at({ file_name, i }));
REQUIRE(resultv == resultsMST.at({ file_name, i })); REQUIRE(resultv == resultsMST.at({ file_name, i }));
} }
} }
} }
TEST_CASE("Select all features ordered by Mutual Information", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
auto kBest = metrics.SelectKBestWeighted(raw.weights, true, 0);
REQUIRE(kBest.size() == raw.features.size());
REQUIRE(kBest == std::vector<int>({ 1, 0, 3, 2 }));
}
TEST_CASE("Entropy Test", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
auto result = metrics.entropy(raw.dataset.index({ 0, "..." }), raw.weights);
REQUIRE(result == Catch::Approx(0.9848175048828125).epsilon(raw.epsilon));
auto data = torch::tensor({ 0, 0, 0, 0, 0, 0, 0, 1, 1, 1 }, torch::kInt32);
auto weights = torch::tensor({ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, torch::kFloat32);
result = metrics.entropy(data, weights);
REQUIRE(result == Catch::Approx(0.61086434125900269).epsilon(raw.epsilon));
data = torch::tensor({ 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 }, torch::kInt32);
result = metrics.entropy(data, weights);
REQUIRE(result == Catch::Approx(0.693147180559945).epsilon(raw.epsilon));
}
TEST_CASE("Conditional Entropy", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
auto expected = std::map<std::pair<int, int>, double>{
{ { 0, 1 }, 1.32674 },
{ { 0, 2 }, 0.236253 },
{ { 0, 3 }, 0.1202 },
{ { 1, 2 }, 0.252551 },
{ { 1, 3 }, 0.10515 },
{ { 2, 3 }, 0.108323 },
};
for (int i = 0; i < raw.features.size() - 1; ++i) {
for (int j = i + 1; j < raw.features.size(); ++j) {
double result = metrics.conditionalEntropy(raw.dataset.index({ i, "..." }), raw.dataset.index({ j, "..." }), raw.yt, raw.weights);
REQUIRE(result == Catch::Approx(expected.at({ i, j })).epsilon(raw.epsilon));
}
}
}
TEST_CASE("Conditional Mutual Information", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
auto expected = std::map<std::pair<int, int>, double>{
{ { 0, 1 }, 0.0 },
{ { 0, 2 }, 0.287696 },
{ { 0, 3 }, 0.403749 },
{ { 1, 2 }, 1.17112 },
{ { 1, 3 }, 1.31852 },
{ { 2, 3 }, 0.210068 },
};
for (int i = 0; i < raw.features.size() - 1; ++i) {
for (int j = i + 1; j < raw.features.size(); ++j) {
double result = metrics.conditionalMutualInformation(raw.dataset.index({ i, "..." }), raw.dataset.index({ j, "..." }), raw.yt, raw.weights);
REQUIRE(result == Catch::Approx(expected.at({ i, j })).epsilon(raw.epsilon));
}
}
}
TEST_CASE("Select K Pairs descending", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
std::vector<int> empty;
auto results = metrics.SelectKPairs(raw.weights, empty, false);
auto expected = std::vector<std::pair<std::pair<int, int>, double>>{
{ { 1, 3 }, 1.31852 },
{ { 1, 2 }, 1.17112 },
{ { 0, 3 }, 0.403749 },
{ { 0, 2 }, 0.287696 },
{ { 2, 3 }, 0.210068 },
{ { 0, 1 }, 0.0 },
};
auto scores = metrics.getScoresKPairs();
for (int i = 0; i < results.size(); ++i) {
auto result = results[i];
auto expect = expected[i];
auto score = scores[i];
REQUIRE(result.first == expect.first.first);
REQUIRE(result.second == expect.first.second);
REQUIRE(score.first.first == expect.first.first);
REQUIRE(score.first.second == expect.first.second);
REQUIRE(score.second == Catch::Approx(expect.second).epsilon(raw.epsilon));
}
REQUIRE(results.size() == 6);
REQUIRE(scores.size() == 6);
}
TEST_CASE("Select K Pairs ascending", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
std::vector<int> empty;
auto results = metrics.SelectKPairs(raw.weights, empty, true);
auto expected = std::vector<std::pair<std::pair<int, int>, double>>{
{ { 0, 1 }, 0.0 },
{ { 2, 3 }, 0.210068 },
{ { 0, 2 }, 0.287696 },
{ { 0, 3 }, 0.403749 },
{ { 1, 2 }, 1.17112 },
{ { 1, 3 }, 1.31852 },
};
auto scores = metrics.getScoresKPairs();
for (int i = 0; i < results.size(); ++i) {
auto result = results[i];
auto expect = expected[i];
auto score = scores[i];
REQUIRE(result.first == expect.first.first);
REQUIRE(result.second == expect.first.second);
REQUIRE(score.first.first == expect.first.first);
REQUIRE(score.first.second == expect.first.second);
REQUIRE(score.second == Catch::Approx(expect.second).epsilon(raw.epsilon));
}
REQUIRE(results.size() == 6);
REQUIRE(scores.size() == 6);
}
TEST_CASE("Select K Pairs with features excluded", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
std::vector<int> excluded = { 0, 3 };
auto results = metrics.SelectKPairs(raw.weights, excluded, true);
auto expected = std::vector<std::pair<std::pair<int, int>, double>>{
{ { 1, 2 }, 1.17112 },
};
auto scores = metrics.getScoresKPairs();
for (int i = 0; i < results.size(); ++i) {
auto result = results[i];
auto expect = expected[i];
auto score = scores[i];
REQUIRE(result.first == expect.first.first);
REQUIRE(result.second == expect.first.second);
REQUIRE(score.first.first == expect.first.first);
REQUIRE(score.first.second == expect.first.second);
REQUIRE(score.second == Catch::Approx(expect.second).epsilon(raw.epsilon));
}
REQUIRE(results.size() == 1);
REQUIRE(scores.size() == 1);
}
TEST_CASE("Select K Pairs with number of pairs descending", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
std::vector<int> empty;
auto results = metrics.SelectKPairs(raw.weights, empty, false, 3);
auto expected = std::vector<std::pair<std::pair<int, int>, double>>{
{ { 1, 3 }, 1.31852 },
{ { 1, 2 }, 1.17112 },
{ { 0, 3 }, 0.403749 }
};
auto scores = metrics.getScoresKPairs();
REQUIRE(results.size() == 3);
REQUIRE(scores.size() == 3);
for (int i = 0; i < results.size(); ++i) {
auto result = results[i];
auto expect = expected[i];
auto score = scores[i];
REQUIRE(result.first == expect.first.first);
REQUIRE(result.second == expect.first.second);
REQUIRE(score.first.first == expect.first.first);
REQUIRE(score.first.second == expect.first.second);
REQUIRE(score.second == Catch::Approx(expect.second).epsilon(raw.epsilon));
}
}
TEST_CASE("Select K Pairs with number of pairs ascending", "[Metrics]")
{
auto raw = RawDatasets("iris", true);
bayesnet::Metrics metrics(raw.dataset, raw.features, raw.className, raw.classNumStates);
std::vector<int> empty;
auto results = metrics.SelectKPairs(raw.weights, empty, true, 3);
auto expected = std::vector<std::pair<std::pair<int, int>, double>>{
{ { 0, 3 }, 0.403749 },
{ { 1, 2 }, 1.17112 },
{ { 1, 3 }, 1.31852 }
};
auto scores = metrics.getScoresKPairs();
REQUIRE(results.size() == 3);
REQUIRE(scores.size() == 3);
for (int i = 0; i < results.size(); ++i) {
auto result = results[i];
auto expect = expected[i];
auto score = scores[i];
REQUIRE(result.first == expect.first.first);
REQUIRE(result.second == expect.first.second);
REQUIRE(score.first.first == expect.first.first);
REQUIRE(score.first.second == expect.first.second);
REQUIRE(score.second == Catch::Approx(expect.second).epsilon(raw.epsilon));
}
}

View File

@@ -20,7 +20,7 @@
#include "bayesnet/ensembles/BoostAODE.h" #include "bayesnet/ensembles/BoostAODE.h"
#include "TestUtils.h" #include "TestUtils.h"
const std::string ACTUAL_VERSION = "1.0.4.1"; const std::string ACTUAL_VERSION = "1.0.5.1";
TEST_CASE("Test Bayesian Classifiers score & version", "[Models]") TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
{ {
@@ -54,16 +54,16 @@ TEST_CASE("Test Bayesian Classifiers score & version", "[Models]")
auto clf = models[name]; auto clf = models[name];
auto discretize = name.substr(name.length() - 2) != "Ld"; auto discretize = name.substr(name.length() - 2) != "Ld";
auto raw = RawDatasets(file_name, discretize); auto raw = RawDatasets(file_name, discretize);
clf->fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); clf->fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
auto score = clf->score(raw.Xt, raw.yt); auto score = clf->score(raw.Xt, raw.yt);
INFO("Classifier: " + name + " File: " + file_name); INFO("Classifier: " << name << " File: " << file_name);
REQUIRE(score == Catch::Approx(scores[{file_name, name}]).epsilon(raw.epsilon)); REQUIRE(score == Catch::Approx(scores[{file_name, name}]).epsilon(raw.epsilon));
REQUIRE(clf->getStatus() == bayesnet::NORMAL); REQUIRE(clf->getStatus() == bayesnet::NORMAL);
} }
} }
SECTION("Library check version") SECTION("Library check version")
{ {
INFO("Checking version of " + name + " classifier"); INFO("Checking version of " << name << " classifier");
REQUIRE(clf->getVersion() == ACTUAL_VERSION); REQUIRE(clf->getVersion() == ACTUAL_VERSION);
} }
delete clf; delete clf;
@@ -81,7 +81,7 @@ TEST_CASE("Models features & Graph", "[Models]")
{ {
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto clf = bayesnet::TAN(); auto clf = bayesnet::TAN();
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 5); REQUIRE(clf.getNumberOfNodes() == 5);
REQUIRE(clf.getNumberOfEdges() == 7); REQUIRE(clf.getNumberOfEdges() == 7);
REQUIRE(clf.getNumberOfStates() == 19); REQUIRE(clf.getNumberOfStates() == 19);
@@ -93,7 +93,7 @@ TEST_CASE("Models features & Graph", "[Models]")
{ {
auto clf = bayesnet::TANLd(); auto clf = bayesnet::TANLd();
auto raw = RawDatasets("iris", false); auto raw = RawDatasets("iris", false);
clf.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); clf.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 5); REQUIRE(clf.getNumberOfNodes() == 5);
REQUIRE(clf.getNumberOfEdges() == 7); REQUIRE(clf.getNumberOfEdges() == 7);
REQUIRE(clf.getNumberOfStates() == 19); REQUIRE(clf.getNumberOfStates() == 19);
@@ -106,7 +106,7 @@ TEST_CASE("Get num features & num edges", "[Models]")
{ {
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto clf = bayesnet::KDB(2); auto clf = bayesnet::KDB(2);
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 5); REQUIRE(clf.getNumberOfNodes() == 5);
REQUIRE(clf.getNumberOfEdges() == 8); REQUIRE(clf.getNumberOfEdges() == 8);
} }
@@ -166,7 +166,7 @@ TEST_CASE("Model predict_proba", "[Models]")
SECTION("Test " + model + " predict_proba") SECTION("Test " + model + " predict_proba")
{ {
auto clf = models[model]; auto clf = models[model];
clf->fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf->fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto y_pred_proba = clf->predict_proba(raw.Xv); auto y_pred_proba = clf->predict_proba(raw.Xv);
auto yt_pred_proba = clf->predict_proba(raw.Xt); auto yt_pred_proba = clf->predict_proba(raw.Xt);
auto y_pred = clf->predict(raw.Xv); auto y_pred = clf->predict(raw.Xv);
@@ -203,7 +203,7 @@ TEST_CASE("AODE voting-proba", "[Models]")
{ {
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::AODE(false); auto clf = bayesnet::AODE(false);
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto score_proba = clf.score(raw.Xv, raw.yv); auto score_proba = clf.score(raw.Xv, raw.yv);
auto pred_proba = clf.predict_proba(raw.Xv); auto pred_proba = clf.predict_proba(raw.Xv);
clf.setHyperparameters({ clf.setHyperparameters({
@@ -222,9 +222,9 @@ TEST_CASE("SPODELd dataset", "[Models]")
auto raw = RawDatasets("iris", false); auto raw = RawDatasets("iris", false);
auto clf = bayesnet::SPODELd(0); auto clf = bayesnet::SPODELd(0);
// raw.dataset.to(torch::kFloat32); // raw.dataset.to(torch::kFloat32);
clf.fit(raw.dataset, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.dataset, raw.features, raw.className, raw.states);
auto score = clf.score(raw.Xt, raw.yt); auto score = clf.score(raw.Xt, raw.yt);
clf.fit(raw.Xt, raw.yt, raw.featurest, raw.classNamet, raw.statest); clf.fit(raw.Xt, raw.yt, raw.features, raw.className, raw.states);
auto scoret = clf.score(raw.Xt, raw.yt); auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.97333f).epsilon(raw.epsilon)); REQUIRE(score == Catch::Approx(0.97333f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.97333f).epsilon(raw.epsilon)); REQUIRE(scoret == Catch::Approx(0.97333f).epsilon(raw.epsilon));
@@ -233,13 +233,13 @@ TEST_CASE("KDB with hyperparameters", "[Models]")
{ {
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::KDB(2); auto clf = bayesnet::KDB(2);
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto score = clf.score(raw.Xv, raw.yv); auto score = clf.score(raw.Xv, raw.yv);
clf.setHyperparameters({ clf.setHyperparameters({
{"k", 3}, {"k", 3},
{"theta", 0.7}, {"theta", 0.7},
}); });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto scoret = clf.score(raw.Xv, raw.yv); auto scoret = clf.score(raw.Xv, raw.yv);
REQUIRE(score == Catch::Approx(0.827103).epsilon(raw.epsilon)); REQUIRE(score == Catch::Approx(0.827103).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.761682).epsilon(raw.epsilon)); REQUIRE(scoret == Catch::Approx(0.761682).epsilon(raw.epsilon));
@@ -248,7 +248,7 @@ TEST_CASE("Incorrect type of data for SPODELd", "[Models]")
{ {
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto clf = bayesnet::SPODELd(0); auto clf = bayesnet::SPODELd(0);
REQUIRE_THROWS_AS(clf.fit(raw.dataset, raw.featurest, raw.classNamet, raw.statest), std::runtime_error); REQUIRE_THROWS_AS(clf.fit(raw.dataset, raw.features, raw.className, raw.states), std::runtime_error);
} }
TEST_CASE("Predict, predict_proba & score without fitting", "[Models]") TEST_CASE("Predict, predict_proba & score without fitting", "[Models]")
{ {

View File

@@ -12,6 +12,7 @@
#include <string> #include <string>
#include "TestUtils.h" #include "TestUtils.h"
#include "bayesnet/network/Network.h" #include "bayesnet/network/Network.h"
#include "bayesnet/network/Node.h"
#include "bayesnet/utils/bayesnetUtils.h" #include "bayesnet/utils/bayesnetUtils.h"
void buildModel(bayesnet::Network& net, const std::vector<std::string>& features, const std::string& className) void buildModel(bayesnet::Network& net, const std::vector<std::string>& features, const std::string& className)
@@ -73,9 +74,9 @@ TEST_CASE("Test Bayesian Network", "[Network]")
net3.initialize(); net3.initialize();
net2.initialize(); net2.initialize();
net.initialize(); net.initialize();
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
buildModel(net2, raw.featurest, raw.classNamet); buildModel(net2, raw.features, raw.className);
buildModel(net3, raw.featurest, raw.classNamet); buildModel(net3, raw.features, raw.className);
std::vector<pair<std::string, std::string>> edges = { std::vector<pair<std::string, std::string>> edges = {
{"class", "sepallength"}, {"class", "sepalwidth"}, {"class", "petallength"}, {"class", "sepallength"}, {"class", "sepalwidth"}, {"class", "petallength"},
{"class", "petalwidth" }, {"sepallength", "sepalwidth"}, {"sepallength", "petallength"}, {"class", "petalwidth" }, {"sepallength", "sepalwidth"}, {"sepallength", "petallength"},
@@ -114,9 +115,9 @@ TEST_CASE("Test Bayesian Network", "[Network]")
REQUIRE(children == children3); REQUIRE(children == children3);
} }
// Fit networks // Fit networks
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
net2.fit(raw.dataset, raw.weights, raw.featurest, raw.classNamet, raw.statest); net2.fit(raw.dataset, raw.weights, raw.features, raw.className, raw.states);
net3.fit(raw.Xt, raw.yt, raw.weights, raw.featurest, raw.classNamet, raw.statest); net3.fit(raw.Xt, raw.yt, raw.weights, raw.features, raw.className, raw.states);
REQUIRE(net.getStates() == net2.getStates()); REQUIRE(net.getStates() == net2.getStates());
REQUIRE(net.getStates() == net3.getStates()); REQUIRE(net.getStates() == net3.getStates());
REQUIRE(net.getFeatures() == net2.getFeatures()); REQUIRE(net.getFeatures() == net2.getFeatures());
@@ -192,8 +193,8 @@ TEST_CASE("Test Bayesian Network", "[Network]")
} }
SECTION("Test predict") SECTION("Test predict")
{ {
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
std::vector<std::vector<int>> test = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1}, {2, 2, 2, 2, 1} }; std::vector<std::vector<int>> test = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1}, {2, 2, 2, 2, 1} };
std::vector<int> y_test = { 2, 2, 0, 2, 1 }; std::vector<int> y_test = { 2, 2, 0, 2, 1 };
auto y_pred = net.predict(test); auto y_pred = net.predict(test);
@@ -201,8 +202,8 @@ TEST_CASE("Test Bayesian Network", "[Network]")
} }
SECTION("Test predict_proba") SECTION("Test predict_proba")
{ {
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
std::vector<std::vector<int>> test = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1}, {2, 2, 2, 2, 1} }; std::vector<std::vector<int>> test = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1}, {2, 2, 2, 2, 1} };
std::vector<std::vector<double>> y_test = { std::vector<std::vector<double>> y_test = {
{0.450237, 0.0866621, 0.463101}, {0.450237, 0.0866621, 0.463101},
@@ -222,15 +223,15 @@ TEST_CASE("Test Bayesian Network", "[Network]")
} }
SECTION("Test score") SECTION("Test score")
{ {
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
auto score = net.score(raw.Xv, raw.yv); auto score = net.score(raw.Xv, raw.yv);
REQUIRE(score == Catch::Approx(0.97333333).margin(threshold)); REQUIRE(score == Catch::Approx(0.97333333).margin(threshold));
} }
SECTION("Copy constructor") SECTION("Copy constructor")
{ {
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
auto net2 = bayesnet::Network(net); auto net2 = bayesnet::Network(net);
REQUIRE(net.getFeatures() == net2.getFeatures()); REQUIRE(net.getFeatures() == net2.getFeatures());
REQUIRE(net.getEdges() == net2.getEdges()); REQUIRE(net.getEdges() == net2.getEdges());
@@ -252,7 +253,7 @@ TEST_CASE("Test Bayesian Network", "[Network]")
} }
SECTION("Test oddities") SECTION("Test oddities")
{ {
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
// predict without fitting // predict without fitting
std::vector<std::vector<int>> test = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1}, {2, 2, 2, 2, 1} }; std::vector<std::vector<int>> test = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1}, {2, 2, 2, 2, 1} };
auto test_tensor = bayesnet::vectorToTensor(test); auto test_tensor = bayesnet::vectorToTensor(test);
@@ -266,8 +267,8 @@ TEST_CASE("Test Bayesian Network", "[Network]")
REQUIRE_THROWS_WITH(net.score(raw.Xv, raw.yv), "You must call fit() before calling predict()"); REQUIRE_THROWS_WITH(net.score(raw.Xv, raw.yv), "You must call fit() before calling predict()");
// predict with wrong data // predict with wrong data
auto netx = bayesnet::Network(); auto netx = bayesnet::Network();
buildModel(netx, raw.featuresv, raw.classNamev); buildModel(netx, raw.features, raw.className);
netx.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); netx.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
std::vector<std::vector<int>> test2 = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1} }; std::vector<std::vector<int>> test2 = { {1, 2, 0, 1, 1}, {0, 1, 2, 0, 1}, {0, 0, 0, 0, 1} };
auto test_tensor2 = bayesnet::vectorToTensor(test2, false); auto test_tensor2 = bayesnet::vectorToTensor(test2, false);
REQUIRE_THROWS_AS(netx.predict(test2), std::logic_error); REQUIRE_THROWS_AS(netx.predict(test2), std::logic_error);
@@ -277,41 +278,50 @@ TEST_CASE("Test Bayesian Network", "[Network]")
// fit with wrong data // fit with wrong data
// Weights // Weights
auto net2 = bayesnet::Network(); auto net2 = bayesnet::Network();
REQUIRE_THROWS_AS(net2.fit(raw.Xv, raw.yv, std::vector<double>(), raw.featuresv, raw.classNamev, raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net2.fit(raw.Xv, raw.yv, std::vector<double>(), raw.features, raw.className, raw.states), std::invalid_argument);
std::string invalid_weights = "Weights (0) must have the same number of elements as samples (150) in Network::fit"; std::string invalid_weights = "Weights (0) must have the same number of elements as samples (150) in Network::fit";
REQUIRE_THROWS_WITH(net2.fit(raw.Xv, raw.yv, std::vector<double>(), raw.featuresv, raw.classNamev, raw.statesv), invalid_weights); REQUIRE_THROWS_WITH(net2.fit(raw.Xv, raw.yv, std::vector<double>(), raw.features, raw.className, raw.states), invalid_weights);
// X & y // X & y
std::string invalid_labels = "X and y must have the same number of samples in Network::fit (150 != 0)"; std::string invalid_labels = "X and y must have the same number of samples in Network::fit (150 != 0)";
REQUIRE_THROWS_AS(net2.fit(raw.Xv, std::vector<int>(), raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net2.fit(raw.Xv, std::vector<int>(), raw.weightsv, raw.features, raw.className, raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(net2.fit(raw.Xv, std::vector<int>(), raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv), invalid_labels); REQUIRE_THROWS_WITH(net2.fit(raw.Xv, std::vector<int>(), raw.weightsv, raw.features, raw.className, raw.states), invalid_labels);
// Features // Features
std::string invalid_features = "X and features must have the same number of features in Network::fit (4 != 0)"; std::string invalid_features = "X and features must have the same number of features in Network::fit (4 != 0)";
REQUIRE_THROWS_AS(net2.fit(raw.Xv, raw.yv, raw.weightsv, std::vector<std::string>(), raw.classNamev, raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net2.fit(raw.Xv, raw.yv, raw.weightsv, std::vector<std::string>(), raw.className, raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(net2.fit(raw.Xv, raw.yv, raw.weightsv, std::vector<std::string>(), raw.classNamev, raw.statesv), invalid_features); REQUIRE_THROWS_WITH(net2.fit(raw.Xv, raw.yv, raw.weightsv, std::vector<std::string>(), raw.className, raw.states), invalid_features);
// Different number of features // Different number of features
auto net3 = bayesnet::Network(); auto net3 = bayesnet::Network();
auto test2y = { 1, 2, 3, 4, 5 }; auto test2y = { 1, 2, 3, 4, 5 };
buildModel(net3, raw.featuresv, raw.classNamev); buildModel(net3, raw.features, raw.className);
auto features3 = raw.featuresv; auto features3 = raw.features;
features3.pop_back(); features3.pop_back();
std::string invalid_features2 = "X and local features must have the same number of features in Network::fit (3 != 4)"; std::string invalid_features2 = "X and local features must have the same number of features in Network::fit (3 != 4)";
REQUIRE_THROWS_AS(net3.fit(test2, test2y, std::vector<double>(5, 0), features3, raw.classNamev, raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net3.fit(test2, test2y, std::vector<double>(5, 0), features3, raw.className, raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(net3.fit(test2, test2y, std::vector<double>(5, 0), features3, raw.classNamev, raw.statesv), invalid_features2); REQUIRE_THROWS_WITH(net3.fit(test2, test2y, std::vector<double>(5, 0), features3, raw.className, raw.states), invalid_features2);
// Uninitialized network // Uninitialized network
std::string network_invalid = "The network has not been initialized. You must call addNode() before calling fit()"; std::string network_invalid = "The network has not been initialized. You must call addNode() before calling fit()";
REQUIRE_THROWS_AS(net2.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, "duck", raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net2.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, "duck", raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(net2.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, "duck", raw.statesv), network_invalid); REQUIRE_THROWS_WITH(net2.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, "duck", raw.states), network_invalid);
// Classname // Classname
std::string invalid_classname = "Class Name not found in Network::features"; std::string invalid_classname = "Class Name not found in Network::features";
REQUIRE_THROWS_AS(net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, "duck", raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, "duck", raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, "duck", raw.statesv), invalid_classname); REQUIRE_THROWS_WITH(net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, "duck", raw.states), invalid_classname);
// Invalid feature // Invalid feature
auto features2 = raw.featuresv; auto features2 = raw.features;
features2.pop_back(); features2.pop_back();
features2.push_back("duck"); features2.push_back("duck");
std::string invalid_feature = "Feature duck not found in Network::features"; std::string invalid_feature = "Feature duck not found in Network::features";
REQUIRE_THROWS_AS(net.fit(raw.Xv, raw.yv, raw.weightsv, features2, raw.classNamev, raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(net.fit(raw.Xv, raw.yv, raw.weightsv, features2, raw.className, raw.states), std::invalid_argument);
REQUIRE_THROWS_WITH(net.fit(raw.Xv, raw.yv, raw.weightsv, features2, raw.classNamev, raw.statesv), invalid_feature); REQUIRE_THROWS_WITH(net.fit(raw.Xv, raw.yv, raw.weightsv, features2, raw.className, raw.states), invalid_feature);
// Add twice the same node name to the network => Nothing should happen
net.addNode("A");
net.addNode("A");
// invalid state in checkfit
auto net4 = bayesnet::Network();
buildModel(net4, raw.features, raw.className);
std::string invalid_state = "Feature sepallength not found in states";
REQUIRE_THROWS_AS(net4.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, std::map<std::string, std::vector<int>>()), std::invalid_argument);
REQUIRE_THROWS_WITH(net4.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, std::map<std::string, std::vector<int>>()), invalid_state);
} }
} }
@@ -355,8 +365,8 @@ TEST_CASE("Dump CPT", "[Network]")
{ {
auto net = bayesnet::Network(); auto net = bayesnet::Network();
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
buildModel(net, raw.featuresv, raw.classNamev); buildModel(net, raw.features, raw.className);
net.fit(raw.Xv, raw.yv, raw.weightsv, raw.featuresv, raw.classNamev, raw.statesv); net.fit(raw.Xv, raw.yv, raw.weightsv, raw.features, raw.className, raw.states);
auto res = net.dump_cpt(); auto res = net.dump_cpt();
std::string expected = R"(* class: (3) : [3] std::string expected = R"(* class: (3) : [3]
0.3333 0.3333

View File

@@ -7,7 +7,9 @@
#include <catch2/catch_test_macros.hpp> #include <catch2/catch_test_macros.hpp>
#include <catch2/catch_approx.hpp> #include <catch2/catch_approx.hpp>
#include <catch2/generators/catch_generators.hpp> #include <catch2/generators/catch_generators.hpp>
#include <catch2/matchers/catch_matchers.hpp>
#include <string> #include <string>
#include <vector>
#include "TestUtils.h" #include "TestUtils.h"
#include "bayesnet/network/Network.h" #include "bayesnet/network/Network.h"
@@ -48,6 +50,73 @@ TEST_CASE("Test Node children and parents", "[Node]")
REQUIRE(parents.size() == 0); REQUIRE(parents.size() == 0);
REQUIRE(children.size() == 0); REQUIRE(children.size() == 0);
} }
TEST_CASE("Test Node computeCPT", "[Node]")
{
// Generate a test to test the computeCPT method of the Node class
// Create a dataset with 3 features and 4 samples
// The dataset is a 2D tensor with 4 rows and 4 columns
auto dataset = torch::tensor({ {1, 0, 0, 1}, {1, 1, 2, 0}, {0, 1, 2, 1}, {0, 1, 0, 1} });
auto states = std::vector<int>({ 2, 3, 3 });
// Create a vector with the names of the features
auto features = std::vector<std::string>{ "F1", "F2", "F3" };
// Create a vector with the names of the classes
auto className = std::string("Class");
// weights
auto weights = torch::tensor({ 1.0, 1.0, 1.0, 1.0 });
std::vector<bayesnet::Node> nodes;
for (int i = 0; i < features.size(); i++) {
auto node = bayesnet::Node(features[i]);
node.setNumStates(states[i]);
nodes.push_back(node);
}
nodes.push_back(bayesnet::Node(className));
nodes[features.size()].setNumStates(2);
for (int i = 0; i < features.size(); i++) {
// Add class node as parent of all feature nodes
nodes[i].addParent(&nodes[features.size()]);
// Node[0] -> Node[1], Node[2]
if (i > 0)
nodes[i].addParent(&nodes[0]);
}
features.push_back(className);
// Compute the conditional probability table
nodes[1].computeCPT(dataset, features, 0.0, weights);
// Get the conditional probability table
auto cpTable = nodes[1].getCPT();
// Get the dimensions of the conditional probability table
auto dimensions = cpTable.sizes();
// Check the dimensions of the conditional probability table
REQUIRE(dimensions.size() == 3);
REQUIRE(dimensions[0] == 3);
REQUIRE(dimensions[1] == 2);
REQUIRE(dimensions[2] == 2);
// Check the values of the conditional probability table
REQUIRE(cpTable[0][0][0].item<float>() == Catch::Approx(0));
REQUIRE(cpTable[0][0][1].item<float>() == Catch::Approx(0));
REQUIRE(cpTable[0][1][0].item<float>() == Catch::Approx(0));
REQUIRE(cpTable[0][1][1].item<float>() == Catch::Approx(1));
REQUIRE(cpTable[1][0][0].item<float>() == Catch::Approx(0));
REQUIRE(cpTable[1][0][1].item<float>() == Catch::Approx(1));
REQUIRE(cpTable[1][1][0].item<float>() == Catch::Approx(1));
REQUIRE(cpTable[1][1][1].item<float>() == Catch::Approx(0));
// Compute evidence
for (auto& node : nodes) {
node.computeCPT(dataset, features, 0.0, weights);
}
auto evidence = std::map<std::string, int>{ { "F1", 0 }, { "F2", 1 }, { "F3", 1 } };
REQUIRE(nodes[3].getFactorValue(evidence) == 0.5);
// Oddities
auto features_back = features;
// Remove a parent from features
features.pop_back();
REQUIRE_THROWS_AS(nodes[0].computeCPT(dataset, features, 0.0, weights), std::logic_error);
REQUIRE_THROWS_WITH(nodes[0].computeCPT(dataset, features, 0.0, weights), "Feature parent Class not found in dataset");
// Remove a feature from features
features = features_back;
features.erase(features.begin());
REQUIRE_THROWS_AS(nodes[0].computeCPT(dataset, features, 0.0, weights), std::logic_error);
REQUIRE_THROWS_WITH(nodes[0].computeCPT(dataset, features, 0.0, weights), "Feature F1 not found in dataset");
}
TEST_CASE("TEST MinFill method", "[Node]") TEST_CASE("TEST MinFill method", "[Node]")
{ {
// Generate a test to test the minFill method of the Node class // Generate a test to test the minFill method of the Node class

215
tests/TestBoostA2DE.cc Normal file
View File

@@ -0,0 +1,215 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <type_traits>
#include <catch2/catch_test_macros.hpp>
#include <catch2/catch_approx.hpp>
#include <catch2/generators/catch_generators.hpp>
#include "bayesnet/utils/BayesMetrics.h"
#include "bayesnet/ensembles/BoostA2DE.h"
#include "TestUtils.h"
TEST_CASE("Build basic model", "[BoostA2DE]")
{
auto raw = RawDatasets("diabetes", true);
auto clf = bayesnet::BoostA2DE();
clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 342);
REQUIRE(clf.getNumberOfEdges() == 684);
REQUIRE(clf.getNotes().size() == 3);
REQUIRE(clf.getNotes()[0] == "Convergence threshold reached & 15 models eliminated");
REQUIRE(clf.getNotes()[1] == "Pairs not used in train: 20");
REQUIRE(clf.getNotes()[2] == "Number of models: 38");
auto score = clf.score(raw.Xv, raw.yv);
REQUIRE(score == Catch::Approx(0.919271).epsilon(raw.epsilon));
}
// TEST_CASE("Feature_select IWSS", "[BoostAODE]")
// {
// auto raw = RawDatasets("glass", true);
// auto clf = bayesnet::BoostAODE();
// clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5 } });
// clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
// REQUIRE(clf.getNumberOfNodes() == 90);
// REQUIRE(clf.getNumberOfEdges() == 153);
// REQUIRE(clf.getNotes().size() == 2);
// REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with IWSS");
// REQUIRE(clf.getNotes()[1] == "Number of models: 9");
// }
// TEST_CASE("Feature_select FCBF", "[BoostAODE]")
// {
// auto raw = RawDatasets("glass", true);
// auto clf = bayesnet::BoostAODE();
// clf.setHyperparameters({ {"select_features", "FCBF"}, {"threshold", 1e-7 } });
// clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
// REQUIRE(clf.getNumberOfNodes() == 90);
// REQUIRE(clf.getNumberOfEdges() == 153);
// REQUIRE(clf.getNotes().size() == 2);
// REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with FCBF");
// REQUIRE(clf.getNotes()[1] == "Number of models: 9");
// }
// TEST_CASE("Test used features in train note and score", "[BoostAODE]")
// {
// auto raw = RawDatasets("diabetes", true);
// auto clf = bayesnet::BoostAODE(true);
// clf.setHyperparameters({
// {"order", "asc"},
// {"convergence", true},
// {"select_features","CFS"},
// });
// clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
// REQUIRE(clf.getNumberOfNodes() == 72);
// REQUIRE(clf.getNumberOfEdges() == 120);
// REQUIRE(clf.getNotes().size() == 2);
// REQUIRE(clf.getNotes()[0] == "Used features in initialization: 6 of 8 with CFS");
// REQUIRE(clf.getNotes()[1] == "Number of models: 8");
// auto score = clf.score(raw.Xv, raw.yv);
// auto scoret = clf.score(raw.Xt, raw.yt);
// REQUIRE(score == Catch::Approx(0.809895813).epsilon(raw.epsilon));
// REQUIRE(scoret == Catch::Approx(0.809895813).epsilon(raw.epsilon));
// }
// TEST_CASE("Voting vs proba", "[BoostAODE]")
// {
// auto raw = RawDatasets("iris", true);
// auto clf = bayesnet::BoostAODE(false);
// clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
// auto score_proba = clf.score(raw.Xv, raw.yv);
// auto pred_proba = clf.predict_proba(raw.Xv);
// clf.setHyperparameters({
// {"predict_voting",true},
// });
// auto score_voting = clf.score(raw.Xv, raw.yv);
// auto pred_voting = clf.predict_proba(raw.Xv);
// REQUIRE(score_proba == Catch::Approx(0.97333).epsilon(raw.epsilon));
// REQUIRE(score_voting == Catch::Approx(0.98).epsilon(raw.epsilon));
// REQUIRE(pred_voting[83][2] == Catch::Approx(1.0).epsilon(raw.epsilon));
// REQUIRE(pred_proba[83][2] == Catch::Approx(0.86121525).epsilon(raw.epsilon));
// REQUIRE(clf.dump_cpt() == "");
// REQUIRE(clf.topological_order() == std::vector<std::string>());
// }
// TEST_CASE("Order asc, desc & random", "[BoostAODE]")
// {
// auto raw = RawDatasets("glass", true);
// std::map<std::string, double> scores{
// {"asc", 0.83645f }, { "desc", 0.84579f }, { "rand", 0.84112 }
// };
// for (const std::string& order : { "asc", "desc", "rand" }) {
// auto clf = bayesnet::BoostAODE();
// clf.setHyperparameters({
// {"order", order},
// {"bisection", false},
// {"maxTolerance", 1},
// {"convergence", false},
// });
// clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
// auto score = clf.score(raw.Xv, raw.yv);
// auto scoret = clf.score(raw.Xt, raw.yt);
// INFO("BoostAODE order: " + order);
// REQUIRE(score == Catch::Approx(scores[order]).epsilon(raw.epsilon));
// REQUIRE(scoret == Catch::Approx(scores[order]).epsilon(raw.epsilon));
// }
// }
// TEST_CASE("Oddities", "[BoostAODE]")
// {
// auto clf = bayesnet::BoostAODE();
// auto raw = RawDatasets("iris", true);
// auto bad_hyper = nlohmann::json{
// { { "order", "duck" } },
// { { "select_features", "duck" } },
// { { "maxTolerance", 0 } },
// { { "maxTolerance", 5 } },
// };
// for (const auto& hyper : bad_hyper.items()) {
// INFO("BoostAODE hyper: " + hyper.value().dump());
// REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
// }
// REQUIRE_THROWS_AS(clf.setHyperparameters({ {"maxTolerance", 0 } }), std::invalid_argument);
// auto bad_hyper_fit = nlohmann::json{
// { { "select_features","IWSS" }, { "threshold", -0.01 } },
// { { "select_features","IWSS" }, { "threshold", 0.51 } },
// { { "select_features","FCBF" }, { "threshold", 1e-8 } },
// { { "select_features","FCBF" }, { "threshold", 1.01 } },
// };
// for (const auto& hyper : bad_hyper_fit.items()) {
// INFO("BoostAODE hyper: " + hyper.value().dump());
// clf.setHyperparameters(hyper.value());
// REQUIRE_THROWS_AS(clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states), std::invalid_argument);
// }
// }
// TEST_CASE("Bisection Best", "[BoostAODE]")
// {
// auto clf = bayesnet::BoostAODE();
// auto raw = RawDatasets("kdd_JapaneseVowels", true, 1200, true, false);
// clf.setHyperparameters({
// {"bisection", true},
// {"maxTolerance", 3},
// {"convergence", true},
// {"block_update", false},
// {"convergence_best", false},
// });
// clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
// REQUIRE(clf.getNumberOfNodes() == 210);
// REQUIRE(clf.getNumberOfEdges() == 378);
// REQUIRE(clf.getNotes().size() == 1);
// REQUIRE(clf.getNotes().at(0) == "Number of models: 14");
// auto score = clf.score(raw.X_test, raw.y_test);
// auto scoret = clf.score(raw.X_test, raw.y_test);
// REQUIRE(score == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
// REQUIRE(scoret == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
// }
// TEST_CASE("Bisection Best vs Last", "[BoostAODE]")
// {
// auto raw = RawDatasets("kdd_JapaneseVowels", true, 1500, true, false);
// auto clf = bayesnet::BoostAODE(true);
// auto hyperparameters = nlohmann::json{
// {"bisection", true},
// {"maxTolerance", 3},
// {"convergence", true},
// {"convergence_best", true},
// };
// clf.setHyperparameters(hyperparameters);
// clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
// auto score_best = clf.score(raw.X_test, raw.y_test);
// REQUIRE(score_best == Catch::Approx(0.980000019f).epsilon(raw.epsilon));
// // Now we will set the hyperparameter to use the last accuracy
// hyperparameters["convergence_best"] = false;
// clf.setHyperparameters(hyperparameters);
// clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
// auto score_last = clf.score(raw.X_test, raw.y_test);
// REQUIRE(score_last == Catch::Approx(0.976666689f).epsilon(raw.epsilon));
// }
// TEST_CASE("Block Update", "[BoostAODE]")
// {
// auto clf = bayesnet::BoostAODE();
// auto raw = RawDatasets("mfeat-factors", true, 500);
// clf.setHyperparameters({
// {"bisection", true},
// {"block_update", true},
// {"maxTolerance", 3},
// {"convergence", true},
// });
// clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
// REQUIRE(clf.getNumberOfNodes() == 868);
// REQUIRE(clf.getNumberOfEdges() == 1724);
// REQUIRE(clf.getNotes().size() == 3);
// REQUIRE(clf.getNotes()[0] == "Convergence threshold reached & 15 models eliminated");
// REQUIRE(clf.getNotes()[1] == "Used features in train: 19 of 216");
// REQUIRE(clf.getNotes()[2] == "Number of models: 4");
// auto score = clf.score(raw.X_test, raw.y_test);
// auto scoret = clf.score(raw.X_test, raw.y_test);
// REQUIRE(score == Catch::Approx(0.99f).epsilon(raw.epsilon));
// REQUIRE(scoret == Catch::Approx(0.99f).epsilon(raw.epsilon));
// //
// // std::cout << "Number of nodes " << clf.getNumberOfNodes() << std::endl;
// // std::cout << "Number of edges " << clf.getNumberOfEdges() << std::endl;
// // std::cout << "Notes size " << clf.getNotes().size() << std::endl;
// // for (auto note : clf.getNotes()) {
// // std::cout << note << std::endl;
// // }
// // std::cout << "Score " << score << std::endl;
// }

View File

@@ -8,6 +8,7 @@
#include <catch2/catch_test_macros.hpp> #include <catch2/catch_test_macros.hpp>
#include <catch2/catch_approx.hpp> #include <catch2/catch_approx.hpp>
#include <catch2/generators/catch_generators.hpp> #include <catch2/generators/catch_generators.hpp>
#include <catch2/matchers/catch_matchers.hpp>
#include "bayesnet/ensembles/BoostAODE.h" #include "bayesnet/ensembles/BoostAODE.h"
#include "TestUtils.h" #include "TestUtils.h"
@@ -17,7 +18,7 @@ TEST_CASE("Feature_select CFS", "[BoostAODE]")
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({ {"select_features", "CFS"} }); clf.setHyperparameters({ {"select_features", "CFS"} });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 90); REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 153); REQUIRE(clf.getNumberOfEdges() == 153);
REQUIRE(clf.getNotes().size() == 2); REQUIRE(clf.getNotes().size() == 2);
@@ -29,7 +30,7 @@ TEST_CASE("Feature_select IWSS", "[BoostAODE]")
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5 } }); clf.setHyperparameters({ {"select_features", "IWSS"}, {"threshold", 0.5 } });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 90); REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 153); REQUIRE(clf.getNumberOfEdges() == 153);
REQUIRE(clf.getNotes().size() == 2); REQUIRE(clf.getNotes().size() == 2);
@@ -41,11 +42,11 @@ TEST_CASE("Feature_select FCBF", "[BoostAODE]")
auto raw = RawDatasets("glass", true); auto raw = RawDatasets("glass", true);
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
clf.setHyperparameters({ {"select_features", "FCBF"}, {"threshold", 1e-7 } }); clf.setHyperparameters({ {"select_features", "FCBF"}, {"threshold", 1e-7 } });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 90); REQUIRE(clf.getNumberOfNodes() == 90);
REQUIRE(clf.getNumberOfEdges() == 153); REQUIRE(clf.getNumberOfEdges() == 153);
REQUIRE(clf.getNotes().size() == 2); REQUIRE(clf.getNotes().size() == 2);
REQUIRE(clf.getNotes()[0] == "Used features in initialization: 5 of 9 with FCBF"); REQUIRE(clf.getNotes()[0] == "Used features in initialization: 4 of 9 with FCBF");
REQUIRE(clf.getNotes()[1] == "Number of models: 9"); REQUIRE(clf.getNotes()[1] == "Number of models: 9");
} }
TEST_CASE("Test used features in train note and score", "[BoostAODE]") TEST_CASE("Test used features in train note and score", "[BoostAODE]")
@@ -57,7 +58,7 @@ TEST_CASE("Test used features in train note and score", "[BoostAODE]")
{"convergence", true}, {"convergence", true},
{"select_features","CFS"}, {"select_features","CFS"},
}); });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 72); REQUIRE(clf.getNumberOfNodes() == 72);
REQUIRE(clf.getNumberOfEdges() == 120); REQUIRE(clf.getNumberOfEdges() == 120);
REQUIRE(clf.getNotes().size() == 2); REQUIRE(clf.getNotes().size() == 2);
@@ -65,14 +66,14 @@ TEST_CASE("Test used features in train note and score", "[BoostAODE]")
REQUIRE(clf.getNotes()[1] == "Number of models: 8"); REQUIRE(clf.getNotes()[1] == "Number of models: 8");
auto score = clf.score(raw.Xv, raw.yv); auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt); auto scoret = clf.score(raw.Xt, raw.yt);
REQUIRE(score == Catch::Approx(0.80078).epsilon(raw.epsilon)); REQUIRE(score == Catch::Approx(0.809895813).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(0.80078).epsilon(raw.epsilon)); REQUIRE(scoret == Catch::Approx(0.809895813).epsilon(raw.epsilon));
} }
TEST_CASE("Voting vs proba", "[BoostAODE]") TEST_CASE("Voting vs proba", "[BoostAODE]")
{ {
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
auto clf = bayesnet::BoostAODE(false); auto clf = bayesnet::BoostAODE(false);
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto score_proba = clf.score(raw.Xv, raw.yv); auto score_proba = clf.score(raw.Xv, raw.yv);
auto pred_proba = clf.predict_proba(raw.Xv); auto pred_proba = clf.predict_proba(raw.Xv);
clf.setHyperparameters({ clf.setHyperparameters({
@@ -101,10 +102,10 @@ TEST_CASE("Order asc, desc & random", "[BoostAODE]")
{"maxTolerance", 1}, {"maxTolerance", 1},
{"convergence", false}, {"convergence", false},
}); });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states);
auto score = clf.score(raw.Xv, raw.yv); auto score = clf.score(raw.Xv, raw.yv);
auto scoret = clf.score(raw.Xt, raw.yt); auto scoret = clf.score(raw.Xt, raw.yt);
INFO("BoostAODE order: " + order); INFO("BoostAODE order: " << order);
REQUIRE(score == Catch::Approx(scores[order]).epsilon(raw.epsilon)); REQUIRE(score == Catch::Approx(scores[order]).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(scores[order]).epsilon(raw.epsilon)); REQUIRE(scoret == Catch::Approx(scores[order]).epsilon(raw.epsilon));
} }
@@ -120,7 +121,7 @@ TEST_CASE("Oddities", "[BoostAODE]")
{ { "maxTolerance", 5 } }, { { "maxTolerance", 5 } },
}; };
for (const auto& hyper : bad_hyper.items()) { for (const auto& hyper : bad_hyper.items()) {
INFO("BoostAODE hyper: " + hyper.value().dump()); INFO("BoostAODE hyper: " << hyper.value().dump());
REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument); REQUIRE_THROWS_AS(clf.setHyperparameters(hyper.value()), std::invalid_argument);
} }
REQUIRE_THROWS_AS(clf.setHyperparameters({ {"maxTolerance", 0 } }), std::invalid_argument); REQUIRE_THROWS_AS(clf.setHyperparameters({ {"maxTolerance", 0 } }), std::invalid_argument);
@@ -131,54 +132,82 @@ TEST_CASE("Oddities", "[BoostAODE]")
{ { "select_features","FCBF" }, { "threshold", 1.01 } }, { { "select_features","FCBF" }, { "threshold", 1.01 } },
}; };
for (const auto& hyper : bad_hyper_fit.items()) { for (const auto& hyper : bad_hyper_fit.items()) {
INFO("BoostAODE hyper: " + hyper.value().dump()); INFO("BoostAODE hyper: " << hyper.value().dump());
clf.setHyperparameters(hyper.value()); clf.setHyperparameters(hyper.value());
REQUIRE_THROWS_AS(clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv), std::invalid_argument); REQUIRE_THROWS_AS(clf.fit(raw.Xv, raw.yv, raw.features, raw.className, raw.states), std::invalid_argument);
} }
} }
TEST_CASE("Bisection", "[BoostAODE]") TEST_CASE("Bisection Best", "[BoostAODE]")
{ {
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("mfeat-factors", true); auto raw = RawDatasets("kdd_JapaneseVowels", true, 1200, true, false);
clf.setHyperparameters({ clf.setHyperparameters({
{"bisection", true}, {"bisection", true},
{"maxTolerance", 3}, {"maxTolerance", 3},
{"convergence", true}, {"convergence", true},
{"block_update", false}, {"block_update", false},
{"convergence_best", false},
}); });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 217); REQUIRE(clf.getNumberOfNodes() == 210);
REQUIRE(clf.getNumberOfEdges() == 431); REQUIRE(clf.getNumberOfEdges() == 378);
REQUIRE(clf.getNotes().size() == 3); REQUIRE(clf.getNotes().size() == 1);
REQUIRE(clf.getNotes()[0] == "Convergence threshold reached & 15 models eliminated"); REQUIRE(clf.getNotes().at(0) == "Number of models: 14");
REQUIRE(clf.getNotes()[1] == "Used features in train: 16 of 216"); auto score = clf.score(raw.X_test, raw.y_test);
REQUIRE(clf.getNotes()[2] == "Number of models: 1"); auto scoret = clf.score(raw.X_test, raw.y_test);
auto score = clf.score(raw.Xv, raw.yv); REQUIRE(score == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
auto scoret = clf.score(raw.Xt, raw.yt); REQUIRE(scoret == Catch::Approx(0.991666675f).epsilon(raw.epsilon));
REQUIRE(score == Catch::Approx(1.0f).epsilon(raw.epsilon)); }
REQUIRE(scoret == Catch::Approx(1.0f).epsilon(raw.epsilon)); TEST_CASE("Bisection Best vs Last", "[BoostAODE]")
{
auto raw = RawDatasets("kdd_JapaneseVowels", true, 1500, true, false);
auto clf = bayesnet::BoostAODE(true);
auto hyperparameters = nlohmann::json{
{"bisection", true},
{"maxTolerance", 3},
{"convergence", true},
{"convergence_best", true},
};
clf.setHyperparameters(hyperparameters);
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
auto score_best = clf.score(raw.X_test, raw.y_test);
REQUIRE(score_best == Catch::Approx(0.980000019f).epsilon(raw.epsilon));
// Now we will set the hyperparameter to use the last accuracy
hyperparameters["convergence_best"] = false;
clf.setHyperparameters(hyperparameters);
clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
auto score_last = clf.score(raw.X_test, raw.y_test);
REQUIRE(score_last == Catch::Approx(0.976666689f).epsilon(raw.epsilon));
} }
TEST_CASE("Block Update", "[BoostAODE]") TEST_CASE("Block Update", "[BoostAODE]")
{ {
auto clf = bayesnet::BoostAODE(); auto clf = bayesnet::BoostAODE();
auto raw = RawDatasets("mfeat-factors", true); auto raw = RawDatasets("mfeat-factors", true, 500);
clf.setHyperparameters({ clf.setHyperparameters({
{"bisection", true}, {"bisection", true},
{"block_update", true}, {"block_update", true},
{"maxTolerance", 3}, {"maxTolerance", 3},
{"convergence", true}, {"convergence", true},
}); });
clf.fit(raw.Xv, raw.yv, raw.featuresv, raw.classNamev, raw.statesv); clf.fit(raw.X_train, raw.y_train, raw.features, raw.className, raw.states);
REQUIRE(clf.getNumberOfNodes() == 217); REQUIRE(clf.getNumberOfNodes() == 868);
REQUIRE(clf.getNumberOfEdges() == 431); REQUIRE(clf.getNumberOfEdges() == 1724);
REQUIRE(clf.getNotes().size() == 3); REQUIRE(clf.getNotes().size() == 3);
REQUIRE(clf.getNotes()[0] == "Convergence threshold reached & 15 models eliminated"); REQUIRE(clf.getNotes()[0] == "Convergence threshold reached & 15 models eliminated");
REQUIRE(clf.getNotes()[1] == "Used features in train: 16 of 216"); REQUIRE(clf.getNotes()[1] == "Used features in train: 19 of 216");
REQUIRE(clf.getNotes()[2] == "Number of models: 1"); REQUIRE(clf.getNotes()[2] == "Number of models: 4");
auto score = clf.score(raw.Xv, raw.yv); auto score = clf.score(raw.X_test, raw.y_test);
auto scoret = clf.score(raw.Xt, raw.yt); auto scoret = clf.score(raw.X_test, raw.y_test);
REQUIRE(score == Catch::Approx(1.0f).epsilon(raw.epsilon)); REQUIRE(score == Catch::Approx(0.99f).epsilon(raw.epsilon));
REQUIRE(scoret == Catch::Approx(1.0f).epsilon(raw.epsilon)); REQUIRE(scoret == Catch::Approx(0.99f).epsilon(raw.epsilon));
//
// std::cout << "Number of nodes " << clf.getNumberOfNodes() << std::endl;
// std::cout << "Number of edges " << clf.getNumberOfEdges() << std::endl;
// std::cout << "Notes size " << clf.getNotes().size() << std::endl;
// for (auto note : clf.getNotes()) {
// std::cout << note << std::endl;
// }
// std::cout << "Score " << score << std::endl;
} }

View File

@@ -14,14 +14,15 @@
#include "bayesnet/feature_selection/IWSS.h" #include "bayesnet/feature_selection/IWSS.h"
#include "TestUtils.h" #include "TestUtils.h"
bayesnet::FeatureSelect* build_selector(RawDatasets& raw, std::string selector, double threshold) bayesnet::FeatureSelect* build_selector(RawDatasets& raw, std::string selector, double threshold, int max_features = 0)
{ {
max_features = max_features == 0 ? raw.features.size() : max_features;
if (selector == "CFS") { if (selector == "CFS") {
return new bayesnet::CFS(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights); return new bayesnet::CFS(raw.dataset, raw.features, raw.className, max_features, raw.classNumStates, raw.weights);
} else if (selector == "FCBF") { } else if (selector == "FCBF") {
return new bayesnet::FCBF(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, threshold); return new bayesnet::FCBF(raw.dataset, raw.features, raw.className, max_features, raw.classNumStates, raw.weights, threshold);
} else if (selector == "IWSS") { } else if (selector == "IWSS") {
return new bayesnet::IWSS(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, threshold); return new bayesnet::IWSS(raw.dataset, raw.features, raw.className, max_features, raw.classNumStates, raw.weights, threshold);
} }
return nullptr; return nullptr;
} }
@@ -80,10 +81,35 @@ TEST_CASE("Oddities", "[FeatureSelection]")
{ {
auto raw = RawDatasets("iris", true); auto raw = RawDatasets("iris", true);
// FCBF Limits // FCBF Limits
REQUIRE_THROWS_AS(bayesnet::FCBF(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, 1e-8), std::invalid_argument); REQUIRE_THROWS_AS(bayesnet::FCBF(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1e-8), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::FCBF(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, 1e-8), "Threshold cannot be less than 1e-7"); REQUIRE_THROWS_WITH(bayesnet::FCBF(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 1e-8), "Threshold cannot be less than 1e-7");
REQUIRE_THROWS_AS(bayesnet::IWSS(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, -1e4), std::invalid_argument); REQUIRE_THROWS_AS(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, -1e4), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::IWSS(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, -1e4), "Threshold has to be in [0, 0.5]"); REQUIRE_THROWS_WITH(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, -1e4), "Threshold has to be in [0, 0.5]");
REQUIRE_THROWS_AS(bayesnet::IWSS(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, 0.501), std::invalid_argument); REQUIRE_THROWS_AS(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 0.501), std::invalid_argument);
REQUIRE_THROWS_WITH(bayesnet::IWSS(raw.dataset, raw.featuresv, raw.classNamev, raw.featuresv.size(), raw.classNumStates, raw.weights, 0.501), "Threshold has to be in [0, 0.5]"); REQUIRE_THROWS_WITH(bayesnet::IWSS(raw.dataset, raw.features, raw.className, raw.features.size(), raw.classNumStates, raw.weights, 0.501), "Threshold has to be in [0, 0.5]");
// Not fitted error
auto selector = build_selector(raw, "CFS", 0);
const std::string message = "FeatureSelect not fitted";
REQUIRE_THROWS_AS(selector->getFeatures(), std::runtime_error);
REQUIRE_THROWS_AS(selector->getScores(), std::runtime_error);
REQUIRE_THROWS_WITH(selector->getFeatures(), message);
REQUIRE_THROWS_WITH(selector->getScores(), message);
delete selector;
}
TEST_CASE("Test threshold limits", "[FeatureSelection]")
{
auto raw = RawDatasets("diabetes", true);
// FCBF Limits
auto selector = build_selector(raw, "FCBF", 0.051);
selector->fit();
REQUIRE(selector->getFeatures().size() == 2);
delete selector;
selector = build_selector(raw, "FCBF", 1e-7, 3);
selector->fit();
REQUIRE(selector->getFeatures().size() == 3);
delete selector;
selector = build_selector(raw, "IWSS", 0.5, 5);
selector->fit();
REQUIRE(selector->getFeatures().size() == 5);
delete selector;
} }

View File

@@ -0,0 +1,43 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <catch2/catch_test_macros.hpp>
#include <catch2/matchers/catch_matchers.hpp>
#include <string>
#include <CPPFImdlp.h>
#include <folding.hpp>
#include <nlohmann/json.hpp>
#define TO_STR2(x) #x
#define TO_STR(x) TO_STR2(x)
#define JSON_VERSION (TO_STR(NLOHMANN_JSON_VERSION_MAJOR) "." TO_STR(NLOHMANN_JSON_VERSION_MINOR))
#include "TestUtils.h"
std::map<std::string, std::string> modules = {
{ "mdlp", "1.1.2" },
{ "Folding", "1.1.0" },
{ "json", "3.11" },
{ "ArffFiles", "1.0.0" }
};
TEST_CASE("MDLP", "[Modules]")
{
auto fimdlp = mdlp::CPPFImdlp();
REQUIRE(fimdlp.version() == modules["mdlp"]);
}
TEST_CASE("Folding", "[Modules]")
{
auto folding = folding::KFold(5, 200);
REQUIRE(folding.version() == modules["Folding"]);
}
TEST_CASE("NLOHMANN_JSON", "[Modules]")
{
REQUIRE(JSON_VERSION == modules["json"]);
}
TEST_CASE("ArffFiles", "[Modules]")
{
auto handler = ArffFiles();
REQUIRE(handler.version() == modules["ArffFiles"]);
}

View File

@@ -4,6 +4,7 @@
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// *************************************************************** // ***************************************************************
#include <random>
#include "TestUtils.h" #include "TestUtils.h"
#include "bayesnet/config.h" #include "bayesnet/config.h"
@@ -15,97 +16,110 @@ public:
} }
}; };
pair<std::vector<mdlp::labels_t>, map<std::string, int>> discretize(std::vector<mdlp::samples_t>& X, mdlp::labels_t& y, std::vector<std::string> features) class ShuffleArffFiles : public ArffFiles {
public:
ShuffleArffFiles(int num_samples = 0, bool shuffle = false) : ArffFiles(), num_samples(num_samples), shuffle(shuffle) {}
void load(const std::string& file_name, bool class_last = true)
{ {
std::vector<mdlp::labels_t> Xd; ArffFiles::load(file_name, class_last);
if (num_samples > 0) {
if (num_samples > getY().size()) {
throw std::invalid_argument("num_lines must be less than the number of lines in the file");
}
auto indices = std::vector<int>(num_samples);
std::iota(indices.begin(), indices.end(), 0);
if (shuffle) {
std::mt19937 g{ 173 };
std::shuffle(indices.begin(), indices.end(), g);
}
auto XX = std::vector<std::vector<float>>(attributes.size(), std::vector<float>(num_samples));
auto yy = std::vector<int>(num_samples);
for (int i = 0; i < num_samples; i++) {
yy[i] = getY()[indices[i]];
for (int j = 0; j < attributes.size(); j++) {
XX[j][i] = X[j][indices[i]];
}
}
X = XX;
y = yy;
}
}
private:
int num_samples;
bool shuffle;
};
RawDatasets::RawDatasets(const std::string& file_name, bool discretize_, int num_samples_, bool shuffle_, bool class_last, bool debug)
{
num_samples = num_samples_;
shuffle = shuffle_;
discretize = discretize_;
// Xt can be either discretized or not
// Xv is always discretized
loadDataset(file_name, class_last);
auto yresized = torch::transpose(yt.view({ yt.size(0), 1 }), 0, 1);
dataset = torch::cat({ Xt, yresized }, 0);
nSamples = dataset.size(1);
weights = torch::full({ nSamples }, 1.0 / nSamples, torch::kDouble);
weightsv = std::vector<double>(nSamples, 1.0 / nSamples);
classNumStates = discretize ? states.at(className).size() : 0;
auto fold = folding::StratifiedKFold(5, yt, 271);
auto [train, test] = fold.getFold(0);
auto train_t = torch::tensor(train);
auto test_t = torch::tensor(test);
// Get train and validation sets
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), train_t });
y_train = dataset.index({ -1, train_t });
X_test = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), test_t });
y_test = dataset.index({ -1, test_t });
if (debug)
std::cout << to_string();
}
map<std::string, int> RawDatasets::discretizeDataset(std::vector<mdlp::samples_t>& X)
{
map<std::string, int> maxes; map<std::string, int> maxes;
auto fimdlp = mdlp::CPPFImdlp(); auto fimdlp = mdlp::CPPFImdlp();
for (int i = 0; i < X.size(); i++) { for (int i = 0; i < X.size(); i++) {
fimdlp.fit(X[i], y); fimdlp.fit(X[i], yv);
mdlp::labels_t& xd = fimdlp.transform(X[i]); mdlp::labels_t& xd = fimdlp.transform(X[i]);
maxes[features[i]] = *max_element(xd.begin(), xd.end()) + 1; maxes[features[i]] = *max_element(xd.begin(), xd.end()) + 1;
Xd.push_back(xd); Xv.push_back(xd);
} }
return { Xd, maxes }; return maxes;
} }
std::vector<mdlp::labels_t> discretizeDataset(std::vector<mdlp::samples_t>& X, mdlp::labels_t& y) void RawDatasets::loadDataset(const std::string& name, bool class_last)
{ {
std::vector<mdlp::labels_t> Xd; auto handler = ShuffleArffFiles(num_samples, shuffle);
auto fimdlp = mdlp::CPPFImdlp();
for (int i = 0; i < X.size(); i++) {
fimdlp.fit(X[i], y);
mdlp::labels_t& xd = fimdlp.transform(X[i]);
Xd.push_back(xd);
}
return Xd;
}
bool file_exists(const std::string& name)
{
if (FILE* file = fopen(name.c_str(), "r")) {
fclose(file);
return true;
} else {
return false;
}
}
tuple<torch::Tensor, torch::Tensor, std::vector<std::string>, std::string, map<std::string, std::vector<int>>> loadDataset(const std::string& name, bool class_last, bool discretize_dataset)
{
auto handler = ArffFiles();
handler.load(Paths::datasets() + static_cast<std::string>(name) + ".arff", class_last); handler.load(Paths::datasets() + static_cast<std::string>(name) + ".arff", class_last);
// Get Dataset X, y // Get Dataset X, y
std::vector<mdlp::samples_t>& X = handler.getX(); std::vector<mdlp::samples_t>& X = handler.getX();
mdlp::labels_t& y = handler.getY(); yv = handler.getY();
// Get className & Features // Get className & Features
auto className = handler.getClassName(); className = handler.getClassName();
std::vector<std::string> features;
auto attributes = handler.getAttributes();
transform(attributes.begin(), attributes.end(), back_inserter(features), [](const auto& pair) { return pair.first; });
torch::Tensor Xd;
auto states = map<std::string, std::vector<int>>();
if (discretize_dataset) {
auto Xr = discretizeDataset(X, y);
Xd = torch::zeros({ static_cast<int>(Xr.size()), static_cast<int>(Xr[0].size()) }, torch::kInt32);
for (int i = 0; i < features.size(); ++i) {
states[features[i]] = std::vector<int>(*max_element(Xr[i].begin(), Xr[i].end()) + 1);
auto item = states.at(features[i]);
iota(begin(item), end(item), 0);
Xd.index_put_({ i, "..." }, torch::tensor(Xr[i], torch::kInt32));
}
states[className] = std::vector<int>(*max_element(y.begin(), y.end()) + 1);
iota(begin(states.at(className)), end(states.at(className)), 0);
} else {
Xd = torch::zeros({ static_cast<int>(X.size()), static_cast<int>(X[0].size()) }, torch::kFloat32);
for (int i = 0; i < features.size(); ++i) {
Xd.index_put_({ i, "..." }, torch::tensor(X[i]));
}
}
return { Xd, torch::tensor(y, torch::kInt32), features, className, states };
}
tuple<std::vector<std::vector<int>>, std::vector<int>, std::vector<std::string>, std::string, map<std::string, std::vector<int>>> loadFile(const std::string& name)
{
auto handler = ArffFiles();
handler.load(Paths::datasets() + static_cast<std::string>(name) + ".arff");
// Get Dataset X, y
std::vector<mdlp::samples_t>& X = handler.getX();
mdlp::labels_t& y = handler.getY();
// Get className & Features
auto className = handler.getClassName();
std::vector<std::string> features;
auto attributes = handler.getAttributes(); auto attributes = handler.getAttributes();
transform(attributes.begin(), attributes.end(), back_inserter(features), [](const auto& pair) { return pair.first; }); transform(attributes.begin(), attributes.end(), back_inserter(features), [](const auto& pair) { return pair.first; });
// Discretize Dataset // Discretize Dataset
std::vector<mdlp::labels_t> Xd; auto maxValues = discretizeDataset(X);
map<std::string, int> maxes; maxValues[className] = *max_element(yv.begin(), yv.end()) + 1;
tie(Xd, maxes) = discretize(X, y, features); if (discretize) {
maxes[className] = *max_element(y.begin(), y.end()) + 1; // discretize the tensor as well
map<std::string, std::vector<int>> states; Xt = torch::zeros({ static_cast<int>(Xv.size()), static_cast<int>(Xv[0].size()) }, torch::kInt32);
for (auto feature : features) { for (int i = 0; i < features.size(); ++i) {
states[feature] = std::vector<int>(maxes[feature]); states[features[i]] = std::vector<int>(maxValues[features[i]]);
iota(begin(states.at(features[i])), end(states.at(features[i])), 0);
Xt.index_put_({ i, "..." }, torch::tensor(Xv[i], torch::kInt32));
} }
states[className] = std::vector<int>(maxes[className]); states[className] = std::vector<int>(maxValues[className]);
return { Xd, y, features, className, states }; iota(begin(states.at(className)), end(states.at(className)), 0);
} else {
Xt = torch::zeros({ static_cast<int>(X.size()), static_cast<int>(X[0].size()) }, torch::kFloat32);
for (int i = 0; i < features.size(); ++i) {
Xt.index_put_({ i, "..." }, torch::tensor(X[i]));
} }
}
yt = torch::tensor(yv, torch::kInt32);
}

View File

@@ -11,39 +11,60 @@
#include <vector> #include <vector>
#include <map> #include <map>
#include <tuple> #include <tuple>
#include <ArffFiles.h> #include <ArffFiles.hpp>
#include <CPPFImdlp.h> #include <CPPFImdlp.h>
#include <folding.hpp>
bool file_exists(const std::string& name);
std::pair<vector<mdlp::labels_t>, map<std::string, int>> discretize(std::vector<mdlp::samples_t>& X, mdlp::labels_t& y, std::vector<string> features);
std::vector<mdlp::labels_t> discretizeDataset(std::vector<mdlp::samples_t>& X, mdlp::labels_t& y);
std::tuple<vector<vector<int>>, std::vector<int>, std::vector<string>, std::string, map<std::string, std::vector<int>>> loadFile(const std::string& name);
std::tuple<torch::Tensor, torch::Tensor, std::vector<string>, std::string, map<std::string, std::vector<int>>> loadDataset(const std::string& name, bool class_last, bool discretize_dataset);
class RawDatasets { class RawDatasets {
public: public:
RawDatasets(const std::string& file_name, bool discretize) RawDatasets(const std::string& file_name, bool discretize_, int num_samples_ = 0, bool shuffle_ = false, bool class_last = true, bool debug = false);
{
// Xt can be either discretized or not
tie(Xt, yt, featurest, classNamet, statest) = loadDataset(file_name, true, discretize);
// Xv is always discretized
tie(Xv, yv, featuresv, classNamev, statesv) = loadFile(file_name);
auto yresized = torch::transpose(yt.view({ yt.size(0), 1 }), 0, 1);
dataset = torch::cat({ Xt, yresized }, 0);
nSamples = dataset.size(1);
weights = torch::full({ nSamples }, 1.0 / nSamples, torch::kDouble);
weightsv = std::vector<double>(nSamples, 1.0 / nSamples);
classNumStates = discretize ? statest.at(classNamet).size() : 0;
}
torch::Tensor Xt, yt, dataset, weights; torch::Tensor Xt, yt, dataset, weights;
torch::Tensor X_train, y_train, X_test, y_test;
std::vector<vector<int>> Xv; std::vector<vector<int>> Xv;
std::vector<double> weightsv;
std::vector<int> yv; std::vector<int> yv;
std::vector<string> featurest, featuresv; std::vector<double> weightsv;
map<std::string, std::vector<int>> statest, statesv; std::vector<string> features;
std::string classNamet, classNamev; std::string className;
map<std::string, std::vector<int>> states;
int nSamples, classNumStates; int nSamples, classNumStates;
double epsilon = 1e-5; double epsilon = 1e-5;
bool discretize;
int num_samples = 0;
bool shuffle = false;
private:
std::string to_string()
{
std::string features_ = "";
for (auto& f : features) {
features_ += f + " ";
}
std::string states_ = "";
for (auto& s : states) {
states_ += s.first + " ";
for (auto& v : s.second) {
states_ += std::to_string(v) + " ";
}
states_ += "\n";
}
return "Xt dimensions: " + std::to_string(Xt.size(0)) + " " + std::to_string(Xt.size(1)) + "\n"
"Xv dimensions: " + std::to_string(Xv.size()) + " " + std::to_string(Xv[0].size()) + "\n"
+ "yt dimensions: " + std::to_string(yt.size(0)) + "\n"
+ "yv dimensions: " + std::to_string(yv.size()) + "\n"
+ "X_train dimensions: " + std::to_string(X_train.size(0)) + " " + std::to_string(X_train.size(1)) + "\n"
+ "X_test dimensions: " + std::to_string(X_test.size(0)) + " " + std::to_string(X_test.size(1)) + "\n"
+ "y_train dimensions: " + std::to_string(y_train.size(0)) + "\n"
+ "y_test dimensions: " + std::to_string(y_test.size(0)) + "\n"
+ "features: " + std::to_string(features.size()) + "\n"
+ features_ + "\n"
+ "className: " + className + "\n"
+ "states: " + std::to_string(states.size()) + "\n"
+ "nSamples: " + std::to_string(nSamples) + "\n"
+ "classNumStates: " + std::to_string(classNumStates) + "\n"
+ "states: " + states_ + "\n";
}
map<std::string, int> discretizeDataset(std::vector<mdlp::samples_t>& X);
void loadDataset(const std::string& name, bool class_last);
}; };
#endif //TEST_UTILS_H #endif //TEST_UTILS_H

41
tests/Timer.h Normal file
View File

@@ -0,0 +1,41 @@
#pragma once
#include <chrono>
#include <string>
#include <sstream>
namespace platform {
class Timer {
private:
std::chrono::high_resolution_clock::time_point begin;
std::chrono::high_resolution_clock::time_point end;
public:
Timer() = default;
~Timer() = default;
void start() { begin = std::chrono::high_resolution_clock::now(); }
void stop() { end = std::chrono::high_resolution_clock::now(); }
double getDuration()
{
stop();
std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double >> (end - begin);
return time_span.count();
}
double getLapse()
{
std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double >> (std::chrono::high_resolution_clock::now() - begin);
return time_span.count();
}
std::string getDurationString(bool lapse = false)
{
double duration = lapse ? getLapse() : getDuration();
return translate2String(duration);
}
std::string translate2String(double duration)
{
double durationShow = duration > 3600 ? duration / 3600 : duration > 60 ? duration / 60 : duration;
std::string durationUnit = duration > 3600 ? "h" : duration > 60 ? "m" : "s";
std::stringstream ss;
ss << std::setprecision(2) << std::fixed << durationShow << " " << durationUnit;
return ss.str();
}
};
} /* namespace platform */

File diff suppressed because it is too large Load Diff

1
tests/lib/Files Submodule

Submodule tests/lib/Files added at 40ac38011a

1
tests/lib/catch2 Submodule

Submodule tests/lib/catch2 added at 4e8d92bf02

View File

@@ -11,24 +11,27 @@ readme_file = "README.md"
print("Updating coverage...") print("Updating coverage...")
# Generate badge line # Generate badge line
output = subprocess.check_output( output = subprocess.check_output(
"lcov --summary " + sys.argv[1] + "/coverage.info|cut -d' ' -f4 |head -2|" "lcov --summary " + sys.argv[1] + "/coverage.info",
"tail -1",
shell=True, shell=True,
) )
value = float(output.decode("utf-8").strip().replace("%", "")) value = output.decode("utf-8").strip()
if value < 90: percentage = 0
for line in value.splitlines():
if "lines" in line:
percentage = float(line.split(":")[1].split("%")[0])
break
print(f"Coverage: {percentage}%")
if percentage < 90:
print("⛔Coverage is less than 90%. I won't update the badge.") print("⛔Coverage is less than 90%. I won't update the badge.")
sys.exit(1) sys.exit(1)
percentage = output.decode("utf-8").strip().replace(".", ",") percentage_label = str(percentage).replace('.', ',')
coverage_line = ( coverage_line = f"[![Coverage Badge](https://img.shields.io/badge/Coverage-{percentage_label}%25-green)](html/index.html)"
f"![Static Badge](https://img.shields.io/badge/Coverage-{percentage}25-green)"
)
# Update README.md # Update README.md
with open(readme_file, "r") as f: with open(readme_file, "r") as f:
lines = f.readlines() lines = f.readlines()
with open(readme_file, "w") as f: with open(readme_file, "w") as f:
for line in lines: for line in lines:
if "Coverage" in line: if "img.shields.io/badge/Coverage" in line:
f.write(coverage_line + "\n") f.write(coverage_line + "\n")
else: else:
f.write(line) f.write(line)