Compare commits

...

165 Commits

Author SHA1 Message Date
b90e558238
Hyperparameter *maxTolerance* in the BoostAODE class is now in [1, 6] range (it was in [1, 4] range before) 2025-01-23 00:56:18 +01:00
64970cf7f7 Merge pull request 'alphablock' (#32) from alphablock into main
Reviewed-on: #32
Added

- Add a new hyperparameter to the BoostAODE class, alphablock, to control the way α is computed, with the last model or with the ensmble built so far. Default value is false.
- Add a new hyperparameter to the SPODE class, parent, to set the root node of the model. If no value is set the root parameter of the constructor is used.
- Add a new hyperparameter to the TAN class, parent, to set the root node of the model. If not set the first feature is used as root.
2025-01-22 11:48:09 +00:00
b571a4da4d
Fix typo in CHANGELOG 2025-01-22 12:43:40 +01:00
8a9f329ff9
Remove typo in README 2024-12-18 14:29:12 +01:00
e2781ee525
Add parent hyperparameter to TAN & SPODE 2024-12-17 10:14:14 +01:00
56a2d3ead0
remove uneeded submodule 2024-12-14 20:27:07 +01:00
dc32a0fc47
Fix tests & update dependencies versions 2024-12-14 14:32:51 +01:00
3d6b4f0614
Implement the functionality of the hyperparameter alpha_block with test 2024-12-14 14:02:45 +01:00
18844c7da7
Add hyperparameter to ChangeLog and Boost class 2024-12-14 14:02:10 +01:00
43ceefd2c9
Fix comment in AODELd 2024-12-10 13:35:23 +01:00
e6501502d1
Update docs and help 2024-11-23 20:28:16 +01:00
d84adf6172
Add model to changelog 2024-11-23 19:13:54 +01:00
268a86cbe0
Actualiza Changelog 2024-11-23 19:11:00 +01:00
fc4c93b299
Fix Mst test 2024-11-23 19:07:35 +01:00
86f2bc44fc libmdlp (#31)
Add mdlp as library in lib/
Fix tests to reach 99.1% of coverage

Reviewed-on: #31
2024-11-23 17:22:41 +00:00
f0f3d9ad6e
Fix CUDA and mdlp library issues 2024-11-20 21:02:56 +01:00
9a323cd7a3
Remove mdlp submodule 2024-11-20 20:15:49 +01:00
cb949ac7e5
Update dependecies versions 2024-09-29 13:17:44 +02:00
2c297ea15d
Control optional doxygen dependency 2024-09-29 12:48:15 +02:00
4e4b6e67f4
Add env parallel variable to Makefile 2024-09-18 11:05:19 +02:00
82847774ee
Update Dockerfile 2024-09-13 09:42:06 +02:00
d0955d9369 Merge pull request 'smoothing' (#30) from smoothing into main
Reviewed-on: #30
2024-09-12 20:28:33 +00:00
2d34eb8c89
Update Makefile to get parallel info from env 2024-08-31 12:43:39 +02:00
0159c397fa
Update optimization flag in CMakeLists 2024-07-11 12:29:57 +02:00
0bbc8328a9
Change cpt table type to float 2024-07-08 13:27:55 +02:00
35ca862eca
Don't allow add node nor add edge on fitted networks 2024-07-07 21:06:59 +02:00
26eb58b104
Forbids to insert the same edge twice 2024-07-04 18:52:41 +02:00
6fcc15d39a
Upgrade mdlp library 2024-06-24 12:38:44 +02:00
9a14133be5
Add thread control to vectors predict 2024-06-23 13:02:40 +02:00
59c1cf5b3b
Fix number of threads spawned 2024-06-21 19:56:35 +02:00
8e9090d283
Fix tests 2024-06-21 13:58:42 +02:00
02bcab01be
Refactor CountingSemaphore as singleton 2024-06-21 09:30:24 +02:00
716748e18c
Add Counting Semaphore class
Fix threading in Network
2024-06-20 10:36:09 +02:00
0b31780d39
Add Thread max spawning to Network 2024-06-18 23:18:24 +02:00
fa26aa80f7
Rename OLD_LAPLACE to ORIGINAL 2024-06-13 15:04:15 +02:00
3eb61905fb
Upgrade ArffFiles Module version 2024-06-13 12:33:54 +02:00
ca0ae4dacf
Refactor Cestnik smoothin factor assuming m=1 2024-06-13 09:11:47 +02:00
b34869cc61
Set smoothing as fit parameter 2024-06-11 11:40:45 +02:00
27a3e5a5e0
Implement 3 types of smoothing 2024-06-10 15:49:01 +02:00
684443a788
Implement Cestnik & Laplace smoothing 2024-06-09 17:19:38 +02:00
6d9badc33b Merge pull request 'BoostA2DE' (#29) from BoostA2DE into main
Reviewed-on: #29
2024-06-09 10:02:47 +00:00
015b1b0c0f
Fix diagram size in manual 2024-05-28 11:43:39 +02:00
7bb8e4df01
Fix back to manual link 2024-05-23 18:59:08 +00:00
53710378de
Fix manual generation and deploy 2024-05-23 17:34:48 +00:00
c833e9ba32
Remove coverage report from html folder and integrate in doc 2024-05-23 16:27:02 +02:00
f5cb46ee29
Add doc-install to Makefile 2024-05-22 12:09:58 +02:00
fa35681abe
Add documentation link to readme 2024-05-22 11:39:33 +02:00
b0bd0e6eee
Create doc target to build documentation 2024-05-22 11:10:21 +02:00
d43be27821
Remove manual and doc pages 2024-05-22 10:17:49 +02:00
a2853dd2e5
Add Doxygen to generate man and manual pages 2024-05-21 23:38:10 +02:00
0341bd5648
Refactor ArffFiles library as a git submodule only for tests 2024-05-21 11:50:19 +00:00
22b742f068
Convert ArffFile library to header only library 2024-05-21 10:11:33 +02:00
2584e8294d
Force mutual information methods to be at least 0
There were cases where a tiny negative number was returned (less than -1e-7)
Fix mst glass test that is affected with this change
2024-05-17 11:15:45 +02:00
291ba0fb0e
First functional BoostA2DE with its 1st test 2024-05-16 16:33:33 +02:00
80043d5181
First approach to BoostA2DE::trainModel 2024-05-16 14:32:59 +02:00
677ec5613d
Add features used to selectKPairs 2024-05-16 14:18:45 +02:00
cccaa6e0af
Complete selectKPairs method & test 2024-05-16 13:46:38 +02:00
2e3e0e0fc2
Add selectKParis method 2024-05-16 11:17:21 +02:00
8784a24898
Extract buildModel method to parent class in Boost 2024-05-15 20:00:44 +02:00
54496c68f1
Create Boost class as Boost<x> classifiers parent 2024-05-15 19:49:15 +02:00
1f236a70db
Create BoostA2DE base class 2024-05-15 11:53:17 +02:00
ef3c74633c
Conditional Entropy test 2024-05-15 11:28:09 +02:00
7efd95095c Merge pull request 'AnDE' (#28) from AnDE into main
Reviewed-on: #28
2024-05-15 09:16:12 +00:00
0e24135d46
Complete Conditional Mutual Information and test 2024-05-15 11:09:23 +02:00
521bfd2a8e
Remove unoptimized implementation of conditionalEntropy 2024-05-15 01:24:27 +02:00
e2e0fb0c40
Implement Conditional Mutual Information 2024-05-15 00:48:02 +02:00
56b62a67cc
Change BoostAODE tests results because folding upgrade 2024-05-12 20:23:05 +02:00
c0fc107abb
Fix catch2 submodule config 2024-05-12 19:05:36 +02:00
d8c44b3b7c
Add tests to check the correct version of the mdlp, folding and json libraries 2024-05-12 12:22:44 +02:00
6ab7cd2cbd
Remove submodule catch from tests/lib 2024-05-12 11:05:53 +02:00
b578ea8a2d
Remove module lib/catch2 2024-05-12 11:04:42 +02:00
9a752d15dc
Change build cmake folder names to Debug & Release 2024-05-09 10:51:52 +02:00
4992685e94
Add devcontainer to repository
Fix update_coverage.py with lcov2.1 output
2024-05-08 06:42:19 +00:00
346b693c79
Update pdf coverage report 2024-05-06 18:28:15 +02:00
164c8bd90c
Update changelog 2024-05-06 18:02:18 +02:00
ced29a2c2e
Refactor coverage report generation
Add some tests to reach 99%
2024-05-06 17:56:00 +02:00
0ec53f405f
Fix mistakes in feature selection in SPnDE
Complete the first A2DE test
Update version number
2024-05-05 11:14:01 +02:00
f806015b29
Implement SPnDE and A2DE 2024-05-05 01:35:17 +02:00
8115f25c06
Fix mispell mistake in doc 2024-05-02 10:53:15 +02:00
618a1e539c
Return File Library to /lib as it is needed by Local Discretization (factorize) 2024-04-30 20:31:14 +02:00
7aeffba740
Add list of models to README 2024-04-30 18:59:38 +02:00
e79ea63afb Merge pull request 'convergence_best' (#27) from convergence_best into main
Add convergence_best as hyperparameter to allow to take the last or the best accuracy as the accuracy to compare to in convergence

Reviewed-on: #27
2024-04-30 16:22:08 +00:00
3c7382a93a
Enhance tests coverage and report output 2024-04-30 14:00:24 +02:00
b4a222b100
Update gcovr configuration 2024-04-30 12:06:32 +02:00
23ef0cc5f7
Remove catch2 as submodule
Add link to pdf coverage report
2024-04-30 11:02:23 +02:00
793b2d3cd5
Refactor TestUtils to allow partial and shuffle dataset load 2024-04-30 02:11:14 +02:00
ae469b8146
Add hyperparameter convergence_best
move test libraries to test folder
2024-04-30 00:52:09 +02:00
f014928411
Update Makefile actions for coverage 2024-04-21 18:54:13 +02:00
c4b563a339
Add link to the coverage report in the README.md coverage label 2024-04-21 16:44:35 +02:00
49bb0582e6
Add Library Logo 2024-04-21 11:31:27 +02:00
b4c5261e01 Delete .github/workflows/main.yml 2024-04-20 17:54:56 +00:00
b956aa3873
Upgrade version number to 1.0.5
Fix dependency graph
Remove loguru library
2024-04-20 18:00:40 +02:00
1f06631f69
Add check dependencies in make diagrams endpoint 2024-04-19 19:47:37 +02:00
6dd589bd61
Add diagram changes to CHANGELOG 2024-04-19 18:29:43 +02:00
6475f10825
Add class and dependency diagrams 2024-04-19 14:33:00 +02:00
7d906b24d1 Merge pull request 'block_update' (#26) from block_update into main
Reviewed-on: #26
2024-04-15 10:26:50 +00:00
464fe029ea
Add dump_cpt classifier test 2024-04-11 18:16:06 +02:00
09a1369122
Add copyright header to source files 2024-04-11 18:02:49 +02:00
503ad687dc
Add some more tests to 97% coverage 2024-04-11 17:29:46 +02:00
8eeaa1beee
Update changelog with the latest changes 2024-04-11 00:35:55 +02:00
a2de1c9522
Implement block update algorithm fix in BoostAODE 2024-04-11 00:02:43 +02:00
cf9b5716ac
block_update and install in local folder 2024-04-10 00:55:36 +02:00
1326891d6a
Fix previous tests of BoostAODE
Due to the change of default values for hyperparameters in BoostAODE
2024-04-09 00:13:45 +02:00
da2a969686
Create hyperparameter block_update 2024-04-08 23:36:05 +02:00
f9553a38d7
Fix BoostAODE.md doc 2024-04-08 22:45:32 +02:00
8b6121eaf2
Update readme and boostAODE docs 2024-04-08 22:41:23 +02:00
fbbed8ad68
Make some boostAODE tests 2024-04-08 22:30:55 +02:00
a1178554ff
Add Ensemble tests 2024-04-08 19:09:51 +02:00
d12a779bd9 Merge pull request 'bisection proposal' (#24) from bisection into main
Reviewed-on: #24
2024-04-08 14:29:25 +00:00
a8fc29e2b2
Create coverage badge 2024-04-08 11:24:25 +02:00
50543e7929
Add tests for Classifier class 2024-04-08 01:25:14 +02:00
9014649a0d
Refactor hyperparameters classifier management 2024-04-08 00:55:30 +02:00
0d6a081d01
Add tests to reach 90% coverage 2024-04-08 00:13:59 +02:00
46cb8d30eb
Add codacy code quality badge 2024-04-07 12:35:21 +02:00
cb26ef2562
Add some tests and code quality badge 2024-04-07 02:08:37 +02:00
df45fddd45
Update folding library and test result due to change in random engine 2024-04-05 19:17:53 +02:00
a1f9086780
Fix CFS mistake 2024-04-02 22:53:00 +02:00
e55365c41c
Update test Models 2024-04-02 17:56:23 +02:00
de23303801
Refactor tests and add FeatureSelection tests 2024-04-02 17:38:48 +02:00
56b5158ff3
Update BoostAODE class structure 2024-04-02 09:52:40 +02:00
a5a29eb66f
Update compiler configuration for Mac 2024-04-02 09:48:03 +02:00
d5eba5710a
Update pseudocode 2024-04-01 18:37:51 +02:00
8c61840d81
Update tests 2024-04-01 11:51:29 +02:00
bc0b938cfc
Remove dataset clone in BoostAODE 2024-03-21 19:35:08 +01:00
58d5a35a35
Update log output size type 2024-03-21 19:24:51 +01:00
45c048f635
Add initial models to log 2024-03-21 11:23:41 +01:00
6e854dfda3
Fix metrics error in BoostAODE Convergence
Update algorithm
2024-03-20 23:33:02 +01:00
5826702fc7
Remove weights backup 2024-03-20 12:01:57 +01:00
42e2be3263
Implement algorithm and add logging 2024-03-20 11:30:02 +01:00
827b0dd893
Add optimization flags to release 2024-03-19 17:24:21 +01:00
882d905a28
First approach to bisection 2024-03-19 14:13:40 +01:00
422129802a
Remove predict_single max_models 2024-03-19 11:35:43 +01:00
eb97a5a14b
Remove repeatSparent hyperparameter 2024-03-19 09:42:03 +01:00
eb72f13bf0
Make predict_voting default value false in AODE 2024-03-12 00:27:50 +01:00
5db168d87b
Make predict_voting default value false in BoostAODE 2024-03-12 00:26:28 +01:00
8f3bb47cfd
Fix Initialize worse_model_count if model accuracy is better in BoostAODE 2024-03-11 22:33:50 +01:00
1986d05c34
Initialize worse_model_count if model accuracy is better in BoostAODE 2024-03-11 21:30:01 +01:00
7c98ba9bea
Update License & Readme 2024-03-11 10:57:27 +01:00
538af0253b
Fix test & sample issue 2024-03-09 12:45:03 +01:00
0b65e34772
Fix config.h location problem 2024-03-09 12:27:05 +01:00
635ef22520
Refactor library structure 2024-03-08 22:20:54 +01:00
1231f4522a Merge pull request 'Create installation process' (#23) from install_lib into main
Reviewed-on: #23
2024-03-08 11:51:59 +00:00
cc34f79b91
Update changelog and readme 2024-03-08 09:02:22 +01:00
6899033806
Change include of library headers 2024-03-08 01:13:30 +01:00
8e2d05e663
Refactor sample to be out of main CMakeLists 2024-03-08 01:09:39 +01:00
eba2095718
Create installation process 2024-03-08 00:37:36 +01:00
199ffc95d2
Update dates on changelog 2024-03-06 23:42:14 +01:00
cbe15e317d
Fix FCBF in select_features 2024-03-06 18:24:27 +01:00
debd890519
Update version number in tests 2024-03-06 17:22:45 +01:00
46e929ff4d Merge pull request 'predict_single' (#22) from predict_single into main
Reviewed-on: #22

close #19
2024-03-06 16:16:15 +00:00
d858e26e4b
Update version number and Changelog 2024-03-06 17:04:16 +01:00
0ee3eaed53
Update select features models significance 2024-03-05 12:10:58 +01:00
093c197f0a
Replace constant strings in BoostAODE 2024-03-05 11:05:11 +01:00
78d7ea7c4d
Add predict_single proposal detailed info 2024-03-03 22:56:01 +01:00
d6af1ffe8e
Update gcovr config and fix some warnings 2024-02-28 11:51:37 +01:00
20669dd161
Translate BoostAODE.md to English 2024-02-27 20:29:01 +01:00
272dbad4f3
Update README and docs 2024-02-27 17:16:26 +01:00
8bccc3e4bc
Update boostaode algorithm explain 2024-02-27 14:24:58 +01:00
903b143338
Refactor library structure and add sample 2024-02-27 13:06:13 +01:00
f10d0daf2e
Update test 2024-02-27 10:16:20 +01:00
d39a17089e
Begin implementing predict_single hyperparameter in BoostAODE 2024-02-26 20:29:08 +01:00
2e325cd114 Merge pull request 'change boostaode ascending hyperparameter to order {asc,desc,rand}' (#21) from baode_random into main
Reviewed-on: #21

This PR closes #18
2024-02-26 16:28:48 +00:00
fc3d63b7db
change boostaode ascending hyperparameter to order {asc,desc,rand} 2024-02-26 17:07:57 +01:00
43dc79a345
Update version number in ChangeLog 2024-02-25 18:07:50 +01:00
b8589bcd0a Merge pull request 'Add the probabilities aggregation method to compute prediction with ensembles' (#16) from baode_proba into main
Reviewed-on: #16

As only the voting method was implemented, this approach computes the classifiers prediction using a weighted average of the probabilities computed by each model.
Added the predict_proba methods to BaseClassifier - Classifier and Ensemble classes.
Add a hyperparameter to decide the type of computation for ensembles voting - probability aggregation
2024-02-25 11:26:26 +00:00
167 changed files with 41594 additions and 1275 deletions

View File

@ -5,11 +5,12 @@ Checks: '-*,
cppcoreguidelines-*, cppcoreguidelines-*,
modernize-*, modernize-*,
performance-*, performance-*,
-modernize-use-nodiscard,
-cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-pro-type-vararg,
-modernize-use-trailing-return-type, -modernize-use-trailing-return-type,
-bugprone-exception-escape' -bugprone-exception-escape'
HeaderFilterRegex: 'src/*' HeaderFilterRegex: 'bayesnet/*'
AnalyzeTemporaryDtors: false AnalyzeTemporaryDtors: false
WarningsAsErrors: '' WarningsAsErrors: ''
FormatStyle: file FormatStyle: file

39
.clang-uml Normal file
View File

@ -0,0 +1,39 @@
compilation_database_dir: build_Debug
output_directory: diagrams
diagrams:
BayesNet:
type: class
glob:
- bayesnet/*.h
- bayesnet/classifiers/*.h
- bayesnet/classifiers/*.cc
- bayesnet/ensembles/*.h
- bayesnet/ensembles/*.cc
- bayesnet/feature_selection/*.h
- bayesnet/feature_selection/*.cc
- bayesnet/network/*.h
- bayesnet/network/*.cc
- bayesnet/utils/*.h
- bayesnet/utils/*.cc
include:
# Only include entities from the following namespaces
namespaces:
- bayesnet
exclude:
access:
- private
plantuml:
style:
# Apply this style to all classes in the diagram
class: "#aliceblue;line:blue;line.dotted;text:blue"
# Apply this style to all packages in the diagram
package: "#back:grey"
# Make all template instantiation relations point upwards and draw them
# as green and dotted lines
instantiation: "up[#green,dotted]"
cmd: "/usr/bin/plantuml -tsvg \"diagrams/{}.puml\""
before:
- 'title clang-uml class diagram model'
mermaid:
before:
- 'classDiagram'

57
.devcontainer/Dockerfile Normal file
View File

@ -0,0 +1,57 @@
FROM mcr.microsoft.com/devcontainers/cpp:ubuntu22.04
ARG REINSTALL_CMAKE_VERSION_FROM_SOURCE="3.29.3"
# Optionally install the cmake for vcpkg
COPY ./reinstall-cmake.sh /tmp/
RUN if [ "${REINSTALL_CMAKE_VERSION_FROM_SOURCE}" != "none" ]; then \
chmod +x /tmp/reinstall-cmake.sh && /tmp/reinstall-cmake.sh ${REINSTALL_CMAKE_VERSION_FROM_SOURCE}; \
fi \
&& rm -f /tmp/reinstall-cmake.sh
# [Optional] Uncomment this section to install additional vcpkg ports.
# RUN su vscode -c "${VCPKG_ROOT}/vcpkg install <your-port-name-here>"
# [Optional] Uncomment this section to install additional packages.
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends wget software-properties-common libdatetime-perl libcapture-tiny-perl libdatetime-format-dateparse-perl libgd-perl
# Add PPA for GCC 13
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update
# Install GCC 13.1
RUN apt-get install -y gcc-13 g++-13 doxygen
# Install lcov 2.1
RUN wget --quiet https://github.com/linux-test-project/lcov/releases/download/v2.1/lcov-2.1.tar.gz && \
tar -xvf lcov-2.1.tar.gz && \
cd lcov-2.1 && \
make install
RUN rm lcov-2.1.tar.gz
RUN rm -fr lcov-2.1
# Install Miniconda
RUN mkdir -p /opt/conda
RUN wget --quiet "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh" -O /opt/conda/miniconda.sh && \
bash /opt/conda/miniconda.sh -b -p /opt/miniconda
# Add conda to PATH
ENV PATH=/opt/miniconda/bin:$PATH
# add CXX and CC to the environment with gcc 13
ENV CXX=/usr/bin/g++-13
ENV CC=/usr/bin/gcc-13
# link the last gcov version
RUN rm /usr/bin/gcov
RUN ln -s /usr/bin/gcov-13 /usr/bin/gcov
# change ownership of /opt/miniconda to vscode user
RUN chown -R vscode:vscode /opt/miniconda
USER vscode
RUN conda init
RUN conda install -y -c conda-forge yaml pytorch

View File

@ -0,0 +1,37 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/cpp
{
"name": "C++",
"build": {
"dockerfile": "Dockerfile"
},
// "features": {
// "ghcr.io/devcontainers/features/conda:1": {}
// }
// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "make release && make debug && echo 'Done!'",
// Configure tool-specific properties.
// "customizations": {},
"customizations": {
// Configure properties specific to VS Code.
"vscode": {
"settings": {},
"extensions": [
"ms-vscode.cpptools",
"ms-vscode.cpptools-extension-pack",
"ms-vscode.cpptools-themes",
"ms-vscode.cmake-tools",
"ms-azuretools.vscode-docker",
"jbenden.c-cpp-flylint",
"matepek.vscode-catch2-test-adapter",
"GitHub.copilot"
]
}
}
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}

View File

@ -0,0 +1,59 @@
#!/usr/bin/env bash
#-------------------------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See https://go.microsoft.com/fwlink/?linkid=2090316 for license information.
#-------------------------------------------------------------------------------------------------------------
#
set -e
CMAKE_VERSION=${1:-"none"}
if [ "${CMAKE_VERSION}" = "none" ]; then
echo "No CMake version specified, skipping CMake reinstallation"
exit 0
fi
# Cleanup temporary directory and associated files when exiting the script.
cleanup() {
EXIT_CODE=$?
set +e
if [[ -n "${TMP_DIR}" ]]; then
echo "Executing cleanup of tmp files"
rm -Rf "${TMP_DIR}"
fi
exit $EXIT_CODE
}
trap cleanup EXIT
echo "Installing CMake..."
apt-get -y purge --auto-remove cmake
mkdir -p /opt/cmake
architecture=$(dpkg --print-architecture)
case "${architecture}" in
arm64)
ARCH=aarch64 ;;
amd64)
ARCH=x86_64 ;;
*)
echo "Unsupported architecture ${architecture}."
exit 1
;;
esac
CMAKE_BINARY_NAME="cmake-${CMAKE_VERSION}-linux-${ARCH}.sh"
CMAKE_CHECKSUM_NAME="cmake-${CMAKE_VERSION}-SHA-256.txt"
TMP_DIR=$(mktemp -d -t cmake-XXXXXXXXXX)
echo "${TMP_DIR}"
cd "${TMP_DIR}"
curl -sSL "https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${CMAKE_BINARY_NAME}" -O
curl -sSL "https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${CMAKE_CHECKSUM_NAME}" -O
sha256sum -c --ignore-missing "${CMAKE_CHECKSUM_NAME}"
sh "${TMP_DIR}/${CMAKE_BINARY_NAME}" --prefix=/opt/cmake --skip-license
ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake
ln -s /opt/cmake/bin/ctest /usr/local/bin/ctest

12
.github/dependabot.yml vendored Normal file
View File

@ -0,0 +1,12 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for more information:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
# https://containers.dev/guide/dependabot
version: 2
updates:
- package-ecosystem: "devcontainers"
directory: "/"
schedule:
interval: weekly

7
.gitignore vendored
View File

@ -38,3 +38,10 @@ cmake-build*/**
.idea .idea
puml/** puml/**
.vscode/settings.json .vscode/settings.json
sample/build
**/.DS_Store
docs/manual
docs/man3
docs/man
docs/Doxyfile

21
.gitmodules vendored
View File

@ -1,13 +1,3 @@
[submodule "lib/mdlp"]
path = lib/mdlp
url = https://github.com/rmontanana/mdlp
main = main
update = merge
[submodule "lib/catch2"]
path = lib/catch2
main = v2.x
update = merge
url = https://github.com/catchorg/Catch2.git
[submodule "lib/json"] [submodule "lib/json"]
path = lib/json path = lib/json
url = https://github.com/nlohmann/json.git url = https://github.com/nlohmann/json.git
@ -18,3 +8,14 @@
url = https://github.com/rmontanana/folding url = https://github.com/rmontanana/folding
main = main main = main
update = merge update = merge
[submodule "tests/lib/catch2"]
path = tests/lib/catch2
url = https://github.com/catchorg/Catch2.git
main = main
update = merge
[submodule "tests/lib/Files"]
path = tests/lib/Files
url = https://github.com/rmontanana/ArffFiles
[submodule "lib/mdlp"]
path = lib/mdlp
url = https://github.com/rmontanana/mdlp

View File

@ -0,0 +1,4 @@
{
"sonarCloudOrganization": "rmontanana",
"projectKey": "rmontanana_BayesNet"
}

View File

@ -3,15 +3,47 @@
{ {
"name": "Mac", "name": "Mac",
"includePath": [ "includePath": [
"${workspaceFolder}/**" "/Users/rmontanana/Code/BayesNet/**"
], ],
"defines": [], "defines": [],
"macFrameworkPath": [ "macFrameworkPath": [
"/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks" "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include"
], ],
"cStandard": "c17", "cStandard": "c17",
"cppStandard": "c++17", "cppStandard": "c++17",
"compileCommands": "${workspaceFolder}/cmake-build-release/compile_commands.json" "compileCommands": "",
"intelliSenseMode": "macos-clang-arm64",
"mergeConfigurations": false,
"browse": {
"path": [
"/Users/rmontanana/Code/BayesNet/**",
"${workspaceFolder}"
],
"limitSymbolsToIncludedHeaders": true
},
"configurationProvider": "ms-vscode.cmake-tools"
},
{
"name": "Linux",
"includePath": [
"/home/rmontanana/Code/BayesNet/**",
"/home/rmontanana/Code/libtorch/include/torch/csrc/api/include/",
"/home/rmontanana/Code/BayesNet/lib/"
],
"defines": [],
"cStandard": "c17",
"cppStandard": "c++17",
"intelliSenseMode": "linux-gcc-x64",
"mergeConfigurations": false,
"compilerPath": "/usr/bin/g++",
"browse": {
"path": [
"/home/rmontanana/Code/BayesNet/**",
"${workspaceFolder}"
],
"limitSymbolsToIncludedHeaders": true
},
"configurationProvider": "ms-vscode.cmake-tools"
} }
], ],
"version": 4 "version": 4

126
.vscode/launch.json vendored
View File

@ -5,126 +5,44 @@
"type": "lldb", "type": "lldb",
"request": "launch", "request": "launch",
"name": "sample", "name": "sample",
"program": "${workspaceFolder}/build_debug/sample/BayesNetSample", "program": "${workspaceFolder}/build_release/sample/bayesnet_sample",
"args": [ "args": [
"-d", "${workspaceFolder}/tests/data/glass.arff"
"iris", ]
"-m",
"TANLd",
"-s",
"271",
"-p",
"/Users/rmontanana/Code/discretizbench/datasets/",
],
//"cwd": "${workspaceFolder}/build/sample/",
},
{
"type": "lldb",
"request": "launch",
"name": "experimentPy",
"program": "${workspaceFolder}/build_debug/src/Platform/b_main",
"args": [
"-m",
"STree",
"--stratified",
"-d",
"iris",
//"--discretize"
// "--hyperparameters",
// "{\"repeatSparent\": true, \"maxModels\": 12}"
],
"cwd": "${workspaceFolder}/../discretizbench",
},
{
"type": "lldb",
"request": "launch",
"name": "gridsearch",
"program": "${workspaceFolder}/build_debug/src/Platform/b_grid",
"args": [
"-m",
"KDB",
"--discretize",
"--continue",
"glass",
"--only",
"--compute"
],
"cwd": "${workspaceFolder}/../discretizbench",
},
{
"type": "lldb",
"request": "launch",
"name": "experimentBayes",
"program": "${workspaceFolder}/build_debug/src/Platform/b_main",
"args": [
"-m",
"TAN",
"--stratified",
"--discretize",
"-d",
"iris",
"--hyperparameters",
"{\"repeatSparent\": true, \"maxModels\": 12}"
],
"cwd": "/home/rmontanana/Code/discretizbench",
},
{
"type": "lldb",
"request": "launch",
"name": "best",
"program": "${workspaceFolder}/build_debug/src/Platform/b_best",
"args": [
"-m",
"BoostAODE",
"-s",
"accuracy",
"--build",
],
"cwd": "${workspaceFolder}/../discretizbench",
},
{
"type": "lldb",
"request": "launch",
"name": "manage",
"program": "${workspaceFolder}/build_debug/src/Platform/b_manage",
"args": [
"-n",
"20"
],
"cwd": "${workspaceFolder}/../discretizbench",
},
{
"type": "lldb",
"request": "launch",
"name": "list",
"program": "${workspaceFolder}/build_debug/src/Platform/b_list",
"args": [],
//"cwd": "/Users/rmontanana/Code/discretizbench",
"cwd": "${workspaceFolder}/../discretizbench",
}, },
{ {
"type": "lldb", "type": "lldb",
"request": "launch", "request": "launch",
"name": "test", "name": "test",
"program": "${workspaceFolder}/build_debug/tests/unit_tests_bayesnet", "program": "${workspaceFolder}/build_Debug/tests/TestBayesNet",
"args": [ "args": [
//"-c=\"Metrics Test\"", "No features selected"
// "-s",
], ],
"cwd": "${workspaceFolder}/build_debug/tests", "cwd": "${workspaceFolder}/build_Debug/tests"
}, },
{ {
"name": "Build & debug active file", "name": "(gdb) Launch",
"type": "cppdbg", "type": "cppdbg",
"request": "launch", "request": "launch",
"program": "${workspaceFolder}/build_debug/bayesnet", "program": "enter program name, for example ${workspaceFolder}/a.out",
"args": [], "args": [],
"stopAtEntry": false, "stopAtEntry": false,
"cwd": "${workspaceFolder}", "cwd": "${fileDirname}",
"environment": [], "environment": [],
"externalConsole": false, "externalConsole": false,
"MIMode": "lldb", "MIMode": "gdb",
"preLaunchTask": "CMake: build" "setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
},
{
"description": "Set Disassembly Flavor to Intel",
"text": "-gdb-set disassembly-flavor intel",
"ignoreFailures": true
}
]
} }
] ]
} }

View File

@ -9,6 +9,104 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added ### Added
- Add a new hyperparameter to the BoostAODE class, *alphablock*, to control the way &alpha; is computed, with the last model or with the ensmble built so far. Default value is *false*.
- Add a new hyperparameter to the SPODE class, *parent*, to set the root node of the model. If no value is set the root parameter of the constructor is used.
- Add a new hyperparameter to the TAN class, *parent*, to set the root node of the model. If not set the first feature is used as root.
### Changed
- Hyperparameter *maxTolerance* in the BoostAODE class is now in [1, 6] range (it was in [1, 4] range before).
## [1.0.6] 2024-11-23
### Fixed
- Prevent existing edges to be added to the network in the `add_edge` method.
- Don't allow to add nodes or edges on already fiited networks.
- Number of threads spawned
- Network class tests
### Added
- Library logo generated with <https://openart.ai> to README.md
- Link to the coverage report in the README.md coverage label.
- *convergence_best* hyperparameter to the BoostAODE class, to control the way the prior accuracy is computed if convergence is set. Default value is *false*.
- SPnDE model.
- A2DE model.
- BoostA2DE model.
- A2DE & SPnDE tests.
- Add tests to reach 99% of coverage.
- Add tests to check the correct version of the mdlp, folding and json libraries.
- Library documentation generated with Doxygen.
- Link to documentation in the README.md.
- Three types of smoothing the Bayesian Network ORIGINAL, LAPLACE and CESTNIK.
### Internal
- Fixed doxygen optional dependency
- Add env parallel variable to Makefile
- Add CountingSemaphore class to manage the number of threads spawned.
- Ignore CUDA language in CMake CodeCoverage module.
- Update mdlp library as a git submodule.
- Create library ShuffleArffFile to limit the number of samples with a parameter and shuffle them.
- Refactor catch2 library location to test/lib
- Refactor loadDataset function in tests.
- Remove conditionalEdgeWeights method in BayesMetrics.
- Refactor Coverage Report generation.
- Add devcontainer to work on apple silicon.
- Change build cmake folder names to Debug & Release.
- Add a Makefile target (doc) to generate the documentation.
- Add a Makefile target (doc-install) to install the documentation.
### Libraries versions
- mdlp: 2.0.1
- Folding: 1.1.0
- json: 3.11
- ArffFiles: 1.1.0
## [1.0.5] 2024-04-20
### Added
- Install command and instructions in README.md
- Prefix to install command to install the package in the any location.
- The 'block_update' hyperparameter to the BoostAODE class, to control the way weights/significances are updated. Default value is false.
- Html report of coverage in the coverage folder. It is created with *make viewcoverage*
- Badges of coverage and code quality (codacy) in README.md. Coverage badge is updated with *make viewcoverage*
- Tests to reach 97% of coverage.
- Copyright header to source files.
- Diagrams to README.md: UML class diagram & dependency diagram
- Action to create diagrams to Makefile: *make diagrams*
### Changed
- Sample app now is a separate target in the Makefile and shows how to use the library with a sample dataset
- The worse model count in BoostAODE is reset to 0 every time a new model produces better accuracy, so the tolerance of the model is meant to be the number of **consecutive** models that produce worse accuracy.
- Default hyperparameter values in BoostAODE: bisection is true, maxTolerance is 3, convergence is true
### Removed
- The 'predict_single' hyperparameter from the BoostAODE class.
- The 'repeatSparent' hyperparameter from the BoostAODE class.
## [1.0.4] 2024-03-06
### Added
- Change *ascending* hyperparameter to *order* with these possible values *{"asc", "desc", "rand"}*, Default is *"desc"*.
- Add the *predict_single* hyperparameter to control if only the last model created is used to predict in boost training or the whole ensemble (all the models built so far). Default is true.
- sample app to show how to use the library (make sample)
### Changed
- Change the library structure adding folders for each group of classes (classifiers, ensembles, etc).
- The significances of the models generated under the feature selection algorithm are now computed after all the models have been generated and an &alpha;<sub>t</sub> value is computed and assigned to each model.
## [1.0.3] 2024-02-25
### Added
- Voting / probability aggregation in Ensemble classes - Voting / probability aggregation in Ensemble classes
- predict_proba method in Classifier - predict_proba method in Classifier
- predict_proba method in BoostAODE - predict_proba method in BoostAODE

View File

@ -0,0 +1,5 @@
# Set the default graph title
set(GRAPHVIZ_GRAPH_NAME "BayesNet dependency graph")
set(GRAPHVIZ_SHARED_LIBS OFF)
set(GRAPHVIZ_STATIC_LIBS ON)

View File

@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.20) cmake_minimum_required(VERSION 3.20)
project(BayesNet project(BayesNet
VERSION 1.0.3 VERSION 1.0.6
DESCRIPTION "Bayesian Network and basic classifiers Library." DESCRIPTION "Bayesian Network and basic classifiers Library."
HOMEPAGE_URL "https://github.com/rmontanana/bayesnet" HOMEPAGE_URL "https://github.com/rmontanana/bayesnet"
LANGUAGES CXX LANGUAGES CXX
@ -25,23 +25,37 @@ set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread") SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fprofile-arcs -ftest-coverage -fno-elide-constructors")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast")
if (NOT ${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-default-inline")
endif()
# Options # Options
# ------- # -------
option(ENABLE_CLANG_TIDY "Enable to add clang tidy." OFF) option(ENABLE_CLANG_TIDY "Enable to add clang tidy." OFF)
option(ENABLE_TESTING "Unit testing build" OFF) option(ENABLE_TESTING "Unit testing build" OFF)
option(CODE_COVERAGE "Collect coverage from test library" OFF) option(CODE_COVERAGE "Collect coverage from test library" OFF)
option(INSTALL_GTEST "Enable installation of googletest." OFF)
# CMakes modules # CMakes modules
# -------------- # --------------
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules ${CMAKE_MODULE_PATH}) set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules ${CMAKE_MODULE_PATH})
include(AddGitSubmodule) include(AddGitSubmodule)
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
MESSAGE("Debug mode")
set(ENABLE_TESTING ON)
set(CODE_COVERAGE ON)
endif (CMAKE_BUILD_TYPE STREQUAL "Debug")
get_property(LANGUAGES GLOBAL PROPERTY ENABLED_LANGUAGES)
message(STATUS "Languages=${LANGUAGES}")
if (CODE_COVERAGE) if (CODE_COVERAGE)
enable_testing() enable_testing()
include(CodeCoverage) include(CodeCoverage)
MESSAGE("Code coverage enabled") MESSAGE(STATUS "Code coverage enabled")
set(CMAKE_CXX_FLAGS " ${CMAKE_CXX_FLAGS} -fprofile-arcs -ftest-coverage -O0 -g") SET(GCC_COVERAGE_LINK_FLAGS " ${GCC_COVERAGE_LINK_FLAGS} -lgcov --coverage")
SET(GCC_COVERAGE_LINK_FLAGS " ${GCC_COVERAGE_LINK_FLAGS} -lgcov --coverage")
endif (CODE_COVERAGE) endif (CODE_COVERAGE)
if (ENABLE_CLANG_TIDY) if (ENABLE_CLANG_TIDY)
@ -50,23 +64,45 @@ endif (ENABLE_CLANG_TIDY)
# External libraries - dependencies of BayesNet # External libraries - dependencies of BayesNet
# --------------------------------------------- # ---------------------------------------------
# include(FetchContent) # include(FetchContent)
add_git_submodule("lib/mdlp")
add_git_submodule("lib/json") add_git_submodule("lib/json")
add_git_submodule("lib/mdlp")
# Subdirectories # Subdirectories
# -------------- # --------------
add_subdirectory(config) add_subdirectory(config)
add_subdirectory(lib/Files) add_subdirectory(bayesnet)
add_subdirectory(src)
file(GLOB BayesNet_SOURCES CONFIGURE_DEPENDS ${BayesNet_SOURCE_DIR}/src/*.cc)
# Testing # Testing
# ------- # -------
if (ENABLE_TESTING) if (ENABLE_TESTING)
MESSAGE("Testing enabled") MESSAGE(STATUS "Testing enabled")
add_git_submodule("lib/catch2") add_subdirectory(tests/lib/catch2)
include(CTest) include(CTest)
add_subdirectory(tests) add_subdirectory(tests)
endif (ENABLE_TESTING) endif (ENABLE_TESTING)
# Installation
# ------------
install(TARGETS BayesNet
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib
CONFIGURATIONS Release)
install(DIRECTORY bayesnet/ DESTINATION include/bayesnet FILES_MATCHING CONFIGURATIONS Release PATTERN "*.h")
install(FILES ${CMAKE_BINARY_DIR}/configured_files/include/bayesnet/config.h DESTINATION include/bayesnet CONFIGURATIONS Release)
# Documentation
# -------------
find_package(Doxygen)
if (Doxygen_FOUND)
set(DOC_DIR ${CMAKE_CURRENT_SOURCE_DIR}/docs)
set(doxyfile_in ${DOC_DIR}/Doxyfile.in)
set(doxyfile ${DOC_DIR}/Doxyfile)
configure_file(${doxyfile_in} ${doxyfile} @ONLY)
doxygen_add_docs(doxygen
WORKING_DIRECTORY ${DOC_DIR}
CONFIG_FILE ${doxyfile})
else (Doxygen_FOUND)
MESSAGE("* Doxygen not found")
endif (Doxygen_FOUND)

View File

@ -1,6 +1,6 @@
MIT License MIT License
Copyright (c) <year> <copyright holders> Copyright (c) 2023 Ricardo Montañana Gómez
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

127
Makefile
View File

@ -1,12 +1,22 @@
SHELL := /bin/bash SHELL := /bin/bash
.DEFAULT_GOAL := help .DEFAULT_GOAL := help
.PHONY: coverage setup help buildr buildd test clean debug release .PHONY: viewcoverage coverage setup help install uninstall diagrams buildr buildd test clean debug release sample updatebadge doc doc-install
f_release = build_release f_release = build_Release
f_debug = build_debug f_debug = build_Debug
f_diagrams = diagrams
app_targets = BayesNet app_targets = BayesNet
test_targets = unit_tests_bayesnet test_targets = TestBayesNet
n_procs = -j 16 clang-uml = clang-uml
plantuml = plantuml
lcov = lcov
genhtml = genhtml
dot = dot
docsrcdir = docs/manual
mansrcdir = docs/man3
mandestdir = /usr/local/share/man
sed_command_link = 's/e">LCOV -/e"><a href="https:\/\/rmontanana.github.io\/bayesnet">Back to manual<\/a> LCOV -/g'
sed_command_diagram = 's/Diagram"/Diagram" width="100%" height="100%" /g'
define ClearTests define ClearTests
@for t in $(test_targets); do \ @for t in $(test_targets); do \
@ -29,24 +39,46 @@ setup: ## Install dependencies for tests and coverage
fi fi
@if [ "$(shell uname)" = "Linux" ]; then \ @if [ "$(shell uname)" = "Linux" ]; then \
pip install gcovr; \ pip install gcovr; \
sudo dnf install lcov;\
fi fi
@echo "* You should install plantuml & graphviz for the diagrams"
dependency: ## Create a dependency graph diagram of the project (build/dependency.png) diagrams: ## Create an UML class diagram & dependency of the project (diagrams/BayesNet.png)
@which $(plantuml) || (echo ">>> Please install plantuml"; exit 1)
@which $(dot) || (echo ">>> Please install graphviz"; exit 1)
@which $(clang-uml) || (echo ">>> Please install clang-uml"; exit 1)
@export PLANTUML_LIMIT_SIZE=16384
@echo ">>> Creating UML class diagram of the project...";
@$(clang-uml) -p
@cd $(f_diagrams); \
$(plantuml) -tsvg BayesNet.puml
@echo ">>> Creating dependency graph diagram of the project..."; @echo ">>> Creating dependency graph diagram of the project...";
$(MAKE) debug $(MAKE) debug
cd $(f_debug) && cmake .. --graphviz=dependency.dot && dot -Tpng dependency.dot -o dependency.png cd $(f_debug) && cmake .. --graphviz=dependency.dot
@$(dot) -Tsvg $(f_debug)/dependency.dot.BayesNet -o $(f_diagrams)/dependency.svg
buildd: ## Build the debug targets buildd: ## Build the debug targets
cmake --build $(f_debug) -t $(app_targets) $(n_procs) cmake --build $(f_debug) -t $(app_targets) --parallel $(CMAKE_BUILD_PARALLEL_LEVEL)
buildr: ## Build the release targets buildr: ## Build the release targets
cmake --build $(f_release) -t $(app_targets) $(n_procs) cmake --build $(f_release) -t $(app_targets) --parallel $(CMAKE_BUILD_PARALLEL_LEVEL)
clean: ## Clean the tests info clean: ## Clean the tests info
@echo ">>> Cleaning Debug BayesNet tests..."; @echo ">>> Cleaning Debug BayesNet tests...";
$(call ClearTests) $(call ClearTests)
@echo ">>> Done"; @echo ">>> Done";
uninstall: ## Uninstall library
@echo ">>> Uninstalling BayesNet...";
xargs rm < $(f_release)/install_manifest.txt
@echo ">>> Done";
prefix = "/usr/local"
install: ## Install library
@echo ">>> Installing BayesNet...";
@cmake --install $(f_release) --prefix $(prefix)
@echo ">>> Done";
debug: ## Build a debug version of the project debug: ## Build a debug version of the project
@echo ">>> Building Debug BayesNet..."; @echo ">>> Building Debug BayesNet...";
@if [ -d ./$(f_debug) ]; then rm -rf ./$(f_debug); fi @if [ -d ./$(f_debug) ]; then rm -rf ./$(f_debug); fi
@ -61,25 +93,94 @@ release: ## Build a Release version of the project
@cmake -S . -B $(f_release) -D CMAKE_BUILD_TYPE=Release @cmake -S . -B $(f_release) -D CMAKE_BUILD_TYPE=Release
@echo ">>> Done"; @echo ">>> Done";
fname = "tests/data/iris.arff"
sample: ## Build sample
@echo ">>> Building Sample...";
@if [ -d ./sample/build ]; then rm -rf ./sample/build; fi
@cd sample && cmake -B build -S . && cmake --build build -t bayesnet_sample
sample/build/bayesnet_sample $(fname)
@echo ">>> Done";
opt = "" opt = ""
test: ## Run tests (opt="-s") to verbose output the tests, (opt="-c='Test Maximum Spanning Tree'") to run only that section test: ## Run tests (opt="-s") to verbose output the tests, (opt="-c='Test Maximum Spanning Tree'") to run only that section
@echo ">>> Running BayesNet & Platform tests..."; @echo ">>> Running BayesNet tests...";
@$(MAKE) clean @$(MAKE) clean
@cmake --build $(f_debug) -t $(test_targets) $(n_procs) @cmake --build $(f_debug) -t $(test_targets) --parallel $(CMAKE_BUILD_PARALLEL_LEVEL)
@for t in $(test_targets); do \ @for t in $(test_targets); do \
echo ">>> Running $$t...";\
if [ -f $(f_debug)/tests/$$t ]; then \ if [ -f $(f_debug)/tests/$$t ]; then \
cd $(f_debug)/tests ; \ cd $(f_debug)/tests ; \
./$$t $(opt) ; \ ./$$t $(opt) ; \
cd ../.. ; \
fi ; \ fi ; \
done done
@echo ">>> Done"; @echo ">>> Done";
coverage: ## Run tests and generate coverage report (build/index.html) coverage: ## Run tests and generate coverage report (build/index.html)
@echo ">>> Building tests with coverage..." @echo ">>> Building tests with coverage..."
@$(MAKE) test @which $(lcov) || (echo ">>ease install lcov"; exit 1)
@gcovr $(f_debug)/tests @if [ ! -f $(f_debug)/tests/coverage.info ] ; then $(MAKE) test ; fi
@echo ">>> Building report..."
@cd $(f_debug)/tests; \
$(lcov) --directory CMakeFiles --capture --demangle-cpp --ignore-errors source,source --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info '/usr/*' --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info 'lib/*' --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info 'libtorch/*' --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info 'tests/*' --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info 'bayesnet/utils/loguru.*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --remove coverage.info '/opt/miniconda/*' --ignore-errors unused --output-file coverage.info >/dev/null 2>&1; \
$(lcov) --summary coverage.info
@$(MAKE) updatebadge
@echo ">>> Done"; @echo ">>> Done";
viewcoverage: ## View the html coverage report
@which $(genhtml) >/dev/null || (echo ">>> Please install lcov (genhtml not found)"; exit 1)
@if [ ! -d $(docsrcdir)/coverage ]; then mkdir -p $(docsrcdir)/coverage; fi
@if [ ! -f $(f_debug)/tests/coverage.info ]; then \
echo ">>> No coverage.info file found. Run make coverage first!"; \
exit 1; \
fi
@$(genhtml) $(f_debug)/tests/coverage.info --demangle-cpp --output-directory $(docsrcdir)/coverage --title "BayesNet Coverage Report" -s -k -f --legend >/dev/null 2>&1;
@xdg-open $(docsrcdir)/coverage/index.html || open $(docsrcdir)/coverage/index.html 2>/dev/null
@echo ">>> Done";
updatebadge: ## Update the coverage badge in README.md
@which python || (echo ">>> Please install python"; exit 1)
@if [ ! -f $(f_debug)/tests/coverage.info ]; then \
echo ">>> No coverage.info file found. Run make coverage first!"; \
exit 1; \
fi
@echo ">>> Updating coverage badge..."
@env python update_coverage.py $(f_debug)/tests
@echo ">>> Done";
doc: ## Generate documentation
@echo ">>> Generating documentation..."
@cmake --build $(f_release) -t doxygen
@cp -rp diagrams $(docsrcdir)
@
@if [ "$(shell uname)" = "Darwin" ]; then \
sed -i "" $(sed_command_link) $(docsrcdir)/coverage/index.html ; \
sed -i "" $(sed_command_diagram) $(docsrcdir)/index.html ; \
else \
sed -i $(sed_command_link) $(docsrcdir)/coverage/index.html ; \
sed -i $(sed_command_diagram) $(docsrcdir)/index.html ; \
fi
@echo ">>> Done";
docdir = ""
doc-install: ## Install documentation
@echo ">>> Installing documentation..."
@if [ "$(docdir)" = "" ]; then \
echo "docdir parameter has to be set when calling doc-install, i.e. docdir=../bayesnet_help"; \
exit 1; \
fi
@if [ ! -d $(docdir) ]; then \
@$(MAKE) doc; \
fi
@cp -rp $(docsrcdir)/* $(docdir)
@sudo cp -rp $(mansrcdir) $(mandestdir)
@echo ">>> Done";
help: ## Show help message help: ## Show help message
@IFS=$$'\n' ; \ @IFS=$$'\n' ; \

View File

@ -1,14 +1,40 @@
# BayesNet # <img src="logo.png" alt="logo" width="50"/> BayesNet
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ![C++](https://img.shields.io/badge/c++-%2300599C.svg?style=flat&logo=c%2B%2B&logoColor=white)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](<https://opensource.org/licenses/MIT>)
![Gitea Release](https://img.shields.io/gitea/v/release/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es:3000)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/cf3e0ac71d764650b1bf4d8d00d303b1)](https://app.codacy.com/gh/Doctorado-ML/BayesNet/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=rmontanana_BayesNet&metric=security_rating)](https://sonarcloud.io/summary/new_code?id=rmontanana_BayesNet)
[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=rmontanana_BayesNet&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=rmontanana_BayesNet)
![Gitea Last Commit](https://img.shields.io/gitea/last-commit/rmontanana/bayesnet?gitea_url=https://gitea.rmontanana.es:3000&logo=gitea)
[![Coverage Badge](https://img.shields.io/badge/Coverage-99,1%25-green)](html/index.html)
[![DOI](https://zenodo.org/badge/667782806.svg)](https://doi.org/10.5281/zenodo.14210344)
Bayesian Network Classifiers using libtorch from scratch Bayesian Network Classifiers library
## Dependencies
The only external dependency is [libtorch](https://pytorch.org/cppdocs/installing.html) which can be installed with the following commands:
```bash
wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip
unzip libtorch-shared-with-deps-latest.zip
```
## Setup
### Getting the code
```bash
git clone --recurse-submodules https://github.com/doctorado-ml/bayesnet
```
### Release ### Release
```bash ```bash
make release make release
make buildr make buildr
sudo make install
``` ```
### Debug & Tests ### Debug & Tests
@ -16,7 +42,64 @@ make buildr
```bash ```bash
make debug make debug
make test make test
make coverage
``` ```
## 1. Introduction ### Coverage
```bash
make coverage
make viewcoverage
```
### Sample app
After building and installing the release version, you can run the sample app with the following commands:
```bash
make sample
make sample fname=tests/data/glass.arff
```
## Models
#### - TAN
#### - KDB
#### - SPODE
#### - SPnDE
#### - AODE
#### - A2DE
#### - [BoostAODE](docs/BoostAODE.md)
#### - BoostA2DE
### With Local Discretization
#### - TANLd
#### - KDBLd
#### - SPODELd
#### - AODELd
## Documentation
### [Manual](https://rmontanana.github.io/bayesnet/)
### [Coverage report](https://rmontanana.github.io/bayesnet/coverage/index.html)
## Diagrams
### UML Class Diagram
![BayesNet UML Class Diagram](diagrams/BayesNet.svg)
### Dependency Diagram
![BayesNet Dependency Diagram](diagrams/dependency.svg)

View File

@ -1,18 +1,25 @@
#ifndef BASE_H // ***************************************************************
#define BASE_H // SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#pragma once
#include <vector>
#include <torch/torch.h> #include <torch/torch.h>
#include <nlohmann/json.hpp> #include <nlohmann/json.hpp>
#include <vector> #include "bayesnet/network/Network.h"
namespace bayesnet { namespace bayesnet {
enum status_t { NORMAL, WARNING, ERROR }; enum status_t { NORMAL, WARNING, ERROR };
class BaseClassifier { class BaseClassifier {
public: public:
// X is nxm std::vector, y is nx1 std::vector // X is nxm std::vector, y is nx1 std::vector
virtual BaseClassifier& fit(std::vector<std::vector<int>>& X, std::vector<int>& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) = 0; virtual BaseClassifier& fit(std::vector<std::vector<int>>& X, std::vector<int>& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) = 0;
// X is nxm tensor, y is nx1 tensor // X is nxm tensor, y is nx1 tensor
virtual BaseClassifier& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) = 0; virtual BaseClassifier& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) = 0;
virtual BaseClassifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) = 0; virtual BaseClassifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) = 0;
virtual BaseClassifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights) = 0; virtual BaseClassifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing) = 0;
virtual ~BaseClassifier() = default; virtual ~BaseClassifier() = default;
torch::Tensor virtual predict(torch::Tensor& X) = 0; torch::Tensor virtual predict(torch::Tensor& X) = 0;
std::vector<int> virtual predict(std::vector<std::vector<int >>& X) = 0; std::vector<int> virtual predict(std::vector<std::vector<int >>& X) = 0;
@ -30,12 +37,11 @@ namespace bayesnet {
virtual std::string getVersion() = 0; virtual std::string getVersion() = 0;
std::vector<std::string> virtual topological_order() = 0; std::vector<std::string> virtual topological_order() = 0;
std::vector<std::string> virtual getNotes() const = 0; std::vector<std::string> virtual getNotes() const = 0;
void virtual dump_cpt()const = 0; std::string virtual dump_cpt()const = 0;
virtual void setHyperparameters(const nlohmann::json& hyperparameters) = 0; virtual void setHyperparameters(const nlohmann::json& hyperparameters) = 0;
std::vector<std::string>& getValidHyperparameters() { return validHyperparameters; } std::vector<std::string>& getValidHyperparameters() { return validHyperparameters; }
protected: protected:
virtual void trainModel(const torch::Tensor& weights) = 0; virtual void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) = 0;
std::vector<std::string> validHyperparameters; std::vector<std::string> validHyperparameters;
}; };
} }
#endif

12
bayesnet/CMakeLists.txt Normal file
View File

@ -0,0 +1,12 @@
include_directories(
${BayesNet_SOURCE_DIR}/lib/mdlp/src
${BayesNet_SOURCE_DIR}/lib/folding
${BayesNet_SOURCE_DIR}/lib/json/include
${BayesNet_SOURCE_DIR}
${CMAKE_BINARY_DIR}/configured_files/include
)
file(GLOB_RECURSE Sources "*.cc")
add_library(BayesNet ${Sources})
target_link_libraries(BayesNet fimdlp "${TORCH_LIBRARIES}")

View File

@ -1,22 +1,29 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <sstream>
#include "bayesnet/utils/bayesnetUtils.h"
#include "Classifier.h" #include "Classifier.h"
#include "bayesnetUtils.h"
namespace bayesnet { namespace bayesnet {
Classifier::Classifier(Network model) : model(model), m(0), n(0), metrics(Metrics()), fitted(false) {} Classifier::Classifier(Network model) : model(model), m(0), n(0), metrics(Metrics()), fitted(false) {}
const std::string CLASSIFIER_NOT_FITTED = "Classifier has not been fitted"; const std::string CLASSIFIER_NOT_FITTED = "Classifier has not been fitted";
Classifier& Classifier::build(const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights) Classifier& Classifier::build(const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing)
{ {
this->features = features; this->features = features;
this->className = className; this->className = className;
this->states = states; this->states = states;
m = dataset.size(1); m = dataset.size(1);
n = dataset.size(0) - 1; n = features.size();
checkFitParameters(); checkFitParameters();
auto n_classes = states.at(className).size(); auto n_classes = states.at(className).size();
metrics = Metrics(dataset, features, className, n_classes); metrics = Metrics(dataset, features, className, n_classes);
model.initialize(); model.initialize();
buildModel(weights); buildModel(weights);
trainModel(weights); trainModel(weights, smoothing);
fitted = true; fitted = true;
return *this; return *this;
} }
@ -27,26 +34,27 @@ namespace bayesnet {
dataset = torch::cat({ dataset, yresized }, 0); dataset = torch::cat({ dataset, yresized }, 0);
} }
catch (const std::exception& e) { catch (const std::exception& e) {
std::cerr << e.what() << '\n'; std::stringstream oss;
std::cout << "X dimensions: " << dataset.sizes() << "\n"; oss << "* Error in X and y dimensions *\n";
std::cout << "y dimensions: " << ytmp.sizes() << "\n"; oss << "X dimensions: " << dataset.sizes() << "\n";
exit(1); oss << "y dimensions: " << ytmp.sizes();
throw std::runtime_error(oss.str());
} }
} }
void Classifier::trainModel(const torch::Tensor& weights) void Classifier::trainModel(const torch::Tensor& weights, Smoothing_t smoothing)
{ {
model.fit(dataset, weights, features, className, states); model.fit(dataset, weights, features, className, states, smoothing);
} }
// X is nxm where n is the number of features and m the number of samples // X is nxm where n is the number of features and m the number of samples
Classifier& Classifier::fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) Classifier& Classifier::fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing)
{ {
dataset = X; dataset = X;
buildDataset(y); buildDataset(y);
const torch::Tensor weights = torch::full({ dataset.size(1) }, 1.0 / dataset.size(1), torch::kDouble); const torch::Tensor weights = torch::full({ dataset.size(1) }, 1.0 / dataset.size(1), torch::kDouble);
return build(features, className, states, weights); return build(features, className, states, weights, smoothing);
} }
// X is nxm where n is the number of features and m the number of samples // X is nxm where n is the number of features and m the number of samples
Classifier& Classifier::fit(std::vector<std::vector<int>>& X, std::vector<int>& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) Classifier& Classifier::fit(std::vector<std::vector<int>>& X, std::vector<int>& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing)
{ {
dataset = torch::zeros({ static_cast<int>(X.size()), static_cast<int>(X[0].size()) }, torch::kInt32); dataset = torch::zeros({ static_cast<int>(X.size()), static_cast<int>(X[0].size()) }, torch::kInt32);
for (int i = 0; i < X.size(); ++i) { for (int i = 0; i < X.size(); ++i) {
@ -55,29 +63,29 @@ namespace bayesnet {
auto ytmp = torch::tensor(y, torch::kInt32); auto ytmp = torch::tensor(y, torch::kInt32);
buildDataset(ytmp); buildDataset(ytmp);
const torch::Tensor weights = torch::full({ dataset.size(1) }, 1.0 / dataset.size(1), torch::kDouble); const torch::Tensor weights = torch::full({ dataset.size(1) }, 1.0 / dataset.size(1), torch::kDouble);
return build(features, className, states, weights); return build(features, className, states, weights, smoothing);
} }
Classifier& Classifier::fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) Classifier& Classifier::fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing)
{ {
this->dataset = dataset; this->dataset = dataset;
const torch::Tensor weights = torch::full({ dataset.size(1) }, 1.0 / dataset.size(1), torch::kDouble); const torch::Tensor weights = torch::full({ dataset.size(1) }, 1.0 / dataset.size(1), torch::kDouble);
return build(features, className, states, weights); return build(features, className, states, weights, smoothing);
} }
Classifier& Classifier::fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights) Classifier& Classifier::fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing)
{ {
this->dataset = dataset; this->dataset = dataset;
return build(features, className, states, weights); return build(features, className, states, weights, smoothing);
} }
void Classifier::checkFitParameters() void Classifier::checkFitParameters()
{ {
if (torch::is_floating_point(dataset)) { if (torch::is_floating_point(dataset)) {
throw std::invalid_argument("dataset (X, y) must be of type Integer"); throw std::invalid_argument("dataset (X, y) must be of type Integer");
} }
if (n != features.size()) { if (dataset.size(0) - 1 != features.size()) {
throw std::invalid_argument("Classifier: X " + std::to_string(n) + " and features " + std::to_string(features.size()) + " must have the same number of features"); throw std::invalid_argument("Classifier: X " + std::to_string(dataset.size(0) - 1) + " and features " + std::to_string(features.size()) + " must have the same number of features");
} }
if (states.find(className) == states.end()) { if (states.find(className) == states.end()) {
throw std::invalid_argument("className not found in states"); throw std::invalid_argument("class name not found in states");
} }
for (auto feature : features) { for (auto feature : features) {
if (states.find(feature) == states.end()) { if (states.find(feature) == states.end()) {
@ -173,12 +181,14 @@ namespace bayesnet {
{ {
return model.topological_sort(); return model.topological_sort();
} }
void Classifier::dump_cpt() const std::string Classifier::dump_cpt() const
{ {
model.dump_cpt(); return model.dump_cpt();
} }
void Classifier::setHyperparameters(const nlohmann::json& hyperparameters) void Classifier::setHyperparameters(const nlohmann::json& hyperparameters)
{ {
//For classifiers that don't have hyperparameters if (!hyperparameters.empty()) {
throw std::invalid_argument("Invalid hyperparameters" + hyperparameters.dump());
}
} }
} }

View File

@ -1,19 +1,24 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef CLASSIFIER_H #ifndef CLASSIFIER_H
#define CLASSIFIER_H #define CLASSIFIER_H
#include <torch/torch.h> #include <torch/torch.h>
#include "BaseClassifier.h" #include "bayesnet/utils/BayesMetrics.h"
#include "Network.h" #include "bayesnet/BaseClassifier.h"
#include "BayesMetrics.h"
namespace bayesnet { namespace bayesnet {
class Classifier : public BaseClassifier { class Classifier : public BaseClassifier {
public: public:
Classifier(Network model); Classifier(Network model);
virtual ~Classifier() = default; virtual ~Classifier() = default;
Classifier& fit(std::vector<std::vector<int>>& X, std::vector<int>& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) override; Classifier& fit(std::vector<std::vector<int>>& X, std::vector<int>& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
Classifier& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) override; Classifier& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
Classifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states) override; Classifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
Classifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights) override; Classifier& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing) override;
void addNodes(); void addNodes();
int getNumberOfNodes() const override; int getNumberOfNodes() const override;
int getNumberOfEdges() const override; int getNumberOfEdges() const override;
@ -30,7 +35,7 @@ namespace bayesnet {
std::vector<std::string> show() const override; std::vector<std::string> show() const override;
std::vector<std::string> topological_order() override; std::vector<std::string> topological_order() override;
std::vector<std::string> getNotes() const override { return notes; } std::vector<std::string> getNotes() const override { return notes; }
void dump_cpt() const override; std::string dump_cpt() const override;
void setHyperparameters(const nlohmann::json& hyperparameters) override; //For classifiers that don't have hyperparameters void setHyperparameters(const nlohmann::json& hyperparameters) override; //For classifiers that don't have hyperparameters
protected: protected:
bool fitted; bool fitted;
@ -45,10 +50,10 @@ namespace bayesnet {
std::vector<std::string> notes; // Used to store messages occurred during the fit process std::vector<std::string> notes; // Used to store messages occurred during the fit process
void checkFitParameters(); void checkFitParameters();
virtual void buildModel(const torch::Tensor& weights) = 0; virtual void buildModel(const torch::Tensor& weights) = 0;
void trainModel(const torch::Tensor& weights) override; void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) override;
void buildDataset(torch::Tensor& y); void buildDataset(torch::Tensor& y);
private: private:
Classifier& build(const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights); Classifier& build(const std::vector<std::string>& features, const std::string& className, std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing);
}; };
} }
#endif #endif

View File

@ -1,3 +1,9 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "KDB.h" #include "KDB.h"
namespace bayesnet { namespace bayesnet {
@ -6,14 +12,18 @@ namespace bayesnet {
validHyperparameters = { "k", "theta" }; validHyperparameters = { "k", "theta" };
} }
void KDB::setHyperparameters(const nlohmann::json& hyperparameters) void KDB::setHyperparameters(const nlohmann::json& hyperparameters_)
{ {
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("k")) { if (hyperparameters.contains("k")) {
k = hyperparameters["k"]; k = hyperparameters["k"];
hyperparameters.erase("k");
} }
if (hyperparameters.contains("theta")) { if (hyperparameters.contains("theta")) {
theta = hyperparameters["theta"]; theta = hyperparameters["theta"];
hyperparameters.erase("theta");
} }
Classifier::setHyperparameters(hyperparameters);
} }
void KDB::buildModel(const torch::Tensor& weights) void KDB::buildModel(const torch::Tensor& weights)
{ {

View File

@ -1,8 +1,14 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef KDB_H #ifndef KDB_H
#define KDB_H #define KDB_H
#include <torch/torch.h> #include <torch/torch.h>
#include "bayesnet/utils/bayesnetUtils.h"
#include "Classifier.h" #include "Classifier.h"
#include "bayesnetUtils.h"
namespace bayesnet { namespace bayesnet {
class KDB : public Classifier { class KDB : public Classifier {
private: private:
@ -14,7 +20,7 @@ namespace bayesnet {
public: public:
explicit KDB(int k, float theta = 0.03); explicit KDB(int k, float theta = 0.03);
virtual ~KDB() = default; virtual ~KDB() = default;
void setHyperparameters(const nlohmann::json& hyperparameters) override; void setHyperparameters(const nlohmann::json& hyperparameters_) override;
std::vector<std::string> graph(const std::string& name = "KDB") const override; std::vector<std::string> graph(const std::string& name = "KDB") const override;
}; };
} }

View File

@ -1,8 +1,14 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "KDBLd.h" #include "KDBLd.h"
namespace bayesnet { namespace bayesnet {
KDBLd::KDBLd(int k) : KDB(k), Proposal(dataset, features, className) {} KDBLd::KDBLd(int k) : KDB(k), Proposal(dataset, features, className) {}
KDBLd& KDBLd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_) KDBLd& KDBLd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{ {
checkInput(X_, y_); checkInput(X_, y_);
features = features_; features = features_;
@ -13,7 +19,7 @@ namespace bayesnet {
states = fit_local_discretization(y); states = fit_local_discretization(y);
// We have discretized the input data // We have discretized the input data
// 1st we need to fit the model to build the normal KDB structure, KDB::fit initializes the base Bayesian network // 1st we need to fit the model to build the normal KDB structure, KDB::fit initializes the base Bayesian network
KDB::fit(dataset, features, className, states); KDB::fit(dataset, features, className, states, smoothing);
states = localDiscretizationProposal(states, model); states = localDiscretizationProposal(states, model);
return *this; return *this;
} }

View File

@ -1,7 +1,13 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef KDBLD_H #ifndef KDBLD_H
#define KDBLD_H #define KDBLD_H
#include "KDB.h"
#include "Proposal.h" #include "Proposal.h"
#include "KDB.h"
namespace bayesnet { namespace bayesnet {
class KDBLd : public KDB, public Proposal { class KDBLd : public KDB, public Proposal {
@ -9,7 +15,7 @@ namespace bayesnet {
public: public:
explicit KDBLd(int k); explicit KDBLd(int k);
virtual ~KDBLd() = default; virtual ~KDBLd() = default;
KDBLd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states) override; KDBLd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
std::vector<std::string> graph(const std::string& name = "KDB") const override; std::vector<std::string> graph(const std::string& name = "KDB") const override;
torch::Tensor predict(torch::Tensor& X) override; torch::Tensor predict(torch::Tensor& X) override;
static inline std::string version() { return "0.0.1"; }; static inline std::string version() { return "0.0.1"; };

View File

@ -1,5 +1,10 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "Proposal.h" #include "Proposal.h"
#include "ArffFiles.h"
namespace bayesnet { namespace bayesnet {
Proposal::Proposal(torch::Tensor& dataset_, std::vector<std::string>& features_, std::string& className_) : pDataset(dataset_), pFeatures(features_), pClassName(className_) {} Proposal::Proposal(torch::Tensor& dataset_, std::vector<std::string>& features_, std::string& className_) : pDataset(dataset_), pFeatures(features_), pClassName(className_) {}
@ -48,8 +53,7 @@ namespace bayesnet {
yJoinParents[i] += to_string(pDataset.index({ idx, i }).item<int>()); yJoinParents[i] += to_string(pDataset.index({ idx, i }).item<int>());
} }
} }
auto arff = ArffFiles(); auto yxv = factorize(yJoinParents);
auto yxv = arff.factorize(yJoinParents);
auto xvf_ptr = Xf.index({ index }).data_ptr<float>(); auto xvf_ptr = Xf.index({ index }).data_ptr<float>();
auto xvf = std::vector<mdlp::precision_t>(xvf_ptr, xvf_ptr + Xf.size(1)); auto xvf = std::vector<mdlp::precision_t>(xvf_ptr, xvf_ptr + Xf.size(1));
discretizers[feature]->fit(xvf, yxv); discretizers[feature]->fit(xvf, yxv);
@ -66,7 +70,7 @@ namespace bayesnet {
states[pFeatures[index]] = xStates; states[pFeatures[index]] = xStates;
} }
const torch::Tensor weights = torch::full({ pDataset.size(1) }, 1.0 / pDataset.size(1), torch::kDouble); const torch::Tensor weights = torch::full({ pDataset.size(1) }, 1.0 / pDataset.size(1), torch::kDouble);
model.fit(pDataset, weights, pFeatures, pClassName, states); model.fit(pDataset, weights, pFeatures, pClassName, states, Smoothing_t::ORIGINAL);
} }
return states; return states;
} }
@ -107,4 +111,19 @@ namespace bayesnet {
} }
return Xtd; return Xtd;
} }
std::vector<int> Proposal::factorize(const std::vector<std::string>& labels_t)
{
std::vector<int> yy;
yy.reserve(labels_t.size());
std::map<std::string, int> labelMap;
int i = 0;
for (const std::string& label : labels_t) {
if (labelMap.find(label) == labelMap.end()) {
labelMap[label] = i++;
bool allDigits = std::all_of(label.begin(), label.end(), ::isdigit);
}
yy.push_back(labelMap[label]);
}
return yy;
}
} }

View File

@ -1,10 +1,16 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef PROPOSAL_H #ifndef PROPOSAL_H
#define PROPOSAL_H #define PROPOSAL_H
#include <string> #include <string>
#include <map> #include <map>
#include <torch/torch.h> #include <torch/torch.h>
#include "Network.h" #include <CPPFImdlp.h>
#include "CPPFImdlp.h" #include "bayesnet/network/Network.h"
#include "Classifier.h" #include "Classifier.h"
namespace bayesnet { namespace bayesnet {
@ -21,6 +27,7 @@ namespace bayesnet {
torch::Tensor y; // y discrete nx1 tensor torch::Tensor y; // y discrete nx1 tensor
map<std::string, mdlp::CPPFImdlp*> discretizers; map<std::string, mdlp::CPPFImdlp*> discretizers;
private: private:
std::vector<int> factorize(const std::vector<std::string>& labels_t);
torch::Tensor& pDataset; // (n+1)xm tensor torch::Tensor& pDataset; // (n+1)xm tensor
std::vector<std::string>& pFeatures; std::vector<std::string>& pFeatures;
std::string& pClassName; std::string& pClassName;

View File

@ -0,0 +1,46 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "SPODE.h"
namespace bayesnet {
SPODE::SPODE(int root) : Classifier(Network()), root(root)
{
validHyperparameters = { "parent" };
}
void SPODE::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("parent")) {
root = hyperparameters["parent"];
hyperparameters.erase("parent");
}
Classifier::setHyperparameters(hyperparameters);
}
void SPODE::buildModel(const torch::Tensor& weights)
{
// 0. Add all nodes to the model
addNodes();
// 1. Add edges from the class node to all other nodes
// 2. Add edges from the root node to all other nodes
if (root >= static_cast<int>(features.size())) {
throw std::invalid_argument("The parent node is not in the dataset");
}
for (int i = 0; i < static_cast<int>(features.size()); ++i) {
model.addEdge(className, features[i]);
if (i != root) {
model.addEdge(features[root], features[i]);
}
}
}
std::vector<std::string> SPODE::graph(const std::string& name) const
{
return model.graph(name);
}
}

View File

@ -1,17 +1,24 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef SPODE_H #ifndef SPODE_H
#define SPODE_H #define SPODE_H
#include "Classifier.h" #include "Classifier.h"
namespace bayesnet { namespace bayesnet {
class SPODE : public Classifier { class SPODE : public Classifier {
private:
int root;
protected:
void buildModel(const torch::Tensor& weights) override;
public: public:
explicit SPODE(int root); explicit SPODE(int root);
virtual ~SPODE() = default; virtual ~SPODE() = default;
void setHyperparameters(const nlohmann::json& hyperparameters_) override;
std::vector<std::string> graph(const std::string& name = "SPODE") const override; std::vector<std::string> graph(const std::string& name = "SPODE") const override;
protected:
void buildModel(const torch::Tensor& weights) override;
private:
int root;
}; };
} }
#endif #endif

View File

@ -1,40 +1,43 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "SPODELd.h" #include "SPODELd.h"
namespace bayesnet { namespace bayesnet {
SPODELd::SPODELd(int root) : SPODE(root), Proposal(dataset, features, className) {} SPODELd::SPODELd(int root) : SPODE(root), Proposal(dataset, features, className) {}
SPODELd& SPODELd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_) SPODELd& SPODELd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{ {
checkInput(X_, y_); checkInput(X_, y_);
features = features_;
className = className_;
Xf = X_; Xf = X_;
y = y_; y = y_;
// Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y return commonFit(features_, className_, states_, smoothing);
states = fit_local_discretization(y);
// We have discretized the input data
// 1st we need to fit the model to build the normal SPODE structure, SPODE::fit initializes the base Bayesian network
SPODE::fit(dataset, features, className, states);
states = localDiscretizationProposal(states, model);
return *this;
} }
SPODELd& SPODELd::fit(torch::Tensor& dataset, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_)
SPODELd& SPODELd::fit(torch::Tensor& dataset, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{ {
if (!torch::is_floating_point(dataset)) { if (!torch::is_floating_point(dataset)) {
throw std::runtime_error("Dataset must be a floating point tensor"); throw std::runtime_error("Dataset must be a floating point tensor");
} }
Xf = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." }).clone(); Xf = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." }).clone();
y = dataset.index({ -1, "..." }).clone(); y = dataset.index({ -1, "..." }).clone().to(torch::kInt32);
return commonFit(features_, className_, states_, smoothing);
}
SPODELd& SPODELd::commonFit(const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
features = features_; features = features_;
className = className_; className = className_;
// Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y // Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y
states = fit_local_discretization(y); states = fit_local_discretization(y);
// We have discretized the input data // We have discretized the input data
// 1st we need to fit the model to build the normal SPODE structure, SPODE::fit initializes the base Bayesian network // 1st we need to fit the model to build the normal SPODE structure, SPODE::fit initializes the base Bayesian network
SPODE::fit(dataset, features, className, states); SPODE::fit(dataset, features, className, states, smoothing);
states = localDiscretizationProposal(states, model); states = localDiscretizationProposal(states, model);
return *this; return *this;
} }
torch::Tensor SPODELd::predict(torch::Tensor& X) torch::Tensor SPODELd::predict(torch::Tensor& X)
{ {
auto Xt = prepareX(X); auto Xt = prepareX(X);

View File

@ -1,3 +1,9 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef SPODELD_H #ifndef SPODELD_H
#define SPODELD_H #define SPODELD_H
#include "SPODE.h" #include "SPODE.h"
@ -8,9 +14,10 @@ namespace bayesnet {
public: public:
explicit SPODELd(int root); explicit SPODELd(int root);
virtual ~SPODELd() = default; virtual ~SPODELd() = default;
SPODELd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states) override; SPODELd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
SPODELd& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states) override; SPODELd& fit(torch::Tensor& dataset, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
std::vector<std::string> graph(const std::string& name = "SPODE") const override; SPODELd& commonFit(const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
std::vector<std::string> graph(const std::string& name = "SPODELd") const override;
torch::Tensor predict(torch::Tensor& X) override; torch::Tensor predict(torch::Tensor& X) override;
static inline std::string version() { return "0.0.1"; }; static inline std::string version() { return "0.0.1"; };
}; };

View File

@ -0,0 +1,38 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "SPnDE.h"
namespace bayesnet {
SPnDE::SPnDE(std::vector<int> parents) : Classifier(Network()), parents(parents) {}
void SPnDE::buildModel(const torch::Tensor& weights)
{
// 0. Add all nodes to the model
addNodes();
std::vector<int> attributes;
for (int i = 0; i < static_cast<int>(features.size()); ++i) {
if (std::find(parents.begin(), parents.end(), i) == parents.end()) {
attributes.push_back(i);
}
}
// 1. Add edges from the class node to all other nodes
// 2. Add edges from the parents nodes to all other nodes
for (const auto& attribute : attributes) {
model.addEdge(className, features[attribute]);
for (const auto& root : parents) {
model.addEdge(features[root], features[attribute]);
}
}
}
std::vector<std::string> SPnDE::graph(const std::string& name) const
{
return model.graph(name);
}
}

View File

@ -0,0 +1,26 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef SPnDE_H
#define SPnDE_H
#include <vector>
#include "Classifier.h"
namespace bayesnet {
class SPnDE : public Classifier {
public:
explicit SPnDE(std::vector<int> parents);
virtual ~SPnDE() = default;
std::vector<std::string> graph(const std::string& name = "SPnDE") const override;
protected:
void buildModel(const torch::Tensor& weights) override;
private:
std::vector<int> parents;
};
}
#endif

View File

@ -1,8 +1,26 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "TAN.h" #include "TAN.h"
namespace bayesnet { namespace bayesnet {
TAN::TAN() : Classifier(Network()) {} TAN::TAN() : Classifier(Network())
{
validHyperparameters = { "parent" };
}
void TAN::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("parent")) {
parent = hyperparameters["parent"];
hyperparameters.erase("parent");
}
Classifier::setHyperparameters(hyperparameters);
}
void TAN::buildModel(const torch::Tensor& weights) void TAN::buildModel(const torch::Tensor& weights)
{ {
// 0. Add all nodes to the model // 0. Add all nodes to the model
@ -17,7 +35,10 @@ namespace bayesnet {
mi.push_back({ i, mi_value }); mi.push_back({ i, mi_value });
} }
sort(mi.begin(), mi.end(), [](const auto& left, const auto& right) {return left.second < right.second;}); sort(mi.begin(), mi.end(), [](const auto& left, const auto& right) {return left.second < right.second;});
auto root = mi[mi.size() - 1].first; auto root = parent == -1 ? mi[mi.size() - 1].first : parent;
if (root >= static_cast<int>(features.size())) {
throw std::invalid_argument("The parent node is not in the dataset");
}
// 2. Compute mutual information between each feature and the class // 2. Compute mutual information between each feature and the class
auto weights_matrix = metrics.conditionalEdge(weights); auto weights_matrix = metrics.conditionalEdge(weights);
// 3. Compute the maximum spanning tree // 3. Compute the maximum spanning tree

View File

@ -0,0 +1,23 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef TAN_H
#define TAN_H
#include "Classifier.h"
namespace bayesnet {
class TAN : public Classifier {
public:
TAN();
virtual ~TAN() = default;
void setHyperparameters(const nlohmann::json& hyperparameters_) override;
std::vector<std::string> graph(const std::string& name = "TAN") const override;
protected:
void buildModel(const torch::Tensor& weights) override;
private:
int parent = -1;
};
}
#endif

View File

@ -1,8 +1,14 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "TANLd.h" #include "TANLd.h"
namespace bayesnet { namespace bayesnet {
TANLd::TANLd() : TAN(), Proposal(dataset, features, className) {} TANLd::TANLd() : TAN(), Proposal(dataset, features, className) {}
TANLd& TANLd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_) TANLd& TANLd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{ {
checkInput(X_, y_); checkInput(X_, y_);
features = features_; features = features_;
@ -13,7 +19,7 @@ namespace bayesnet {
states = fit_local_discretization(y); states = fit_local_discretization(y);
// We have discretized the input data // We have discretized the input data
// 1st we need to fit the model to build the normal TAN structure, TAN::fit initializes the base Bayesian network // 1st we need to fit the model to build the normal TAN structure, TAN::fit initializes the base Bayesian network
TAN::fit(dataset, features, className, states); TAN::fit(dataset, features, className, states, smoothing);
states = localDiscretizationProposal(states, model); states = localDiscretizationProposal(states, model);
return *this; return *this;

View File

@ -1,3 +1,9 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef TANLD_H #ifndef TANLD_H
#define TANLD_H #define TANLD_H
#include "TAN.h" #include "TAN.h"
@ -9,10 +15,9 @@ namespace bayesnet {
public: public:
TANLd(); TANLd();
virtual ~TANLd() = default; virtual ~TANLd() = default;
TANLd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states) override; TANLd& fit(torch::Tensor& X, torch::Tensor& y, const std::vector<std::string>& features, const std::string& className, map<std::string, std::vector<int>>& states, const Smoothing_t smoothing) override;
std::vector<std::string> graph(const std::string& name = "TAN") const override; std::vector<std::string> graph(const std::string& name = "TANLd") const override;
torch::Tensor predict(torch::Tensor& X) override; torch::Tensor predict(torch::Tensor& X) override;
static inline std::string version() { return "0.0.1"; };
}; };
} }
#endif // !TANLD_H #endif // !TANLD_H

View File

@ -0,0 +1,40 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "A2DE.h"
namespace bayesnet {
A2DE::A2DE(bool predict_voting) : Ensemble(predict_voting)
{
validHyperparameters = { "predict_voting" };
}
void A2DE::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("predict_voting")) {
predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting");
}
Classifier::setHyperparameters(hyperparameters);
}
void A2DE::buildModel(const torch::Tensor& weights)
{
models.clear();
significanceModels.clear();
for (int i = 0; i < features.size() - 1; ++i) {
for (int j = i + 1; j < features.size(); ++j) {
auto model = std::make_unique<SPnDE>(std::vector<int>({ i, j }));
models.push_back(std::move(model));
}
}
n_models = static_cast<unsigned>(models.size());
significanceModels = std::vector<double>(n_models, 1.0);
}
std::vector<std::string> A2DE::graph(const std::string& title) const
{
return Ensemble::graph(title);
}
}

22
bayesnet/ensembles/A2DE.h Normal file
View File

@ -0,0 +1,22 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef A2DE_H
#define A2DE_H
#include "bayesnet/classifiers/SPnDE.h"
#include "Ensemble.h"
namespace bayesnet {
class A2DE : public Ensemble {
public:
A2DE(bool predict_voting = false);
virtual ~A2DE() {};
void setHyperparameters(const nlohmann::json& hyperparameters) override;
std::vector<std::string> graph(const std::string& title = "A2DE") const override;
protected:
void buildModel(const torch::Tensor& weights) override;
};
}
#endif

View File

@ -1,3 +1,9 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "AODE.h" #include "AODE.h"
namespace bayesnet { namespace bayesnet {
@ -13,9 +19,7 @@ namespace bayesnet {
predict_voting = hyperparameters["predict_voting"]; predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting"); hyperparameters.erase("predict_voting");
} }
if (!hyperparameters.empty()) { Classifier::setHyperparameters(hyperparameters);
throw std::invalid_argument("Invalid hyperparameters" + hyperparameters.dump());
}
} }
void AODE::buildModel(const torch::Tensor& weights) void AODE::buildModel(const torch::Tensor& weights)
{ {

View File

@ -1,11 +1,17 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef AODE_H #ifndef AODE_H
#define AODE_H #define AODE_H
#include "bayesnet/classifiers/SPODE.h"
#include "Ensemble.h" #include "Ensemble.h"
#include "SPODE.h"
namespace bayesnet { namespace bayesnet {
class AODE : public Ensemble { class AODE : public Ensemble {
public: public:
AODE(bool predict_voting = true); AODE(bool predict_voting = false);
virtual ~AODE() {}; virtual ~AODE() {};
void setHyperparameters(const nlohmann::json& hyperparameters) override; void setHyperparameters(const nlohmann::json& hyperparameters) override;
std::vector<std::string> graph(const std::string& title = "AODE") const override; std::vector<std::string> graph(const std::string& title = "AODE") const override;

View File

@ -1,23 +1,16 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "AODELd.h" #include "AODELd.h"
namespace bayesnet { namespace bayesnet {
AODELd::AODELd(bool predict_voting) : Ensemble(predict_voting), Proposal(dataset, features, className) AODELd::AODELd(bool predict_voting) : Ensemble(predict_voting), Proposal(dataset, features, className)
{ {
validHyperparameters = { "predict_voting" };
} }
void AODELd::setHyperparameters(const nlohmann::json& hyperparameters_) AODELd& AODELd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("predict_voting")) {
predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting");
}
if (!hyperparameters.empty()) {
throw std::invalid_argument("Invalid hyperparameters" + hyperparameters.dump());
}
}
AODELd& AODELd::fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_)
{ {
checkInput(X_, y_); checkInput(X_, y_);
features = features_; features = features_;
@ -27,8 +20,9 @@ namespace bayesnet {
// Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y // Fills std::vectors Xv & yv with the data from tensors X_ (discretized) & y
states = fit_local_discretization(y); states = fit_local_discretization(y);
// We have discretized the input data // We have discretized the input data
// 1st we need to fit the model to build the normal TAN structure, TAN::fit initializes the base Bayesian network // 1st we need to fit the model to build the normal AODE structure, Ensemble::fit
Ensemble::fit(dataset, features, className, states); // calls buildModel to initialize the base models
Ensemble::fit(dataset, features, className, states, smoothing);
return *this; return *this;
} }
@ -41,10 +35,10 @@ namespace bayesnet {
n_models = models.size(); n_models = models.size();
significanceModels = std::vector<double>(n_models, 1.0); significanceModels = std::vector<double>(n_models, 1.0);
} }
void AODELd::trainModel(const torch::Tensor& weights) void AODELd::trainModel(const torch::Tensor& weights, const Smoothing_t smoothing)
{ {
for (const auto& model : models) { for (const auto& model : models) {
model->fit(Xf, y, features, className, states); model->fit(Xf, y, features, className, states, smoothing);
} }
} }
std::vector<std::string> AODELd::graph(const std::string& name) const std::vector<std::string> AODELd::graph(const std::string& name) const

View File

@ -1,19 +1,24 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef AODELD_H #ifndef AODELD_H
#define AODELD_H #define AODELD_H
#include "bayesnet/classifiers/Proposal.h"
#include "bayesnet/classifiers/SPODELd.h"
#include "Ensemble.h" #include "Ensemble.h"
#include "Proposal.h"
#include "SPODELd.h"
namespace bayesnet { namespace bayesnet {
class AODELd : public Ensemble, public Proposal { class AODELd : public Ensemble, public Proposal {
public: public:
AODELd(bool predict_voting = true); AODELd(bool predict_voting = true);
virtual ~AODELd() = default; virtual ~AODELd() = default;
AODELd& fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_) override; AODELd& fit(torch::Tensor& X_, torch::Tensor& y_, const std::vector<std::string>& features_, const std::string& className_, map<std::string, std::vector<int>>& states_, const Smoothing_t smoothing) override;
void setHyperparameters(const nlohmann::json& hyperparameters) override;
std::vector<std::string> graph(const std::string& name = "AODELd") const override; std::vector<std::string> graph(const std::string& name = "AODELd") const override;
protected: protected:
void trainModel(const torch::Tensor& weights) override; void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) override;
void buildModel(const torch::Tensor& weights) override; void buildModel(const torch::Tensor& weights) override;
}; };
} }

256
bayesnet/ensembles/Boost.cc Normal file
View File

@ -0,0 +1,256 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <folding.hpp>
#include "bayesnet/feature_selection/CFS.h"
#include "bayesnet/feature_selection/FCBF.h"
#include "bayesnet/feature_selection/IWSS.h"
#include "Boost.h"
namespace bayesnet {
Boost::Boost(bool predict_voting) : Ensemble(predict_voting)
{
validHyperparameters = { "alpha_block", "order", "convergence", "convergence_best", "bisection", "threshold", "maxTolerance",
"predict_voting", "select_features", "block_update" };
}
void Boost::setHyperparameters(const nlohmann::json& hyperparameters_)
{
auto hyperparameters = hyperparameters_;
if (hyperparameters.contains("order")) {
std::vector<std::string> algos = { Orders.ASC, Orders.DESC, Orders.RAND };
order_algorithm = hyperparameters["order"];
if (std::find(algos.begin(), algos.end(), order_algorithm) == algos.end()) {
throw std::invalid_argument("Invalid order algorithm, valid values [" + Orders.ASC + ", " + Orders.DESC + ", " + Orders.RAND + "]");
}
hyperparameters.erase("order");
}
if (hyperparameters.contains("alpha_block")) {
alpha_block = hyperparameters["alpha_block"];
hyperparameters.erase("alpha_block");
}
if (hyperparameters.contains("convergence")) {
convergence = hyperparameters["convergence"];
hyperparameters.erase("convergence");
}
if (hyperparameters.contains("convergence_best")) {
convergence_best = hyperparameters["convergence_best"];
hyperparameters.erase("convergence_best");
}
if (hyperparameters.contains("bisection")) {
bisection = hyperparameters["bisection"];
hyperparameters.erase("bisection");
}
if (hyperparameters.contains("threshold")) {
threshold = hyperparameters["threshold"];
hyperparameters.erase("threshold");
}
if (hyperparameters.contains("maxTolerance")) {
maxTolerance = hyperparameters["maxTolerance"];
if (maxTolerance < 1 || maxTolerance > 6)
throw std::invalid_argument("Invalid maxTolerance value, must be greater in [1, 6]");
hyperparameters.erase("maxTolerance");
}
if (hyperparameters.contains("predict_voting")) {
predict_voting = hyperparameters["predict_voting"];
hyperparameters.erase("predict_voting");
}
if (hyperparameters.contains("select_features")) {
auto selectedAlgorithm = hyperparameters["select_features"];
std::vector<std::string> algos = { SelectFeatures.IWSS, SelectFeatures.CFS, SelectFeatures.FCBF };
selectFeatures = true;
select_features_algorithm = selectedAlgorithm;
if (std::find(algos.begin(), algos.end(), selectedAlgorithm) == algos.end()) {
throw std::invalid_argument("Invalid selectFeatures value, valid values [" + SelectFeatures.IWSS + ", " + SelectFeatures.CFS + ", " + SelectFeatures.FCBF + "]");
}
hyperparameters.erase("select_features");
}
if (hyperparameters.contains("block_update")) {
block_update = hyperparameters["block_update"];
hyperparameters.erase("block_update");
}
if (block_update && alpha_block) {
throw std::invalid_argument("alpha_block and block_update cannot be true at the same time");
}
if (block_update && !bisection) {
throw std::invalid_argument("block_update needs bisection to be true");
}
Classifier::setHyperparameters(hyperparameters);
}
void Boost::buildModel(const torch::Tensor& weights)
{
// Models shall be built in trainModel
models.clear();
significanceModels.clear();
n_models = 0;
// Prepare the validation dataset
auto y_ = dataset.index({ -1, "..." });
if (convergence) {
// Prepare train & validation sets from train data
auto fold = folding::StratifiedKFold(5, y_, 271);
auto [train, test] = fold.getFold(0);
auto train_t = torch::tensor(train);
auto test_t = torch::tensor(test);
// Get train and validation sets
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), train_t });
y_train = dataset.index({ -1, train_t });
X_test = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), test_t });
y_test = dataset.index({ -1, test_t });
dataset = X_train;
m = X_train.size(1);
auto n_classes = states.at(className).size();
// Build dataset with train data
buildDataset(y_train);
metrics = Metrics(dataset, features, className, n_classes);
} else {
// Use all data to train
X_train = dataset.index({ torch::indexing::Slice(0, dataset.size(0) - 1), "..." });
y_train = y_;
}
}
std::vector<int> Boost::featureSelection(torch::Tensor& weights_)
{
int maxFeatures = 0;
if (select_features_algorithm == SelectFeatures.CFS) {
featureSelector = new CFS(dataset, features, className, maxFeatures, states.at(className).size(), weights_);
} else if (select_features_algorithm == SelectFeatures.IWSS) {
if (threshold < 0 || threshold >0.5) {
throw std::invalid_argument("Invalid threshold value for " + SelectFeatures.IWSS + " [0, 0.5]");
}
featureSelector = new IWSS(dataset, features, className, maxFeatures, states.at(className).size(), weights_, threshold);
} else if (select_features_algorithm == SelectFeatures.FCBF) {
if (threshold < 1e-7 || threshold > 1) {
throw std::invalid_argument("Invalid threshold value for " + SelectFeatures.FCBF + " [1e-7, 1]");
}
featureSelector = new FCBF(dataset, features, className, maxFeatures, states.at(className).size(), weights_, threshold);
}
featureSelector->fit();
auto featuresUsed = featureSelector->getFeatures();
delete featureSelector;
return featuresUsed;
}
std::tuple<torch::Tensor&, double, bool> Boost::update_weights(torch::Tensor& ytrain, torch::Tensor& ypred, torch::Tensor& weights)
{
bool terminate = false;
double alpha_t = 0;
auto mask_wrong = ypred != ytrain;
auto mask_right = ypred == ytrain;
auto masked_weights = weights * mask_wrong.to(weights.dtype());
double epsilon_t = masked_weights.sum().item<double>();
if (epsilon_t > 0.5) {
// Inverse the weights policy (plot ln(wt))
// "In each round of AdaBoost, there is a sanity check to ensure that the current base
// learner is better than random guess" (Zhi-Hua Zhou, 2012)
terminate = true;
} else {
double wt = (1 - epsilon_t) / epsilon_t;
alpha_t = epsilon_t == 0 ? 1 : 0.5 * log(wt);
// Step 3.2: Update weights for next classifier
// Step 3.2.1: Update weights of wrong samples
weights += mask_wrong.to(weights.dtype()) * exp(alpha_t) * weights;
// Step 3.2.2: Update weights of right samples
weights += mask_right.to(weights.dtype()) * exp(-alpha_t) * weights;
// Step 3.3: Normalise the weights
double totalWeights = torch::sum(weights).item<double>();
weights = weights / totalWeights;
}
return { weights, alpha_t, terminate };
}
std::tuple<torch::Tensor&, double, bool> Boost::update_weights_block(int k, torch::Tensor& ytrain, torch::Tensor& weights)
{
/* Update Block algorithm
k = # of models in block
n_models = # of models in ensemble to make predictions
n_models_bak = # models saved
models = vector of models to make predictions
models_bak = models not used to make predictions
significances_bak = backup of significances vector
Case list
A) k = 1, n_models = 1 => n = 0 , n_models = n + k
B) k = 1, n_models = n + 1 => n_models = n + k
C) k > 1, n_models = k + 1 => n= 1, n_models = n + k
D) k > 1, n_models = k => n = 0, n_models = n + k
E) k > 1, n_models = k + n => n_models = n + k
A, D) n=0, k > 0, n_models == k
1. n_models_bak <- n_models
2. significances_bak <- significances
3. significances = vector(k, 1)
4. Dont move any classifiers out of models
5. n_models <- k
6. Make prediction, compute alpha, update weights
7. Dont restore any classifiers to models
8. significances <- significances_bak
9. Update last k significances
10. n_models <- n_models_bak
B, C, E) n > 0, k > 0, n_models == n + k
1. n_models_bak <- n_models
2. significances_bak <- significances
3. significances = vector(k, 1)
4. Move first n classifiers to models_bak
5. n_models <- k
6. Make prediction, compute alpha, update weights
7. Insert classifiers in models_bak to be the first n models
8. significances <- significances_bak
9. Update last k significances
10. n_models <- n_models_bak
*/
//
// Make predict with only the last k models
//
std::unique_ptr<Classifier> model;
std::vector<std::unique_ptr<Classifier>> models_bak;
// 1. n_models_bak <- n_models 2. significances_bak <- significances
auto significance_bak = significanceModels;
auto n_models_bak = n_models;
// 3. significances = vector(k, 1)
significanceModels = std::vector<double>(k, 1.0);
// 4. Move first n classifiers to models_bak
// backup the first n_models - k models (if n_models == k, don't backup any)
for (int i = 0; i < n_models - k; ++i) {
model = std::move(models[0]);
models.erase(models.begin());
models_bak.push_back(std::move(model));
}
assert(models.size() == k);
// 5. n_models <- k
n_models = k;
// 6. Make prediction, compute alpha, update weights
auto ypred = predict(X_train);
//
// Update weights
//
double alpha_t;
bool terminate;
std::tie(weights, alpha_t, terminate) = update_weights(y_train, ypred, weights);
//
// Restore the models if needed
//
// 7. Insert classifiers in models_bak to be the first n models
// if n_models_bak == k, don't restore any, because none of them were moved
if (k != n_models_bak) {
// Insert in the same order as they were extracted
int bak_size = models_bak.size();
for (int i = 0; i < bak_size; ++i) {
model = std::move(models_bak[bak_size - 1 - i]);
models_bak.erase(models_bak.end() - 1);
models.insert(models.begin(), std::move(model));
}
}
// 8. significances <- significances_bak
significanceModels = significance_bak;
//
// Update the significance of the last k models
//
// 9. Update last k significances
for (int i = 0; i < k; ++i) {
significanceModels[n_models_bak - k + i] = alpha_t;
}
// 10. n_models <- n_models_bak
n_models = n_models_bak;
return { weights, alpha_t, terminate };
}
}

View File

@ -0,0 +1,52 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BOOST_H
#define BOOST_H
#include <string>
#include <tuple>
#include <vector>
#include <nlohmann/json.hpp>
#include <torch/torch.h>
#include "Ensemble.h"
#include "bayesnet/feature_selection/FeatureSelect.h"
namespace bayesnet {
const struct {
std::string CFS = "CFS";
std::string FCBF = "FCBF";
std::string IWSS = "IWSS";
}SelectFeatures;
const struct {
std::string ASC = "asc";
std::string DESC = "desc";
std::string RAND = "rand";
}Orders;
class Boost : public Ensemble {
public:
explicit Boost(bool predict_voting = false);
virtual ~Boost() = default;
void setHyperparameters(const nlohmann::json& hyperparameters_) override;
protected:
std::vector<int> featureSelection(torch::Tensor& weights_);
void buildModel(const torch::Tensor& weights) override;
std::tuple<torch::Tensor&, double, bool> update_weights(torch::Tensor& ytrain, torch::Tensor& ypred, torch::Tensor& weights);
std::tuple<torch::Tensor&, double, bool> update_weights_block(int k, torch::Tensor& ytrain, torch::Tensor& weights);
torch::Tensor X_train, y_train, X_test, y_test;
// Hyperparameters
bool bisection = true; // if true, use bisection stratety to add k models at once to the ensemble
int maxTolerance = 3;
std::string order_algorithm; // order to process the KBest features asc, desc, rand
bool convergence = true; //if true, stop when the model does not improve
bool convergence_best = false; // wether to keep the best accuracy to the moment or the last accuracy as prior accuracy
bool selectFeatures = false; // if true, use feature selection
std::string select_features_algorithm = Orders.DESC; // Selected feature selection algorithm
FeatureSelect* featureSelector = nullptr;
double threshold = -1;
bool block_update = false; // if true, use block update algorithm, only meaningful if bisection is true
bool alpha_block = false; // if true, the alpha is computed with the ensemble built so far and the new model
};
}
#endif

View File

@ -0,0 +1,170 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <set>
#include <functional>
#include <limits.h>
#include <tuple>
#include <folding.hpp>
#include "bayesnet/feature_selection/CFS.h"
#include "bayesnet/feature_selection/FCBF.h"
#include "bayesnet/feature_selection/IWSS.h"
#include "BoostA2DE.h"
namespace bayesnet {
BoostA2DE::BoostA2DE(bool predict_voting) : Boost(predict_voting)
{
}
std::vector<int> BoostA2DE::initializeModels(const Smoothing_t smoothing)
{
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
std::vector<int> featuresSelected = featureSelection(weights_);
if (featuresSelected.size() < 2) {
notes.push_back("No features selected in initialization");
status = ERROR;
return std::vector<int>();
}
for (int i = 0; i < featuresSelected.size() - 1; i++) {
for (int j = i + 1; j < featuresSelected.size(); j++) {
auto parents = { featuresSelected[i], featuresSelected[j] };
std::unique_ptr<Classifier> model = std::make_unique<SPnDE>(parents);
model->fit(dataset, features, className, states, weights_, smoothing);
models.push_back(std::move(model));
significanceModels.push_back(1.0); // They will be updated later in trainModel
n_models++;
}
}
notes.push_back("Used features in initialization: " + std::to_string(featuresSelected.size()) + " of " + std::to_string(features.size()) + " with " + select_features_algorithm);
return featuresSelected;
}
void BoostA2DE::trainModel(const torch::Tensor& weights, const Smoothing_t smoothing)
{
//
// Logging setup
//
// loguru::set_thread_name("BoostA2DE");
// loguru::g_stderr_verbosity = loguru::Verbosity_OFF;
// loguru::add_file("boostA2DE.log", loguru::Truncate, loguru::Verbosity_MAX);
// Algorithm based on the adaboost algorithm for classification
// as explained in Ensemble methods (Zhi-Hua Zhou, 2012)
fitted = true;
double alpha_t = 0;
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
bool finished = false;
std::vector<int> featuresUsed;
if (selectFeatures) {
featuresUsed = initializeModels(smoothing);
if (featuresUsed.size() == 0) {
return;
}
auto ypred = predict(X_train);
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
// Update significance of the models
for (int i = 0; i < n_models; ++i) {
significanceModels[i] = alpha_t;
}
if (finished) {
return;
}
}
int numItemsPack = 0; // The counter of the models inserted in the current pack
// Variables to control the accuracy finish condition
double priorAccuracy = 0.0;
double improvement = 1.0;
double convergence_threshold = 1e-4;
int tolerance = 0; // number of times the accuracy is lower than the convergence_threshold
// Step 0: Set the finish condition
// epsilon sub t > 0.5 => inverse the weights policy
// validation error is not decreasing
// run out of features
bool ascending = order_algorithm == Orders.ASC;
std::mt19937 g{ 173 };
std::vector<std::pair<int, int>> pairSelection;
while (!finished) {
// Step 1: Build ranking with mutual information
pairSelection = metrics.SelectKPairs(weights_, featuresUsed, ascending, 0); // Get all the pairs sorted
if (order_algorithm == Orders.RAND) {
std::shuffle(pairSelection.begin(), pairSelection.end(), g);
}
int k = bisection ? pow(2, tolerance) : 1;
int counter = 0; // The model counter of the current pack
// VLOG_SCOPE_F(1, "counter=%d k=%d featureSelection.size: %zu", counter, k, featureSelection.size());
while (counter++ < k && pairSelection.size() > 0) {
auto feature_pair = pairSelection[0];
pairSelection.erase(pairSelection.begin());
std::unique_ptr<Classifier> model;
model = std::make_unique<SPnDE>(std::vector<int>({ feature_pair.first, feature_pair.second }));
model->fit(dataset, features, className, states, weights_, smoothing);
alpha_t = 0.0;
if (!block_update) {
auto ypred = model->predict(X_train);
// Step 3.1: Compute the classifier amout of say
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
}
// Step 3.4: Store classifier and its accuracy to weigh its future vote
numItemsPack++;
models.push_back(std::move(model));
significanceModels.push_back(alpha_t);
n_models++;
// VLOG_SCOPE_F(2, "numItemsPack: %d n_models: %d featuresUsed: %zu", numItemsPack, n_models, featuresUsed.size());
}
if (block_update) {
std::tie(weights_, alpha_t, finished) = update_weights_block(k, y_train, weights_);
}
if (convergence && !finished) {
auto y_val_predict = predict(X_test);
double accuracy = (y_val_predict == y_test).sum().item<double>() / (double)y_test.size(0);
if (priorAccuracy == 0) {
priorAccuracy = accuracy;
} else {
improvement = accuracy - priorAccuracy;
}
if (improvement < convergence_threshold) {
// VLOG_SCOPE_F(3, " (improvement<threshold) tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance++;
} else {
// VLOG_SCOPE_F(3, "* (improvement>=threshold) Reset. tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance = 0; // Reset the counter if the model performs better
numItemsPack = 0;
}
if (convergence_best) {
// Keep the best accuracy until now as the prior accuracy
priorAccuracy = std::max(accuracy, priorAccuracy);
} else {
// Keep the last accuray obtained as the prior accuracy
priorAccuracy = accuracy;
}
}
// VLOG_SCOPE_F(1, "tolerance: %d featuresUsed.size: %zu features.size: %zu", tolerance, featuresUsed.size(), features.size());
finished = finished || tolerance > maxTolerance || pairSelection.size() == 0;
}
if (tolerance > maxTolerance) {
if (numItemsPack < n_models) {
notes.push_back("Convergence threshold reached & " + std::to_string(numItemsPack) + " models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & %d models eliminated of %d", numItemsPack, n_models);
for (int i = 0; i < numItemsPack; ++i) {
significanceModels.pop_back();
models.pop_back();
n_models--;
}
} else {
notes.push_back("Convergence threshold reached & 0 models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & 0 models eliminated n_models=%d numItemsPack=%d", n_models, numItemsPack);
}
}
if (pairSelection.size() > 0) {
notes.push_back("Pairs not used in train: " + std::to_string(pairSelection.size()));
status = WARNING;
}
notes.push_back("Number of models: " + std::to_string(n_models));
}
std::vector<std::string> BoostA2DE::graph(const std::string& title) const
{
return Ensemble::graph(title);
}
}

View File

@ -0,0 +1,25 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BOOSTA2DE_H
#define BOOSTA2DE_H
#include <string>
#include <vector>
#include "bayesnet/classifiers/SPnDE.h"
#include "Boost.h"
namespace bayesnet {
class BoostA2DE : public Boost {
public:
explicit BoostA2DE(bool predict_voting = false);
virtual ~BoostA2DE() = default;
std::vector<std::string> graph(const std::string& title = "BoostA2DE") const override;
protected:
void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) override;
private:
std::vector<int> initializeModels(const Smoothing_t smoothing);
};
}
#endif

View File

@ -0,0 +1,179 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <random>
#include <set>
#include <functional>
#include <limits.h>
#include <tuple>
#include "BoostAODE.h"
namespace bayesnet {
BoostAODE::BoostAODE(bool predict_voting) : Boost(predict_voting)
{
}
std::vector<int> BoostAODE::initializeModels(const Smoothing_t smoothing)
{
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
std::vector<int> featuresSelected = featureSelection(weights_);
for (const int& feature : featuresSelected) {
std::unique_ptr<Classifier> model = std::make_unique<SPODE>(feature);
model->fit(dataset, features, className, states, weights_, smoothing);
models.push_back(std::move(model));
significanceModels.push_back(1.0); // They will be updated later in trainModel
n_models++;
}
notes.push_back("Used features in initialization: " + std::to_string(featuresSelected.size()) + " of " + std::to_string(features.size()) + " with " + select_features_algorithm);
return featuresSelected;
}
void BoostAODE::trainModel(const torch::Tensor& weights, const Smoothing_t smoothing)
{
//
// Logging setup
//
// loguru::set_thread_name("BoostAODE");
// loguru::g_stderr_verbosity = loguru::Verbosity_OFF;
// loguru::add_file("boostAODE.log", loguru::Truncate, loguru::Verbosity_MAX);
// Algorithm based on the adaboost algorithm for classification
// as explained in Ensemble methods (Zhi-Hua Zhou, 2012)
fitted = true;
double alpha_t = 0;
torch::Tensor weights_ = torch::full({ m }, 1.0 / m, torch::kFloat64);
bool finished = false;
std::vector<int> featuresUsed;
if (selectFeatures) {
featuresUsed = initializeModels(smoothing);
auto ypred = predict(X_train);
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
// Update significance of the models
for (int i = 0; i < n_models; ++i) {
significanceModels[i] = alpha_t;
}
if (finished) {
return;
}
}
int numItemsPack = 0; // The counter of the models inserted in the current pack
// Variables to control the accuracy finish condition
double priorAccuracy = 0.0;
double improvement = 1.0;
double convergence_threshold = 1e-4;
int tolerance = 0; // number of times the accuracy is lower than the convergence_threshold
// Step 0: Set the finish condition
// epsilon sub t > 0.5 => inverse the weights policy
// validation error is not decreasing
// run out of features
bool ascending = order_algorithm == Orders.ASC;
std::mt19937 g{ 173 };
while (!finished) {
// Step 1: Build ranking with mutual information
auto featureSelection = metrics.SelectKBestWeighted(weights_, ascending, n); // Get all the features sorted
if (order_algorithm == Orders.RAND) {
std::shuffle(featureSelection.begin(), featureSelection.end(), g);
}
// Remove used features
featureSelection.erase(remove_if(begin(featureSelection), end(featureSelection), [&](auto x)
{ return std::find(begin(featuresUsed), end(featuresUsed), x) != end(featuresUsed);}),
end(featureSelection)
);
int k = bisection ? pow(2, tolerance) : 1;
int counter = 0; // The model counter of the current pack
// VLOG_SCOPE_F(1, "counter=%d k=%d featureSelection.size: %zu", counter, k, featureSelection.size());
while (counter++ < k && featureSelection.size() > 0) {
auto feature = featureSelection[0];
featureSelection.erase(featureSelection.begin());
std::unique_ptr<Classifier> model;
model = std::make_unique<SPODE>(feature);
model->fit(dataset, features, className, states, weights_, smoothing);
alpha_t = 0.0;
if (!block_update) {
torch::Tensor ypred;
if (alpha_block) {
//
// Compute the prediction with the current ensemble + model
//
// Add the model to the ensemble
n_models++;
models.push_back(std::move(model));
significanceModels.push_back(1);
// Compute the prediction
ypred = predict(X_train);
// Remove the model from the ensemble
model = std::move(models.back());
models.pop_back();
significanceModels.pop_back();
n_models--;
} else {
ypred = model->predict(X_train);
}
// Step 3.1: Compute the classifier amout of say
std::tie(weights_, alpha_t, finished) = update_weights(y_train, ypred, weights_);
}
// Step 3.4: Store classifier and its accuracy to weigh its future vote
numItemsPack++;
featuresUsed.push_back(feature);
models.push_back(std::move(model));
significanceModels.push_back(alpha_t);
n_models++;
// VLOG_SCOPE_F(2, "numItemsPack: %d n_models: %d featuresUsed: %zu", numItemsPack, n_models, featuresUsed.size());
}
if (block_update) {
std::tie(weights_, alpha_t, finished) = update_weights_block(k, y_train, weights_);
}
if (convergence && !finished) {
auto y_val_predict = predict(X_test);
double accuracy = (y_val_predict == y_test).sum().item<double>() / (double)y_test.size(0);
if (priorAccuracy == 0) {
priorAccuracy = accuracy;
} else {
improvement = accuracy - priorAccuracy;
}
if (improvement < convergence_threshold) {
// VLOG_SCOPE_F(3, " (improvement<threshold) tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance++;
} else {
// VLOG_SCOPE_F(3, "* (improvement>=threshold) Reset. tolerance: %d numItemsPack: %d improvement: %f prior: %f current: %f", tolerance, numItemsPack, improvement, priorAccuracy, accuracy);
tolerance = 0; // Reset the counter if the model performs better
numItemsPack = 0;
}
if (convergence_best) {
// Keep the best accuracy until now as the prior accuracy
priorAccuracy = std::max(accuracy, priorAccuracy);
} else {
// Keep the last accuray obtained as the prior accuracy
priorAccuracy = accuracy;
}
}
// VLOG_SCOPE_F(1, "tolerance: %d featuresUsed.size: %zu features.size: %zu", tolerance, featuresUsed.size(), features.size());
finished = finished || tolerance > maxTolerance || featuresUsed.size() == features.size();
}
if (tolerance > maxTolerance) {
if (numItemsPack < n_models) {
notes.push_back("Convergence threshold reached & " + std::to_string(numItemsPack) + " models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & %d models eliminated of %d", numItemsPack, n_models);
for (int i = 0; i < numItemsPack; ++i) {
significanceModels.pop_back();
models.pop_back();
n_models--;
}
} else {
notes.push_back("Convergence threshold reached & 0 models eliminated");
// VLOG_SCOPE_F(4, "Convergence threshold reached & 0 models eliminated n_models=%d numItemsPack=%d", n_models, numItemsPack);
}
}
if (featuresUsed.size() != features.size()) {
notes.push_back("Used features in train: " + std::to_string(featuresUsed.size()) + " of " + std::to_string(features.size()));
status = WARNING;
}
notes.push_back("Number of models: " + std::to_string(n_models));
}
std::vector<std::string> BoostAODE::graph(const std::string& title) const
{
return Ensemble::graph(title);
}
}

View File

@ -0,0 +1,26 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BOOSTAODE_H
#define BOOSTAODE_H
#include <string>
#include <vector>
#include "bayesnet/classifiers/SPODE.h"
#include "Boost.h"
namespace bayesnet {
class BoostAODE : public Boost {
public:
explicit BoostAODE(bool predict_voting = false);
virtual ~BoostAODE() = default;
std::vector<std::string> graph(const std::string& title = "BoostAODE") const override;
protected:
void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) override;
private:
std::vector<int> initializeModels(const Smoothing_t smoothing);
};
}
#endif

View File

@ -1,18 +1,23 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "Ensemble.h" #include "Ensemble.h"
#include "bayesnet/utils/CountingSemaphore.h"
namespace bayesnet { namespace bayesnet {
Ensemble::Ensemble(bool predict_voting) : Classifier(Network()), n_models(0), predict_voting(predict_voting) Ensemble::Ensemble(bool predict_voting) : Classifier(Network()), n_models(0), predict_voting(predict_voting)
{ {
}; };
const std::string ENSEMBLE_NOT_FITTED = "Ensemble has not been fitted"; const std::string ENSEMBLE_NOT_FITTED = "Ensemble has not been fitted";
void Ensemble::trainModel(const torch::Tensor& weights) void Ensemble::trainModel(const torch::Tensor& weights, const Smoothing_t smoothing)
{ {
n_models = models.size(); n_models = models.size();
for (auto i = 0; i < n_models; ++i) { for (auto i = 0; i < n_models; ++i) {
// fit with std::vectors // fit with std::vectors
models[i]->fit(dataset, features, className, states); models[i]->fit(dataset, features, className, states, smoothing);
} }
} }
std::vector<int> Ensemble::compute_arg_max(std::vector<std::vector<double>>& X) std::vector<int> Ensemble::compute_arg_max(std::vector<std::vector<double>>& X)
@ -79,17 +84,9 @@ namespace bayesnet {
{ {
auto n_states = models[0]->getClassNumStates(); auto n_states = models[0]->getClassNumStates();
torch::Tensor y_pred = torch::zeros({ X.size(1), n_states }, torch::kFloat32); torch::Tensor y_pred = torch::zeros({ X.size(1), n_states }, torch::kFloat32);
auto threads{ std::vector<std::thread>() };
std::mutex mtx;
for (auto i = 0; i < n_models; ++i) { for (auto i = 0; i < n_models; ++i) {
threads.push_back(std::thread([&, i]() { auto ypredict = models[i]->predict_proba(X);
auto ypredict = models[i]->predict_proba(X); y_pred += ypredict * significanceModels[i];
std::lock_guard<std::mutex> lock(mtx);
y_pred += ypredict * significanceModels[i];
}));
}
for (auto& thread : threads) {
thread.join();
} }
auto sum = std::reduce(significanceModels.begin(), significanceModels.end()); auto sum = std::reduce(significanceModels.begin(), significanceModels.end());
y_pred /= sum; y_pred /= sum;
@ -99,23 +96,15 @@ namespace bayesnet {
{ {
auto n_states = models[0]->getClassNumStates(); auto n_states = models[0]->getClassNumStates();
std::vector<std::vector<double>> y_pred(X[0].size(), std::vector<double>(n_states, 0.0)); std::vector<std::vector<double>> y_pred(X[0].size(), std::vector<double>(n_states, 0.0));
auto threads{ std::vector<std::thread>() };
std::mutex mtx;
for (auto i = 0; i < n_models; ++i) { for (auto i = 0; i < n_models; ++i) {
threads.push_back(std::thread([&, i]() { auto ypredict = models[i]->predict_proba(X);
auto ypredict = models[i]->predict_proba(X); assert(ypredict.size() == y_pred.size());
assert(ypredict.size() == y_pred.size()); assert(ypredict[0].size() == y_pred[0].size());
assert(ypredict[0].size() == y_pred[0].size()); // Multiply each prediction by the significance of the model and then add it to the final prediction
std::lock_guard<std::mutex> lock(mtx); for (auto j = 0; j < ypredict.size(); ++j) {
// Multiply each prediction by the significance of the model and then add it to the final prediction std::transform(y_pred[j].begin(), y_pred[j].end(), ypredict[j].begin(), y_pred[j].begin(),
for (auto j = 0; j < ypredict.size(); ++j) { [significanceModels = significanceModels[i]](double x, double y) { return x + y * significanceModels; });
std::transform(y_pred[j].begin(), y_pred[j].end(), ypredict[j].begin(), y_pred[j].begin(), }
[significanceModels = significanceModels[i]](double x, double y) { return x + y * significanceModels; });
}
}));
}
for (auto& thread : threads) {
thread.join();
} }
auto sum = std::reduce(significanceModels.begin(), significanceModels.end()); auto sum = std::reduce(significanceModels.begin(), significanceModels.end());
//Divide each element of the prediction by the sum of the significances //Divide each element of the prediction by the sum of the significances
@ -135,17 +124,9 @@ namespace bayesnet {
{ {
// Build a m x n_models tensor with the predictions of each model // Build a m x n_models tensor with the predictions of each model
torch::Tensor y_pred = torch::zeros({ X.size(1), n_models }, torch::kInt32); torch::Tensor y_pred = torch::zeros({ X.size(1), n_models }, torch::kInt32);
auto threads{ std::vector<std::thread>() };
std::mutex mtx;
for (auto i = 0; i < n_models; ++i) { for (auto i = 0; i < n_models; ++i) {
threads.push_back(std::thread([&, i]() { auto ypredict = models[i]->predict(X);
auto ypredict = models[i]->predict(X); y_pred.index_put_({ "...", i }, ypredict);
std::lock_guard<std::mutex> lock(mtx);
y_pred.index_put_({ "...", i }, ypredict);
}));
}
for (auto& thread : threads) {
thread.join();
} }
return voting(y_pred); return voting(y_pred);
} }

View File

@ -1,9 +1,15 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef ENSEMBLE_H #ifndef ENSEMBLE_H
#define ENSEMBLE_H #define ENSEMBLE_H
#include <torch/torch.h> #include <torch/torch.h>
#include "Classifier.h" #include "bayesnet/utils/BayesMetrics.h"
#include "BayesMetrics.h" #include "bayesnet/utils/bayesnetUtils.h"
#include "bayesnetUtils.h" #include "bayesnet/classifiers/Classifier.h"
namespace bayesnet { namespace bayesnet {
class Ensemble : public Classifier { class Ensemble : public Classifier {
@ -25,8 +31,9 @@ namespace bayesnet {
{ {
return std::vector<std::string>(); return std::vector<std::string>();
} }
void dump_cpt() const override std::string dump_cpt() const override
{ {
return "";
} }
protected: protected:
torch::Tensor predict_average_voting(torch::Tensor& X); torch::Tensor predict_average_voting(torch::Tensor& X);
@ -39,7 +46,7 @@ namespace bayesnet {
unsigned n_models; unsigned n_models;
std::vector<std::unique_ptr<Classifier>> models; std::vector<std::unique_ptr<Classifier>> models;
std::vector<double> significanceModels; std::vector<double> significanceModels;
void trainModel(const torch::Tensor& weights) override; void trainModel(const torch::Tensor& weights, const Smoothing_t smoothing) override;
bool predict_voting; bool predict_voting;
}; };
} }

View File

@ -1,6 +1,12 @@
#include "CFS.h" // ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <limits> #include <limits>
#include "bayesnetUtils.h" #include "bayesnet/utils/bayesnetUtils.h"
#include "CFS.h"
namespace bayesnet { namespace bayesnet {
void CFS::fit() void CFS::fit()
{ {
@ -11,7 +17,7 @@ namespace bayesnet {
auto feature = featureOrder[0]; auto feature = featureOrder[0];
selectedFeatures.push_back(feature); selectedFeatures.push_back(feature);
selectedScores.push_back(suLabels[feature]); selectedScores.push_back(suLabels[feature]);
selectedFeatures.erase(selectedFeatures.begin()); featureOrder.erase(featureOrder.begin());
while (continueCondition) { while (continueCondition) {
double merit = std::numeric_limits<double>::lowest(); double merit = std::numeric_limits<double>::lowest();
int bestFeature = -1; int bestFeature = -1;

View File

@ -1,8 +1,14 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef CFS_H #ifndef CFS_H
#define CFS_H #define CFS_H
#include <torch/torch.h> #include <torch/torch.h>
#include <vector> #include <vector>
#include "FeatureSelect.h" #include "bayesnet/feature_selection/FeatureSelect.h"
namespace bayesnet { namespace bayesnet {
class CFS : public FeatureSelect { class CFS : public FeatureSelect {
public: public:

View File

@ -1,4 +1,10 @@
#include "bayesnetUtils.h" // ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "bayesnet/utils/bayesnetUtils.h"
#include "FCBF.h" #include "FCBF.h"
namespace bayesnet { namespace bayesnet {

View File

@ -1,8 +1,14 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef FCBF_H #ifndef FCBF_H
#define FCBF_H #define FCBF_H
#include <torch/torch.h> #include <torch/torch.h>
#include <vector> #include <vector>
#include "FeatureSelect.h" #include "bayesnet/feature_selection/FeatureSelect.h"
namespace bayesnet { namespace bayesnet {
class FCBF : public FeatureSelect { class FCBF : public FeatureSelect {
public: public:

View File

@ -1,6 +1,12 @@
#include "FeatureSelect.h" // ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <limits> #include <limits>
#include "bayesnetUtils.h" #include "bayesnet/utils/bayesnetUtils.h"
#include "FeatureSelect.h"
namespace bayesnet { namespace bayesnet {
FeatureSelect::FeatureSelect(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int maxFeatures, const int classNumStates, const torch::Tensor& weights) : FeatureSelect::FeatureSelect(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int maxFeatures, const int classNumStates, const torch::Tensor& weights) :
Metrics(samples, features, className, classNumStates), maxFeatures(maxFeatures == 0 ? samples.size(0) - 1 : maxFeatures), weights(weights) Metrics(samples, features, className, classNumStates), maxFeatures(maxFeatures == 0 ? samples.size(0) - 1 : maxFeatures), weights(weights)
@ -50,7 +56,6 @@ namespace bayesnet {
} }
double FeatureSelect::computeMeritCFS() double FeatureSelect::computeMeritCFS()
{ {
double result;
double rcf = 0; double rcf = 0;
for (auto feature : selectedFeatures) { for (auto feature : selectedFeatures) {
rcf += suLabels[feature]; rcf += suLabels[feature];

View File

@ -1,8 +1,14 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef FEATURE_SELECT_H #ifndef FEATURE_SELECT_H
#define FEATURE_SELECT_H #define FEATURE_SELECT_H
#include <torch/torch.h> #include <torch/torch.h>
#include <vector> #include <vector>
#include "BayesMetrics.h" #include "bayesnet/utils/BayesMetrics.h"
namespace bayesnet { namespace bayesnet {
class FeatureSelect : public Metrics { class FeatureSelect : public Metrics {
public: public:

View File

@ -1,6 +1,12 @@
#include "IWSS.h" // ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <limits> #include <limits>
#include "bayesnetUtils.h" #include "bayesnet/utils/bayesnetUtils.h"
#include "IWSS.h"
namespace bayesnet { namespace bayesnet {
IWSS::IWSS(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int maxFeatures, const int classNumStates, const torch::Tensor& weights, const double threshold) : IWSS::IWSS(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int maxFeatures, const int classNumStates, const torch::Tensor& weights, const double threshold) :
FeatureSelect(samples, features, className, maxFeatures, classNumStates, weights), threshold(threshold) FeatureSelect(samples, features, className, maxFeatures, classNumStates, weights), threshold(threshold)
@ -28,7 +34,7 @@ namespace bayesnet {
selectedFeatures.push_back(feature); selectedFeatures.push_back(feature);
// Compute merit with selectedFeatures // Compute merit with selectedFeatures
auto meritNew = computeMeritCFS(); auto meritNew = computeMeritCFS();
double delta = merit != 0.0 ? abs(merit - meritNew) / merit : 0.0; double delta = merit != 0.0 ? std::abs(merit - meritNew) / merit : 0.0;
if (meritNew > merit || delta < threshold) { if (meritNew > merit || delta < threshold) {
if (meritNew > merit) { if (meritNew > merit) {
merit = meritNew; merit = meritNew;

View File

@ -1,7 +1,13 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef IWSS_H #ifndef IWSS_H
#define IWSS_H #define IWSS_H
#include <torch/torch.h>
#include <vector> #include <vector>
#include <torch/torch.h>
#include "FeatureSelect.h" #include "FeatureSelect.h"
namespace bayesnet { namespace bayesnet {
class IWSS : public FeatureSelect { class IWSS : public FeatureSelect {

View File

@ -1,36 +1,49 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <thread> #include <thread>
#include <mutex> #include <sstream>
#include <numeric>
#include <algorithm>
#include "Network.h" #include "Network.h"
#include "bayesnetUtils.h" #include "bayesnet/utils/bayesnetUtils.h"
#include "bayesnet/utils/CountingSemaphore.h"
#include <pthread.h>
#include <fstream>
namespace bayesnet { namespace bayesnet {
Network::Network() : features(std::vector<std::string>()), className(""), classNumStates(0), fitted(false), laplaceSmoothing(0) {} Network::Network() : fitted{ false }, classNumStates{ 0 }
Network::Network(float maxT) : features(std::vector<std::string>()), className(""), classNumStates(0), maxThreads(maxT), fitted(false), laplaceSmoothing(0) {}
Network::Network(Network& other) : laplaceSmoothing(other.laplaceSmoothing), features(other.features), className(other.className), classNumStates(other.getClassNumStates()), maxThreads(other.
getmaxThreads()), fitted(other.fitted)
{ {
}
Network::Network(const Network& other) : features(other.features), className(other.className), classNumStates(other.getClassNumStates()),
fitted(other.fitted), samples(other.samples)
{
if (samples.defined())
samples = samples.clone();
for (const auto& node : other.nodes) { for (const auto& node : other.nodes) {
nodes[node.first] = std::make_unique<Node>(*node.second); nodes[node.first] = std::make_unique<Node>(*node.second);
} }
} }
void Network::initialize() void Network::initialize()
{ {
features = std::vector<std::string>(); features.clear();
className = ""; className = "";
classNumStates = 0; classNumStates = 0;
fitted = false; fitted = false;
nodes.clear(); nodes.clear();
samples = torch::Tensor(); samples = torch::Tensor();
} }
float Network::getmaxThreads()
{
return maxThreads;
}
torch::Tensor& Network::getSamples() torch::Tensor& Network::getSamples()
{ {
return samples; return samples;
} }
void Network::addNode(const std::string& name) void Network::addNode(const std::string& name)
{ {
if (fitted) {
throw std::invalid_argument("Cannot add node to a fitted network. Initialize first.");
}
if (name == "") { if (name == "") {
throw std::invalid_argument("Node name cannot be empty"); throw std::invalid_argument("Node name cannot be empty");
} }
@ -71,7 +84,7 @@ namespace bayesnet {
for (Node* child : nodes[nodeId]->getChildren()) { for (Node* child : nodes[nodeId]->getChildren()) {
if (visited.find(child->getName()) == visited.end() && isCyclic(child->getName(), visited, recStack)) if (visited.find(child->getName()) == visited.end() && isCyclic(child->getName(), visited, recStack))
return true; return true;
else if (recStack.find(child->getName()) != recStack.end()) if (recStack.find(child->getName()) != recStack.end())
return true; return true;
} }
} }
@ -80,12 +93,21 @@ namespace bayesnet {
} }
void Network::addEdge(const std::string& parent, const std::string& child) void Network::addEdge(const std::string& parent, const std::string& child)
{ {
if (fitted) {
throw std::invalid_argument("Cannot add edge to a fitted network. Initialize first.");
}
if (nodes.find(parent) == nodes.end()) { if (nodes.find(parent) == nodes.end()) {
throw std::invalid_argument("Parent node " + parent + " does not exist"); throw std::invalid_argument("Parent node " + parent + " does not exist");
} }
if (nodes.find(child) == nodes.end()) { if (nodes.find(child) == nodes.end()) {
throw std::invalid_argument("Child node " + child + " does not exist"); throw std::invalid_argument("Child node " + child + " does not exist");
} }
// Check if the edge is already in the graph
for (auto& node : nodes[parent]->getChildren()) {
if (node->getName() == child) {
throw std::invalid_argument("Edge " + parent + " -> " + child + " already exists");
}
}
// Temporarily add edge to check for cycles // Temporarily add edge to check for cycles
nodes[parent]->addChild(nodes[child].get()); nodes[parent]->addChild(nodes[child].get());
nodes[child]->addParent(nodes[parent].get()); nodes[child]->addParent(nodes[parent].get());
@ -114,11 +136,14 @@ namespace bayesnet {
if (n_features != featureNames.size()) { if (n_features != featureNames.size()) {
throw std::invalid_argument("X and features must have the same number of features in Network::fit (" + std::to_string(n_features) + " != " + std::to_string(featureNames.size()) + ")"); throw std::invalid_argument("X and features must have the same number of features in Network::fit (" + std::to_string(n_features) + " != " + std::to_string(featureNames.size()) + ")");
} }
if (features.size() == 0) {
throw std::invalid_argument("The network has not been initialized. You must call addNode() before calling fit()");
}
if (n_features != features.size() - 1) { if (n_features != features.size() - 1) {
throw std::invalid_argument("X and local features must have the same number of features in Network::fit (" + std::to_string(n_features) + " != " + std::to_string(features.size() - 1) + ")"); throw std::invalid_argument("X and local features must have the same number of features in Network::fit (" + std::to_string(n_features) + " != " + std::to_string(features.size() - 1) + ")");
} }
if (find(features.begin(), features.end(), className) == features.end()) { if (find(features.begin(), features.end(), className) == features.end()) {
throw std::invalid_argument("className not found in Network::features"); throw std::invalid_argument("Class Name not found in Network::features");
} }
for (auto& feature : featureNames) { for (auto& feature : featureNames) {
if (find(features.begin(), features.end(), feature) == features.end()) { if (find(features.begin(), features.end(), feature) == features.end()) {
@ -138,7 +163,7 @@ namespace bayesnet {
classNumStates = nodes.at(className)->getNumStates(); classNumStates = nodes.at(className)->getNumStates();
} }
// X comes in nxm, where n is the number of features and m the number of samples // X comes in nxm, where n is the number of features and m the number of samples
void Network::fit(const torch::Tensor& X, const torch::Tensor& y, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states) void Network::fit(const torch::Tensor& X, const torch::Tensor& y, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing)
{ {
checkFitData(X.size(1), X.size(0), y.size(0), featureNames, className, states, weights); checkFitData(X.size(1), X.size(0), y.size(0), featureNames, className, states, weights);
this->className = className; this->className = className;
@ -147,17 +172,17 @@ namespace bayesnet {
for (int i = 0; i < featureNames.size(); ++i) { for (int i = 0; i < featureNames.size(); ++i) {
auto row_feature = X.index({ i, "..." }); auto row_feature = X.index({ i, "..." });
} }
completeFit(states, weights); completeFit(states, weights, smoothing);
} }
void Network::fit(const torch::Tensor& samples, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states) void Network::fit(const torch::Tensor& samples, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing)
{ {
checkFitData(samples.size(1), samples.size(0) - 1, samples.size(1), featureNames, className, states, weights); checkFitData(samples.size(1), samples.size(0) - 1, samples.size(1), featureNames, className, states, weights);
this->className = className; this->className = className;
this->samples = samples; this->samples = samples;
completeFit(states, weights); completeFit(states, weights, smoothing);
} }
// input_data comes in nxm, where n is the number of features and m the number of samples // input_data comes in nxm, where n is the number of features and m the number of samples
void Network::fit(const std::vector<std::vector<int>>& input_data, const std::vector<int>& labels, const std::vector<double>& weights_, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states) void Network::fit(const std::vector<std::vector<int>>& input_data, const std::vector<int>& labels, const std::vector<double>& weights_, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing)
{ {
const torch::Tensor weights = torch::tensor(weights_, torch::kFloat64); const torch::Tensor weights = torch::tensor(weights_, torch::kFloat64);
checkFitData(input_data[0].size(), input_data.size(), labels.size(), featureNames, className, states, weights); checkFitData(input_data[0].size(), input_data.size(), labels.size(), featureNames, className, states, weights);
@ -168,17 +193,43 @@ namespace bayesnet {
samples.index_put_({ i, "..." }, torch::tensor(input_data[i], torch::kInt32)); samples.index_put_({ i, "..." }, torch::tensor(input_data[i], torch::kInt32));
} }
samples.index_put_({ -1, "..." }, torch::tensor(labels, torch::kInt32)); samples.index_put_({ -1, "..." }, torch::tensor(labels, torch::kInt32));
completeFit(states, weights); completeFit(states, weights, smoothing);
} }
void Network::completeFit(const std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights) void Network::completeFit(const std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing)
{ {
setStates(states); setStates(states);
laplaceSmoothing = 1.0 / samples.size(1); // To use in CPT computation
std::vector<std::thread> threads; std::vector<std::thread> threads;
auto& semaphore = CountingSemaphore::getInstance();
const double n_samples = static_cast<double>(samples.size(1));
auto worker = [&](std::pair<const std::string, std::unique_ptr<Node>>& node, int i) {
std::string threadName = "FitWorker-" + std::to_string(i);
#if defined(__linux__)
pthread_setname_np(pthread_self(), threadName.c_str());
#else
pthread_setname_np(threadName.c_str());
#endif
double numStates = static_cast<double>(node.second->getNumStates());
double smoothing_factor;
switch (smoothing) {
case Smoothing_t::ORIGINAL:
smoothing_factor = 1.0 / n_samples;
break;
case Smoothing_t::LAPLACE:
smoothing_factor = 1.0;
break;
case Smoothing_t::CESTNIK:
smoothing_factor = 1 / numStates;
break;
default:
smoothing_factor = 0.0; // No smoothing
}
node.second->computeCPT(samples, features, smoothing_factor, weights);
semaphore.release();
};
int i = 0;
for (auto& node : nodes) { for (auto& node : nodes) {
threads.emplace_back([this, &node, &weights]() { semaphore.acquire();
node.second->computeCPT(samples, features, laplaceSmoothing, weights); threads.emplace_back(worker, std::ref(node), i++);
});
} }
for (auto& thread : threads) { for (auto& thread : threads) {
thread.join(); thread.join();
@ -190,14 +241,38 @@ namespace bayesnet {
if (!fitted) { if (!fitted) {
throw std::logic_error("You must call fit() before calling predict()"); throw std::logic_error("You must call fit() before calling predict()");
} }
// Ensure the sample size is equal to the number of features
if (samples.size(0) != features.size() - 1) {
throw std::invalid_argument("(T) Sample size (" + std::to_string(samples.size(0)) +
") does not match the number of features (" + std::to_string(features.size() - 1) + ")");
}
torch::Tensor result; torch::Tensor result;
std::vector<std::thread> threads;
std::mutex mtx;
auto& semaphore = CountingSemaphore::getInstance();
result = torch::zeros({ samples.size(1), classNumStates }, torch::kFloat64); result = torch::zeros({ samples.size(1), classNumStates }, torch::kFloat64);
for (int i = 0; i < samples.size(1); ++i) { auto worker = [&](const torch::Tensor& sample, int i) {
const torch::Tensor sample = samples.index({ "...", i }); std::string threadName = "PredictWorker-" + std::to_string(i);
#if defined(__linux__)
pthread_setname_np(pthread_self(), threadName.c_str());
#else
pthread_setname_np(threadName.c_str());
#endif
auto psample = predict_sample(sample); auto psample = predict_sample(sample);
auto temp = torch::tensor(psample, torch::kFloat64); auto temp = torch::tensor(psample, torch::kFloat64);
// result.index_put_({ i, "..." }, torch::tensor(predict_sample(sample), torch::kFloat64)); {
result.index_put_({ i, "..." }, temp); std::lock_guard<std::mutex> lock(mtx);
result.index_put_({ i, "..." }, temp);
}
semaphore.release();
};
for (int i = 0; i < samples.size(1); ++i) {
semaphore.acquire();
const torch::Tensor sample = samples.index({ "...", i });
threads.emplace_back(worker, sample, i);
}
for (auto& thread : threads) {
thread.join();
} }
if (proba) if (proba)
return result; return result;
@ -222,18 +297,38 @@ namespace bayesnet {
if (!fitted) { if (!fitted) {
throw std::logic_error("You must call fit() before calling predict()"); throw std::logic_error("You must call fit() before calling predict()");
} }
std::vector<int> predictions; // Ensure the sample size is equal to the number of features
if (tsamples.size() != features.size() - 1) {
throw std::invalid_argument("(V) Sample size (" + std::to_string(tsamples.size()) +
") does not match the number of features (" + std::to_string(features.size() - 1) + ")");
}
std::vector<int> predictions(tsamples[0].size(), 0);
std::vector<int> sample; std::vector<int> sample;
std::vector<std::thread> threads;
auto& semaphore = CountingSemaphore::getInstance();
auto worker = [&](const std::vector<int>& sample, const int row, int& prediction) {
std::string threadName = "(V)PWorker-" + std::to_string(row);
#if defined(__linux__)
pthread_setname_np(pthread_self(), threadName.c_str());
#else
pthread_setname_np(threadName.c_str());
#endif
auto classProbabilities = predict_sample(sample);
auto maxElem = max_element(classProbabilities.begin(), classProbabilities.end());
int predictedClass = distance(classProbabilities.begin(), maxElem);
prediction = predictedClass;
semaphore.release();
};
for (int row = 0; row < tsamples[0].size(); ++row) { for (int row = 0; row < tsamples[0].size(); ++row) {
sample.clear(); sample.clear();
for (int col = 0; col < tsamples.size(); ++col) { for (int col = 0; col < tsamples.size(); ++col) {
sample.push_back(tsamples[col][row]); sample.push_back(tsamples[col][row]);
} }
std::vector<double> classProbabilities = predict_sample(sample); semaphore.acquire();
// Find the class with the maximum posterior probability threads.emplace_back(worker, sample, row, std::ref(predictions[row]));
auto maxElem = max_element(classProbabilities.begin(), classProbabilities.end()); }
int predictedClass = distance(classProbabilities.begin(), maxElem); for (auto& thread : threads) {
predictions.push_back(predictedClass); thread.join();
} }
return predictions; return predictions;
} }
@ -244,14 +339,36 @@ namespace bayesnet {
if (!fitted) { if (!fitted) {
throw std::logic_error("You must call fit() before calling predict_proba()"); throw std::logic_error("You must call fit() before calling predict_proba()");
} }
std::vector<std::vector<double>> predictions; // Ensure the sample size is equal to the number of features
if (tsamples.size() != features.size() - 1) {
throw std::invalid_argument("(V) Sample size (" + std::to_string(tsamples.size()) +
") does not match the number of features (" + std::to_string(features.size() - 1) + ")");
}
std::vector<std::vector<double>> predictions(tsamples[0].size(), std::vector<double>(classNumStates, 0.0));
std::vector<int> sample; std::vector<int> sample;
std::vector<std::thread> threads;
auto& semaphore = CountingSemaphore::getInstance();
auto worker = [&](const std::vector<int>& sample, int row, std::vector<double>& predictions) {
std::string threadName = "(V)PWorker-" + std::to_string(row);
#if defined(__linux__)
pthread_setname_np(pthread_self(), threadName.c_str());
#else
pthread_setname_np(threadName.c_str());
#endif
std::vector<double> classProbabilities = predict_sample(sample);
predictions = classProbabilities;
semaphore.release();
};
for (int row = 0; row < tsamples[0].size(); ++row) { for (int row = 0; row < tsamples[0].size(); ++row) {
sample.clear(); sample.clear();
for (int col = 0; col < tsamples.size(); ++col) { for (int col = 0; col < tsamples.size(); ++col) {
sample.push_back(tsamples[col][row]); sample.push_back(tsamples[col][row]);
} }
predictions.push_back(predict_sample(sample)); semaphore.acquire();
threads.emplace_back(worker, sample, row, std::ref(predictions[row]));
}
for (auto& thread : threads) {
thread.join();
} }
return predictions; return predictions;
} }
@ -269,11 +386,6 @@ namespace bayesnet {
// Return 1xn std::vector of probabilities // Return 1xn std::vector of probabilities
std::vector<double> Network::predict_sample(const std::vector<int>& sample) std::vector<double> Network::predict_sample(const std::vector<int>& sample)
{ {
// Ensure the sample size is equal to the number of features
if (sample.size() != features.size() - 1) {
throw std::invalid_argument("Sample size (" + std::to_string(sample.size()) +
") does not match the number of features (" + std::to_string(features.size() - 1) + ")");
}
std::map<std::string, int> evidence; std::map<std::string, int> evidence;
for (int i = 0; i < sample.size(); ++i) { for (int i = 0; i < sample.size(); ++i) {
evidence[features[i]] = sample[i]; evidence[features[i]] = sample[i];
@ -283,44 +395,26 @@ namespace bayesnet {
// Return 1xn std::vector of probabilities // Return 1xn std::vector of probabilities
std::vector<double> Network::predict_sample(const torch::Tensor& sample) std::vector<double> Network::predict_sample(const torch::Tensor& sample)
{ {
// Ensure the sample size is equal to the number of features
if (sample.size(0) != features.size() - 1) {
throw std::invalid_argument("Sample size (" + std::to_string(sample.size(0)) +
") does not match the number of features (" + std::to_string(features.size() - 1) + ")");
}
std::map<std::string, int> evidence; std::map<std::string, int> evidence;
for (int i = 0; i < sample.size(0); ++i) { for (int i = 0; i < sample.size(0); ++i) {
evidence[features[i]] = sample[i].item<int>(); evidence[features[i]] = sample[i].item<int>();
} }
return exactInference(evidence); return exactInference(evidence);
} }
double Network::computeFactor(std::map<std::string, int>& completeEvidence)
{
double result = 1.0;
for (auto& node : getNodes()) {
result *= node.second->getFactorValue(completeEvidence);
}
return result;
}
std::vector<double> Network::exactInference(std::map<std::string, int>& evidence) std::vector<double> Network::exactInference(std::map<std::string, int>& evidence)
{ {
std::vector<double> result(classNumStates, 0.0); std::vector<double> result(classNumStates, 0.0);
std::vector<std::thread> threads; auto completeEvidence = std::map<std::string, int>(evidence);
std::mutex mtx;
for (int i = 0; i < classNumStates; ++i) { for (int i = 0; i < classNumStates; ++i) {
threads.emplace_back([this, &result, &evidence, i, &mtx]() { completeEvidence[getClassName()] = i;
auto completeEvidence = std::map<std::string, int>(evidence); double partial = 1.0;
completeEvidence[getClassName()] = i; for (auto& node : getNodes()) {
double factor = computeFactor(completeEvidence); partial *= node.second->getFactorValue(completeEvidence);
std::lock_guard<std::mutex> lock(mtx); }
result[i] = factor; result[i] = partial;
});
}
for (auto& thread : threads) {
thread.join();
} }
// Normalize result // Normalize result
double sum = accumulate(result.begin(), result.end(), 0.0); double sum = std::accumulate(result.begin(), result.end(), 0.0);
transform(result.begin(), result.end(), result.begin(), [sum](const double& value) { return value / sum; }); transform(result.begin(), result.end(), result.begin(), [sum](const double& value) { return value / sum; });
return result; return result;
} }
@ -393,22 +487,20 @@ namespace bayesnet {
result.insert(it2, fatherName); result.insert(it2, fatherName);
ending = false; ending = false;
} }
} else {
throw std::logic_error("Error in topological sort because of node " + feature + " is not in result");
} }
} else {
throw std::logic_error("Error in topological sort because of node father " + fatherName + " is not in result");
} }
} }
} }
} }
return result; return result;
} }
void Network::dump_cpt() const std::string Network::dump_cpt() const
{ {
std::stringstream oss;
for (auto& node : nodes) { for (auto& node : nodes) {
std::cout << "* " << node.first << ": (" << node.second->getNumStates() << ") : " << node.second->getCPT().sizes() << std::endl; oss << "* " << node.first << ": (" << node.second->getNumStates() << ") : " << node.second->getCPT().sizes() << std::endl;
std::cout << node.second->getCPT() << std::endl; oss << node.second->getCPT() << std::endl;
} }
return oss.str();
} }
} }

View File

@ -1,19 +1,29 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef NETWORK_H #ifndef NETWORK_H
#define NETWORK_H #define NETWORK_H
#include "Node.h"
#include <map> #include <map>
#include <vector> #include <vector>
#include "config.h" #include "bayesnet/config.h"
#include "Node.h"
namespace bayesnet { namespace bayesnet {
enum class Smoothing_t {
NONE = -1,
ORIGINAL = 0,
LAPLACE,
CESTNIK
};
class Network { class Network {
public: public:
Network(); Network();
explicit Network(float); explicit Network(const Network&);
explicit Network(Network&);
~Network() = default; ~Network() = default;
torch::Tensor& getSamples(); torch::Tensor& getSamples();
float getmaxThreads();
void addNode(const std::string&); void addNode(const std::string&);
void addEdge(const std::string&, const std::string&); void addEdge(const std::string&, const std::string&);
std::map<std::string, std::unique_ptr<Node>>& getNodes(); std::map<std::string, std::unique_ptr<Node>>& getNodes();
@ -26,9 +36,9 @@ namespace bayesnet {
/* /*
Notice: Nodes have to be inserted in the same order as they are in the dataset, i.e., first node is first column and so on. Notice: Nodes have to be inserted in the same order as they are in the dataset, i.e., first node is first column and so on.
*/ */
void fit(const std::vector<std::vector<int>>& input_data, const std::vector<int>& labels, const std::vector<double>& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states); void fit(const std::vector<std::vector<int>>& input_data, const std::vector<int>& labels, const std::vector<double>& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
void fit(const torch::Tensor& X, const torch::Tensor& y, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states); void fit(const torch::Tensor& X, const torch::Tensor& y, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
void fit(const torch::Tensor& samples, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states); void fit(const torch::Tensor& samples, const torch::Tensor& weights, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const Smoothing_t smoothing);
std::vector<int> predict(const std::vector<std::vector<int>>&); // Return mx1 std::vector of predictions std::vector<int> predict(const std::vector<std::vector<int>>&); // Return mx1 std::vector of predictions
torch::Tensor predict(const torch::Tensor&); // Return mx1 tensor of predictions torch::Tensor predict(const torch::Tensor&); // Return mx1 tensor of predictions
torch::Tensor predict_tensor(const torch::Tensor& samples, const bool proba); torch::Tensor predict_tensor(const torch::Tensor& samples, const bool proba);
@ -39,24 +49,21 @@ namespace bayesnet {
std::vector<std::string> show() const; std::vector<std::string> show() const;
std::vector<std::string> graph(const std::string& title) const; // Returns a std::vector of std::strings representing the graph in graphviz format std::vector<std::string> graph(const std::string& title) const; // Returns a std::vector of std::strings representing the graph in graphviz format
void initialize(); void initialize();
void dump_cpt() const; std::string dump_cpt() const;
inline std::string version() { return { project_version.begin(), project_version.end() }; } inline std::string version() { return { project_version.begin(), project_version.end() }; }
private: private:
std::map<std::string, std::unique_ptr<Node>> nodes; std::map<std::string, std::unique_ptr<Node>> nodes;
bool fitted; bool fitted;
float maxThreads = 0.95;
int classNumStates; int classNumStates;
std::vector<std::string> features; // Including classname std::vector<std::string> features; // Including classname
std::string className; std::string className;
double laplaceSmoothing; torch::Tensor samples; // n+1xm tensor used to fit the model
torch::Tensor samples; // nxm tensor used to fit the model
bool isCyclic(const std::string&, std::unordered_set<std::string>&, std::unordered_set<std::string>&); bool isCyclic(const std::string&, std::unordered_set<std::string>&, std::unordered_set<std::string>&);
std::vector<double> predict_sample(const std::vector<int>&); std::vector<double> predict_sample(const std::vector<int>&);
std::vector<double> predict_sample(const torch::Tensor&); std::vector<double> predict_sample(const torch::Tensor&);
std::vector<double> exactInference(std::map<std::string, int>&); std::vector<double> exactInference(std::map<std::string, int>&);
double computeFactor(std::map<std::string, int>&); void completeFit(const std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights, const Smoothing_t smoothing);
void completeFit(const std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights); void checkFitData(int n_samples, int n_features, int n_samples_y, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights);
void checkFitData(int n_features, int n_samples, int n_samples_y, const std::vector<std::string>& featureNames, const std::string& className, const std::map<std::string, std::vector<int>>& states, const torch::Tensor& weights);
void setStates(const std::map<std::string, std::vector<int>>&); void setStates(const std::map<std::string, std::vector<int>>&);
}; };
} }

View File

@ -1,9 +1,15 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "Node.h" #include "Node.h"
namespace bayesnet { namespace bayesnet {
Node::Node(const std::string& name) Node::Node(const std::string& name)
: name(name), numStates(0), cpTable(torch::Tensor()), parents(std::vector<Node*>()), children(std::vector<Node*>()) : name(name)
{ {
} }
void Node::clear() void Node::clear()
@ -84,52 +90,54 @@ namespace bayesnet {
} }
return result; return result;
} }
void Node::computeCPT(const torch::Tensor& dataset, const std::vector<std::string>& features, const double laplaceSmoothing, const torch::Tensor& weights) void Node::computeCPT(const torch::Tensor& dataset, const std::vector<std::string>& features, const double smoothing, const torch::Tensor& weights)
{ {
dimensions.clear(); dimensions.clear();
// Get dimensions of the CPT // Get dimensions of the CPT
dimensions.push_back(numStates); dimensions.push_back(numStates);
transform(parents.begin(), parents.end(), back_inserter(dimensions), [](const auto& parent) { return parent->getNumStates(); }); transform(parents.begin(), parents.end(), back_inserter(dimensions), [](const auto& parent) { return parent->getNumStates(); });
// Create a tensor of zeros with the dimensions of the CPT // Create a tensor of zeros with the dimensions of the CPT
cpTable = torch::zeros(dimensions, torch::kFloat) + laplaceSmoothing; cpTable = torch::zeros(dimensions, torch::kDouble) + smoothing;
// Fill table with counts // Fill table with counts
auto pos = find(features.begin(), features.end(), name); auto pos = find(features.begin(), features.end(), name);
if (pos == features.end()) { if (pos == features.end()) {
throw std::logic_error("Feature " + name + " not found in dataset"); throw std::logic_error("Feature " + name + " not found in dataset");
} }
int name_index = pos - features.begin(); int name_index = pos - features.begin();
c10::List<c10::optional<at::Tensor>> coordinates;
for (int n_sample = 0; n_sample < dataset.size(1); ++n_sample) { for (int n_sample = 0; n_sample < dataset.size(1); ++n_sample) {
c10::List<c10::optional<at::Tensor>> coordinates; coordinates.clear();
coordinates.push_back(dataset.index({ name_index, n_sample })); auto sample = dataset.index({ "...", n_sample });
coordinates.push_back(sample[name_index]);
for (auto parent : parents) { for (auto parent : parents) {
pos = find(features.begin(), features.end(), parent->getName()); pos = find(features.begin(), features.end(), parent->getName());
if (pos == features.end()) { if (pos == features.end()) {
throw std::logic_error("Feature parent " + parent->getName() + " not found in dataset"); throw std::logic_error("Feature parent " + parent->getName() + " not found in dataset");
} }
int parent_index = pos - features.begin(); int parent_index = pos - features.begin();
coordinates.push_back(dataset.index({ parent_index, n_sample })); coordinates.push_back(sample[parent_index]);
} }
// Increment the count of the corresponding coordinate // Increment the count of the corresponding coordinate
cpTable.index_put_({ coordinates }, cpTable.index({ coordinates }) + weights.index({ n_sample }).item<double>()); cpTable.index_put_({ coordinates }, weights.index({ n_sample }), true);
} }
// Normalize the counts // Normalize the counts
// Divide each row by the sum of the row
cpTable = cpTable / cpTable.sum(0); cpTable = cpTable / cpTable.sum(0);
} }
float Node::getFactorValue(std::map<std::string, int>& evidence) double Node::getFactorValue(std::map<std::string, int>& evidence)
{ {
c10::List<c10::optional<at::Tensor>> coordinates; c10::List<c10::optional<at::Tensor>> coordinates;
// following predetermined order of indices in the cpTable (see Node.h) // following predetermined order of indices in the cpTable (see Node.h)
coordinates.push_back(at::tensor(evidence[name])); coordinates.push_back(at::tensor(evidence[name]));
transform(parents.begin(), parents.end(), std::back_inserter(coordinates), [&evidence](const auto& parent) { return at::tensor(evidence[parent->getName()]); }); transform(parents.begin(), parents.end(), std::back_inserter(coordinates), [&evidence](const auto& parent) { return at::tensor(evidence[parent->getName()]); });
return cpTable.index({ coordinates }).item<float>(); return cpTable.index({ coordinates }).item<double>();
} }
std::vector<std::string> Node::graph(const std::string& className) std::vector<std::string> Node::graph(const std::string& className)
{ {
auto output = std::vector<std::string>(); auto output = std::vector<std::string>();
auto suffix = name == className ? ", fontcolor=red, fillcolor=lightblue, style=filled " : ""; auto suffix = name == className ? ", fontcolor=red, fillcolor=lightblue, style=filled " : "";
output.push_back(name + " [shape=circle" + suffix + "] \n"); output.push_back("\"" + name + "\" [shape=circle" + suffix + "] \n");
transform(children.begin(), children.end(), back_inserter(output), [this](const auto& child) { return name + " -> " + child->getName(); }); transform(children.begin(), children.end(), back_inserter(output), [this](const auto& child) { return "\"" + name + "\" -> \"" + child->getName() + "\""; });
return output; return output;
} }
} }

View File

@ -1,19 +1,17 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef NODE_H #ifndef NODE_H
#define NODE_H #define NODE_H
#include <torch/torch.h>
#include <unordered_set> #include <unordered_set>
#include <vector> #include <vector>
#include <string> #include <string>
#include <torch/torch.h>
namespace bayesnet { namespace bayesnet {
class Node { class Node {
private:
std::string name;
std::vector<Node*> parents;
std::vector<Node*> children;
int numStates; // number of states of the variable
torch::Tensor cpTable; // Order of indices is 0-> node variable, 1-> 1st parent, 2-> 2nd parent, ...
std::vector<int64_t> dimensions; // dimensions of the cpTable
std::vector<std::pair<std::string, std::string>> combinations(const std::vector<std::string>&);
public: public:
explicit Node(const std::string&); explicit Node(const std::string&);
void clear(); void clear();
@ -25,12 +23,20 @@ namespace bayesnet {
std::vector<Node*>& getParents(); std::vector<Node*>& getParents();
std::vector<Node*>& getChildren(); std::vector<Node*>& getChildren();
torch::Tensor& getCPT(); torch::Tensor& getCPT();
void computeCPT(const torch::Tensor& dataset, const std::vector<std::string>& features, const double laplaceSmoothing, const torch::Tensor& weights); void computeCPT(const torch::Tensor& dataset, const std::vector<std::string>& features, const double smoothing, const torch::Tensor& weights);
int getNumStates() const; int getNumStates() const;
void setNumStates(int); void setNumStates(int);
unsigned minFill(); unsigned minFill();
std::vector<std::string> graph(const std::string& clasName); // Returns a std::vector of std::strings representing the graph in graphviz format std::vector<std::string> graph(const std::string& clasName); // Returns a std::vector of std::strings representing the graph in graphviz format
float getFactorValue(std::map<std::string, int>&); double getFactorValue(std::map<std::string, int>&);
private:
std::string name;
std::vector<Node*> parents;
std::vector<Node*> children;
int numStates = 0; // number of states of the variable
torch::Tensor cpTable; // Order of indices is 0-> node variable, 1-> 1st parent, 2-> 2nd parent, ...
std::vector<int64_t> dimensions; // dimensions of the cpTable
std::vector<std::pair<std::string, std::string>> combinations(const std::vector<std::string>&);
}; };
} }
#endif #endif

View File

@ -1,30 +1,86 @@
#include "BayesMetrics.h" // ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <map>
#include <unordered_map>
#include <tuple>
#include "Mst.h" #include "Mst.h"
#include "BayesMetrics.h"
namespace bayesnet { namespace bayesnet {
//samples is n+1xm tensor used to fit the model //samples is n+1xm tensor used to fit the model
Metrics::Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates) Metrics::Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates)
: samples(samples) : samples(samples)
, features(features)
, className(className) , className(className)
, features(features)
, classNumStates(classNumStates) , classNumStates(classNumStates)
{ {
} }
//samples is nxm std::vector used to fit the model //samples is n+1xm std::vector used to fit the model
Metrics::Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates) Metrics::Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates)
: features(features) : samples(torch::zeros({ static_cast<int>(vsamples.size() + 1), static_cast<int>(vsamples[0].size()) }, torch::kInt32))
, className(className) , className(className)
, features(features)
, classNumStates(classNumStates) , classNumStates(classNumStates)
, samples(torch::zeros({ static_cast<int>(vsamples[0].size()), static_cast<int>(vsamples.size() + 1) }, torch::kInt32))
{ {
for (int i = 0; i < vsamples.size(); ++i) { for (int i = 0; i < vsamples.size(); ++i) {
samples.index_put_({ i, "..." }, torch::tensor(vsamples[i], torch::kInt32)); samples.index_put_({ i, "..." }, torch::tensor(vsamples[i], torch::kInt32));
} }
samples.index_put_({ -1, "..." }, torch::tensor(labels, torch::kInt32)); samples.index_put_({ -1, "..." }, torch::tensor(labels, torch::kInt32));
} }
std::vector<std::pair<int, int>> Metrics::SelectKPairs(const torch::Tensor& weights, std::vector<int>& featuresExcluded, bool ascending, unsigned k)
{
// Return the K Best features
auto n = features.size();
// compute scores
scoresKPairs.clear();
pairsKBest.clear();
auto labels = samples.index({ -1, "..." });
for (int i = 0; i < n - 1; ++i) {
if (std::find(featuresExcluded.begin(), featuresExcluded.end(), i) != featuresExcluded.end()) {
continue;
}
for (int j = i + 1; j < n; ++j) {
if (std::find(featuresExcluded.begin(), featuresExcluded.end(), j) != featuresExcluded.end()) {
continue;
}
auto key = std::make_pair(i, j);
auto value = conditionalMutualInformation(samples.index({ i, "..." }), samples.index({ j, "..." }), labels, weights);
scoresKPairs.push_back({ key, value });
}
}
// sort scores
if (ascending) {
sort(scoresKPairs.begin(), scoresKPairs.end(), [](auto& a, auto& b)
{ return a.second < b.second; });
} else {
sort(scoresKPairs.begin(), scoresKPairs.end(), [](auto& a, auto& b)
{ return a.second > b.second; });
}
for (auto& [pairs, score] : scoresKPairs) {
pairsKBest.push_back(pairs);
}
if (k != 0 && k < pairsKBest.size()) {
if (ascending) {
int limit = pairsKBest.size() - k;
for (int i = 0; i < limit; i++) {
pairsKBest.erase(pairsKBest.begin());
scoresKPairs.erase(scoresKPairs.begin());
}
} else {
pairsKBest.resize(k);
scoresKPairs.resize(k);
}
}
return pairsKBest;
}
std::vector<int> Metrics::SelectKBestWeighted(const torch::Tensor& weights, bool ascending, unsigned k) std::vector<int> Metrics::SelectKBestWeighted(const torch::Tensor& weights, bool ascending, unsigned k)
{ {
// Return the K Best features // Return the K Best features
auto n = samples.size(0) - 1; auto n = features.size();
if (k == 0) { if (k == 0) {
k = n; k = n;
} }
@ -60,7 +116,10 @@ namespace bayesnet {
{ {
return scoresKBest; return scoresKBest;
} }
std::vector<std::pair<std::pair<int, int>, double>> Metrics::getScoresKPairs() const
{
return scoresKPairs;
}
torch::Tensor Metrics::conditionalEdge(const torch::Tensor& weights) torch::Tensor Metrics::conditionalEdge(const torch::Tensor& weights)
{ {
auto result = std::vector<double>(); auto result = std::vector<double>();
@ -99,14 +158,8 @@ namespace bayesnet {
} }
return matrix; return matrix;
} }
// To use in Python // Measured in nats (natural logarithm (log) base e)
std::vector<float> Metrics::conditionalEdgeWeights(std::vector<float>& weights_) // Elements of Information Theory, 2nd Edition, Thomas M. Cover, Joy A. Thomas p. 14
{
const torch::Tensor weights = torch::tensor(weights_);
auto matrix = conditionalEdge(weights);
std::vector<float> v(matrix.data_ptr<float>(), matrix.data_ptr<float>() + matrix.numel());
return v;
}
double Metrics::entropy(const torch::Tensor& feature, const torch::Tensor& weights) double Metrics::entropy(const torch::Tensor& feature, const torch::Tensor& weights)
{ {
torch::Tensor counts = feature.bincount(weights); torch::Tensor counts = feature.bincount(weights);
@ -145,10 +198,54 @@ namespace bayesnet {
} }
return entropyValue; return entropyValue;
} }
// I(X;Y) = H(Y) - H(Y|X) // H(X|Y,C) = sum_{y in Y, c in C} p(x,c) H(X|Y=y,C=c)
double Metrics::conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights)
{
// Ensure the tensors are of the same length
assert(firstFeature.size(0) == secondFeature.size(0) && firstFeature.size(0) == labels.size(0) && firstFeature.size(0) == weights.size(0));
// Convert tensors to vectors for easier processing
auto firstFeatureData = firstFeature.accessor<int, 1>();
auto secondFeatureData = secondFeature.accessor<int, 1>();
auto labelsData = labels.accessor<int, 1>();
auto weightsData = weights.accessor<double, 1>();
int numSamples = firstFeature.size(0);
// Maps for joint and marginal probabilities
std::map<std::tuple<int, int, int>, double> jointCount;
std::map<std::tuple<int, int>, double> marginalCount;
// Compute joint and marginal counts
for (int i = 0; i < numSamples; ++i) {
auto keyJoint = std::make_tuple(firstFeatureData[i], labelsData[i], secondFeatureData[i]);
auto keyMarginal = std::make_tuple(firstFeatureData[i], labelsData[i]);
jointCount[keyJoint] += weightsData[i];
marginalCount[keyMarginal] += weightsData[i];
}
// Total weight sum
double totalWeight = torch::sum(weights).item<double>();
if (totalWeight == 0)
return 0;
// Compute the conditional entropy
double conditionalEntropy = 0.0;
for (const auto& [keyJoint, jointFreq] : jointCount) {
auto [x, c, y] = keyJoint;
auto keyMarginal = std::make_tuple(x, c);
//double p_xc = marginalCount[keyMarginal] / totalWeight;
double p_y_given_xc = jointFreq / marginalCount[keyMarginal];
if (p_y_given_xc > 0) {
conditionalEntropy -= (jointFreq / totalWeight) * std::log(p_y_given_xc);
}
}
return conditionalEntropy;
}
// I(X;Y) = H(Y) - H(Y|X) ; I(X;Y) >= 0
double Metrics::mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights) double Metrics::mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights)
{ {
return entropy(firstFeature, weights) - conditionalEntropy(firstFeature, secondFeature, weights); return std::max(entropy(firstFeature, weights) - conditionalEntropy(firstFeature, secondFeature, weights), 0.0);
}
// I(X;Y|C) = H(X|C) - H(X|Y,C) >= 0
double Metrics::conditionalMutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights)
{
return std::max(conditionalEntropy(firstFeature, labels, weights) - conditionalEntropy(firstFeature, secondFeature, labels, weights), 0.0);
} }
/* /*
Compute the maximum spanning tree considering the weights as distances Compute the maximum spanning tree considering the weights as distances

View File

@ -1,25 +1,41 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BAYESNET_METRICS_H #ifndef BAYESNET_METRICS_H
#define BAYESNET_METRICS_H #define BAYESNET_METRICS_H
#include <torch/torch.h>
#include <vector> #include <vector>
#include <string> #include <string>
#include <torch/torch.h>
namespace bayesnet { namespace bayesnet {
class Metrics { class Metrics {
private: public:
int classNumStates = 0; Metrics() = default;
std::vector<double> scoresKBest; Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates);
std::vector<int> featuresKBest; // sorted indices of the features Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates);
double conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights); std::vector<int> SelectKBestWeighted(const torch::Tensor& weights, bool ascending = false, unsigned k = 0);
std::vector<std::pair<int, int>> SelectKPairs(const torch::Tensor& weights, std::vector<int>& featuresExcluded, bool ascending = false, unsigned k = 0);
std::vector<double> getScoresKBest() const;
std::vector<std::pair<std::pair<int, int>, double>> getScoresKPairs() const;
double mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights);
double conditionalMutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights);
torch::Tensor conditionalEdge(const torch::Tensor& weights);
std::vector<std::pair<int, int>> maximumSpanningTree(const std::vector<std::string>& features, const torch::Tensor& weights, const int root);
// Measured in nats (natural logarithm (log) base e)
// Elements of Information Theory, 2nd Edition, Thomas M. Cover, Joy A. Thomas p. 14
double entropy(const torch::Tensor& feature, const torch::Tensor& weights);
double conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& labels, const torch::Tensor& weights);
protected: protected:
torch::Tensor samples; // n+1xm torch::Tensor used to fit the model where samples[-1] is the y std::vector torch::Tensor samples; // n+1xm torch::Tensor used to fit the model where samples[-1] is the y std::vector
std::string className; std::string className;
double entropy(const torch::Tensor& feature, const torch::Tensor& weights);
std::vector<std::string> features; std::vector<std::string> features;
template <class T> template <class T>
std::vector<std::pair<T, T>> doCombinations(const std::vector<T>& source) std::vector<std::pair<T, T>> doCombinations(const std::vector<T>& source)
{ {
std::vector<std::pair<T, T>> result; std::vector<std::pair<T, T>> result;
for (int i = 0; i < source.size(); ++i) { for (int i = 0; i < source.size() - 1; ++i) {
T temp = source[i]; T temp = source[i];
for (int j = i + 1; j < source.size(); ++j) { for (int j = i + 1; j < source.size(); ++j) {
result.push_back({ temp, source[j] }); result.push_back({ temp, source[j] });
@ -34,16 +50,13 @@ namespace bayesnet {
v.erase(v.begin()); v.erase(v.begin());
return temp; return temp;
} }
public: private:
Metrics() = default; int classNumStates = 0;
Metrics(const torch::Tensor& samples, const std::vector<std::string>& features, const std::string& className, const int classNumStates); std::vector<double> scoresKBest;
Metrics(const std::vector<std::vector<int>>& vsamples, const std::vector<int>& labels, const std::vector<std::string>& features, const std::string& className, const int classNumStates); std::vector<int> featuresKBest; // sorted indices of the features
std::vector<int> SelectKBestWeighted(const torch::Tensor& weights, bool ascending = false, unsigned k = 0); std::vector<std::pair<int, int>> pairsKBest; // sorted indices of the pairs
std::vector<double> getScoresKBest() const; std::vector<std::pair<std::pair<int, int>, double>> scoresKPairs;
double mutualInformation(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights); double conditionalEntropy(const torch::Tensor& firstFeature, const torch::Tensor& secondFeature, const torch::Tensor& weights);
std::vector<float> conditionalEdgeWeights(std::vector<float>& weights); // To use in Python
torch::Tensor conditionalEdge(const torch::Tensor& weights);
std::vector<std::pair<int, int>> maximumSpanningTree(const std::vector<std::string>& features, const torch::Tensor& weights, const int root);
}; };
} }
#endif #endif

View File

@ -0,0 +1,46 @@
#ifndef COUNTING_SEMAPHORE_H
#define COUNTING_SEMAPHORE_H
#include <mutex>
#include <condition_variable>
#include <algorithm>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <thread>
class CountingSemaphore {
public:
static CountingSemaphore& getInstance()
{
static CountingSemaphore instance;
return instance;
}
// Delete copy constructor and assignment operator
CountingSemaphore(const CountingSemaphore&) = delete;
CountingSemaphore& operator=(const CountingSemaphore&) = delete;
void acquire()
{
std::unique_lock<std::mutex> lock(mtx_);
cv_.wait(lock, [this]() { return count_ > 0; });
--count_;
}
void release()
{
std::lock_guard<std::mutex> lock(mtx_);
++count_;
if (count_ <= max_count_) {
cv_.notify_one();
}
}
private:
CountingSemaphore()
: max_count_(std::max(1u, static_cast<uint>(0.95 * std::thread::hardware_concurrency()))),
count_(max_count_)
{
}
std::mutex mtx_;
std::condition_variable cv_;
const uint max_count_;
uint count_;
};
#endif

View File

@ -1,6 +1,13 @@
#include "Mst.h" // ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include <sstream>
#include <vector> #include <vector>
#include <list> #include <list>
#include "Mst.h"
/* /*
Based on the code from https://www.softwaretestinghelp.com/minimum-spanning-tree-tutorial/ Based on the code from https://www.softwaretestinghelp.com/minimum-spanning-tree-tutorial/
@ -45,24 +52,15 @@ namespace bayesnet {
} }
} }
} }
void Graph::display_mst()
{
std::cout << "Edge :" << " Weight" << std::endl;
for (int i = 0; i < T.size(); i++) {
std::cout << T[i].second.first << " - " << T[i].second.second << " : "
<< T[i].first;
std::cout << std::endl;
}
}
void insertElement(std::list<int>& variables, int variable) void MST::insertElement(std::list<int>& variables, int variable)
{ {
if (std::find(variables.begin(), variables.end(), variable) == variables.end()) { if (std::find(variables.begin(), variables.end(), variable) == variables.end()) {
variables.push_front(variable); variables.push_front(variable);
} }
} }
std::vector<std::pair<int, int>> reorder(std::vector<std::pair<float, std::pair<int, int>>> T, int root_original) std::vector<std::pair<int, int>> MST::reorder(std::vector<std::pair<float, std::pair<int, int>>> T, int root_original)
{ {
// Create the edges of a DAG from the MST // Create the edges of a DAG from the MST
// replacing unordered_set with list because unordered_set cannot guarantee the order of the elements inserted // replacing unordered_set with list because unordered_set cannot guarantee the order of the elements inserted

View File

@ -1,33 +1,40 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef MST_H #ifndef MST_H
#define MST_H #define MST_H
#include <torch/torch.h>
#include <vector> #include <vector>
#include <string> #include <string>
#include <torch/torch.h>
namespace bayesnet { namespace bayesnet {
class MST { class MST {
public:
MST() = default;
MST(const std::vector<std::string>& features, const torch::Tensor& weights, const int root);
void insertElement(std::list<int>& variables, int variable);
std::vector<std::pair<int, int>> reorder(std::vector<std::pair<float, std::pair<int, int>>> T, int root_original);
std::vector<std::pair<int, int>> maximumSpanningTree();
private: private:
torch::Tensor weights; torch::Tensor weights;
std::vector<std::string> features; std::vector<std::string> features;
int root = 0; int root = 0;
public:
MST() = default;
MST(const std::vector<std::string>& features, const torch::Tensor& weights, const int root);
std::vector<std::pair<int, int>> maximumSpanningTree();
}; };
class Graph { class Graph {
private:
int V; // number of nodes in graph
std::vector <std::pair<float, std::pair<int, int>>> G; // std::vector for graph
std::vector <std::pair<float, std::pair<int, int>>> T; // std::vector for mst
std::vector<int> parent;
public: public:
explicit Graph(int V); explicit Graph(int V);
void addEdge(int u, int v, float wt); void addEdge(int u, int v, float wt);
int find_set(int i); int find_set(int i);
void union_set(int u, int v); void union_set(int u, int v);
void kruskal_algorithm(); void kruskal_algorithm();
void display_mst();
std::vector <std::pair<float, std::pair<int, int>>> get_mst() { return T; } std::vector <std::pair<float, std::pair<int, int>>> get_mst() { return T; }
private:
int V; // number of nodes in graph
std::vector <std::pair<float, std::pair<int, int>>> G; // std::vector for graph
std::vector <std::pair<float, std::pair<int, int>>> T; // std::vector for mst
std::vector<int> parent;
}; };
} }
#endif #endif

View File

@ -1,3 +1,9 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#include "bayesnetUtils.h" #include "bayesnetUtils.h"
namespace bayesnet { namespace bayesnet {
@ -10,18 +16,6 @@ namespace bayesnet {
sort(indices.begin(), indices.end(), [&nums](int i, int j) {return nums[i] > nums[j];}); sort(indices.begin(), indices.end(), [&nums](int i, int j) {return nums[i] > nums[j];});
return indices; return indices;
} }
std::vector<std::vector<int>> tensorToVector(torch::Tensor& dtensor)
{
// convert mxn tensor to nxm std::vector
std::vector<std::vector<int>> result;
// Iterate over cols
for (int i = 0; i < dtensor.size(1); ++i) {
auto col_tensor = dtensor.index({ "...", i });
auto col = std::vector<int>(col_tensor.data_ptr<int>(), col_tensor.data_ptr<int>() + dtensor.size(0));
result.push_back(col);
}
return result;
}
std::vector<std::vector<double>> tensorToVectorDouble(torch::Tensor& dtensor) std::vector<std::vector<double>> tensorToVectorDouble(torch::Tensor& dtensor)
{ {
// convert mxn tensor to mxn std::vector // convert mxn tensor to mxn std::vector

View File

@ -1,10 +1,15 @@
// ***************************************************************
// SPDX-FileCopyrightText: Copyright 2024 Ricardo Montañana Gómez
// SPDX-FileType: SOURCE
// SPDX-License-Identifier: MIT
// ***************************************************************
#ifndef BAYESNET_UTILS_H #ifndef BAYESNET_UTILS_H
#define BAYESNET_UTILS_H #define BAYESNET_UTILS_H
#include <torch/torch.h>
#include <vector> #include <vector>
#include <torch/torch.h>
namespace bayesnet { namespace bayesnet {
std::vector<int> argsort(std::vector<double>& nums); std::vector<int> argsort(std::vector<double>& nums);
std::vector<std::vector<int>> tensorToVector(torch::Tensor& dtensor);
std::vector<std::vector<double>> tensorToVectorDouble(torch::Tensor& dtensor); std::vector<std::vector<double>> tensorToVectorDouble(torch::Tensor& dtensor);
torch::Tensor vectorToTensor(std::vector<std::vector<int>>& vector, bool transpose = true); torch::Tensor vectorToTensor(std::vector<std::vector<int>>& vector, bool transpose = true);
} }

View File

@ -137,7 +137,7 @@
include(CMakeParseArguments) include(CMakeParseArguments)
option(CODE_COVERAGE_VERBOSE "Verbose information" FALSE) option(CODE_COVERAGE_VERBOSE "Verbose information" TRUE)
# Check prereqs # Check prereqs
find_program( GCOV_PATH gcov ) find_program( GCOV_PATH gcov )
@ -160,7 +160,11 @@ foreach(LANG ${LANGUAGES})
endif() endif()
elseif(NOT "${CMAKE_${LANG}_COMPILER_ID}" MATCHES "GNU" elseif(NOT "${CMAKE_${LANG}_COMPILER_ID}" MATCHES "GNU"
AND NOT "${CMAKE_${LANG}_COMPILER_ID}" MATCHES "(LLVM)?[Ff]lang") AND NOT "${CMAKE_${LANG}_COMPILER_ID}" MATCHES "(LLVM)?[Ff]lang")
message(FATAL_ERROR "Compiler is not GNU or Flang! Aborting...") if ("${LANG}" MATCHES "CUDA")
message(STATUS "Ignoring CUDA")
else()
message(FATAL_ERROR "Compiler is not GNU or Flang! Aborting...")
endif()
endif() endif()
endforeach() endforeach()

View File

@ -1,4 +1,4 @@
configure_file( configure_file(
"config.h.in" "config.h.in"
"${CMAKE_BINARY_DIR}/configured_files/include/config.h" ESCAPE_QUOTES "${CMAKE_BINARY_DIR}/configured_files/include/bayesnet/config.h" ESCAPE_QUOTES
) )

Binary file not shown.

580
diagrams/BayesNet.puml Normal file
View File

@ -0,0 +1,580 @@
@startuml
title clang-uml class diagram model
class "bayesnet::Node" as C_0010428199432536647474
class C_0010428199432536647474 #aliceblue;line:blue;line.dotted;text:blue {
+Node(const std::string &) : void
..
+addChild(Node *) : void
+addParent(Node *) : void
+clear() : void
+computeCPT(const torch::Tensor & dataset, const std::vector<std::string> & features, const double smoothing, const torch::Tensor & weights) : void
+getCPT() : torch::Tensor &
+getChildren() : std::vector<Node *> &
+getFactorValue(std::map<std::string,int> &) : double
+getName() const : std::string
+getNumStates() const : int
+getParents() : std::vector<Node *> &
+graph(const std::string & clasName) : std::vector<std::string>
+minFill() : unsigned int
+removeChild(Node *) : void
+removeParent(Node *) : void
+setNumStates(int) : void
__
}
enum "bayesnet::Smoothing_t" as C_0013393078277439680282
enum C_0013393078277439680282 {
NONE
ORIGINAL
LAPLACE
CESTNIK
}
class "bayesnet::Network" as C_0009493661199123436603
class C_0009493661199123436603 #aliceblue;line:blue;line.dotted;text:blue {
+Network() : void
+Network(const Network &) : void
+~Network() = default : void
..
+addEdge(const std::string &, const std::string &) : void
+addNode(const std::string &) : void
+dump_cpt() const : std::string
+fit(const torch::Tensor & samples, const torch::Tensor & weights, const std::vector<std::string> & featureNames, const std::string & className, const std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : void
+fit(const torch::Tensor & X, const torch::Tensor & y, const torch::Tensor & weights, const std::vector<std::string> & featureNames, const std::string & className, const std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : void
+fit(const std::vector<std::vector<int>> & input_data, const std::vector<int> & labels, const std::vector<double> & weights, const std::vector<std::string> & featureNames, const std::string & className, const std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : void
+getClassName() const : std::string
+getClassNumStates() const : int
+getEdges() const : std::vector<std::pair<std::string,std::string>>
+getFeatures() const : std::vector<std::string>
+getNodes() : std::map<std::string,std::unique_ptr<Node>> &
+getNumEdges() const : int
+getSamples() : torch::Tensor &
+getStates() const : int
+graph(const std::string & title) const : std::vector<std::string>
+initialize() : void
+predict(const std::vector<std::vector<int>> &) : std::vector<int>
+predict(const torch::Tensor &) : torch::Tensor
+predict_proba(const std::vector<std::vector<int>> &) : std::vector<std::vector<double>>
+predict_proba(const torch::Tensor &) : torch::Tensor
+predict_tensor(const torch::Tensor & samples, const bool proba) : torch::Tensor
+score(const std::vector<std::vector<int>> &, const std::vector<int> &) : double
+show() const : std::vector<std::string>
+topological_sort() : std::vector<std::string>
+version() : std::string
__
}
enum "bayesnet::status_t" as C_0005907365846270811004
enum C_0005907365846270811004 {
NORMAL
WARNING
ERROR
}
abstract "bayesnet::BaseClassifier" as C_0002617087915615796317
abstract C_0002617087915615796317 #aliceblue;line:blue;line.dotted;text:blue {
+~BaseClassifier() = default : void
..
{abstract} +dump_cpt() const = 0 : std::string
{abstract} +fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) = 0 : BaseClassifier &
{abstract} +fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) = 0 : BaseClassifier &
{abstract} +fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const torch::Tensor & weights, const Smoothing_t smoothing) = 0 : BaseClassifier &
{abstract} +fit(std::vector<std::vector<int>> & X, std::vector<int> & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) = 0 : BaseClassifier &
{abstract} +getClassNumStates() const = 0 : int
{abstract} +getNotes() const = 0 : std::vector<std::string>
{abstract} +getNumberOfEdges() const = 0 : int
{abstract} +getNumberOfNodes() const = 0 : int
{abstract} +getNumberOfStates() const = 0 : int
{abstract} +getStatus() const = 0 : status_t
+getValidHyperparameters() : std::vector<std::string> &
{abstract} +getVersion() = 0 : std::string
{abstract} +graph(const std::string & title = "") const = 0 : std::vector<std::string>
{abstract} +predict(std::vector<std::vector<int>> & X) = 0 : std::vector<int>
{abstract} +predict(torch::Tensor & X) = 0 : torch::Tensor
{abstract} +predict_proba(std::vector<std::vector<int>> & X) = 0 : std::vector<std::vector<double>>
{abstract} +predict_proba(torch::Tensor & X) = 0 : torch::Tensor
{abstract} +score(std::vector<std::vector<int>> & X, std::vector<int> & y) = 0 : float
{abstract} +score(torch::Tensor & X, torch::Tensor & y) = 0 : float
{abstract} +setHyperparameters(const nlohmann::json & hyperparameters) = 0 : void
{abstract} +show() const = 0 : std::vector<std::string>
{abstract} +topological_order() = 0 : std::vector<std::string>
{abstract} #trainModel(const torch::Tensor & weights, const Smoothing_t smoothing) = 0 : void
__
#validHyperparameters : std::vector<std::string>
}
class "bayesnet::Metrics" as C_0005895723015084986588
class C_0005895723015084986588 #aliceblue;line:blue;line.dotted;text:blue {
+Metrics() = default : void
+Metrics(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int classNumStates) : void
+Metrics(const std::vector<std::vector<int>> & vsamples, const std::vector<int> & labels, const std::vector<std::string> & features, const std::string & className, const int classNumStates) : void
..
+SelectKBestWeighted(const torch::Tensor & weights, bool ascending = false, unsigned int k = 0) : std::vector<int>
+SelectKPairs(const torch::Tensor & weights, std::vector<int> & featuresExcluded, bool ascending = false, unsigned int k = 0) : std::vector<std::pair<int,int>>
+conditionalEdge(const torch::Tensor & weights) : torch::Tensor
+conditionalEntropy(const torch::Tensor & firstFeature, const torch::Tensor & secondFeature, const torch::Tensor & labels, const torch::Tensor & weights) : double
+conditionalMutualInformation(const torch::Tensor & firstFeature, const torch::Tensor & secondFeature, const torch::Tensor & labels, const torch::Tensor & weights) : double
#doCombinations<T>(const std::vector<T> & source) : std::vector<std::pair<T, T> >
+entropy(const torch::Tensor & feature, const torch::Tensor & weights) : double
+getScoresKBest() const : std::vector<double>
+getScoresKPairs() const : std::vector<std::pair<std::pair<int,int>,double>>
+maximumSpanningTree(const std::vector<std::string> & features, const torch::Tensor & weights, const int root) : std::vector<std::pair<int,int>>
+mutualInformation(const torch::Tensor & firstFeature, const torch::Tensor & secondFeature, const torch::Tensor & weights) : double
#pop_first<T>(std::vector<T> & v) : T
__
#className : std::string
#features : std::vector<std::string>
#samples : torch::Tensor
}
abstract "bayesnet::Classifier" as C_0016351972983202413152
abstract C_0016351972983202413152 #aliceblue;line:blue;line.dotted;text:blue {
+Classifier(Network model) : void
+~Classifier() = default : void
..
+addNodes() : void
#buildDataset(torch::Tensor & y) : void
{abstract} #buildModel(const torch::Tensor & weights) = 0 : void
#checkFitParameters() : void
+dump_cpt() const : std::string
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : Classifier &
+fit(std::vector<std::vector<int>> & X, std::vector<int> & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : Classifier &
+fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : Classifier &
+fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const torch::Tensor & weights, const Smoothing_t smoothing) : Classifier &
+getClassNumStates() const : int
+getNotes() const : std::vector<std::string>
+getNumberOfEdges() const : int
+getNumberOfNodes() const : int
+getNumberOfStates() const : int
+getStatus() const : status_t
+getVersion() : std::string
+predict(std::vector<std::vector<int>> & X) : std::vector<int>
+predict(torch::Tensor & X) : torch::Tensor
+predict_proba(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
+predict_proba(torch::Tensor & X) : torch::Tensor
+score(torch::Tensor & X, torch::Tensor & y) : float
+score(std::vector<std::vector<int>> & X, std::vector<int> & y) : float
+setHyperparameters(const nlohmann::json & hyperparameters) : void
+show() const : std::vector<std::string>
+topological_order() : std::vector<std::string>
#trainModel(const torch::Tensor & weights, const Smoothing_t smoothing) : void
__
#className : std::string
#dataset : torch::Tensor
#features : std::vector<std::string>
#fitted : bool
#m : unsigned int
#metrics : Metrics
#model : Network
#n : unsigned int
#notes : std::vector<std::string>
#states : std::map<std::string,std::vector<int>>
#status : status_t
}
class "bayesnet::KDB" as C_0008902920152122000044
class C_0008902920152122000044 #aliceblue;line:blue;line.dotted;text:blue {
+KDB(int k, float theta = 0.03) : void
+~KDB() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "KDB") const : std::vector<std::string>
+setHyperparameters(const nlohmann::json & hyperparameters_) : void
__
}
class "bayesnet::SPODE" as C_0004096182510460307610
class C_0004096182510460307610 #aliceblue;line:blue;line.dotted;text:blue {
+SPODE(int root) : void
+~SPODE() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "SPODE") const : std::vector<std::string>
__
}
class "bayesnet::SPnDE" as C_0016268916386101512883
class C_0016268916386101512883 #aliceblue;line:blue;line.dotted;text:blue {
+SPnDE(std::vector<int> parents) : void
+~SPnDE() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "SPnDE") const : std::vector<std::string>
__
}
class "bayesnet::TAN" as C_0014087955399074584137
class C_0014087955399074584137 #aliceblue;line:blue;line.dotted;text:blue {
+TAN() : void
+~TAN() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & name = "TAN") const : std::vector<std::string>
__
}
class "bayesnet::Proposal" as C_0017759964713298103839
class C_0017759964713298103839 #aliceblue;line:blue;line.dotted;text:blue {
+Proposal(torch::Tensor & pDataset, std::vector<std::string> & features_, std::string & className_) : void
+~Proposal() : void
..
#checkInput(const torch::Tensor & X, const torch::Tensor & y) : void
#fit_local_discretization(const torch::Tensor & y) : std::map<std::string,std::vector<int>>
#localDiscretizationProposal(const std::map<std::string,std::vector<int>> & states, Network & model) : std::map<std::string,std::vector<int>>
#prepareX(torch::Tensor & X) : torch::Tensor
__
#Xf : torch::Tensor
#discretizers : map<std::string,mdlp::CPPFImdlp *>
#y : torch::Tensor
}
class "bayesnet::KDBLd" as C_0002756018222998454702
class C_0002756018222998454702 #aliceblue;line:blue;line.dotted;text:blue {
+KDBLd(int k) : void
+~KDBLd() = default : void
..
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : KDBLd &
+graph(const std::string & name = "KDB") const : std::vector<std::string>
+predict(torch::Tensor & X) : torch::Tensor
{static} +version() : std::string
__
}
class "bayesnet::SPODELd" as C_0010957245114062042836
class C_0010957245114062042836 #aliceblue;line:blue;line.dotted;text:blue {
+SPODELd(int root) : void
+~SPODELd() = default : void
..
+commonFit(const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : SPODELd &
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : SPODELd &
+fit(torch::Tensor & dataset, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : SPODELd &
+graph(const std::string & name = "SPODELd") const : std::vector<std::string>
+predict(torch::Tensor & X) : torch::Tensor
{static} +version() : std::string
__
}
class "bayesnet::TANLd" as C_0013350632773616302678
class C_0013350632773616302678 #aliceblue;line:blue;line.dotted;text:blue {
+TANLd() : void
+~TANLd() = default : void
..
+fit(torch::Tensor & X, torch::Tensor & y, const std::vector<std::string> & features, const std::string & className, std::map<std::string,std::vector<int>> & states, const Smoothing_t smoothing) : TANLd &
+graph(const std::string & name = "TANLd") const : std::vector<std::string>
+predict(torch::Tensor & X) : torch::Tensor
__
}
class "bayesnet::Ensemble" as C_0015881931090842884611
class C_0015881931090842884611 #aliceblue;line:blue;line.dotted;text:blue {
+Ensemble(bool predict_voting = true) : void
+~Ensemble() = default : void
..
#compute_arg_max(std::vector<std::vector<double>> & X) : std::vector<int>
#compute_arg_max(torch::Tensor & X) : torch::Tensor
+dump_cpt() const : std::string
+getNumberOfEdges() const : int
+getNumberOfNodes() const : int
+getNumberOfStates() const : int
+graph(const std::string & title) const : std::vector<std::string>
+predict(std::vector<std::vector<int>> & X) : std::vector<int>
+predict(torch::Tensor & X) : torch::Tensor
#predict_average_proba(torch::Tensor & X) : torch::Tensor
#predict_average_proba(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
#predict_average_voting(torch::Tensor & X) : torch::Tensor
#predict_average_voting(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
+predict_proba(std::vector<std::vector<int>> & X) : std::vector<std::vector<double>>
+predict_proba(torch::Tensor & X) : torch::Tensor
+score(std::vector<std::vector<int>> & X, std::vector<int> & y) : float
+score(torch::Tensor & X, torch::Tensor & y) : float
+show() const : std::vector<std::string>
+topological_order() : std::vector<std::string>
#trainModel(const torch::Tensor & weights, const Smoothing_t smoothing) : void
#voting(torch::Tensor & votes) : torch::Tensor
__
#models : std::vector<std::unique_ptr<Classifier>>
#n_models : unsigned int
#predict_voting : bool
#significanceModels : std::vector<double>
}
class "bayesnet::A2DE" as C_0001410789567057647859
class C_0001410789567057647859 #aliceblue;line:blue;line.dotted;text:blue {
+A2DE(bool predict_voting = false) : void
+~A2DE() : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & title = "A2DE") const : std::vector<std::string>
+setHyperparameters(const nlohmann::json & hyperparameters) : void
__
}
class "bayesnet::AODE" as C_0006288892608974306258
class C_0006288892608974306258 #aliceblue;line:blue;line.dotted;text:blue {
+AODE(bool predict_voting = false) : void
+~AODE() : void
..
#buildModel(const torch::Tensor & weights) : void
+graph(const std::string & title = "AODE") const : std::vector<std::string>
+setHyperparameters(const nlohmann::json & hyperparameters) : void
__
}
abstract "bayesnet::FeatureSelect" as C_0013562609546004646591
abstract C_0013562609546004646591 #aliceblue;line:blue;line.dotted;text:blue {
+FeatureSelect(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights) : void
+~FeatureSelect() : void
..
#computeMeritCFS() : double
#computeSuFeatures(const int a, const int b) : double
#computeSuLabels() : void
{abstract} +fit() = 0 : void
+getFeatures() const : std::vector<int>
+getScores() const : std::vector<double>
#initialize() : void
#symmetricalUncertainty(int a, int b) : double
__
#fitted : bool
#maxFeatures : int
#selectedFeatures : std::vector<int>
#selectedScores : std::vector<double>
#suFeatures : std::map<std::pair<int,int>,double>
#suLabels : std::vector<double>
#weights : const torch::Tensor &
}
class "bayesnet::(anonymous_60342586)" as C_0005584545181746538542
class C_0005584545181746538542 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_60343240)" as C_0016227156982041949444
class C_0016227156982041949444 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::Boost" as C_0009819322948617116148
class C_0009819322948617116148 #aliceblue;line:blue;line.dotted;text:blue {
+Boost(bool predict_voting = false) : void
+~Boost() = default : void
..
#buildModel(const torch::Tensor & weights) : void
#featureSelection(torch::Tensor & weights_) : std::vector<int>
+setHyperparameters(const nlohmann::json & hyperparameters_) : void
#update_weights(torch::Tensor & ytrain, torch::Tensor & ypred, torch::Tensor & weights) : std::tuple<torch::Tensor &,double,bool>
#update_weights_block(int k, torch::Tensor & ytrain, torch::Tensor & weights) : std::tuple<torch::Tensor &,double,bool>
__
#X_test : torch::Tensor
#X_train : torch::Tensor
#bisection : bool
#block_update : bool
#convergence : bool
#convergence_best : bool
#featureSelector : FeatureSelect *
#maxTolerance : int
#order_algorithm : std::string
#selectFeatures : bool
#select_features_algorithm : std::string
#threshold : double
#y_test : torch::Tensor
#y_train : torch::Tensor
}
class "bayesnet::AODELd" as C_0003898187834670349177
class C_0003898187834670349177 #aliceblue;line:blue;line.dotted;text:blue {
+AODELd(bool predict_voting = true) : void
+~AODELd() = default : void
..
#buildModel(const torch::Tensor & weights) : void
+fit(torch::Tensor & X_, torch::Tensor & y_, const std::vector<std::string> & features_, const std::string & className_, std::map<std::string,std::vector<int>> & states_, const Smoothing_t smoothing) : AODELd &
+graph(const std::string & name = "AODELd") const : std::vector<std::string>
#trainModel(const torch::Tensor & weights, const Smoothing_t smoothing) : void
__
}
class "bayesnet::(anonymous_60275628)" as C_0009086919615463763584
class C_0009086919615463763584 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_60276282)" as C_0015251985607563196159
class C_0015251985607563196159 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::BoostA2DE" as C_0000272055465257861326
class C_0000272055465257861326 #aliceblue;line:blue;line.dotted;text:blue {
+BoostA2DE(bool predict_voting = false) : void
+~BoostA2DE() = default : void
..
+graph(const std::string & title = "BoostA2DE") const : std::vector<std::string>
#trainModel(const torch::Tensor & weights, const Smoothing_t smoothing) : void
__
}
class "bayesnet::(anonymous_60275502)" as C_0016033655851510053155
class C_0016033655851510053155 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_60276156)" as C_0000379522761622473555
class C_0000379522761622473555 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::BoostAODE" as C_0002867772739198819061
class C_0002867772739198819061 #aliceblue;line:blue;line.dotted;text:blue {
+BoostAODE(bool predict_voting = false) : void
+~BoostAODE() = default : void
..
+graph(const std::string & title = "BoostAODE") const : std::vector<std::string>
#trainModel(const torch::Tensor & weights, const Smoothing_t smoothing) : void
__
}
class "bayesnet::CFS" as C_0000093018845530739957
class C_0000093018845530739957 #aliceblue;line:blue;line.dotted;text:blue {
+CFS(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights) : void
+~CFS() : void
..
+fit() : void
__
}
class "bayesnet::FCBF" as C_0001157456122733975432
class C_0001157456122733975432 #aliceblue;line:blue;line.dotted;text:blue {
+FCBF(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights, const double threshold) : void
+~FCBF() : void
..
+fit() : void
__
}
class "bayesnet::IWSS" as C_0000066148117395428429
class C_0000066148117395428429 #aliceblue;line:blue;line.dotted;text:blue {
+IWSS(const torch::Tensor & samples, const std::vector<std::string> & features, const std::string & className, const int maxFeatures, const int classNumStates, const torch::Tensor & weights, const double threshold) : void
+~IWSS() : void
..
+fit() : void
__
}
class "bayesnet::(anonymous_60730495)" as C_0004857727320042830573
class C_0004857727320042830573 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_60731150)" as C_0000076541533312623385
class C_0000076541533312623385 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::(anonymous_60653004)" as C_0001444063444142949758
class C_0001444063444142949758 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_60653658)" as C_0007139277546931322856
class C_0007139277546931322856 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::(anonymous_60731375)" as C_0010493853592456211189
class C_0010493853592456211189 #aliceblue;line:blue;line.dotted;text:blue {
__
+CFS : std::string
+FCBF : std::string
+IWSS : std::string
}
class "bayesnet::(anonymous_60732030)" as C_0007011438637915849564
class C_0007011438637915849564 #aliceblue;line:blue;line.dotted;text:blue {
__
+ASC : std::string
+DESC : std::string
+RAND : std::string
}
class "bayesnet::MST" as C_0001054867409378333602
class C_0001054867409378333602 #aliceblue;line:blue;line.dotted;text:blue {
+MST() = default : void
+MST(const std::vector<std::string> & features, const torch::Tensor & weights, const int root) : void
..
+insertElement(std::list<int> & variables, int variable) : void
+maximumSpanningTree() : std::vector<std::pair<int,int>>
+reorder(std::vector<std::pair<float,std::pair<int,int>>> T, int root_original) : std::vector<std::pair<int,int>>
__
}
class "bayesnet::Graph" as C_0009576333456015187741
class C_0009576333456015187741 #aliceblue;line:blue;line.dotted;text:blue {
+Graph(int V) : void
..
+addEdge(int u, int v, float wt) : void
+find_set(int i) : int
+get_mst() : std::vector<std::pair<float,std::pair<int,int>>>
+kruskal_algorithm() : void
+union_set(int u, int v) : void
__
}
C_0010428199432536647474 --> C_0010428199432536647474 : -parents
C_0010428199432536647474 --> C_0010428199432536647474 : -children
C_0009493661199123436603 ..> C_0013393078277439680282
C_0009493661199123436603 o-- C_0010428199432536647474 : -nodes
C_0002617087915615796317 ..> C_0013393078277439680282
C_0002617087915615796317 ..> C_0005907365846270811004
C_0016351972983202413152 ..> C_0013393078277439680282
C_0016351972983202413152 o-- C_0009493661199123436603 : #model
C_0016351972983202413152 o-- C_0005895723015084986588 : #metrics
C_0016351972983202413152 o-- C_0005907365846270811004 : #status
C_0002617087915615796317 <|-- C_0016351972983202413152
C_0016351972983202413152 <|-- C_0008902920152122000044
C_0016351972983202413152 <|-- C_0004096182510460307610
C_0016351972983202413152 <|-- C_0016268916386101512883
C_0016351972983202413152 <|-- C_0014087955399074584137
C_0017759964713298103839 ..> C_0009493661199123436603
C_0002756018222998454702 ..> C_0013393078277439680282
C_0008902920152122000044 <|-- C_0002756018222998454702
C_0017759964713298103839 <|-- C_0002756018222998454702
C_0010957245114062042836 ..> C_0013393078277439680282
C_0004096182510460307610 <|-- C_0010957245114062042836
C_0017759964713298103839 <|-- C_0010957245114062042836
C_0013350632773616302678 ..> C_0013393078277439680282
C_0014087955399074584137 <|-- C_0013350632773616302678
C_0017759964713298103839 <|-- C_0013350632773616302678
C_0015881931090842884611 ..> C_0013393078277439680282
C_0015881931090842884611 o-- C_0016351972983202413152 : #models
C_0016351972983202413152 <|-- C_0015881931090842884611
C_0015881931090842884611 <|-- C_0001410789567057647859
C_0015881931090842884611 <|-- C_0006288892608974306258
C_0005895723015084986588 <|-- C_0013562609546004646591
C_0009819322948617116148 --> C_0013562609546004646591 : #featureSelector
C_0015881931090842884611 <|-- C_0009819322948617116148
C_0003898187834670349177 ..> C_0013393078277439680282
C_0015881931090842884611 <|-- C_0003898187834670349177
C_0017759964713298103839 <|-- C_0003898187834670349177
C_0000272055465257861326 ..> C_0013393078277439680282
C_0009819322948617116148 <|-- C_0000272055465257861326
C_0002867772739198819061 ..> C_0013393078277439680282
C_0009819322948617116148 <|-- C_0002867772739198819061
C_0013562609546004646591 <|-- C_0000093018845530739957
C_0013562609546004646591 <|-- C_0001157456122733975432
C_0013562609546004646591 <|-- C_0000066148117395428429
'Generated with clang-uml, version 0.5.5
'LLVM version clang version 18.1.8 (Fedora 18.1.8-5.fc41)
@enduml

1
diagrams/BayesNet.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 196 KiB

314
diagrams/dependency.svg Normal file
View File

@ -0,0 +1,314 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 12.1.0 (20240811.2233)
-->
<!-- Title: BayesNet Pages: 1 -->
<svg width="3725pt" height="432pt"
viewBox="0.00 0.00 3724.84 431.80" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 427.8)">
<title>BayesNet</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-427.8 3720.84,-427.8 3720.84,4 -4,4"/>
<!-- node0 -->
<g id="node1" class="node">
<title>node0</title>
<polygon fill="none" stroke="black" points="1655.43,-398.35 1655.43,-413.26 1625.69,-423.8 1583.63,-423.8 1553.89,-413.26 1553.89,-398.35 1583.63,-387.8 1625.69,-387.8 1655.43,-398.35"/>
<text text-anchor="middle" x="1604.66" y="-401.53" font-family="Times,serif" font-size="12.00">BayesNet</text>
</g>
<!-- node1 -->
<g id="node2" class="node">
<title>node1</title>
<polygon fill="none" stroke="black" points="413.32,-257.8 372.39,-273.03 206.66,-279.8 40.93,-273.03 0,-257.8 114.69,-245.59 298.64,-245.59 413.32,-257.8"/>
<text text-anchor="middle" x="206.66" y="-257.53" font-family="Times,serif" font-size="12.00">/home/rmontanana/Code/libtorch/lib/libc10.so</text>
</g>
<!-- node0&#45;&gt;node1 -->
<g id="edge1" class="edge">
<title>node0&#45;&gt;node1</title>
<path fill="none" stroke="black" d="M1553.59,-400.53C1451.65,-391.91 1215.69,-371.61 1017.66,-351.8 773.36,-327.37 488.07,-295.22 329.31,-277.01"/>
<polygon fill="black" stroke="black" points="329.93,-273.56 319.6,-275.89 329.14,-280.51 329.93,-273.56"/>
</g>
<!-- node2 -->
<g id="node3" class="node">
<title>node2</title>
<polygon fill="none" stroke="black" points="894.21,-257.8 848.35,-273.03 662.66,-279.8 476.98,-273.03 431.12,-257.8 559.61,-245.59 765.71,-245.59 894.21,-257.8"/>
<text text-anchor="middle" x="662.66" y="-257.53" font-family="Times,serif" font-size="12.00">/home/rmontanana/Code/libtorch/lib/libc10_cuda.so</text>
</g>
<!-- node0&#45;&gt;node2 -->
<g id="edge2" class="edge">
<title>node0&#45;&gt;node2</title>
<path fill="none" stroke="black" d="M1555.34,-397.37C1408.12,-375.18 969.52,-309.06 767.13,-278.55"/>
<polygon fill="black" stroke="black" points="767.81,-275.12 757.4,-277.09 766.77,-282.04 767.81,-275.12"/>
</g>
<!-- node3 -->
<g id="node4" class="node">
<title>node3</title>
<polygon fill="none" stroke="black" points="1338.68,-257.8 1296.49,-273.03 1125.66,-279.8 954.84,-273.03 912.65,-257.8 1030.86,-245.59 1220.46,-245.59 1338.68,-257.8"/>
<text text-anchor="middle" x="1125.66" y="-257.53" font-family="Times,serif" font-size="12.00">/home/rmontanana/Code/libtorch/lib/libkineto.a</text>
</g>
<!-- node0&#45;&gt;node3 -->
<g id="edge3" class="edge">
<title>node0&#45;&gt;node3</title>
<path fill="none" stroke="black" d="M1566.68,-393.54C1484.46,-369.17 1289.3,-311.32 1188.44,-281.41"/>
<polygon fill="black" stroke="black" points="1189.53,-278.09 1178.95,-278.6 1187.54,-284.8 1189.53,-278.09"/>
</g>
<!-- node4 -->
<g id="node5" class="node">
<title>node4</title>
<polygon fill="none" stroke="black" points="1552.26,-257.8 1532.93,-273.03 1454.66,-279.8 1376.4,-273.03 1357.07,-257.8 1411.23,-245.59 1498.1,-245.59 1552.26,-257.8"/>
<text text-anchor="middle" x="1454.66" y="-257.53" font-family="Times,serif" font-size="12.00">/usr/lib64/libcuda.so</text>
</g>
<!-- node0&#45;&gt;node4 -->
<g id="edge4" class="edge">
<title>node0&#45;&gt;node4</title>
<path fill="none" stroke="black" d="M1586.27,-387.39C1559.5,-362.05 1509.72,-314.92 1479.65,-286.46"/>
<polygon fill="black" stroke="black" points="1482.13,-283.99 1472.46,-279.65 1477.31,-289.07 1482.13,-283.99"/>
</g>
<!-- node5 -->
<g id="node6" class="node">
<title>node5</title>
<polygon fill="none" stroke="black" points="1873.26,-257.8 1843.23,-273.03 1721.66,-279.8 1600.09,-273.03 1570.06,-257.8 1654.19,-245.59 1789.13,-245.59 1873.26,-257.8"/>
<text text-anchor="middle" x="1721.66" y="-257.53" font-family="Times,serif" font-size="12.00">/usr/local/cuda/lib64/libcudart.so</text>
</g>
<!-- node0&#45;&gt;node5 -->
<g id="edge5" class="edge">
<title>node0&#45;&gt;node5</title>
<path fill="none" stroke="black" d="M1619.76,-387.77C1628.83,-377.46 1640.53,-363.98 1650.66,-351.8 1668.32,-330.59 1687.84,-306.03 1701.94,-288.1"/>
<polygon fill="black" stroke="black" points="1704.43,-290.59 1707.84,-280.56 1698.92,-286.27 1704.43,-290.59"/>
</g>
<!-- node6 -->
<g id="node7" class="node">
<title>node6</title>
<polygon fill="none" stroke="black" points="2231.79,-257.8 2198.1,-273.03 2061.66,-279.8 1925.23,-273.03 1891.53,-257.8 1985.95,-245.59 2137.38,-245.59 2231.79,-257.8"/>
<text text-anchor="middle" x="2061.66" y="-257.53" font-family="Times,serif" font-size="12.00">/usr/local/cuda/lib64/libnvToolsExt.so</text>
</g>
<!-- node0&#45;&gt;node6 -->
<g id="edge6" class="edge">
<title>node0&#45;&gt;node6</title>
<path fill="none" stroke="black" d="M1642.06,-393.18C1721.31,-368.56 1906.71,-310.95 2002.32,-281.24"/>
<polygon fill="black" stroke="black" points="2003.28,-284.61 2011.79,-278.3 2001.21,-277.92 2003.28,-284.61"/>
</g>
<!-- node7 -->
<g id="node8" class="node">
<title>node7</title>
<polygon fill="none" stroke="black" points="2541.44,-257.8 2512.56,-273.03 2395.66,-279.8 2278.76,-273.03 2249.89,-257.8 2330.79,-245.59 2460.54,-245.59 2541.44,-257.8"/>
<text text-anchor="middle" x="2395.66" y="-257.53" font-family="Times,serif" font-size="12.00">/usr/local/cuda/lib64/libnvrtc.so</text>
</g>
<!-- node0&#45;&gt;node7 -->
<g id="edge7" class="edge">
<title>node0&#45;&gt;node7</title>
<path fill="none" stroke="black" d="M1651.19,-396.45C1780.36,-373.26 2144.76,-307.85 2311.05,-277.99"/>
<polygon fill="black" stroke="black" points="2311.47,-281.47 2320.7,-276.26 2310.24,-274.58 2311.47,-281.47"/>
</g>
<!-- node8 -->
<g id="node9" class="node">
<title>node8</title>
<polygon fill="none" stroke="black" points="1642.01,-326.35 1642.01,-341.26 1620.13,-351.8 1589.19,-351.8 1567.31,-341.26 1567.31,-326.35 1589.19,-315.8 1620.13,-315.8 1642.01,-326.35"/>
<text text-anchor="middle" x="1604.66" y="-329.53" font-family="Times,serif" font-size="12.00">fimdlp</text>
</g>
<!-- node0&#45;&gt;node8 -->
<g id="edge8" class="edge">
<title>node0&#45;&gt;node8</title>
<path fill="none" stroke="black" d="M1604.66,-387.5C1604.66,-380.21 1604.66,-371.53 1604.66,-363.34"/>
<polygon fill="black" stroke="black" points="1608.16,-363.42 1604.66,-353.42 1601.16,-363.42 1608.16,-363.42"/>
</g>
<!-- node19 -->
<g id="node10" class="node">
<title>node19</title>
<polygon fill="none" stroke="black" points="2709.74,-267.37 2634.66,-279.8 2559.58,-267.37 2588.26,-247.24 2681.06,-247.24 2709.74,-267.37"/>
<text text-anchor="middle" x="2634.66" y="-257.53" font-family="Times,serif" font-size="12.00">torch_library</text>
</g>
<!-- node0&#45;&gt;node19 -->
<g id="edge29" class="edge">
<title>node0&#45;&gt;node19</title>
<path fill="none" stroke="black" d="M1655.87,-399.32C1798.23,-383.79 2210.64,-336.94 2550.66,-279.8 2559.43,-278.33 2568.68,-276.62 2577.72,-274.86"/>
<polygon fill="black" stroke="black" points="2578.38,-278.3 2587.5,-272.92 2577.01,-271.43 2578.38,-278.3"/>
</g>
<!-- node8&#45;&gt;node1 -->
<g id="edge9" class="edge">
<title>node8&#45;&gt;node1</title>
<path fill="none" stroke="black" d="M1566.84,-331.58C1419.81,-326.72 872.06,-307.69 421.66,-279.8 401.07,-278.53 379.38,-277.02 358.03,-275.43"/>
<polygon fill="black" stroke="black" points="358.3,-271.94 348.06,-274.67 357.77,-278.92 358.3,-271.94"/>
</g>
<!-- node8&#45;&gt;node2 -->
<g id="edge10" class="edge">
<title>node8&#45;&gt;node2</title>
<path fill="none" stroke="black" d="M1566.86,-330C1445.11,-320.95 1057.97,-292.18 831.67,-275.36"/>
<polygon fill="black" stroke="black" points="832.09,-271.89 821.86,-274.63 831.57,-278.87 832.09,-271.89"/>
</g>
<!-- node8&#45;&gt;node3 -->
<g id="edge11" class="edge">
<title>node8&#45;&gt;node3</title>
<path fill="none" stroke="black" d="M1567.08,-327.31C1495.4,-316.84 1336.86,-293.67 1230.62,-278.14"/>
<polygon fill="black" stroke="black" points="1231.44,-274.72 1221.04,-276.74 1230.42,-281.65 1231.44,-274.72"/>
</g>
<!-- node8&#45;&gt;node4 -->
<g id="edge12" class="edge">
<title>node8&#45;&gt;node4</title>
<path fill="none" stroke="black" d="M1578.53,-320.61C1555.96,-310.08 1522.92,-294.66 1496.64,-282.4"/>
<polygon fill="black" stroke="black" points="1498.12,-279.22 1487.58,-278.17 1495.16,-285.57 1498.12,-279.22"/>
</g>
<!-- node8&#45;&gt;node5 -->
<g id="edge13" class="edge">
<title>node8&#45;&gt;node5</title>
<path fill="none" stroke="black" d="M1627.78,-318.97C1644.15,-309.18 1666.44,-295.84 1685.2,-284.62"/>
<polygon fill="black" stroke="black" points="1686.83,-287.73 1693.61,-279.59 1683.23,-281.72 1686.83,-287.73"/>
</g>
<!-- node8&#45;&gt;node6 -->
<g id="edge14" class="edge">
<title>node8&#45;&gt;node6</title>
<path fill="none" stroke="black" d="M1642.45,-327.02C1712.36,-316.31 1863.89,-293.1 1964.32,-277.71"/>
<polygon fill="black" stroke="black" points="1964.84,-281.18 1974.2,-276.2 1963.78,-274.26 1964.84,-281.18"/>
</g>
<!-- node8&#45;&gt;node7 -->
<g id="edge15" class="edge">
<title>node8&#45;&gt;node7</title>
<path fill="none" stroke="black" d="M1642.33,-330.01C1740.75,-322.64 2013.75,-301.7 2240.66,-279.8 2254.16,-278.5 2268.32,-277.06 2282.35,-275.58"/>
<polygon fill="black" stroke="black" points="2282.49,-279.08 2292.06,-274.54 2281.75,-272.12 2282.49,-279.08"/>
</g>
<!-- node8&#45;&gt;node19 -->
<g id="edge16" class="edge">
<title>node8&#45;&gt;node19</title>
<path fill="none" stroke="black" d="M1642.25,-332.63C1770.06,-331.64 2199.48,-324.94 2550.66,-279.8 2560.1,-278.59 2570.07,-276.92 2579.71,-275.1"/>
<polygon fill="black" stroke="black" points="2580.21,-278.57 2589.34,-273.21 2578.86,-271.7 2580.21,-278.57"/>
</g>
<!-- node20 -->
<g id="node11" class="node">
<title>node20</title>
<polygon fill="none" stroke="black" points="2606.81,-185.8 2533.89,-201.03 2238.66,-207.8 1943.43,-201.03 1870.52,-185.8 2074.82,-173.59 2402.5,-173.59 2606.81,-185.8"/>
<text text-anchor="middle" x="2238.66" y="-185.53" font-family="Times,serif" font-size="12.00">&#45;Wl,&#45;&#45;no&#45;as&#45;needed,&quot;/home/rmontanana/Code/libtorch/lib/libtorch.so&quot; &#45;Wl,&#45;&#45;as&#45;needed</text>
</g>
<!-- node19&#45;&gt;node20 -->
<g id="edge17" class="edge">
<title>node19&#45;&gt;node20</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2583.63,-250.21C2572.76,-248.03 2561.34,-245.79 2550.66,-243.8 2482.14,-231.05 2404.92,-217.93 2344.44,-207.93"/>
<polygon fill="black" stroke="black" points="2345.28,-204.52 2334.84,-206.34 2344.14,-211.42 2345.28,-204.52"/>
</g>
<!-- node9 -->
<g id="node12" class="node">
<title>node9</title>
<polygon fill="none" stroke="black" points="2542.56,-123.37 2445.66,-135.8 2348.77,-123.37 2385.78,-103.24 2505.55,-103.24 2542.56,-123.37"/>
<text text-anchor="middle" x="2445.66" y="-113.53" font-family="Times,serif" font-size="12.00">torch_cpu_library</text>
</g>
<!-- node19&#45;&gt;node9 -->
<g id="edge18" class="edge">
<title>node19&#45;&gt;node9</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2635.72,-246.84C2636.4,-227.49 2634.61,-192.58 2615.66,-171.8 2601.13,-155.87 2551.93,-141.56 2510.18,-131.84"/>
<polygon fill="black" stroke="black" points="2511.2,-128.48 2500.67,-129.68 2509.65,-135.31 2511.2,-128.48"/>
</g>
<!-- node13 -->
<g id="node16" class="node">
<title>node13</title>
<polygon fill="none" stroke="black" points="3056.45,-195.37 2953.66,-207.8 2850.87,-195.37 2890.13,-175.24 3017.19,-175.24 3056.45,-195.37"/>
<text text-anchor="middle" x="2953.66" y="-185.53" font-family="Times,serif" font-size="12.00">torch_cuda_library</text>
</g>
<!-- node19&#45;&gt;node13 -->
<g id="edge22" class="edge">
<title>node19&#45;&gt;node13</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2685.21,-249.71C2741.11,-237.45 2831.21,-217.67 2891.42,-204.46"/>
<polygon fill="black" stroke="black" points="2891.8,-207.96 2900.82,-202.4 2890.3,-201.13 2891.8,-207.96"/>
</g>
<!-- node10 -->
<g id="node13" class="node">
<title>node10</title>
<polygon fill="none" stroke="black" points="2362.4,-27.9 2285.6,-43.12 1974.66,-49.9 1663.72,-43.12 1586.93,-27.9 1802.1,-15.68 2147.22,-15.68 2362.4,-27.9"/>
<text text-anchor="middle" x="1974.66" y="-27.63" font-family="Times,serif" font-size="12.00">&#45;Wl,&#45;&#45;no&#45;as&#45;needed,&quot;/home/rmontanana/Code/libtorch/lib/libtorch_cpu.so&quot; &#45;Wl,&#45;&#45;as&#45;needed</text>
</g>
<!-- node9&#45;&gt;node10 -->
<g id="edge19" class="edge">
<title>node9&#45;&gt;node10</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2381.16,-105.31C2301.63,-91.15 2165.65,-66.92 2073.05,-50.43"/>
<polygon fill="black" stroke="black" points="2073.93,-47.03 2063.48,-48.72 2072.71,-53.92 2073.93,-47.03"/>
</g>
<!-- node11 -->
<g id="node14" class="node">
<title>node11</title>
<polygon fill="none" stroke="black" points="2510.72,-37.46 2445.66,-49.9 2380.61,-37.46 2405.46,-17.34 2485.87,-17.34 2510.72,-37.46"/>
<text text-anchor="middle" x="2445.66" y="-27.63" font-family="Times,serif" font-size="12.00">caffe2::mkl</text>
</g>
<!-- node9&#45;&gt;node11 -->
<g id="edge20" class="edge">
<title>node9&#45;&gt;node11</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2445.66,-102.95C2445.66,-91.68 2445.66,-75.4 2445.66,-61.37"/>
<polygon fill="black" stroke="black" points="2449.16,-61.78 2445.66,-51.78 2442.16,-61.78 2449.16,-61.78"/>
</g>
<!-- node12 -->
<g id="node15" class="node">
<title>node12</title>
<polygon fill="none" stroke="black" points="2794.95,-41.76 2661.66,-63.8 2528.37,-41.76 2579.28,-6.09 2744.04,-6.09 2794.95,-41.76"/>
<text text-anchor="middle" x="2661.66" y="-34.75" font-family="Times,serif" font-size="12.00">dummy</text>
<text text-anchor="middle" x="2661.66" y="-20.5" font-family="Times,serif" font-size="12.00">(protobuf::libprotobuf)</text>
</g>
<!-- node9&#45;&gt;node12 -->
<g id="edge21" class="edge">
<title>node9&#45;&gt;node12</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2481.82,-102.76C2512.55,-90.82 2557.5,-73.36 2594.77,-58.89"/>
<polygon fill="black" stroke="black" points="2595.6,-62.32 2603.65,-55.44 2593.06,-55.79 2595.6,-62.32"/>
</g>
<!-- node13&#45;&gt;node9 -->
<g id="edge28" class="edge">
<title>node13&#45;&gt;node9</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2880.59,-179.79C2799.97,-169.71 2666.42,-152.57 2551.66,-135.8 2540.2,-134.13 2528.06,-132.27 2516.24,-130.41"/>
<polygon fill="black" stroke="black" points="2516.96,-126.98 2506.54,-128.86 2515.87,-133.89 2516.96,-126.98"/>
</g>
<!-- node14 -->
<g id="node17" class="node">
<title>node14</title>
<polygon fill="none" stroke="black" points="3346.69,-113.8 3268.85,-129.03 2953.66,-135.8 2638.48,-129.03 2560.63,-113.8 2778.75,-101.59 3128.58,-101.59 3346.69,-113.8"/>
<text text-anchor="middle" x="2953.66" y="-113.53" font-family="Times,serif" font-size="12.00">&#45;Wl,&#45;&#45;no&#45;as&#45;needed,&quot;/home/rmontanana/Code/libtorch/lib/libtorch_cuda.so&quot; &#45;Wl,&#45;&#45;as&#45;needed</text>
</g>
<!-- node13&#45;&gt;node14 -->
<g id="edge23" class="edge">
<title>node13&#45;&gt;node14</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M2953.66,-174.97C2953.66,-167.13 2953.66,-157.01 2953.66,-147.53"/>
<polygon fill="black" stroke="black" points="2957.16,-147.59 2953.66,-137.59 2950.16,-147.59 2957.16,-147.59"/>
</g>
<!-- node15 -->
<g id="node18" class="node">
<title>node15</title>
<polygon fill="none" stroke="black" points="3514.74,-123.37 3439.66,-135.8 3364.58,-123.37 3393.26,-103.24 3486.06,-103.24 3514.74,-123.37"/>
<text text-anchor="middle" x="3439.66" y="-113.53" font-family="Times,serif" font-size="12.00">torch::cudart</text>
</g>
<!-- node13&#45;&gt;node15 -->
<g id="edge24" class="edge">
<title>node13&#45;&gt;node15</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M3028.35,-180.51C3109.24,-171.17 3241.96,-154.78 3355.66,-135.8 3364.43,-134.34 3373.69,-132.63 3382.72,-130.88"/>
<polygon fill="black" stroke="black" points="3383.38,-134.31 3392.51,-128.93 3382.02,-127.45 3383.38,-134.31"/>
</g>
<!-- node17 -->
<g id="node20" class="node">
<title>node17</title>
<polygon fill="none" stroke="black" points="3716.84,-123.37 3624.66,-135.8 3532.48,-123.37 3567.69,-103.24 3681.63,-103.24 3716.84,-123.37"/>
<text text-anchor="middle" x="3624.66" y="-113.53" font-family="Times,serif" font-size="12.00">torch::nvtoolsext</text>
</g>
<!-- node13&#45;&gt;node17 -->
<g id="edge26" class="edge">
<title>node13&#45;&gt;node17</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M3033.64,-183.25C3144.1,-175.14 3349.47,-158.53 3523.66,-135.8 3534.84,-134.35 3546.67,-132.57 3558.15,-130.72"/>
<polygon fill="black" stroke="black" points="3558.68,-134.18 3567.98,-129.1 3557.54,-127.27 3558.68,-134.18"/>
</g>
<!-- node16 -->
<g id="node19" class="node">
<title>node16</title>
<polygon fill="none" stroke="black" points="3510.78,-27.9 3496.7,-43.12 3439.66,-49.9 3382.63,-43.12 3368.54,-27.9 3408.01,-15.68 3471.31,-15.68 3510.78,-27.9"/>
<text text-anchor="middle" x="3439.66" y="-27.63" font-family="Times,serif" font-size="12.00">CUDA::cudart</text>
</g>
<!-- node15&#45;&gt;node16 -->
<g id="edge25" class="edge">
<title>node15&#45;&gt;node16</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M3439.66,-102.95C3439.66,-91.68 3439.66,-75.4 3439.66,-61.37"/>
<polygon fill="black" stroke="black" points="3443.16,-61.78 3439.66,-51.78 3436.16,-61.78 3443.16,-61.78"/>
</g>
<!-- node18 -->
<g id="node21" class="node">
<title>node18</title>
<polygon fill="none" stroke="black" points="3714.32,-27.9 3696.56,-43.12 3624.66,-49.9 3552.77,-43.12 3535.01,-27.9 3584.76,-15.68 3664.56,-15.68 3714.32,-27.9"/>
<text text-anchor="middle" x="3624.66" y="-27.63" font-family="Times,serif" font-size="12.00">CUDA::nvToolsExt</text>
</g>
<!-- node17&#45;&gt;node18 -->
<g id="edge27" class="edge">
<title>node17&#45;&gt;node18</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M3624.66,-102.95C3624.66,-91.68 3624.66,-75.4 3624.66,-61.37"/>
<polygon fill="black" stroke="black" points="3628.16,-61.78 3624.66,-51.78 3621.16,-61.78 3628.16,-61.78"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

30
docs/BoostAODE.md Normal file
View File

@ -0,0 +1,30 @@
# BoostAODE Algorithm Operation
## Hyperparameters
The hyperparameters defined in the algorithm are:
- ***bisection*** (*boolean*): If set to true allows the algorithm to add *k* models at once (as specified in the algorithm) to the ensemble. Default value: *true*.
- ***bisection_best*** (*boolean*): If set to *true*, the algorithm will take as *priorAccuracy* the best accuracy computed. If set to *false⁺ it will take the last accuracy as *priorAccuracy*. Default value: *false*.
- ***order*** (*{"asc", "desc", "rand"}*): Sets the order (ascending/descending/random) in which dataset variables will be processed to choose the parents of the *SPODEs*. Default value: *"desc"*.
- ***block_update*** (*boolean*): Sets whether the algorithm will update the weights of the models in blocks. If set to false, the algorithm will update the weights of the models one by one. Default value: *false*.
- ***convergence*** (*boolean*): Sets whether the convergence of the result will be used as a termination condition. If this hyperparameter is set to true, the training dataset passed to the model is divided into two sets, one serving as training data and the other as a test set (so the original test partition will become a validation partition in this case). The partition is made by taking the first partition generated by a process of generating a 5 fold partition with stratification using a predetermined seed. The exit condition used in this *convergence* is that the difference between the accuracy obtained by the current model and that obtained by the previous model is greater than *1e-4*; otherwise, one will be added to the number of models that worsen the result (see next hyperparameter). Default value: *true*.
- ***maxTolerance*** (*int*): Sets the maximum number of models that can worsen the result without constituting a termination condition. if ***bisection*** is set to *true*, the value of this hyperparameter will be exponent of base 2 to compute the number of models to insert at once. Default value: *3*
- ***select_features*** (*{"IWSS", "FCBF", "CFS", ""}*): Selects the variable selection method to be used to build initial models for the ensemble that will be included without considering any of the other exit conditions. Once the models of the selected variables are built, the algorithm will update the weights using the ensemble and set the significance of all the models built with the same &alpha;<sub>t</sub>. Default value: *""*.
- ***threshold*** (*double*): Sets the necessary value for the IWSS and FCBF algorithms to function. Accepted values are:
- IWSS: $threshold \in [0, 0.5]$
- FCBF: $threshold \in [10^{-7}, 1]$
Default value is *-1* so every time any of those algorithms are called, the threshold has to be set to the desired value.
- ***predict_voting*** (*boolean*): Sets whether the algorithm will use *model voting* to predict the result. If set to false, the weighted average of the probabilities of each model's prediction will be used. Default value: *false*.
## Operation
### [Base Algorithm](./algorithm.md)

Binary file not shown.

Binary file not shown.

2912
docs/Doxyfile.in Normal file

File diff suppressed because it is too large Load Diff

117
docs/algorithm.md Normal file
View File

@ -0,0 +1,117 @@
# Algorithm
- // notation
- $n$ features ${\cal{X}} = \{X_1, \dots, X_n\}$ and the class $Y$
- $m$ instances.
- $D = \{ (x_1^i, \dots, x_n^i, y^i) \}_{i=1}^{m}$
- $W$ a weights vector. $W_0$ are the initial weights.
- $D[W]$ dataset with weights $W$ for the instances.
1. // initialization
2. $W_0 \leftarrow (w_1, \dots, w_m) \leftarrow 1/m$
3. $W \leftarrow W_0$
4. $Vars \leftarrow {\cal{X}}$
5. $\delta \leftarrow 10^{-4}$
6. $convergence \leftarrow True$ // hyperparameter
7. $maxTolerancia \leftarrow 3$ // hyperparameter
8. $bisection \leftarrow False$ // hyperparameter
9. $finished \leftarrow False$
10. $AODE \leftarrow \emptyset$ // the ensemble
11. $tolerance \leftarrow 0$
12. $numModelsInPack \leftarrow 0$
13. $maxAccuracy \leftarrow -1$
14.
15. // main loop
16. While $(\lnot finished)$
1. $\pi \leftarrow SortFeatures(Vars, criterio, D[W])$
2. $k \leftarrow 2^{tolerance}$
3. if ($tolerance == 0$) $numItemsPack \leftarrow0$
4. $P \leftarrow Head(\pi,k)$ // first k features in order
5. $spodes \leftarrow \emptyset$
6. $i \leftarrow 0$
7. While ($i < size(P)$)
1. $X \leftarrow P[i]$
2. $i \leftarrow i + 1$
3. $numItemsPack \leftarrow numItemsPack + 1$
4. $Vars.remove(X)$
5. $spode \leftarrow BuildSpode(X, {\cal{X}}, D[W])$
6. $\hat{y}[] \leftarrow spode.Predict(D)$
7. $\epsilon \leftarrow error(\hat{y}[], y[])$
8. $\alpha \leftarrow \frac{1}{2} ln \left ( \frac{1-\epsilon}{\epsilon} \right )$
9. if ($\epsilon > 0.5$)
1. $finished \leftarrow True$
2. break
10. $spodes.add( (spode,\alpha_t) )$
11. $W \leftarrow UpdateWeights(W,\alpha,y[],\hat{y}[])$
8. $AODE.add( spodes )$
9. if ($convergence \land \lnot finished$)
1. $\hat{y}[] \leftarrow AODE.Predict(D)$
2. $actualAccuracy \leftarrow accuracy(\hat{y}[], y[])$
3. $if (maxAccuracy == -1)\; maxAccuracy \leftarrow actualAccuracy$
4. if $((accuracy - maxAccuracy) < \delta)$ // result doesn't
improve enough
1. $tolerance \leftarrow tolerance + 1$
5. else
1. $tolerance \leftarrow 0$
2. $numItemsPack \leftarrow 0$
10. If $(Vars == \emptyset \lor tolerance>maxTolerance) \; finished \leftarrow True$
11. $lastAccuracy \leftarrow max(lastAccuracy, actualAccuracy)$
17. if ($tolerance > maxTolerance$) // algorithm finished because of
lack of convergence
1. $removeModels(AODE, numItemsPack)$
18. Return $AODE$

80
docs/algorithm.tex Normal file
View File

@ -0,0 +1,80 @@
\section{Algorithm}
\begin{itemize}
\item[] // notation
\item $n$ features ${\cal{X}} = \{X_1, \dots, X_n\}$ and the class $Y$
\item $m$ instances.
\item $D = \{ (x_1^i, \dots, x_n^i, y^i) \}_{i=1}^{m}$
\item $W$ a weights vector. $W_0$ are the initial weights.
\item $D[W]$ dataset with weights $W$ for the instances.
\end{itemize}
\bigskip
\begin{enumerate}
\item[] // initialization
\item $W_0 \leftarrow (w_1, \dots, w_m) \leftarrow 1/m$
\item $W \leftarrow W_0$
\item $Vars \leftarrow {\cal{X}}$
\item $\delta \leftarrow 10^{-4}$
\item $convergence \leftarrow True$ // hyperparameter
\item $maxTolerancia \leftarrow 3$ // hyperparameter
\item $bisection \leftarrow False$ // hyperparameter
\item $finished \leftarrow False$
\item $AODE \leftarrow \emptyset$ \hspace*{2cm} // the ensemble
\item $tolerance \leftarrow 0$
\item $numModelsInPack \leftarrow 0$
\item $maxAccuracy \leftarrow -1$
\item[]
\newpage
\item[] // main loop
\item While $(\lnot finished)$
\begin{enumerate}
\item $\pi \leftarrow SortFeatures(Vars, criterio, D[W])$
\item $k \leftarrow 2^{tolerance}$
\item if ($tolerance == 0$) $numItemsPack \leftarrow0$
\item $P \leftarrow Head(\pi,k)$ \hspace*{2cm} // first k features in order
\item $spodes \leftarrow \emptyset$
\item $i \leftarrow 0$
\item While ($ i < size(P)$)
\begin{enumerate}
\item $X \leftarrow P[i]$
\item $i \leftarrow i + 1$
\item $numItemsPack \leftarrow numItemsPack + 1$
\item $Vars.remove(X)$
\item $spode \leftarrow BuildSpode(X, {\cal{X}}, D[W])$
\item $\hat{y}[] \leftarrow spode.Predict(D)$
\item $\epsilon \leftarrow error(\hat{y}[], y[])$
\item $\alpha \leftarrow \frac{1}{2} ln \left ( \frac{1-\epsilon}{\epsilon} \right )$
\item if ($\epsilon > 0.5$)
\begin{enumerate}
\item $finished \leftarrow True$
\item break
\end{enumerate}
\item $spodes.add( (spode,\alpha_t) )$
\item $W \leftarrow UpdateWeights(W,\alpha,y[],\hat{y}[])$
\end{enumerate}
\item $AODE.add( spodes )$
\item if ($convergence \land \lnot finished$)
\begin{enumerate}
\item $\hat{y}[] \leftarrow AODE.Predict(D)$
\item $actualAccuracy \leftarrow accuracy(\hat{y}[], y[])$
\item $if (maxAccuracy == -1)\; maxAccuracy \leftarrow actualAccuracy$
\item if $((accuracy - maxAccuracy) < \delta)$\hspace*{2cm} // result doesn't improve enough
\begin{enumerate}
\item $tolerance \leftarrow tolerance + 1$
\end{enumerate}
\item else
\begin{enumerate}
\item $tolerance \leftarrow 0$
\item $numItemsPack \leftarrow 0$
\end{enumerate}
\end{enumerate}
\item If $(Vars == \emptyset \lor tolerance>maxTolerance) \; finished \leftarrow True$
\item $lastAccuracy \leftarrow max(lastAccuracy, actualAccuracy)$
\end{enumerate}
\item if ($tolerance > maxTolerance$) \hspace*{1cm} // algorithm finished because of lack of convergence
\begin{enumerate}
\item $removeModels(AODE, numItemsPack)$
\end{enumerate}
\item Return $AODE$
\end{enumerate}

BIN
docs/logo_small.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

View File

@ -1,4 +0,0 @@
filter = src/
exclude-directories = build_debug/lib/
print-summary = yes
sort-percentage = yes

View File

@ -1,168 +0,0 @@
#include "ArffFiles.h"
#include <fstream>
#include <sstream>
#include <map>
#include <iostream>
ArffFiles::ArffFiles() = default;
std::vector<std::string> ArffFiles::getLines() const
{
return lines;
}
unsigned long int ArffFiles::getSize() const
{
return lines.size();
}
std::vector<std::pair<std::string, std::string>> ArffFiles::getAttributes() const
{
return attributes;
}
std::string ArffFiles::getClassName() const
{
return className;
}
std::string ArffFiles::getClassType() const
{
return classType;
}
std::vector<std::vector<float>>& ArffFiles::getX()
{
return X;
}
std::vector<int>& ArffFiles::getY()
{
return y;
}
void ArffFiles::loadCommon(std::string fileName)
{
std::ifstream file(fileName);
if (!file.is_open()) {
throw std::invalid_argument("Unable to open file");
}
std::string line;
std::string keyword;
std::string attribute;
std::string type;
std::string type_w;
while (getline(file, line)) {
if (line.empty() || line[0] == '%' || line == "\r" || line == " ") {
continue;
}
if (line.find("@attribute") != std::string::npos || line.find("@ATTRIBUTE") != std::string::npos) {
std::stringstream ss(line);
ss >> keyword >> attribute;
type = "";
while (ss >> type_w)
type += type_w + " ";
attributes.emplace_back(trim(attribute), trim(type));
continue;
}
if (line[0] == '@') {
continue;
}
lines.push_back(line);
}
file.close();
if (attributes.empty())
throw std::invalid_argument("No attributes found");
}
void ArffFiles::load(const std::string& fileName, bool classLast)
{
int labelIndex;
loadCommon(fileName);
if (classLast) {
className = std::get<0>(attributes.back());
classType = std::get<1>(attributes.back());
attributes.pop_back();
labelIndex = static_cast<int>(attributes.size());
} else {
className = std::get<0>(attributes.front());
classType = std::get<1>(attributes.front());
attributes.erase(attributes.begin());
labelIndex = 0;
}
generateDataset(labelIndex);
}
void ArffFiles::load(const std::string& fileName, const std::string& name)
{
int labelIndex;
loadCommon(fileName);
bool found = false;
for (int i = 0; i < attributes.size(); ++i) {
if (attributes[i].first == name) {
className = std::get<0>(attributes[i]);
classType = std::get<1>(attributes[i]);
attributes.erase(attributes.begin() + i);
labelIndex = i;
found = true;
break;
}
}
if (!found) {
throw std::invalid_argument("Class name not found");
}
generateDataset(labelIndex);
}
void ArffFiles::generateDataset(int labelIndex)
{
X = std::vector<std::vector<float>>(attributes.size(), std::vector<float>(lines.size()));
auto yy = std::vector<std::string>(lines.size(), "");
auto removeLines = std::vector<int>(); // Lines with missing values
for (size_t i = 0; i < lines.size(); i++) {
std::stringstream ss(lines[i]);
std::string value;
int pos = 0;
int xIndex = 0;
while (getline(ss, value, ',')) {
if (pos++ == labelIndex) {
yy[i] = value;
} else {
if (value == "?") {
X[xIndex++][i] = -1;
removeLines.push_back(i);
} else
X[xIndex++][i] = stof(value);
}
}
}
for (auto i : removeLines) {
yy.erase(yy.begin() + i);
for (auto& x : X) {
x.erase(x.begin() + i);
}
}
y = factorize(yy);
}
std::string ArffFiles::trim(const std::string& source)
{
std::string s(source);
s.erase(0, s.find_first_not_of(" '\n\r\t"));
s.erase(s.find_last_not_of(" '\n\r\t") + 1);
return s;
}
std::vector<int> ArffFiles::factorize(const std::vector<std::string>& labels_t)
{
std::vector<int> yy;
yy.reserve(labels_t.size());
std::map<std::string, int> labelMap;
int i = 0;
for (const std::string& label : labels_t) {
if (labelMap.find(label) == labelMap.end()) {
labelMap[label] = i++;
}
yy.push_back(labelMap[label]);
}
return yy;
}

View File

@ -1,32 +0,0 @@
#ifndef ARFFFILES_H
#define ARFFFILES_H
#include <string>
#include <vector>
class ArffFiles {
private:
std::vector<std::string> lines;
std::vector<std::pair<std::string, std::string>> attributes;
std::string className;
std::string classType;
std::vector<std::vector<float>> X;
std::vector<int> y;
void generateDataset(int);
void loadCommon(std::string);
public:
ArffFiles();
void load(const std::string&, bool = true);
void load(const std::string&, const std::string&);
std::vector<std::string> getLines() const;
unsigned long int getSize() const;
std::string getClassName() const;
std::string getClassType() const;
static std::string trim(const std::string&);
std::vector<std::vector<float>>& getX();
std::vector<int>& getY();
std::vector<std::pair<std::string, std::string>> getAttributes() const;
static std::vector<int> factorize(const std::vector<std::string>& labels_t);
};
#endif

View File

@ -1 +0,0 @@
add_library(ArffFiles ArffFiles.cc)

@ -1 +0,0 @@
Subproject commit ed6ac8a629f9a4206575be784c1e340da2a94855

@ -1 +1 @@
Subproject commit 37316a54e0d558555ae02ae95c8bb083ec063874 Subproject commit 9652853d692ed3b8a38d89f70559209ffb988020

@ -1 +1 @@
Subproject commit 0457de21cffb298c22b629e538036bfeb96130b7 Subproject commit 620034ececc93991c5c1183b73c3768d81ca84b3

2009
lib/log/loguru.cpp Normal file

File diff suppressed because it is too large Load Diff

1475
lib/log/loguru.hpp Normal file

File diff suppressed because it is too large Load Diff

@ -1 +1 @@
Subproject commit 5708dc3de944fc22d61a2dd071b63aa338e04db3 Subproject commit 7d62d6af4a6ca944a3bbde0b61f651fd4b2d3f57

BIN
logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 543 KiB

25
sample/CMakeLists.txt Normal file
View File

@ -0,0 +1,25 @@
cmake_minimum_required(VERSION 3.20)
project(bayesnet_sample)
set(CMAKE_CXX_STANDARD 17)
find_package(Torch REQUIRED)
find_library(BayesNet NAMES libBayesNet BayesNet libBayesNet.a REQUIRED)
find_path(Bayesnet_INCLUDE_DIRS REQUIRED NAMES bayesnet)
find_library(FImdlp NAMES libfimdlp.a PATHS REQUIRED)
message(STATUS "FImdlp=${FImdlp}")
message(STATUS "FImdlp_INCLUDE_DIRS=${FImdlp_INCLUDE_DIRS}")
message(STATUS "BayesNet=${BayesNet}")
message(STATUS "Bayesnet_INCLUDE_DIRS=${Bayesnet_INCLUDE_DIRS}")
include_directories(
../tests/lib/Files
lib/json/include
/usr/local/include
${FImdlp_INCLUDE_DIRS}
)
add_executable(bayesnet_sample sample.cc)
target_link_libraries(bayesnet_sample fimdlp "${TORCH_LIBRARIES}" "${BayesNet}")

View File

@ -0,0 +1,55 @@
// __ _____ _____ _____
// __| | __| | | | JSON for Modern C++
// | | |__ | | | | | | version 3.11.3
// |_____|_____|_____|_|___| https://github.com/nlohmann/json
//
// SPDX-FileCopyrightText: 2013-2023 Niels Lohmann <https://nlohmann.me>
// SPDX-License-Identifier: MIT
#pragma once
#include <utility>
#include <nlohmann/detail/abi_macros.hpp>
#include <nlohmann/detail/conversions/from_json.hpp>
#include <nlohmann/detail/conversions/to_json.hpp>
#include <nlohmann/detail/meta/identity_tag.hpp>
NLOHMANN_JSON_NAMESPACE_BEGIN
/// @sa https://json.nlohmann.me/api/adl_serializer/
template<typename ValueType, typename>
struct adl_serializer
{
/// @brief convert a JSON value to any value type
/// @sa https://json.nlohmann.me/api/adl_serializer/from_json/
template<typename BasicJsonType, typename TargetType = ValueType>
static auto from_json(BasicJsonType && j, TargetType& val) noexcept(
noexcept(::nlohmann::from_json(std::forward<BasicJsonType>(j), val)))
-> decltype(::nlohmann::from_json(std::forward<BasicJsonType>(j), val), void())
{
::nlohmann::from_json(std::forward<BasicJsonType>(j), val);
}
/// @brief convert a JSON value to any value type
/// @sa https://json.nlohmann.me/api/adl_serializer/from_json/
template<typename BasicJsonType, typename TargetType = ValueType>
static auto from_json(BasicJsonType && j) noexcept(
noexcept(::nlohmann::from_json(std::forward<BasicJsonType>(j), detail::identity_tag<TargetType> {})))
-> decltype(::nlohmann::from_json(std::forward<BasicJsonType>(j), detail::identity_tag<TargetType> {}))
{
return ::nlohmann::from_json(std::forward<BasicJsonType>(j), detail::identity_tag<TargetType> {});
}
/// @brief convert any value type to a JSON value
/// @sa https://json.nlohmann.me/api/adl_serializer/to_json/
template<typename BasicJsonType, typename TargetType = ValueType>
static auto to_json(BasicJsonType& j, TargetType && val) noexcept(
noexcept(::nlohmann::to_json(j, std::forward<TargetType>(val))))
-> decltype(::nlohmann::to_json(j, std::forward<TargetType>(val)), void())
{
::nlohmann::to_json(j, std::forward<TargetType>(val));
}
};
NLOHMANN_JSON_NAMESPACE_END

View File

@ -0,0 +1,103 @@
// __ _____ _____ _____
// __| | __| | | | JSON for Modern C++
// | | |__ | | | | | | version 3.11.3
// |_____|_____|_____|_|___| https://github.com/nlohmann/json
//
// SPDX-FileCopyrightText: 2013-2023 Niels Lohmann <https://nlohmann.me>
// SPDX-License-Identifier: MIT
#pragma once
#include <cstdint> // uint8_t, uint64_t
#include <tuple> // tie
#include <utility> // move
#include <nlohmann/detail/abi_macros.hpp>
NLOHMANN_JSON_NAMESPACE_BEGIN
/// @brief an internal type for a backed binary type
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/
template<typename BinaryType>
class byte_container_with_subtype : public BinaryType
{
public:
using container_type = BinaryType;
using subtype_type = std::uint64_t;
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/byte_container_with_subtype/
byte_container_with_subtype() noexcept(noexcept(container_type()))
: container_type()
{}
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/byte_container_with_subtype/
byte_container_with_subtype(const container_type& b) noexcept(noexcept(container_type(b)))
: container_type(b)
{}
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/byte_container_with_subtype/
byte_container_with_subtype(container_type&& b) noexcept(noexcept(container_type(std::move(b))))
: container_type(std::move(b))
{}
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/byte_container_with_subtype/
byte_container_with_subtype(const container_type& b, subtype_type subtype_) noexcept(noexcept(container_type(b)))
: container_type(b)
, m_subtype(subtype_)
, m_has_subtype(true)
{}
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/byte_container_with_subtype/
byte_container_with_subtype(container_type&& b, subtype_type subtype_) noexcept(noexcept(container_type(std::move(b))))
: container_type(std::move(b))
, m_subtype(subtype_)
, m_has_subtype(true)
{}
bool operator==(const byte_container_with_subtype& rhs) const
{
return std::tie(static_cast<const BinaryType&>(*this), m_subtype, m_has_subtype) ==
std::tie(static_cast<const BinaryType&>(rhs), rhs.m_subtype, rhs.m_has_subtype);
}
bool operator!=(const byte_container_with_subtype& rhs) const
{
return !(rhs == *this);
}
/// @brief sets the binary subtype
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/set_subtype/
void set_subtype(subtype_type subtype_) noexcept
{
m_subtype = subtype_;
m_has_subtype = true;
}
/// @brief return the binary subtype
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/subtype/
constexpr subtype_type subtype() const noexcept
{
return m_has_subtype ? m_subtype : static_cast<subtype_type>(-1);
}
/// @brief return whether the value has a subtype
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/has_subtype/
constexpr bool has_subtype() const noexcept
{
return m_has_subtype;
}
/// @brief clears the binary subtype
/// @sa https://json.nlohmann.me/api/byte_container_with_subtype/clear_subtype/
void clear_subtype() noexcept
{
m_subtype = 0;
m_has_subtype = false;
}
private:
subtype_type m_subtype = 0;
bool m_has_subtype = false;
};
NLOHMANN_JSON_NAMESPACE_END

View File

@ -0,0 +1,100 @@
// __ _____ _____ _____
// __| | __| | | | JSON for Modern C++
// | | |__ | | | | | | version 3.11.3
// |_____|_____|_____|_|___| https://github.com/nlohmann/json
//
// SPDX-FileCopyrightText: 2013-2023 Niels Lohmann <https://nlohmann.me>
// SPDX-License-Identifier: MIT
#pragma once
// This file contains all macro definitions affecting or depending on the ABI
#ifndef JSON_SKIP_LIBRARY_VERSION_CHECK
#if defined(NLOHMANN_JSON_VERSION_MAJOR) && defined(NLOHMANN_JSON_VERSION_MINOR) && defined(NLOHMANN_JSON_VERSION_PATCH)
#if NLOHMANN_JSON_VERSION_MAJOR != 3 || NLOHMANN_JSON_VERSION_MINOR != 11 || NLOHMANN_JSON_VERSION_PATCH != 3
#warning "Already included a different version of the library!"
#endif
#endif
#endif
#define NLOHMANN_JSON_VERSION_MAJOR 3 // NOLINT(modernize-macro-to-enum)
#define NLOHMANN_JSON_VERSION_MINOR 11 // NOLINT(modernize-macro-to-enum)
#define NLOHMANN_JSON_VERSION_PATCH 3 // NOLINT(modernize-macro-to-enum)
#ifndef JSON_DIAGNOSTICS
#define JSON_DIAGNOSTICS 0
#endif
#ifndef JSON_USE_LEGACY_DISCARDED_VALUE_COMPARISON
#define JSON_USE_LEGACY_DISCARDED_VALUE_COMPARISON 0
#endif
#if JSON_DIAGNOSTICS
#define NLOHMANN_JSON_ABI_TAG_DIAGNOSTICS _diag
#else
#define NLOHMANN_JSON_ABI_TAG_DIAGNOSTICS
#endif
#if JSON_USE_LEGACY_DISCARDED_VALUE_COMPARISON
#define NLOHMANN_JSON_ABI_TAG_LEGACY_DISCARDED_VALUE_COMPARISON _ldvcmp
#else
#define NLOHMANN_JSON_ABI_TAG_LEGACY_DISCARDED_VALUE_COMPARISON
#endif
#ifndef NLOHMANN_JSON_NAMESPACE_NO_VERSION
#define NLOHMANN_JSON_NAMESPACE_NO_VERSION 0
#endif
// Construct the namespace ABI tags component
#define NLOHMANN_JSON_ABI_TAGS_CONCAT_EX(a, b) json_abi ## a ## b
#define NLOHMANN_JSON_ABI_TAGS_CONCAT(a, b) \
NLOHMANN_JSON_ABI_TAGS_CONCAT_EX(a, b)
#define NLOHMANN_JSON_ABI_TAGS \
NLOHMANN_JSON_ABI_TAGS_CONCAT( \
NLOHMANN_JSON_ABI_TAG_DIAGNOSTICS, \
NLOHMANN_JSON_ABI_TAG_LEGACY_DISCARDED_VALUE_COMPARISON)
// Construct the namespace version component
#define NLOHMANN_JSON_NAMESPACE_VERSION_CONCAT_EX(major, minor, patch) \
_v ## major ## _ ## minor ## _ ## patch
#define NLOHMANN_JSON_NAMESPACE_VERSION_CONCAT(major, minor, patch) \
NLOHMANN_JSON_NAMESPACE_VERSION_CONCAT_EX(major, minor, patch)
#if NLOHMANN_JSON_NAMESPACE_NO_VERSION
#define NLOHMANN_JSON_NAMESPACE_VERSION
#else
#define NLOHMANN_JSON_NAMESPACE_VERSION \
NLOHMANN_JSON_NAMESPACE_VERSION_CONCAT(NLOHMANN_JSON_VERSION_MAJOR, \
NLOHMANN_JSON_VERSION_MINOR, \
NLOHMANN_JSON_VERSION_PATCH)
#endif
// Combine namespace components
#define NLOHMANN_JSON_NAMESPACE_CONCAT_EX(a, b) a ## b
#define NLOHMANN_JSON_NAMESPACE_CONCAT(a, b) \
NLOHMANN_JSON_NAMESPACE_CONCAT_EX(a, b)
#ifndef NLOHMANN_JSON_NAMESPACE
#define NLOHMANN_JSON_NAMESPACE \
nlohmann::NLOHMANN_JSON_NAMESPACE_CONCAT( \
NLOHMANN_JSON_ABI_TAGS, \
NLOHMANN_JSON_NAMESPACE_VERSION)
#endif
#ifndef NLOHMANN_JSON_NAMESPACE_BEGIN
#define NLOHMANN_JSON_NAMESPACE_BEGIN \
namespace nlohmann \
{ \
inline namespace NLOHMANN_JSON_NAMESPACE_CONCAT( \
NLOHMANN_JSON_ABI_TAGS, \
NLOHMANN_JSON_NAMESPACE_VERSION) \
{
#endif
#ifndef NLOHMANN_JSON_NAMESPACE_END
#define NLOHMANN_JSON_NAMESPACE_END \
} /* namespace (inline namespace) NOLINT(readability/namespace) */ \
} // namespace nlohmann
#endif

View File

@ -0,0 +1,497 @@
// __ _____ _____ _____
// __| | __| | | | JSON for Modern C++
// | | |__ | | | | | | version 3.11.3
// |_____|_____|_____|_|___| https://github.com/nlohmann/json
//
// SPDX-FileCopyrightText: 2013-2023 Niels Lohmann <https://nlohmann.me>
// SPDX-License-Identifier: MIT
#pragma once
#include <algorithm> // transform
#include <array> // array
#include <forward_list> // forward_list
#include <iterator> // inserter, front_inserter, end
#include <map> // map
#include <string> // string
#include <tuple> // tuple, make_tuple
#include <type_traits> // is_arithmetic, is_same, is_enum, underlying_type, is_convertible
#include <unordered_map> // unordered_map
#include <utility> // pair, declval
#include <valarray> // valarray
#include <nlohmann/detail/exceptions.hpp>
#include <nlohmann/detail/macro_scope.hpp>
#include <nlohmann/detail/meta/cpp_future.hpp>
#include <nlohmann/detail/meta/identity_tag.hpp>
#include <nlohmann/detail/meta/std_fs.hpp>
#include <nlohmann/detail/meta/type_traits.hpp>
#include <nlohmann/detail/string_concat.hpp>
#include <nlohmann/detail/value_t.hpp>
NLOHMANN_JSON_NAMESPACE_BEGIN
namespace detail
{
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename std::nullptr_t& n)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_null()))
{
JSON_THROW(type_error::create(302, concat("type must be null, but is ", j.type_name()), &j));
}
n = nullptr;
}
// overloads for basic_json template parameters
template < typename BasicJsonType, typename ArithmeticType,
enable_if_t < std::is_arithmetic<ArithmeticType>::value&&
!std::is_same<ArithmeticType, typename BasicJsonType::boolean_t>::value,
int > = 0 >
void get_arithmetic_value(const BasicJsonType& j, ArithmeticType& val)
{
switch (static_cast<value_t>(j))
{
case value_t::number_unsigned:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_unsigned_t*>());
break;
}
case value_t::number_integer:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_integer_t*>());
break;
}
case value_t::number_float:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_float_t*>());
break;
}
case value_t::null:
case value_t::object:
case value_t::array:
case value_t::string:
case value_t::boolean:
case value_t::binary:
case value_t::discarded:
default:
JSON_THROW(type_error::create(302, concat("type must be number, but is ", j.type_name()), &j));
}
}
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename BasicJsonType::boolean_t& b)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_boolean()))
{
JSON_THROW(type_error::create(302, concat("type must be boolean, but is ", j.type_name()), &j));
}
b = *j.template get_ptr<const typename BasicJsonType::boolean_t*>();
}
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename BasicJsonType::string_t& s)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_string()))
{
JSON_THROW(type_error::create(302, concat("type must be string, but is ", j.type_name()), &j));
}
s = *j.template get_ptr<const typename BasicJsonType::string_t*>();
}
template <
typename BasicJsonType, typename StringType,
enable_if_t <
std::is_assignable<StringType&, const typename BasicJsonType::string_t>::value
&& is_detected_exact<typename BasicJsonType::string_t::value_type, value_type_t, StringType>::value
&& !std::is_same<typename BasicJsonType::string_t, StringType>::value
&& !is_json_ref<StringType>::value, int > = 0 >
inline void from_json(const BasicJsonType& j, StringType& s)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_string()))
{
JSON_THROW(type_error::create(302, concat("type must be string, but is ", j.type_name()), &j));
}
s = *j.template get_ptr<const typename BasicJsonType::string_t*>();
}
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename BasicJsonType::number_float_t& val)
{
get_arithmetic_value(j, val);
}
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename BasicJsonType::number_unsigned_t& val)
{
get_arithmetic_value(j, val);
}
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename BasicJsonType::number_integer_t& val)
{
get_arithmetic_value(j, val);
}
#if !JSON_DISABLE_ENUM_SERIALIZATION
template<typename BasicJsonType, typename EnumType,
enable_if_t<std::is_enum<EnumType>::value, int> = 0>
inline void from_json(const BasicJsonType& j, EnumType& e)
{
typename std::underlying_type<EnumType>::type val;
get_arithmetic_value(j, val);
e = static_cast<EnumType>(val);
}
#endif // JSON_DISABLE_ENUM_SERIALIZATION
// forward_list doesn't have an insert method
template<typename BasicJsonType, typename T, typename Allocator,
enable_if_t<is_getable<BasicJsonType, T>::value, int> = 0>
inline void from_json(const BasicJsonType& j, std::forward_list<T, Allocator>& l)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
l.clear();
std::transform(j.rbegin(), j.rend(),
std::front_inserter(l), [](const BasicJsonType & i)
{
return i.template get<T>();
});
}
// valarray doesn't have an insert method
template<typename BasicJsonType, typename T,
enable_if_t<is_getable<BasicJsonType, T>::value, int> = 0>
inline void from_json(const BasicJsonType& j, std::valarray<T>& l)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
l.resize(j.size());
std::transform(j.begin(), j.end(), std::begin(l),
[](const BasicJsonType & elem)
{
return elem.template get<T>();
});
}
template<typename BasicJsonType, typename T, std::size_t N>
auto from_json(const BasicJsonType& j, T (&arr)[N]) // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)
-> decltype(j.template get<T>(), void())
{
for (std::size_t i = 0; i < N; ++i)
{
arr[i] = j.at(i).template get<T>();
}
}
template<typename BasicJsonType>
inline void from_json_array_impl(const BasicJsonType& j, typename BasicJsonType::array_t& arr, priority_tag<3> /*unused*/)
{
arr = *j.template get_ptr<const typename BasicJsonType::array_t*>();
}
template<typename BasicJsonType, typename T, std::size_t N>
auto from_json_array_impl(const BasicJsonType& j, std::array<T, N>& arr,
priority_tag<2> /*unused*/)
-> decltype(j.template get<T>(), void())
{
for (std::size_t i = 0; i < N; ++i)
{
arr[i] = j.at(i).template get<T>();
}
}
template<typename BasicJsonType, typename ConstructibleArrayType,
enable_if_t<
std::is_assignable<ConstructibleArrayType&, ConstructibleArrayType>::value,
int> = 0>
auto from_json_array_impl(const BasicJsonType& j, ConstructibleArrayType& arr, priority_tag<1> /*unused*/)
-> decltype(
arr.reserve(std::declval<typename ConstructibleArrayType::size_type>()),
j.template get<typename ConstructibleArrayType::value_type>(),
void())
{
using std::end;
ConstructibleArrayType ret;
ret.reserve(j.size());
std::transform(j.begin(), j.end(),
std::inserter(ret, end(ret)), [](const BasicJsonType & i)
{
// get<BasicJsonType>() returns *this, this won't call a from_json
// method when value_type is BasicJsonType
return i.template get<typename ConstructibleArrayType::value_type>();
});
arr = std::move(ret);
}
template<typename BasicJsonType, typename ConstructibleArrayType,
enable_if_t<
std::is_assignable<ConstructibleArrayType&, ConstructibleArrayType>::value,
int> = 0>
inline void from_json_array_impl(const BasicJsonType& j, ConstructibleArrayType& arr,
priority_tag<0> /*unused*/)
{
using std::end;
ConstructibleArrayType ret;
std::transform(
j.begin(), j.end(), std::inserter(ret, end(ret)),
[](const BasicJsonType & i)
{
// get<BasicJsonType>() returns *this, this won't call a from_json
// method when value_type is BasicJsonType
return i.template get<typename ConstructibleArrayType::value_type>();
});
arr = std::move(ret);
}
template < typename BasicJsonType, typename ConstructibleArrayType,
enable_if_t <
is_constructible_array_type<BasicJsonType, ConstructibleArrayType>::value&&
!is_constructible_object_type<BasicJsonType, ConstructibleArrayType>::value&&
!is_constructible_string_type<BasicJsonType, ConstructibleArrayType>::value&&
!std::is_same<ConstructibleArrayType, typename BasicJsonType::binary_t>::value&&
!is_basic_json<ConstructibleArrayType>::value,
int > = 0 >
auto from_json(const BasicJsonType& j, ConstructibleArrayType& arr)
-> decltype(from_json_array_impl(j, arr, priority_tag<3> {}),
j.template get<typename ConstructibleArrayType::value_type>(),
void())
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
from_json_array_impl(j, arr, priority_tag<3> {});
}
template < typename BasicJsonType, typename T, std::size_t... Idx >
std::array<T, sizeof...(Idx)> from_json_inplace_array_impl(BasicJsonType&& j,
identity_tag<std::array<T, sizeof...(Idx)>> /*unused*/, index_sequence<Idx...> /*unused*/)
{
return { { std::forward<BasicJsonType>(j).at(Idx).template get<T>()... } };
}
template < typename BasicJsonType, typename T, std::size_t N >
auto from_json(BasicJsonType&& j, identity_tag<std::array<T, N>> tag)
-> decltype(from_json_inplace_array_impl(std::forward<BasicJsonType>(j), tag, make_index_sequence<N> {}))
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
return from_json_inplace_array_impl(std::forward<BasicJsonType>(j), tag, make_index_sequence<N> {});
}
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, typename BasicJsonType::binary_t& bin)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_binary()))
{
JSON_THROW(type_error::create(302, concat("type must be binary, but is ", j.type_name()), &j));
}
bin = *j.template get_ptr<const typename BasicJsonType::binary_t*>();
}
template<typename BasicJsonType, typename ConstructibleObjectType,
enable_if_t<is_constructible_object_type<BasicJsonType, ConstructibleObjectType>::value, int> = 0>
inline void from_json(const BasicJsonType& j, ConstructibleObjectType& obj)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_object()))
{
JSON_THROW(type_error::create(302, concat("type must be object, but is ", j.type_name()), &j));
}
ConstructibleObjectType ret;
const auto* inner_object = j.template get_ptr<const typename BasicJsonType::object_t*>();
using value_type = typename ConstructibleObjectType::value_type;
std::transform(
inner_object->begin(), inner_object->end(),
std::inserter(ret, ret.begin()),
[](typename BasicJsonType::object_t::value_type const & p)
{
return value_type(p.first, p.second.template get<typename ConstructibleObjectType::mapped_type>());
});
obj = std::move(ret);
}
// overload for arithmetic types, not chosen for basic_json template arguments
// (BooleanType, etc..); note: Is it really necessary to provide explicit
// overloads for boolean_t etc. in case of a custom BooleanType which is not
// an arithmetic type?
template < typename BasicJsonType, typename ArithmeticType,
enable_if_t <
std::is_arithmetic<ArithmeticType>::value&&
!std::is_same<ArithmeticType, typename BasicJsonType::number_unsigned_t>::value&&
!std::is_same<ArithmeticType, typename BasicJsonType::number_integer_t>::value&&
!std::is_same<ArithmeticType, typename BasicJsonType::number_float_t>::value&&
!std::is_same<ArithmeticType, typename BasicJsonType::boolean_t>::value,
int > = 0 >
inline void from_json(const BasicJsonType& j, ArithmeticType& val)
{
switch (static_cast<value_t>(j))
{
case value_t::number_unsigned:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_unsigned_t*>());
break;
}
case value_t::number_integer:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_integer_t*>());
break;
}
case value_t::number_float:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_float_t*>());
break;
}
case value_t::boolean:
{
val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::boolean_t*>());
break;
}
case value_t::null:
case value_t::object:
case value_t::array:
case value_t::string:
case value_t::binary:
case value_t::discarded:
default:
JSON_THROW(type_error::create(302, concat("type must be number, but is ", j.type_name()), &j));
}
}
template<typename BasicJsonType, typename... Args, std::size_t... Idx>
std::tuple<Args...> from_json_tuple_impl_base(BasicJsonType&& j, index_sequence<Idx...> /*unused*/)
{
return std::make_tuple(std::forward<BasicJsonType>(j).at(Idx).template get<Args>()...);
}
template < typename BasicJsonType, class A1, class A2 >
std::pair<A1, A2> from_json_tuple_impl(BasicJsonType&& j, identity_tag<std::pair<A1, A2>> /*unused*/, priority_tag<0> /*unused*/)
{
return {std::forward<BasicJsonType>(j).at(0).template get<A1>(),
std::forward<BasicJsonType>(j).at(1).template get<A2>()};
}
template<typename BasicJsonType, typename A1, typename A2>
inline void from_json_tuple_impl(BasicJsonType&& j, std::pair<A1, A2>& p, priority_tag<1> /*unused*/)
{
p = from_json_tuple_impl(std::forward<BasicJsonType>(j), identity_tag<std::pair<A1, A2>> {}, priority_tag<0> {});
}
template<typename BasicJsonType, typename... Args>
std::tuple<Args...> from_json_tuple_impl(BasicJsonType&& j, identity_tag<std::tuple<Args...>> /*unused*/, priority_tag<2> /*unused*/)
{
return from_json_tuple_impl_base<BasicJsonType, Args...>(std::forward<BasicJsonType>(j), index_sequence_for<Args...> {});
}
template<typename BasicJsonType, typename... Args>
inline void from_json_tuple_impl(BasicJsonType&& j, std::tuple<Args...>& t, priority_tag<3> /*unused*/)
{
t = from_json_tuple_impl_base<BasicJsonType, Args...>(std::forward<BasicJsonType>(j), index_sequence_for<Args...> {});
}
template<typename BasicJsonType, typename TupleRelated>
auto from_json(BasicJsonType&& j, TupleRelated&& t)
-> decltype(from_json_tuple_impl(std::forward<BasicJsonType>(j), std::forward<TupleRelated>(t), priority_tag<3> {}))
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
return from_json_tuple_impl(std::forward<BasicJsonType>(j), std::forward<TupleRelated>(t), priority_tag<3> {});
}
template < typename BasicJsonType, typename Key, typename Value, typename Compare, typename Allocator,
typename = enable_if_t < !std::is_constructible <
typename BasicJsonType::string_t, Key >::value >>
inline void from_json(const BasicJsonType& j, std::map<Key, Value, Compare, Allocator>& m)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
m.clear();
for (const auto& p : j)
{
if (JSON_HEDLEY_UNLIKELY(!p.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", p.type_name()), &j));
}
m.emplace(p.at(0).template get<Key>(), p.at(1).template get<Value>());
}
}
template < typename BasicJsonType, typename Key, typename Value, typename Hash, typename KeyEqual, typename Allocator,
typename = enable_if_t < !std::is_constructible <
typename BasicJsonType::string_t, Key >::value >>
inline void from_json(const BasicJsonType& j, std::unordered_map<Key, Value, Hash, KeyEqual, Allocator>& m)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", j.type_name()), &j));
}
m.clear();
for (const auto& p : j)
{
if (JSON_HEDLEY_UNLIKELY(!p.is_array()))
{
JSON_THROW(type_error::create(302, concat("type must be array, but is ", p.type_name()), &j));
}
m.emplace(p.at(0).template get<Key>(), p.at(1).template get<Value>());
}
}
#if JSON_HAS_FILESYSTEM || JSON_HAS_EXPERIMENTAL_FILESYSTEM
template<typename BasicJsonType>
inline void from_json(const BasicJsonType& j, std_fs::path& p)
{
if (JSON_HEDLEY_UNLIKELY(!j.is_string()))
{
JSON_THROW(type_error::create(302, concat("type must be string, but is ", j.type_name()), &j));
}
p = *j.template get_ptr<const typename BasicJsonType::string_t*>();
}
#endif
struct from_json_fn
{
template<typename BasicJsonType, typename T>
auto operator()(const BasicJsonType& j, T&& val) const
noexcept(noexcept(from_json(j, std::forward<T>(val))))
-> decltype(from_json(j, std::forward<T>(val)))
{
return from_json(j, std::forward<T>(val));
}
};
} // namespace detail
#ifndef JSON_HAS_CPP_17
/// namespace to hold default `from_json` function
/// to see why this is required:
/// http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4381.html
namespace // NOLINT(cert-dcl59-cpp,fuchsia-header-anon-namespaces,google-build-namespaces)
{
#endif
JSON_INLINE_VARIABLE constexpr const auto& from_json = // NOLINT(misc-definitions-in-headers)
detail::static_const<detail::from_json_fn>::value;
#ifndef JSON_HAS_CPP_17
} // namespace
#endif
NLOHMANN_JSON_NAMESPACE_END

Some files were not shown because too many files have changed in this diff Show More