First commit

2025-06-22 00:31:33 +02:00
parent a52c20d1fb
commit 4bdbcad256
110 changed files with 31991 additions and 1 deletions
--- a/liblinear-2.49/python/README
+++ b/liblinear-2.49/python/README
@@ -0,0 +1,549 @@
+-------------------------------------
+--- Python interface of LIBLINEAR ---
+-------------------------------------
+
+Table of Contents
+=================
+
+- Introduction
+- Installation via PyPI
+- Installation via Sources
+- Quick Start
+- Quick Start with Scipy
+- Design Description
+- Data Structures
+- Utility Functions
+- Additional Information
+
+Introduction
+============
+
+Python (http://www.python.org/) is a programming language suitable for rapid
+development. This tool provides a simple Python interface to LIBLINEAR, a library
+for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/liblinear). The
+interface is very easy to use as the usage is the same as that of LIBLINEAR. The
+interface is developed with the built-in Python library "ctypes."
+
+Installation via PyPI
+=====================
+
+To install the interface from PyPI, execute the following command:
+
+> pip install -U liblinear-official
+
+Installation via Sources
+========================
+
+Alternatively, you may install the interface from sources by
+generating the LIBLINEAR shared library.
+
+Depending on your use cases, you can choose between local-directory
+and system-wide installation.
+
+- Local-directory installation:
+
+    On Unix systems, type
+
+    > make
+
+    This generates a .so file in the LIBLINEAR main directory and you
+    can run the interface in the current python directory.
+    
+    For Windows, starting from version 2.48, we no longer provide the 
+    pre-built shared library liblinear.dll. To run the interface in the 
+    current python directory, please follow the instruction of building 
+    Windows binaries in LIBLINEAR README. You can copy liblinear.dll to 
+    the system directory (e.g., `C:\WINDOWS\system32\') to make it 
+    system-widely available.
+
+- System-wide installation:
+
+    Type
+
+    > pip install -e .
+
+    or
+
+    > pip install --user -e .
+
+    The option --user would install the package in the home directory
+    instead of the system directory, and thus does not require the
+    root privilege.
+
+    Please note that you must keep the sources after the installation.
+
+    For Windows, to run the above command, Microsoft Visual C++ and
+    other tools are needed.
+
+    In addition, DON'T use the following FAILED commands
+
+    > python setup.py install (failed to run at the python directory)
+    > pip install .
+
+Quick Start
+===========
+
+"Quick Start with Scipy" is in the next section.
+
+There are two levels of usage. The high-level one uses utility
+functions in liblinearutil.py and commonutil.py (shared with LIBSVM
+and imported by svmutil.py). The usage is the same as the LIBLINEAR
+MATLAB interface.
+
+>>> from liblinear.liblinearutil import *
+# Read data in LIBSVM format
+>>> y, x = svm_read_problem('../heart_scale')
+>>> m = train(y[:200], x[:200], '-c 4')
+>>> p_label, p_acc, p_val = predict(y[200:], x[200:], m)
+
+# Construct problem in python format
+# Dense data
+>>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
+# Sparse data
+>>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
+>>> prob  = problem(y, x)
+>>> param = parameter('-s 0 -c 4 -B 1')
+>>> m = train(prob, param)
+
+# Other utility functions
+>>> save_model('heart_scale.model', m)
+>>> m = load_model('heart_scale.model')
+>>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
+>>> ACC, MSE, SCC = evaluations(y, p_label)
+
+# Getting online help
+>>> help(train)
+
+The low-level use directly calls C interfaces imported by liblinear.py. Note that
+all arguments and return values are in ctypes format. You need to handle them
+carefully.
+
+>>> from liblinear.liblinear import *
+>>> prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
+>>> param = parameter('-c 4')
+>>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
+# Convert a Python-format instance to feature_nodearray, a ctypes structure
+>>> x0, max_idx = gen_feature_nodearray({1:1, 3:1})
+>>> label = liblinear.predict(m, x0)
+
+Quick Start with Scipy
+======================
+
+Make sure you have Scipy installed to proceed in this section.
+If numba (http://numba.pydata.org) is installed, some operations will be much faster.
+
+There are two levels of usage. The high-level one uses utility functions
+in liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.
+
+>>> import numpy as np
+>>> import scipy
+>>> from liblinear.liblinearutil import *
+# Read data in LIBSVM format
+>>> y, x = svm_read_problem('../heart_scale', return_scipy = True) # y: ndarray, x: csr_matrix
+>>> m = train(y[:200], x[:200, :], '-c 4')
+>>> p_label, p_acc, p_val = predict(y[200:], x[200:, :], m)
+
+# Construct problem in Scipy format
+# Dense data: numpy ndarray
+>>> y, x = np.asarray([1,-1]), np.asarray([[1,0,1], [-1,0,-1]])
+# Sparse data: scipy csr_matrix((data, (row_ind, col_ind))
+>>> y, x = np.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2])))
+>>> prob  = problem(y, x)
+>>> param = parameter('-s 0 -c 4 -B 1')
+>>> m = train(prob, param)
+
+# Apply data scaling in Scipy format
+>>> y, x = svm_read_problem('../heart_scale', return_scipy=True)
+>>> scale_param = csr_find_scale_param(x, lower=0)
+>>> scaled_x = csr_scale(x, scale_param)
+
+# Other utility functions
+>>> save_model('heart_scale.model', m)
+>>> m = load_model('heart_scale.model')
+>>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
+>>> ACC, MSE, SCC = evaluations(y, p_label)
+
+# Getting online help
+>>> help(train)
+
+The low-level use directly calls C interfaces imported by liblinear.py. Note that
+all arguments and return values are in ctypes format. You need to handle them
+carefully.
+
+>>> from liblinear.liblinear import *
+>>> prob = problem(np.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2]))))
+>>> param = parameter('-s 1 -c 4')
+# One may also direct assign the options after creating the parameter instance
+>>> param = parameter()
+>>> param.solver_type = 1
+>>> param.C = 4
+>>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
+# Convert a tuple of ndarray (index, data) to feature_nodearray, a ctypes structure
+# Note that index starts from 0, though the following example will be changed to 1:1, 3:1 internally
+>>> x0, max_idx = gen_feature_nodearray((np.asarray([0,2]), np.asarray([1,1])))
+>>> label = liblinear.predict(m, x0)
+
+Design Description
+==================
+
+There are two files liblinear.py and liblinearutil.py, which respectively correspond to
+low-level and high-level use of the interface.
+
+In liblinear.py, we adopt the Python built-in library "ctypes," so that
+Python can directly access C structures and interface functions defined
+in linear.h.
+
+While advanced users can use structures/functions in liblinear.py, to
+avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
+functions. The usage is similar to LIBLINEAR MATLAB interface.
+
+Data Structures
+===============
+
+Three data structures derived from linear.h are node, problem, and
+parameter. They all contain fields with the same names in
+linear.h. Access these fields carefully because you directly use a C structure
+instead of a Python object. The following description introduces additional
+fields and methods.
+
+Before using the data structures, execute the following command to load the
+LIBLINEAR shared library:
+
+    >>> from liblinear.liblinear import *
+
+- class feature_node:
+
+    Construct a feature_node.
+
+    >>> node = feature_node(idx, val)
+
+    idx: an integer indicates the feature index.
+
+    val: a float indicates the feature value.
+
+    Show the index and the value of a node.
+
+    >>> print(node)
+
+- Function: gen_feature_nodearray(xi [,feature_max=None])
+
+    Generate a feature vector from a Python list/tuple/dictionary, numpy ndarray or tuple of (index, data):
+
+    >>> xi_ctype, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})
+
+    xi_ctype: the returned feature_nodearray (a ctypes structure)
+
+    max_idx: the maximal feature index of xi
+
+    feature_max: if feature_max is assigned, features with indices larger than
+                 feature_max are removed.
+
+- class problem:
+
+    Construct a problem instance
+
+    >>> prob = problem(y, x [,bias=-1])
+
+    y: a Python list/tuple/ndarray of l labels (type must be int/double).
+
+    x: 1. a list/tuple of l training instances. Feature vector of
+          each training instance is a list/tuple or dictionary.
+
+       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
+
+    bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term
+          added (default -1)
+
+    You can also modify the bias value by
+
+    >>> prob.set_bias(1)
+
+    Note that if your x contains sparse data (i.e., dictionary), the internal
+    ctypes data format is still sparse.
+
+    Copy a problem instance.
+    DON'T use this unless you know what it does.
+
+    >>> prob_copy = prob.copy()
+
+    The reason we need to copy a problem instance is because, for example,
+    in multi-label tasks using the OVR setting, we need to train a binary
+    classification problem for each label on data with/without that label.
+    Since each training uses the same x but different y, simply looping
+    over labels and creating a new problem instance for each training would
+    introduce overhead either from repeatedly transforming x into feature_node
+    or from allocating additional memory space for x during parallel training.
+
+    With problem copying, suppose x represents data, y1 represents the label
+    for class 1 and y2 represent the label for class 2.
+    We can do:
+
+    >>> class1_prob = problem(y1, x)
+    >>> class2_prob = class1_prob.copy()
+    >>> class2_prob.y = (ctypes.c_double * class1_prob.l)(*y2)
+
+    Note that although the copied problem is a new instance, attributes such as
+    y (POINTER(c_double)), x (POINTER(POINTER(feature_node))), and x_space (list/np.ndarray)
+    are copied by reference. That is, class1_prob and class2_prob share the same
+    y, x and x_space after the copy.
+
+- class parameter:
+
+    Construct a parameter instance
+
+    >>> param = parameter('training_options')
+
+    If 'training_options' is empty, LIBLINEAR default values are applied.
+
+    Set param to LIBLINEAR default values.
+
+    >>> param.set_to_default_values()
+
+    Parse a string of options.
+
+    >>> param.parse_options('training_options')
+
+    Show values of parameters.
+
+    >>> print(param)
+
+- class model:
+
+    There are two ways to obtain an instance of model:
+
+    >>> model_ = train(y, x)
+    >>> model_ = load_model('model_file_name')
+
+    Note that the returned structure of interface functions
+    liblinear.train and liblinear.load_model is a ctypes pointer of
+    model, which is different from the model object returned
+    by train and load_model in liblinearutil.py. We provide a
+    function toPyModel for the conversion:
+
+    >>> model_ptr = liblinear.train(prob, param)
+    >>> model_ = toPyModel(model_ptr)
+
+    If you obtain a model in a way other than the above approaches,
+    handle it carefully to avoid memory leak or segmentation fault.
+
+    Some interface functions to access LIBLINEAR models are wrapped as
+    members of the class model:
+
+    >>> nr_feature =  model_.get_nr_feature()
+    >>> nr_class = model_.get_nr_class()
+    >>> class_labels = model_.get_labels()
+    >>> is_prob_model = model_.is_probability_model()
+    >>> is_regression_model = model_.is_regression_model()
+
+    The decision function is W*x + b, where
+        W is an nr_class-by-nr_feature matrix, and
+        b is a vector of size nr_class.
+    To access W_kj (i.e., coefficient for the k-th class and the j-th feature)
+    and b_k (i.e., bias for the k-th class), use the following functions.
+
+    >>> W_kj = model_.get_decfun_coef(feat_idx=j, label_idx=k)
+    >>> b_k = model_.get_decfun_bias(label_idx=k)
+
+    We also provide a function to extract w_k (i.e., the k-th row of W) and
+    b_k directly as follows.
+
+    >>> [w_k, b_k] = model_.get_decfun(label_idx=k)
+
+    Note that w_k is a Python list of length nr_feature, which means that
+        w_k[0] = W_k1.
+    For regression models, W is just a vector of length nr_feature. Either
+    set label_idx=0 or omit the label_idx parameter to access the coefficients.
+
+    >>> W_j = model_.get_decfun_coef(feat_idx=j)
+    >>> b = model_.get_decfun_bias()
+    >>> [W, b] = model_.get_decfun()
+
+    For one-class SVM models, label_idx is ignored and b=-rho is
+    returned from get_decfun(). That is, the decision function is
+    w*x+b = w*x-rho.
+
+    >>> rho = model_.get_decfun_rho()
+    >>> [W, b] = model_.get_decfun()
+
+    Note that in get_decfun_coef, get_decfun_bias, and get_decfun, feat_idx
+    starts from 1, while label_idx starts from 0. If label_idx is not in the
+    valid range (0 to nr_class-1), then a NaN will be returned; and if feat_idx
+    is not in the valid range (1 to nr_feature), then a zero value will be
+    returned. For regression models, label_idx is ignored.
+
+Utility Functions
+=================
+
+To use utility functions, type
+
+    >>> from liblinear.liblinearutil import *
+
+The above command loads
+    train()                : train a linear model
+    predict()              : predict testing data
+    svm_read_problem()     : read the data from a LIBSVM-format file or object.
+    load_model()           : load a LIBLINEAR model.
+    save_model()           : save model to a file.
+    evaluations()          : evaluate prediction results.
+    csr_find_scale_param() : find scaling parameter for data in csr format.
+    csr_scale()            : apply data scaling to data in csr format.
+
+- Function: train
+
+    There are three ways to call train()
+
+    >>> model = train(y, x [, 'training_options'])
+    >>> model = train(prob [, 'training_options'])
+    >>> model = train(prob, param)
+
+    y: a list/tuple/ndarray of l training labels (type must be int/double).
+
+    x: 1. a list/tuple of l training instances. Feature vector of
+          each training instance is a list/tuple or dictionary.
+
+       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
+
+    training_options: a string in the same form as that for LIBLINEAR command
+                      mode.
+
+    prob: a problem instance generated by calling
+          problem(y, x).
+
+    param: a parameter instance generated by calling
+           parameter('training_options')
+
+    model: the returned model instance. See linear.h for details of this
+           structure. If '-v' is specified, cross validation is
+           conducted and the returned model is just a scalar: cross-validation
+           accuracy for classification and mean-squared error for regression.
+
+           If the '-C' option is specified, best parameters are found
+           by cross validation. The parameter selection utility is supported
+           only by -s 0, -s 2 (for finding C) and -s 11 (for finding C, p).
+           The returned structure is a triple with the best C, the best p,
+           and the corresponding cross-validation accuracy or mean squared
+           error. The returned best p for -s 0 and -s 2 is set to -1 because
+           the p parameter is not used by classification models.
+
+
+    To train the same data many times with different
+    parameters, the second and the third ways should be faster..
+
+    Examples:
+
+    >>> y, x = svm_read_problem('../heart_scale')
+    >>> prob = problem(y, x)
+    >>> param = parameter('-s 3 -c 5 -q')
+    >>> m = train(y, x, '-c 5')
+    >>> m = train(prob, '-w1 5 -c 5')
+    >>> m = train(prob, param)
+    >>> CV_ACC = train(y, x, '-v 3')
+    >>> best_C, best_p, best_rate = train(y, x, '-C -s 0') # best_p is only for -s 11
+    >>> m = train(y, x, '-c {0} -s 0'.format(best_C)) # use the same solver: -s 0
+
+- Function: predict
+
+    To predict testing data with a model, use
+
+    >>> p_labs, p_acc, p_vals = predict(y, x, model [,'predicting_options'])
+
+    y: a list/tuple/ndarray of l true labels (type must be int/double).
+       It is used for calculating the accuracy. Use [] if true labels are
+       unavailable.
+
+    x: 1. a list/tuple of l training instances. Feature vector of
+          each training instance is a list/tuple or dictionary.
+
+       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
+
+    predicting_options: a string of predicting options in the same format as
+                        that of LIBLINEAR.
+
+    model: a model instance.
+
+    p_labels: a list of predicted labels
+
+    p_acc: a tuple including accuracy (for classification), mean
+           squared error, and squared correlation coefficient (for
+           regression).
+
+    p_vals: a list of decision values or probability estimates (if '-b 1'
+            is specified). If k is the number of classes, for decision values,
+            each element includes results of predicting k binary-class
+            SVMs. If k = 2 and solver is not MCSVM_CS, only one decision value
+            is returned. For probabilities, each element contains k values
+            indicating the probability that the testing instance is in each class.
+            Note that the order of classes here is the same as 'model.label'
+            field in the model structure.
+
+    Example:
+
+    >>> m = train(y, x, '-c 5')
+    >>> p_labels, p_acc, p_vals = predict(y, x, m)
+
+- Functions: svm_read_problem/load_model/save_model
+
+    See the usage by examples:
+
+    >>> y, x = svm_read_problem('data.txt')
+    >>> with open('data.txt') as f:
+    >>>     y, x = svm_read_problem(f)
+    >>> m = load_model('model_file')
+    >>> save_model('model_file', m)
+
+- Function: evaluations
+
+    Calculate some evaluations using the true values (ty) and the predicted
+    values (pv):
+
+    >>> (ACC, MSE, SCC) = evaluations(ty, pv, useScipy)
+
+    ty: a list/tuple/ndarray of true values.
+
+    pv: a list/tuple/ndarray of predicted values.
+
+    useScipy: convert ty, pv to ndarray, and use scipy functions to do the evaluation
+
+    ACC: accuracy.
+
+    MSE: mean squared error.
+
+    SCC: squared correlation coefficient.
+
+- Function: csr_find_scale_parameter/csr_scale
+
+    Scale data in csr format.
+
+    >>> param = csr_find_scale_param(x [, lower=l, upper=u])
+    >>> x = csr_scale(x, param)
+
+    x: a csr_matrix of data.
+
+    l: x scaling lower limit; default -1.
+
+    u: x scaling upper limit; default 1.
+
+    The scaling process is: x * diag(coef) + ones(l, 1) * offset'
+
+    param: a dictionary of scaling parameters, where param['coef'] = coef and param['offset'] = offset.
+
+    coef: a scipy array of scaling coefficients.
+
+    offset: a scipy array of scaling offsets.
+
+Additional Information
+======================
+
+This interface was originally written by Hsiang-Fu Yu from Department of Computer
+Science, National Taiwan University. If you find this tool useful, please
+cite LIBLINEAR as follows
+
+R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
+LIBLINEAR: A Library for Large Linear Classification, Journal of
+Machine Learning Research 9(2008), 1871-1874. Software available at
+http://www.csie.ntu.edu.tw/~cjlin/liblinear
+
+For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
+or check the FAQ page:
+
+http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html