QuaPy/quapy/method/base.py

from abc import ABCMeta, abstractmethod
from copy import deepcopy

from joblib import Parallel, delayed
from sklearn.base import BaseEstimator

import quapy as qp
from quapy.data import LabelledCollection
import numpy as np


# Base Quantifier abstract class
# ------------------------------------
class BaseQuantifier(BaseEstimator):
    """
    Abstract Quantifier. A quantifier is defined as an object of a class that implements the method :meth:`fit` on
    :class:`quapy.data.base.LabelledCollection`, the method :meth:`quantify`, and the :meth:`set_params` and
    :meth:`get_params` for model selection (see :meth:`quapy.model_selection.GridSearchQ`)
    """

    @abstractmethod
    def fit(self, data: LabelledCollection):
        """
        Trains a quantifier.

        :param data: a :class:`quapy.data.base.LabelledCollection` consisting of the training data
        :return: self
        """
        ...

    @abstractmethod
    def quantify(self, instances):
        """
        Generate class prevalence estimates for the sample's instances

        :param instances: array-like
        :return: `np.ndarray` of shape `(n_classes,)` with class prevalence estimates.
        """
        ...


class BinaryQuantifier(BaseQuantifier):
    """
    Abstract class of binary quantifiers, i.e., quantifiers estimating class prevalence values for only two classes
    (typically, to be interpreted as one class and its complement).
    """

    def _check_binary(self, data: LabelledCollection, quantifier_name):
        assert data.binary, f'{quantifier_name} works only on problems of binary classification. ' \
                            f'Use the class OneVsAll to enable {quantifier_name} work on single-label data.'


class OneVsAll:
    pass


def newOneVsAll(binary_quantifier, n_jobs=None):
    assert isinstance(binary_quantifier, BaseQuantifier), \
        f'{binary_quantifier} does not seem to be a Quantifier'
    if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):
        return qp.method.aggregative.OneVsAllAggregative(binary_quantifier, n_jobs)
    else:
        return OneVsAllGeneric(binary_quantifier, n_jobs)


class OneVsAllGeneric(OneVsAll,BaseQuantifier):
    """
    Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary
    quantifier for each class, and then l1-normalizes the outputs so that the class prevelence values sum up to 1.
    """

    def __init__(self, binary_quantifier, n_jobs=None):
        assert isinstance(binary_quantifier, BaseQuantifier), \
            f'{binary_quantifier} does not seem to be a Quantifier'
        if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):
            print('[warning] the quantifier seems to be an instance of qp.method.aggregative.AggregativeQuantifier; '
                  f'you might prefer instantiating {qp.method.aggregative.OneVsAllAggregative.__name__}')
        self.binary_quantifier = binary_quantifier
        self.n_jobs = qp._get_njobs(n_jobs)

    def fit(self, data: LabelledCollection, fit_classifier=True):
        assert not data.binary, f'{self.__class__.__name__} expect non-binary data'
        assert fit_classifier == True, 'fit_classifier must be True'

        self.dict_binary_quantifiers = {c: deepcopy(self.binary_quantifier) for c in data.classes_}
        self._parallel(self._delayed_binary_fit, data)
        return self

    def _parallel(self, func, *args, **kwargs):
        return np.asarray(
            Parallel(n_jobs=self.n_jobs, backend='threading')(
                delayed(func)(c, *args, **kwargs) for c in self.classes_
            )
        )

    def quantify(self, instances):
        prevalences = self._parallel(self._delayed_binary_predict, instances)
        return qp.functional.normalize_prevalence(prevalences)

    @property
    def classes_(self):
        return sorted(self.dict_binary_quantifiers.keys())

    def _delayed_binary_predict(self, c, X):
        return self.dict_binary_quantifiers[c].quantify(X)[1]

    def _delayed_binary_fit(self, c, data):
        bindata = LabelledCollection(data.instances, data.labels == c, classes=[False, True])
        self.dict_binary_quantifiers[c].fit(bindata)
many aggregative methods added 2020-12-03 18:12:28 +01:00			`from abc import ABCMeta, abstractmethod`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`from copy import deepcopy`
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`from joblib import Parallel, delayed`
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00			`from sklearn.base import BaseEstimator`

updating parallel policy to take n_jobs from environment (not yet tested) 2022-06-14 09:35:39 +02:00			`import quapy as qp`
import fixes 2021-01-15 18:32:32 +01:00			`from quapy.data import LabelledCollection`
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`import numpy as np`
many aggregative methods added 2020-12-03 18:12:28 +01:00

			`# Base Quantifier abstract class`
			`# ------------------------------------`
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00			`class BaseQuantifier(BaseEstimator):`
adding documentation 2021-12-15 15:27:43 +01:00			`"""`
			Abstract Quantifier. A quantifier is defined as an object of a class that implements the method :meth:`fit` on
			:class:`quapy.data.base.LabelledCollection`, the method :meth:`quantify`, and the :meth:`set_params` and
			:meth:`get_params` for model selection (see :meth:`quapy.model_selection.GridSearchQ`)
			`"""`
many aggregative methods added 2020-12-03 18:12:28 +01:00
			`@abstractmethod`
adding documentation 2021-12-15 15:27:43 +01:00			`def fit(self, data: LabelledCollection):`
			`"""`
			`Trains a quantifier.`

			:param data: a :class:`quapy.data.base.LabelledCollection` consisting of the training data
			`:return: self`
			`"""`
			`...`
many aggregative methods added 2020-12-03 18:12:28 +01:00
			`@abstractmethod`
adding documentation 2021-12-15 15:27:43 +01:00			`def quantify(self, instances):`
			`"""`
			`Generate class prevalence estimates for the sample's instances`

			`:param instances: array-like`
adding documentation and adding one new example 2023-02-08 19:06:53 +01:00			:return: `np.ndarray` of shape `(n_classes,)` with class prevalence estimates.
adding documentation 2021-12-15 15:27:43 +01:00			`"""`
			`...`
many aggregative methods added 2020-12-03 18:12:28 +01:00

added Ensemble methods (methods ALL, ACC, Ptr, DS from Pérez-Gallego et al 2017 and 2019) and some UCI ML datasets used in those articles (only 5 datasets out of 32 they used) 2021-01-06 14:58:29 +01:00			`class BinaryQuantifier(BaseQuantifier):`
adding documentation 2021-12-15 15:27:43 +01:00			`"""`
			`Abstract class of binary quantifiers, i.e., quantifiers estimating class prevalence values for only two classes`
			`(typically, to be interpreted as one class and its complement).`
			`"""`
refit=True default value in GridSearchQ 2021-06-16 13:53:54 +02:00
added Ensemble methods (methods ALL, ACC, Ptr, DS from Pérez-Gallego et al 2017 and 2019) and some UCI ML datasets used in those articles (only 5 datasets out of 32 they used) 2021-01-06 14:58:29 +01:00			`def _check_binary(self, data: LabelledCollection, quantifier_name):`
			`assert data.binary, f'{quantifier_name} works only on problems of binary classification. ' \`
			`f'Use the class OneVsAll to enable {quantifier_name} work on single-label data.'`

updating parallel policy to take n_jobs from environment (not yet tested) 2022-06-14 09:35:39 +02:00
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`class OneVsAll:`
			`pass`


elm examples 2023-02-13 12:01:52 +01:00			`def newOneVsAll(binary_quantifier, n_jobs=None):`
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`assert isinstance(binary_quantifier, BaseQuantifier), \`
			`f'{binary_quantifier} does not seem to be a Quantifier'`
			`if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):`
fixing bugs in one-vs-all 2023-02-10 19:02:17 +01:00			`return qp.method.aggregative.OneVsAllAggregative(binary_quantifier, n_jobs)`
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`else:`
fixing bugs in one-vs-all 2023-02-10 19:02:17 +01:00			`return OneVsAllGeneric(binary_quantifier, n_jobs)`
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00

			`class OneVsAllGeneric(OneVsAll,BaseQuantifier):`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`"""`
			`Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary`
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00			`quantifier for each class, and then l1-normalizes the outputs so that the class prevelence values sum up to 1.`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`"""`
plot functionality added 2021-01-07 17:58:48 +01:00
fixing bugs in one-vs-all 2023-02-10 19:02:17 +01:00			`def __init__(self, binary_quantifier, n_jobs=None):`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`assert isinstance(binary_quantifier, BaseQuantifier), \`
			`f'{binary_quantifier} does not seem to be a Quantifier'`
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):`
			`print('[warning] the quantifier seems to be an instance of qp.method.aggregative.AggregativeQuantifier; '`
			`f'you might prefer instantiating {qp.method.aggregative.OneVsAllAggregative.__name__}')`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`self.binary_quantifier = binary_quantifier`
adding documentation and adding one new example 2023-02-08 19:06:53 +01:00			`self.n_jobs = qp._get_njobs(n_jobs)`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`def fit(self, data: LabelledCollection, fit_classifier=True):`
			`assert not data.binary, f'{self.__class__.__name__} expect non-binary data'`
			`assert fit_classifier == True, 'fit_classifier must be True'`

			`self.dict_binary_quantifiers = {c: deepcopy(self.binary_quantifier) for c in data.classes_}`
			`self._parallel(self._delayed_binary_fit, data)`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`return self`

more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`def _parallel(self, func, args, *kwargs):`
			`return np.asarray(`
fixing bugs in one-vs-all 2023-02-10 19:02:17 +01:00			`Parallel(n_jobs=self.n_jobs, backend='threading')(`
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`delayed(func)(c, args, *kwargs) for c in self.classes_`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`)`
			`)`

more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`def quantify(self, instances):`
			`prevalences = self._parallel(self._delayed_binary_predict, instances)`
			`return qp.functional.normalize_prevalence(prevalences)`
some refactor made in order to accomodate OneVsAll to operate with aggregative probabilistic quantifiers; launching OneVsAll(HDy) 2021-01-18 16:52:19 +01:00
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`@property`
			`def classes_(self):`
			`return sorted(self.dict_binary_quantifiers.keys())`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`def _delayed_binary_predict(self, c, X):`
			`return self.dict_binary_quantifiers[c].quantify(X)[1]`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00
more examples, one-vs-all fixed 2023-02-09 19:39:16 +01:00			`def _delayed_binary_fit(self, c, data):`
			`bindata = LabelledCollection(data.instances, data.labels == c, classes=[False, True])`
			`self.dict_binary_quantifiers[c].fit(bindata)`