quapy.method package

Submodules

quapy.method.aggregative module

class quapy.method.aggregative.ACC(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.AggregativeQuantifier

aggregate(classif_predictions)
classify(data)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, int, quapy.data.base.LabelledCollection]] = None)

Trains a ACC quantifier :param data: the training set :param fit_learner: set to False to bypass the training (the learner is assumed to be already fit) :param val_split: either a float in (0,1) indicating the proportion of training instances to use for

validation (e.g., 0.3 for using 30% of the training set as validation data), or a LabelledCollection indicating the validation set itself, or an int indicating the number k of folds to be used in kFCV to estimate the parameters

Returns

self

classmethod solve_adjustment(PteCondEstim, prevs_estim)
quapy.method.aggregative.AdjustedClassifyAndCount

alias of quapy.method.aggregative.ACC

class quapy.method.aggregative.AggregativeProbabilisticQuantifier

Bases: quapy.method.aggregative.AggregativeQuantifier

Abstract class for quantification methods that base their estimations on the aggregation of posterior probabilities as returned by a probabilistic classifier. Aggregative Probabilistic Quantifiers thus extend Aggregative Quantifiers by implementing a _posterior_probabilities_ method returning values in [0,1] – the posterior probabilities.

posterior_probabilities(instances)
predict_proba(instances)
property probabilistic
quantify(instances)
set_params(**parameters)
class quapy.method.aggregative.AggregativeQuantifier

Bases: quapy.method.base.BaseQuantifier

Abstract class for quantification methods that base their estimations on the aggregation of classification results. Aggregative Quantifiers thus implement a _classify_ method and maintain a _learner_ attribute.

abstract aggregate(classif_predictions: numpy.ndarray)
property aggregative
property classes_
classify(instances)
abstract fit(data: quapy.data.base.LabelledCollection, fit_learner=True)
get_params(deep=True)
property learner
property n_classes
quantify(instances)
set_params(**parameters)
class quapy.method.aggregative.CC(learner: sklearn.base.BaseEstimator)

Bases: quapy.method.aggregative.AggregativeQuantifier

The most basic Quantification method. One that simply classifies all instances and countes how many have been attributed each of the classes in order to compute class prevalence estimates.

aggregate(classif_predictions)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True)

Trains the Classify & Count method unless _fit_learner_ is False, in which case it is assumed to be already fit. :param data: training data :param fit_learner: if False, the classifier is assumed to be fit :return: self

quapy.method.aggregative.ClassifyAndCount

alias of quapy.method.aggregative.CC

class quapy.method.aggregative.ELM(svmperf_base=None, loss='01', **kwargs)

Bases: quapy.method.aggregative.AggregativeQuantifier, quapy.method.base.BinaryQuantifier

aggregate(classif_predictions: numpy.ndarray)
classify(X, y=None)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True)
class quapy.method.aggregative.EMQ(learner: sklearn.base.BaseEstimator)

Bases: quapy.method.aggregative.AggregativeProbabilisticQuantifier

The method is described in: Saerens, M., Latinne, P., and Decaestecker, C. (2002). Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neural Computation, 14(1): 21–41.

classmethod EM(tr_prev, posterior_probabilities, epsilon=0.0001)
EPSILON = 0.0001
MAX_ITER = 1000
aggregate(classif_posteriors, epsilon=0.0001)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True)
predict_proba(instances, epsilon=0.0001)
quapy.method.aggregative.ExpectationMaximizationQuantifier

alias of quapy.method.aggregative.EMQ

quapy.method.aggregative.ExplicitLossMinimisation

alias of quapy.method.aggregative.ELM

class quapy.method.aggregative.HDy(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.AggregativeProbabilisticQuantifier, quapy.method.base.BinaryQuantifier

Implementation of the method based on the Hellinger Distance y (HDy) proposed by González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution estimation based on the Hellinger distance. Information Sciences, 218:146–164.

aggregate(classif_posteriors)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, quapy.data.base.LabelledCollection]] = None)

Trains a HDy quantifier :param data: the training set :param fit_learner: set to False to bypass the training (the learner is assumed to be already fit) :param val_split: either a float in (0,1) indicating the proportion of training instances to use for

validation (e.g., 0.3 for using 30% of the training set as validation data), or a LabelledCollection indicating the validation set itself

Returns

self

quapy.method.aggregative.HellingerDistanceY

alias of quapy.method.aggregative.HDy

class quapy.method.aggregative.MAX(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.ThresholdOptimization

class quapy.method.aggregative.MS(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.ThresholdOptimization

optimize_threshold(y, probabilities)
class quapy.method.aggregative.MS2(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.MS

optimize_threshold(y, probabilities)
quapy.method.aggregative.MedianSweep

alias of quapy.method.aggregative.MS

quapy.method.aggregative.MedianSweep2

alias of quapy.method.aggregative.MS2

class quapy.method.aggregative.OneVsAll(binary_quantifier, n_jobs=- 1)

Bases: quapy.method.aggregative.AggregativeQuantifier

Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary quantifier for each class, and then l1-normalizes the outputs so that the class prevelences sum up to 1. This variant was used, along with the ExplicitLossMinimization quantifier in Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. Social Network Analysis and Mining 6(19), 1–22 (2016)

aggregate(classif_predictions_bin)
property binary
property classes_
classify(instances)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True)
get_params(deep=True)
posterior_probabilities(instances)
property probabilistic
quantify(X)
set_params(**parameters)
class quapy.method.aggregative.PACC(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.AggregativeProbabilisticQuantifier

aggregate(classif_posteriors)
classify(data)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, int, quapy.data.base.LabelledCollection]] = None)

Trains a PACC quantifier :param data: the training set :param fit_learner: set to False to bypass the training (the learner is assumed to be already fit) :param val_split: either a float in (0,1) indicating the proportion of training instances to use for

validation (e.g., 0.3 for using 30% of the training set as validation data), or a LabelledCollection indicating the validation set itself, or an int indicating the number k of folds to be used in kFCV to estimate the parameters

Returns

self

class quapy.method.aggregative.PCC(learner: sklearn.base.BaseEstimator)

Bases: quapy.method.aggregative.AggregativeProbabilisticQuantifier

aggregate(classif_posteriors)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True)
quapy.method.aggregative.ProbabilisticAdjustedClassifyAndCount

alias of quapy.method.aggregative.PACC

quapy.method.aggregative.ProbabilisticClassifyAndCount

alias of quapy.method.aggregative.PCC

class quapy.method.aggregative.SVMAE(svmperf_base=None, **kwargs)

Bases: quapy.method.aggregative.ELM

class quapy.method.aggregative.SVMKLD(svmperf_base=None, **kwargs)

Bases: quapy.method.aggregative.ELM

Esuli, A. and Sebastiani, F. (2015). Optimizing text quantifiers for multivariate loss functions. ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.

class quapy.method.aggregative.SVMNKLD(svmperf_base=None, **kwargs)

Bases: quapy.method.aggregative.ELM

Esuli, A. and Sebastiani, F. (2015). Optimizing text quantifiers for multivariate loss functions. ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.

class quapy.method.aggregative.SVMQ(svmperf_base=None, **kwargs)

Bases: quapy.method.aggregative.ELM

Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based on reliable classifiers. Pattern Recognition, 48(2):591–604.

class quapy.method.aggregative.SVMRAE(svmperf_base=None, **kwargs)

Bases: quapy.method.aggregative.ELM

class quapy.method.aggregative.T50(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.ThresholdOptimization

class quapy.method.aggregative.ThresholdOptimization(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.AggregativeQuantifier, quapy.method.base.BinaryQuantifier

aggregate(classif_predictions)
compute_fpr(FP, TN)
compute_table(y, y_)
compute_tpr(TP, FP)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, int, quapy.data.base.LabelledCollection]] = None)
optimize_threshold(y, probabilities)
class quapy.method.aggregative.X(learner: sklearn.base.BaseEstimator, val_split=0.4)

Bases: quapy.method.aggregative.ThresholdOptimization

quapy.method.aggregative.training_helper(learner, data: quapy.data.base.LabelledCollection, fit_learner: bool = True, ensure_probabilistic=False, val_split: Optional[Union[float, quapy.data.base.LabelledCollection]] = None)

Training procedure common to all Aggregative Quantifiers. :param learner: the learner to be fit :param data: the data on which to fit the learner. If requested, the data will be split before fitting the learner. :param fit_learner: whether or not to fit the learner (if False, then bypasses any action) :param ensure_probabilistic: if True, guarantees that the resulting classifier implements predict_proba (if the learner is not probabilistic, then a CalibratedCV instance of it is trained) :param val_split: if specified as a float, indicates the proportion of training instances that will define the validation split (e.g., 0.3 for using 30% of the training set as validation data); if specified as a LabelledCollection, represents the validation split itself :return: the learner trained on the training set, and the unused data (a _LabelledCollection_ if train_val_split>0 or None otherwise) to be used as a validation set for any subsequent parameter fitting

quapy.method.base module

class quapy.method.base.BaseQuantifier

Bases: object

property aggregative
property binary
abstract property classes_
abstract fit(data: quapy.data.base.LabelledCollection)
abstract get_params(deep=True)
property probabilistic
abstract quantify(instances)
abstract set_params(**parameters)
class quapy.method.base.BinaryQuantifier

Bases: quapy.method.base.BaseQuantifier

property binary
quapy.method.base.isaggregative(model: quapy.method.base.BaseQuantifier)
quapy.method.base.isbinary(model: quapy.method.base.BaseQuantifier)
quapy.method.base.isprobabilistic(model: quapy.method.base.BaseQuantifier)

quapy.method.meta module

quapy.method.meta.EACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)
quapy.method.meta.ECC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)
quapy.method.meta.EEMQ(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)
quapy.method.meta.EHDy(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)
quapy.method.meta.EPACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)
class quapy.method.meta.Ensemble(quantifier: quapy.method.base.BaseQuantifier, size=50, red_size=25, min_pos=5, policy='ave', max_sample_size=None, val_split=None, n_jobs=1, verbose=False)

Bases: quapy.method.base.BaseQuantifier

VALID_POLICIES = {'ave', 'ds', 'mae', 'mkld', 'mnkld', 'mrae', 'mse', 'ptr'}

Methods from the articles: Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017). Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. Information Fusion, 34, 87-100. and Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019). Dynamic ensemble selection for quantification tasks. Information Fusion, 45, 1-15.

accuracy_policy(error_name)

Selects the red_size best performant quantifiers in a static way (i.e., dropping all non-selected instances). For each model in the ensemble, the performance is measured in terms of _error_name_ on the quantification of the samples used for training the rest of the models in the ensemble.

property aggregative
property binary
property classes_
ds_policy(predictions, test)
ds_policy_get_posteriors(data: quapy.data.base.LabelledCollection)

In the original article, this procedure is not described in a sufficient level of detail. The paper only says that the distribution of posterior probabilities from training and test examples is compared by means of the Hellinger Distance. However, how these posterior probabilities are generated is not specified. In the article, a Logistic Regressor (LR) is used as the classifier device and that could be used for this purpose. However, in general, a Quantifier is not necessarily an instance of Aggreggative Probabilistic Quantifiers, and so, that the quantifier builds on top of a probabilistic classifier cannot be given for granted. Additionally, it would not be correct to generate the posterior probabilities for training documents that have concurred in training the classifier that generates them. This function thus generates the posterior probabilities for all training documents in a cross-validation way, using a LR with hyperparameters that have previously been optimized via grid search in 5FCV. :return P,f, where P is a ndarray containing the posterior probabilities of the training data, generated via cross-validation and using an optimized LR, and the function to be used in order to generate posterior probabilities for test instances.

fit(data: quapy.data.base.LabelledCollection, val_split: Optional[Union[float, quapy.data.base.LabelledCollection]] = None)
get_params(deep=True)
property probabilistic
ptr_policy(predictions)

Selects the predictions made by models that have been trained on samples with a prevalence that is most similar to a first approximation of the test prevalence as made by all models in the ensemble.

quantify(instances)
set_params(**parameters)
sout(msg)
quapy.method.meta.ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None, param_model_sel: Optional[dict] = None, **kwargs)
quapy.method.meta.get_probability_distribution(posterior_probabilities, bins=8)

quapy.method.neural module

class quapy.method.neural.QuaNetModule(doc_embedding_size, n_classes, stats_size, lstm_hidden_size=64, lstm_nlayers=1, ff_layers=[1024, 512], bidirectional=True, qdrop_p=0.5, order_by=0)

Bases: torch.nn.modules.module.Module

property device
forward(doc_embeddings, doc_posteriors, statistics)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_hidden()
class quapy.method.neural.QuaNetTrainer(learner, sample_size, n_epochs=100, tr_iter_per_poch=500, va_iter_per_poch=100, lr=0.001, lstm_hidden_size=64, lstm_nlayers=1, ff_layers=[1024, 512], bidirectional=True, qdrop_p=0.5, patience=10, checkpointdir='../checkpoint', checkpointname=None, device='cuda')

Bases: quapy.method.base.BaseQuantifier

property classes_
clean_checkpoint()
clean_checkpoint_dir()
epoch(data: quapy.data.base.LabelledCollection, posteriors, iterations, epoch, early_stop, train)
fit(data: quapy.data.base.LabelledCollection, fit_learner=True)
Parameters

data – the training data on which to train QuaNet. If fit_learner=True, the data will be split in

40/40/20 for training the classifier, training QuaNet, and validating QuaNet, respectively. If fit_learner=False, the data will be split in 66/34 for training QuaNet and validating it, respectively. :param fit_learner: if true, trains the classifier on a split containing 40% of the data :return: self

get_aggregative_estims(posteriors)
get_params(deep=True)
quantify(instances, *args)
set_params(**parameters)
quapy.method.neural.mae_loss(output, target)

quapy.method.non_aggregative module

class quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation(**kwargs)

Bases: quapy.method.base.BaseQuantifier

property classes_
fit(data: quapy.data.base.LabelledCollection, *args)
get_params()
quantify(documents, *args)
set_params(**parameters)

Module contents