quapy.method package¶
Submodules¶
quapy.method.aggregative module¶
- class quapy.method.aggregative.ACC(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
Bases:
quapy.method.aggregative.AggregativeQuantifier
- aggregate(classif_predictions)¶
- classify(data)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, int, quapy.data.base.LabelledCollection]] = None)¶
Trains a ACC quantifier :param data: the training set :param fit_learner: set to False to bypass the training (the learner is assumed to be already fit) :param val_split: either a float in (0,1) indicating the proportion of training instances to use for
validation (e.g., 0.3 for using 30% of the training set as validation data), or a LabelledCollection indicating the validation set itself, or an int indicating the number k of folds to be used in kFCV to estimate the parameters
- Returns
self
- classmethod solve_adjustment(PteCondEstim, prevs_estim)¶
- quapy.method.aggregative.AdjustedClassifyAndCount¶
alias of
quapy.method.aggregative.ACC
- class quapy.method.aggregative.AggregativeProbabilisticQuantifier¶
Bases:
quapy.method.aggregative.AggregativeQuantifier
Abstract class for quantification methods that base their estimations on the aggregation of posterior probabilities as returned by a probabilistic classifier. Aggregative Probabilistic Quantifiers thus extend Aggregative Quantifiers by implementing a _posterior_probabilities_ method returning values in [0,1] – the posterior probabilities.
- posterior_probabilities(instances)¶
- predict_proba(instances)¶
- property probabilistic¶
- quantify(instances)¶
- set_params(**parameters)¶
- class quapy.method.aggregative.AggregativeQuantifier¶
Bases:
quapy.method.base.BaseQuantifier
Abstract class for quantification methods that base their estimations on the aggregation of classification results. Aggregative Quantifiers thus implement a _classify_ method and maintain a _learner_ attribute.
- abstract aggregate(classif_predictions: numpy.ndarray)¶
- property aggregative¶
- property classes_¶
- classify(instances)¶
- abstract fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
- get_params(deep=True)¶
- property learner¶
- property n_classes¶
- quantify(instances)¶
- set_params(**parameters)¶
- class quapy.method.aggregative.CC(learner: sklearn.base.BaseEstimator)¶
Bases:
quapy.method.aggregative.AggregativeQuantifier
The most basic Quantification method. One that simply classifies all instances and countes how many have been attributed each of the classes in order to compute class prevalence estimates.
- aggregate(classif_predictions)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
Trains the Classify & Count method unless _fit_learner_ is False, in which case it is assumed to be already fit. :param data: training data :param fit_learner: if False, the classifier is assumed to be fit :return: self
- quapy.method.aggregative.ClassifyAndCount¶
alias of
quapy.method.aggregative.CC
- class quapy.method.aggregative.ELM(svmperf_base=None, loss='01', **kwargs)¶
Bases:
quapy.method.aggregative.AggregativeQuantifier
,quapy.method.base.BinaryQuantifier
- aggregate(classif_predictions: numpy.ndarray)¶
- classify(X, y=None)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
- class quapy.method.aggregative.EMQ(learner: sklearn.base.BaseEstimator)¶
Bases:
quapy.method.aggregative.AggregativeProbabilisticQuantifier
The method is described in: Saerens, M., Latinne, P., and Decaestecker, C. (2002). Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neural Computation, 14(1): 21–41.
- classmethod EM(tr_prev, posterior_probabilities, epsilon=0.0001)¶
- EPSILON = 0.0001¶
- MAX_ITER = 1000¶
- aggregate(classif_posteriors, epsilon=0.0001)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
- predict_proba(instances, epsilon=0.0001)¶
- quapy.method.aggregative.ExpectationMaximizationQuantifier¶
alias of
quapy.method.aggregative.EMQ
- quapy.method.aggregative.ExplicitLossMinimisation¶
alias of
quapy.method.aggregative.ELM
- class quapy.method.aggregative.HDy(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
Bases:
quapy.method.aggregative.AggregativeProbabilisticQuantifier
,quapy.method.base.BinaryQuantifier
Implementation of the method based on the Hellinger Distance y (HDy) proposed by González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution estimation based on the Hellinger distance. Information Sciences, 218:146–164.
- aggregate(classif_posteriors)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, quapy.data.base.LabelledCollection]] = None)¶
Trains a HDy quantifier :param data: the training set :param fit_learner: set to False to bypass the training (the learner is assumed to be already fit) :param val_split: either a float in (0,1) indicating the proportion of training instances to use for
validation (e.g., 0.3 for using 30% of the training set as validation data), or a LabelledCollection indicating the validation set itself
- Returns
self
- quapy.method.aggregative.HellingerDistanceY¶
alias of
quapy.method.aggregative.HDy
- class quapy.method.aggregative.MAX(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
- class quapy.method.aggregative.MS(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
Bases:
quapy.method.aggregative.ThresholdOptimization
- optimize_threshold(y, probabilities)¶
- class quapy.method.aggregative.MS2(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
Bases:
quapy.method.aggregative.MS
- optimize_threshold(y, probabilities)¶
- quapy.method.aggregative.MedianSweep¶
alias of
quapy.method.aggregative.MS
- quapy.method.aggregative.MedianSweep2¶
alias of
quapy.method.aggregative.MS2
- class quapy.method.aggregative.OneVsAll(binary_quantifier, n_jobs=- 1)¶
Bases:
quapy.method.aggregative.AggregativeQuantifier
Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary quantifier for each class, and then l1-normalizes the outputs so that the class prevelences sum up to 1. This variant was used, along with the ExplicitLossMinimization quantifier in Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. Social Network Analysis and Mining 6(19), 1–22 (2016)
- aggregate(classif_predictions_bin)¶
- property binary¶
- property classes_¶
- classify(instances)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
- get_params(deep=True)¶
- posterior_probabilities(instances)¶
- property probabilistic¶
- quantify(X)¶
- set_params(**parameters)¶
- class quapy.method.aggregative.PACC(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
Bases:
quapy.method.aggregative.AggregativeProbabilisticQuantifier
- aggregate(classif_posteriors)¶
- classify(data)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, int, quapy.data.base.LabelledCollection]] = None)¶
Trains a PACC quantifier :param data: the training set :param fit_learner: set to False to bypass the training (the learner is assumed to be already fit) :param val_split: either a float in (0,1) indicating the proportion of training instances to use for
validation (e.g., 0.3 for using 30% of the training set as validation data), or a LabelledCollection indicating the validation set itself, or an int indicating the number k of folds to be used in kFCV to estimate the parameters
- Returns
self
- class quapy.method.aggregative.PCC(learner: sklearn.base.BaseEstimator)¶
Bases:
quapy.method.aggregative.AggregativeProbabilisticQuantifier
- aggregate(classif_posteriors)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
- quapy.method.aggregative.ProbabilisticAdjustedClassifyAndCount¶
alias of
quapy.method.aggregative.PACC
- quapy.method.aggregative.ProbabilisticClassifyAndCount¶
alias of
quapy.method.aggregative.PCC
- class quapy.method.aggregative.SVMAE(svmperf_base=None, **kwargs)¶
Bases:
quapy.method.aggregative.ELM
- class quapy.method.aggregative.SVMKLD(svmperf_base=None, **kwargs)¶
Bases:
quapy.method.aggregative.ELM
Esuli, A. and Sebastiani, F. (2015). Optimizing text quantifiers for multivariate loss functions. ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.
- class quapy.method.aggregative.SVMNKLD(svmperf_base=None, **kwargs)¶
Bases:
quapy.method.aggregative.ELM
Esuli, A. and Sebastiani, F. (2015). Optimizing text quantifiers for multivariate loss functions. ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.
- class quapy.method.aggregative.SVMQ(svmperf_base=None, **kwargs)¶
Bases:
quapy.method.aggregative.ELM
Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based on reliable classifiers. Pattern Recognition, 48(2):591–604.
- class quapy.method.aggregative.SVMRAE(svmperf_base=None, **kwargs)¶
Bases:
quapy.method.aggregative.ELM
- class quapy.method.aggregative.T50(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
- class quapy.method.aggregative.ThresholdOptimization(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
Bases:
quapy.method.aggregative.AggregativeQuantifier
,quapy.method.base.BinaryQuantifier
- aggregate(classif_predictions)¶
- compute_fpr(FP, TN)¶
- compute_table(y, y_)¶
- compute_tpr(TP, FP)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True, val_split: Optional[Union[float, int, quapy.data.base.LabelledCollection]] = None)¶
- optimize_threshold(y, probabilities)¶
- class quapy.method.aggregative.X(learner: sklearn.base.BaseEstimator, val_split=0.4)¶
- quapy.method.aggregative.training_helper(learner, data: quapy.data.base.LabelledCollection, fit_learner: bool = True, ensure_probabilistic=False, val_split: Optional[Union[float, quapy.data.base.LabelledCollection]] = None)¶
Training procedure common to all Aggregative Quantifiers. :param learner: the learner to be fit :param data: the data on which to fit the learner. If requested, the data will be split before fitting the learner. :param fit_learner: whether or not to fit the learner (if False, then bypasses any action) :param ensure_probabilistic: if True, guarantees that the resulting classifier implements predict_proba (if the learner is not probabilistic, then a CalibratedCV instance of it is trained) :param val_split: if specified as a float, indicates the proportion of training instances that will define the validation split (e.g., 0.3 for using 30% of the training set as validation data); if specified as a LabelledCollection, represents the validation split itself :return: the learner trained on the training set, and the unused data (a _LabelledCollection_ if train_val_split>0 or None otherwise) to be used as a validation set for any subsequent parameter fitting
quapy.method.base module¶
- class quapy.method.base.BaseQuantifier¶
Bases:
object
- property aggregative¶
- property binary¶
- abstract property classes_¶
- abstract fit(data: quapy.data.base.LabelledCollection)¶
- abstract get_params(deep=True)¶
- property probabilistic¶
- abstract quantify(instances)¶
- abstract set_params(**parameters)¶
- class quapy.method.base.BinaryQuantifier¶
Bases:
quapy.method.base.BaseQuantifier
- property binary¶
- quapy.method.base.isaggregative(model: quapy.method.base.BaseQuantifier)¶
- quapy.method.base.isbinary(model: quapy.method.base.BaseQuantifier)¶
- quapy.method.base.isprobabilistic(model: quapy.method.base.BaseQuantifier)¶
quapy.method.meta module¶
- quapy.method.meta.EACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)¶
- quapy.method.meta.ECC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)¶
- quapy.method.meta.EEMQ(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)¶
- quapy.method.meta.EHDy(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)¶
- quapy.method.meta.EPACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs)¶
- class quapy.method.meta.Ensemble(quantifier: quapy.method.base.BaseQuantifier, size=50, red_size=25, min_pos=5, policy='ave', max_sample_size=None, val_split=None, n_jobs=1, verbose=False)¶
Bases:
quapy.method.base.BaseQuantifier
- VALID_POLICIES = {'ave', 'ds', 'mae', 'mkld', 'mnkld', 'mrae', 'mse', 'ptr'}¶
Methods from the articles: Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017). Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. Information Fusion, 34, 87-100. and Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019). Dynamic ensemble selection for quantification tasks. Information Fusion, 45, 1-15.
- accuracy_policy(error_name)¶
Selects the red_size best performant quantifiers in a static way (i.e., dropping all non-selected instances). For each model in the ensemble, the performance is measured in terms of _error_name_ on the quantification of the samples used for training the rest of the models in the ensemble.
- property aggregative¶
- property binary¶
- property classes_¶
- ds_policy(predictions, test)¶
- ds_policy_get_posteriors(data: quapy.data.base.LabelledCollection)¶
In the original article, this procedure is not described in a sufficient level of detail. The paper only says that the distribution of posterior probabilities from training and test examples is compared by means of the Hellinger Distance. However, how these posterior probabilities are generated is not specified. In the article, a Logistic Regressor (LR) is used as the classifier device and that could be used for this purpose. However, in general, a Quantifier is not necessarily an instance of Aggreggative Probabilistic Quantifiers, and so, that the quantifier builds on top of a probabilistic classifier cannot be given for granted. Additionally, it would not be correct to generate the posterior probabilities for training documents that have concurred in training the classifier that generates them. This function thus generates the posterior probabilities for all training documents in a cross-validation way, using a LR with hyperparameters that have previously been optimized via grid search in 5FCV. :return P,f, where P is a ndarray containing the posterior probabilities of the training data, generated via cross-validation and using an optimized LR, and the function to be used in order to generate posterior probabilities for test instances.
- fit(data: quapy.data.base.LabelledCollection, val_split: Optional[Union[float, quapy.data.base.LabelledCollection]] = None)¶
- get_params(deep=True)¶
- property probabilistic¶
- ptr_policy(predictions)¶
Selects the predictions made by models that have been trained on samples with a prevalence that is most similar to a first approximation of the test prevalence as made by all models in the ensemble.
- quantify(instances)¶
- set_params(**parameters)¶
- sout(msg)¶
- quapy.method.meta.ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None, param_model_sel: Optional[dict] = None, **kwargs)¶
- quapy.method.meta.get_probability_distribution(posterior_probabilities, bins=8)¶
quapy.method.neural module¶
- class quapy.method.neural.QuaNetModule(doc_embedding_size, n_classes, stats_size, lstm_hidden_size=64, lstm_nlayers=1, ff_layers=[1024, 512], bidirectional=True, qdrop_p=0.5, order_by=0)¶
Bases:
torch.nn.modules.module.Module
- property device¶
- forward(doc_embeddings, doc_posteriors, statistics)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class quapy.method.neural.QuaNetTrainer(learner, sample_size, n_epochs=100, tr_iter_per_poch=500, va_iter_per_poch=100, lr=0.001, lstm_hidden_size=64, lstm_nlayers=1, ff_layers=[1024, 512], bidirectional=True, qdrop_p=0.5, patience=10, checkpointdir='../checkpoint', checkpointname=None, device='cuda')¶
Bases:
quapy.method.base.BaseQuantifier
- property classes_¶
- clean_checkpoint()¶
- clean_checkpoint_dir()¶
- epoch(data: quapy.data.base.LabelledCollection, posteriors, iterations, epoch, early_stop, train)¶
- fit(data: quapy.data.base.LabelledCollection, fit_learner=True)¶
- Parameters
data – the training data on which to train QuaNet. If fit_learner=True, the data will be split in
40/40/20 for training the classifier, training QuaNet, and validating QuaNet, respectively. If fit_learner=False, the data will be split in 66/34 for training QuaNet and validating it, respectively. :param fit_learner: if true, trains the classifier on a split containing 40% of the data :return: self
- get_aggregative_estims(posteriors)¶
- get_params(deep=True)¶
- quantify(instances, *args)¶
- set_params(**parameters)¶
- quapy.method.neural.mae_loss(output, target)¶
quapy.method.non_aggregative module¶
- class quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation(**kwargs)¶
Bases:
quapy.method.base.BaseQuantifier
- property classes_¶
- fit(data: quapy.data.base.LabelledCollection, *args)¶
- get_params()¶
- quantify(documents, *args)¶
- set_params(**parameters)¶