quapy package

Submodules

quapy.error module

quapy.error.absolute_error(p, p_hat)
quapy.error.acc_error(y_true, y_pred)
quapy.error.acce(y_true, y_pred)
quapy.error.ae(p, p_hat)
quapy.error.f1_error(y_true, y_pred)
quapy.error.f1e(y_true, y_pred)
quapy.error.from_name(err_name)
quapy.error.kld(p, p_hat, eps=None)
quapy.error.mae(prevs, prevs_hat)
quapy.error.mean_absolute_error(prevs, prevs_hat)
quapy.error.mean_relative_absolute_error(p, p_hat, eps=None)
quapy.error.mkld(prevs, prevs_hat, eps=None)
quapy.error.mnkld(prevs, prevs_hat, eps=None)
quapy.error.mrae(p, p_hat, eps=None)
quapy.error.mse(prevs, prevs_hat)
quapy.error.nkld(p, p_hat, eps=None)
quapy.error.rae(p, p_hat, eps=None)
quapy.error.relative_absolute_error(p, p_hat, eps=None)
quapy.error.se(p, p_hat)
quapy.error.smooth(p, eps)

quapy.evaluation module

quapy.evaluation.artificial_prevalence_prediction(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, verbose=False)

Performs the predictions for all samples generated according to the artificial sampling protocol. :param model: the model in charge of generating the class prevalence estimations :param test: the test set on which to perform arificial sampling :param sample_size: the size of the samples :param n_prevpoints: the number of different prevalences to sample (or set to None if eval_budget is specified) :param n_repetitions: the number of repetitions for each prevalence :param eval_budget: if specified, sets a ceil on the number of evaluations to perform. For example, if there are 3 classes, n_repetitions=1 and eval_budget=20, then n_prevpoints will be set to 5, since this will generate 15 different prevalences ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] … [1, 0, 0]) and since setting it n_prevpoints to 6 would produce more than 20 evaluations. :param n_jobs: number of jobs to be run in parallel :param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect any other random process. :param verbose: if True, shows a progress bar :return: two ndarrays of shape (m,n) with m the number of samples (n_prevpoints*n_repetitions) and n the

number of classes. The first one contains the true prevalences for the samples generated while the second one contains the the prevalence estimations

quapy.evaluation.artificial_prevalence_protocol(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, error_metric: Union[str, Callable] = 'mae', verbose=False)
quapy.evaluation.artificial_prevalence_report(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, error_metrics: Iterable[Union[str, Callable]] = 'mae', verbose=False)
quapy.evaluation.evaluate(model: quapy.method.base.BaseQuantifier, test_samples: Iterable[quapy.data.base.LabelledCollection], err: Union[str, Callable], n_jobs: int = - 1)
quapy.evaluation.gen_prevalence_prediction(model: quapy.method.base.BaseQuantifier, gen_fn: Callable, eval_budget=None)
quapy.evaluation.natural_prevalence_prediction(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, verbose=False)

Performs the predictions for all samples generated according to the artificial sampling protocol. :param model: the model in charge of generating the class prevalence estimations :param test: the test set on which to perform arificial sampling :param sample_size: the size of the samples :param n_repetitions: the number of repetitions for each prevalence :param n_jobs: number of jobs to be run in parallel :param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect any other random process. :param verbose: if True, shows a progress bar :return: two ndarrays of shape (m,n) with m the number of samples (n_repetitions) and n the

number of classes. The first one contains the true prevalences for the samples generated while the second one contains the the prevalence estimations

quapy.evaluation.natural_prevalence_protocol(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, error_metric: Union[str, Callable] = 'mae', verbose=False)
quapy.evaluation.natural_prevalence_report(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, error_metrics: Iterable[Union[str, Callable]] = 'mae', verbose=False)

quapy.functional module

quapy.functional.HellingerDistance(P, Q)
quapy.functional.adjusted_quantification(prevalence_estim, tpr, fpr, clip=True)
quapy.functional.artificial_prevalence_sampling(dimensions, n_prevalences=21, repeat=1, return_constrained_dim=False)
quapy.functional.get_nprevpoints_approximation(combinations_budget: int, n_classes: int, n_repeats: int = 1)

Searches for the largest number of (equidistant) prevalence points to define for each of the n_classes classes so that the number of valid prevalences generated as combinations of prevalence points (points in a n_classes-dimensional simplex) do not exceed combinations_budget. :param n_classes: number of classes :param n_repeats: number of repetitions for each prevalence combination :param combinations_budget: maximum number of combinatios allowed :return: the largest number of prevalence points that generate less than combinations_budget valid prevalences

quapy.functional.normalize_prevalence(prevalences)
quapy.functional.num_prevalence_combinations(n_prevpoints: int, n_classes: int, n_repeats: int = 1)

Computes the number of prevalence combinations in the n_classes-dimensional simplex if nprevpoints equally distant prevalences are generated and n_repeats repetitions are requested :param n_classes: number of classes :param n_prevpoints: number of prevalence points. :param n_repeats: number of repetitions for each prevalence combination :return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]

quapy.functional.prevalence_from_labels(labels, classes_)
quapy.functional.prevalence_from_probabilities(posteriors, binarize: bool = False)
quapy.functional.prevalence_linspace(n_prevalences=21, repeat=1, smooth_limits_epsilon=0.01)

Produces a uniformly separated values of prevalence. By default, produces an array 21 prevalences, with step 0.05 and with the limits smoothed, i.e.: [0.01, 0.05, 0.10, 0.15, …, 0.90, 0.95, 0.99] :param n_prevalences: the number of prevalence values to sample from the [0,1] interval (default 21) :param repeat: number of times each prevalence is to be repeated (defaults to 1) :param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1 :return: an array of uniformly separated prevalence values

quapy.functional.strprev(prevalences, prec=3)
quapy.functional.uniform_prevalence_sampling(n_classes, size=1)
quapy.functional.uniform_simplex_sampling(n_classes, size=1)

quapy.model_selection module

class quapy.model_selection.GridSearchQ(model: quapy.method.base.BaseQuantifier, param_grid: dict, sample_size: Optional[int], protocol='app', n_prevpoints: Optional[int] = None, n_repetitions: int = 1, eval_budget: Optional[int] = None, error: Union[Callable, str] = <function mae>, refit=True, val_split=0.4, n_jobs=1, random_seed=42, timeout=-1, verbose=False)

Bases: quapy.method.base.BaseQuantifier

Grid Search optimization targeting a quantification-oriented metric.

Optimizes the hyperparameters of a quantification method, based on an evaluation method and on an evaluation protocol for quantification.

Parameters
  • model (BaseQuantifier) – the quantifier to optimize

  • param_grid – a dictionary with keys the parameter names and values the list of values to explore

  • sample_size – the size of the samples to extract from the validation set (ignored if protocl=’gen’)

  • protocol – either ‘app’ for the artificial prevalence protocol, ‘npp’ for the natural prevalence protocol, or ‘gen’ for using a custom sampling generator function

  • n_prevpoints – if specified, indicates the number of equally distant points to extract from the interval [0,1] in order to define the prevalences of the samples; e.g., if n_prevpoints=5, then the prevalences for each class will be explored in [0.00, 0.25, 0.50, 0.75, 1.00]. If not specified, then eval_budget is requested. Ignored if protocol!=’app’.

  • n_repetitions – the number of repetitions for each combination of prevalences. This parameter is ignored for the protocol=’app’ if eval_budget is set and is lower than the number of combinations that would be generated using the value assigned to n_prevpoints (for the current number of classes and n_repetitions). Ignored for protocol=’npp’ and protocol=’gen’ (use eval_budget for setting a maximum number of samples in those cases).

  • eval_budget – if specified, sets a ceil on the number of evaluations to perform for each hyper-parameter combination. For example, if protocol=’app’, there are 3 classes, n_repetitions=1 and eval_budget=20, then n_prevpoints will be set to 5, since this will generate 15 different prevalences, i.e., [0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] … [1, 0, 0], and since setting it to 6 would generate more than 20. When protocol=’gen’, indicates the maximum number of samples to generate, but less samples will be generated if the generator yields less samples.

  • error – an error function (callable) or a string indicating the name of an error function (valid ones are those in qp.error.QUANTIFICATION_ERROR

  • refit – whether or not to refit the model on the whole labelled collection (training+validation) with the best chosen hyperparameter combination. Ignored if protocol=’gen’

  • val_split – either a LabelledCollection on which to test the performance of the different settings, or a float in [0,1] indicating the proportion of labelled data to extract from the training set, or a callable returning a generator function each time it is invoked (only for protocol=’gen’).

  • n_jobs – number of parallel jobs

  • random_seed – set the seed of the random generator to replicate experiments. Ignored if protocol=’gen’.

  • timeout – establishes a timer (in seconds) for each of the hyperparameters configurations being tested. Whenever a run takes longer than this timer, that configuration will be ignored. If all configurations end up being ignored, a TimeoutError exception is raised. If -1 (default) then no time bound is set.

  • verbose – set to True to get information through the stdout

best_model()
property classes_
fit(training: quapy.data.base.LabelledCollection, val_split: Optional[Union[quapy.data.base.LabelledCollection, float, Callable]] = None)
Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing

the error metric.

Parameters
  • training – the training set on which to optimize the hyperparameters

  • val_split – either a LabelledCollection on which to test the performance of the different settings, or a float in [0,1] indicating the proportion of labelled data to extract from the training set

get_params(deep=True)

Returns the dictionary of hyper-parameters to explore (param_grid)

Parameters

deep – Unused

Returns

the dictionary param_grid

quantify(instances)

Estimate class prevalence values

Parameters

instances – sample contanining the instances

set_params(**parameters)

Sets the hyper-parameters to explore.

Parameters

parameters – a dictionary with keys the parameter names and values the list of values to explore

quapy.plot module

quapy.plot.binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=None, nbins=5, colormap=<matplotlib.colors.ListedColormap object>, vertical_xticks=False, legend=True, savepath=None)
quapy.plot.binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title=None, savepath=None)
quapy.plot.binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=None, show_std=True, legend=True, train_prev=None, savepath=None)
quapy.plot.error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=True, logscale=False, title='Quantification error as a function of distribution shift', savepath=None)
quapy.plot.save_or_show(savepath)

quapy.util module

class quapy.util.EarlyStop(patience, lower_is_better=True)

Bases: object

quapy.util.create_if_not_exist(path)
quapy.util.create_parent_dir(path)
quapy.util.download_file(url, archive_filename)
quapy.util.download_file_if_not_exists(url, archive_path)
quapy.util.get_quapy_home()
quapy.util.map_parallel(func, args, n_jobs)

Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then func is applied in two parallel processes to args[0:50] and to args[50:99]

quapy.util.parallel(func, args, n_jobs)

A wrapper of multiprocessing: Parallel(n_jobs=n_jobs)(

delayed(func)(args_i) for args_i in args

) that takes the quapy.environ variable as input silently

quapy.util.pickled_resource(pickle_path: str, generation_func: callable, *args)

Allows for fast reuse of resources that are generated only once by calling generation_func(*args). The next times this function is invoked, it loads the pickled resource. Example: def some_array(n):

return np.random.rand(n)

pickled_resource(‘./my_array.pkl’, some_array, 10) # the resource does not exist: it is created by some_array(10) pickled_resource(‘./my_array.pkl’, some_array, 10) # the resource exists: it is loaded from ‘./my_array.pkl’ :param pickle_path: the path where to save (first time) and load (next times) the resource :param generation_func: the function that generates the resource, in case it does not exist in pickle_path :param args: any arg that generation_func uses for generating the resources :return: the resource

quapy.util.save_text_file(path, text)
quapy.util.temp_seed(seed)

Can be used in a “with” context to set a temporal seed without modifying the outer numpy’s current state. E.g.: with temp_seed(random_seed):

# do any computation depending on np.random functionality

Parameters

seed – the seed to set within the “with” context

Module contents

quapy.isbinary(x)