quapy package¶
Submodules¶
quapy.error module¶
- quapy.error.absolute_error(p, p_hat)¶
- quapy.error.acc_error(y_true, y_pred)¶
- quapy.error.acce(y_true, y_pred)¶
- quapy.error.ae(p, p_hat)¶
- quapy.error.f1_error(y_true, y_pred)¶
- quapy.error.f1e(y_true, y_pred)¶
- quapy.error.from_name(err_name)¶
- quapy.error.kld(p, p_hat, eps=None)¶
- quapy.error.mae(prevs, prevs_hat)¶
- quapy.error.mean_absolute_error(prevs, prevs_hat)¶
- quapy.error.mean_relative_absolute_error(p, p_hat, eps=None)¶
- quapy.error.mkld(prevs, prevs_hat, eps=None)¶
- quapy.error.mnkld(prevs, prevs_hat, eps=None)¶
- quapy.error.mrae(p, p_hat, eps=None)¶
- quapy.error.mse(prevs, prevs_hat)¶
- quapy.error.nkld(p, p_hat, eps=None)¶
- quapy.error.rae(p, p_hat, eps=None)¶
- quapy.error.relative_absolute_error(p, p_hat, eps=None)¶
- quapy.error.se(p, p_hat)¶
- quapy.error.smooth(p, eps)¶
quapy.evaluation module¶
- quapy.evaluation.artificial_prevalence_prediction(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, verbose=False)¶
Performs the predictions for all samples generated according to the artificial sampling protocol. :param model: the model in charge of generating the class prevalence estimations :param test: the test set on which to perform arificial sampling :param sample_size: the size of the samples :param n_prevpoints: the number of different prevalences to sample (or set to None if eval_budget is specified) :param n_repetitions: the number of repetitions for each prevalence :param eval_budget: if specified, sets a ceil on the number of evaluations to perform. For example, if there are 3 classes, n_repetitions=1 and eval_budget=20, then n_prevpoints will be set to 5, since this will generate 15 different prevalences ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] … [1, 0, 0]) and since setting it n_prevpoints to 6 would produce more than 20 evaluations. :param n_jobs: number of jobs to be run in parallel :param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect any other random process. :param verbose: if True, shows a progress bar :return: two ndarrays of shape (m,n) with m the number of samples (n_prevpoints*n_repetitions) and n the
number of classes. The first one contains the true prevalences for the samples generated while the second one contains the the prevalence estimations
- quapy.evaluation.artificial_prevalence_protocol(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, error_metric: Union[str, Callable] = 'mae', verbose=False)¶
- quapy.evaluation.artificial_prevalence_report(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, error_metrics: Iterable[Union[str, Callable]] = 'mae', verbose=False)¶
- quapy.evaluation.evaluate(model: quapy.method.base.BaseQuantifier, test_samples: Iterable[quapy.data.base.LabelledCollection], err: Union[str, Callable], n_jobs: int = - 1)¶
- quapy.evaluation.gen_prevalence_prediction(model: quapy.method.base.BaseQuantifier, gen_fn: Callable, eval_budget=None)¶
- quapy.evaluation.natural_prevalence_prediction(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, verbose=False)¶
Performs the predictions for all samples generated according to the artificial sampling protocol. :param model: the model in charge of generating the class prevalence estimations :param test: the test set on which to perform arificial sampling :param sample_size: the size of the samples :param n_repetitions: the number of repetitions for each prevalence :param n_jobs: number of jobs to be run in parallel :param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect any other random process. :param verbose: if True, shows a progress bar :return: two ndarrays of shape (m,n) with m the number of samples (n_repetitions) and n the
number of classes. The first one contains the true prevalences for the samples generated while the second one contains the the prevalence estimations
- quapy.evaluation.natural_prevalence_protocol(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, error_metric: Union[str, Callable] = 'mae', verbose=False)¶
- quapy.evaluation.natural_prevalence_report(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, error_metrics: Iterable[Union[str, Callable]] = 'mae', verbose=False)¶
quapy.functional module¶
- quapy.functional.HellingerDistance(P, Q)¶
- quapy.functional.adjusted_quantification(prevalence_estim, tpr, fpr, clip=True)¶
- quapy.functional.artificial_prevalence_sampling(dimensions, n_prevalences=21, repeat=1, return_constrained_dim=False)¶
- quapy.functional.get_nprevpoints_approximation(combinations_budget: int, n_classes: int, n_repeats: int = 1)¶
Searches for the largest number of (equidistant) prevalence points to define for each of the n_classes classes so that the number of valid prevalences generated as combinations of prevalence points (points in a n_classes-dimensional simplex) do not exceed combinations_budget. :param n_classes: number of classes :param n_repeats: number of repetitions for each prevalence combination :param combinations_budget: maximum number of combinatios allowed :return: the largest number of prevalence points that generate less than combinations_budget valid prevalences
- quapy.functional.normalize_prevalence(prevalences)¶
- quapy.functional.num_prevalence_combinations(n_prevpoints: int, n_classes: int, n_repeats: int = 1)¶
Computes the number of prevalence combinations in the n_classes-dimensional simplex if nprevpoints equally distant prevalences are generated and n_repeats repetitions are requested :param n_classes: number of classes :param n_prevpoints: number of prevalence points. :param n_repeats: number of repetitions for each prevalence combination :return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
- quapy.functional.prevalence_from_labels(labels, classes_)¶
- quapy.functional.prevalence_from_probabilities(posteriors, binarize: bool = False)¶
- quapy.functional.prevalence_linspace(n_prevalences=21, repeat=1, smooth_limits_epsilon=0.01)¶
Produces a uniformly separated values of prevalence. By default, produces an array 21 prevalences, with step 0.05 and with the limits smoothed, i.e.: [0.01, 0.05, 0.10, 0.15, …, 0.90, 0.95, 0.99] :param n_prevalences: the number of prevalence values to sample from the [0,1] interval (default 21) :param repeat: number of times each prevalence is to be repeated (defaults to 1) :param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1 :return: an array of uniformly separated prevalence values
- quapy.functional.strprev(prevalences, prec=3)¶
- quapy.functional.uniform_prevalence_sampling(n_classes, size=1)¶
- quapy.functional.uniform_simplex_sampling(n_classes, size=1)¶
quapy.model_selection module¶
- class quapy.model_selection.GridSearchQ(model: quapy.method.base.BaseQuantifier, param_grid: dict, sample_size: Optional[int], protocol='app', n_prevpoints: Optional[int] = None, n_repetitions: int = 1, eval_budget: Optional[int] = None, error: Union[Callable, str] = <function mae>, refit=True, val_split=0.4, n_jobs=1, random_seed=42, timeout=-1, verbose=False)¶
Bases:
quapy.method.base.BaseQuantifier
Grid Search optimization targeting a quantification-oriented metric.
Optimizes the hyperparameters of a quantification method, based on an evaluation method and on an evaluation protocol for quantification.
- Parameters
model (BaseQuantifier) – the quantifier to optimize
param_grid – a dictionary with keys the parameter names and values the list of values to explore
sample_size – the size of the samples to extract from the validation set (ignored if protocl=’gen’)
protocol – either ‘app’ for the artificial prevalence protocol, ‘npp’ for the natural prevalence protocol, or ‘gen’ for using a custom sampling generator function
n_prevpoints – if specified, indicates the number of equally distant points to extract from the interval [0,1] in order to define the prevalences of the samples; e.g., if n_prevpoints=5, then the prevalences for each class will be explored in [0.00, 0.25, 0.50, 0.75, 1.00]. If not specified, then eval_budget is requested. Ignored if protocol!=’app’.
n_repetitions – the number of repetitions for each combination of prevalences. This parameter is ignored for the protocol=’app’ if eval_budget is set and is lower than the number of combinations that would be generated using the value assigned to n_prevpoints (for the current number of classes and n_repetitions). Ignored for protocol=’npp’ and protocol=’gen’ (use eval_budget for setting a maximum number of samples in those cases).
eval_budget – if specified, sets a ceil on the number of evaluations to perform for each hyper-parameter combination. For example, if protocol=’app’, there are 3 classes, n_repetitions=1 and eval_budget=20, then n_prevpoints will be set to 5, since this will generate 15 different prevalences, i.e., [0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] … [1, 0, 0], and since setting it to 6 would generate more than 20. When protocol=’gen’, indicates the maximum number of samples to generate, but less samples will be generated if the generator yields less samples.
error – an error function (callable) or a string indicating the name of an error function (valid ones are those in qp.error.QUANTIFICATION_ERROR
refit – whether or not to refit the model on the whole labelled collection (training+validation) with the best chosen hyperparameter combination. Ignored if protocol=’gen’
val_split – either a LabelledCollection on which to test the performance of the different settings, or a float in [0,1] indicating the proportion of labelled data to extract from the training set, or a callable returning a generator function each time it is invoked (only for protocol=’gen’).
n_jobs – number of parallel jobs
random_seed – set the seed of the random generator to replicate experiments. Ignored if protocol=’gen’.
timeout – establishes a timer (in seconds) for each of the hyperparameters configurations being tested. Whenever a run takes longer than this timer, that configuration will be ignored. If all configurations end up being ignored, a TimeoutError exception is raised. If -1 (default) then no time bound is set.
verbose – set to True to get information through the stdout
- best_model()¶
- property classes_¶
- fit(training: quapy.data.base.LabelledCollection, val_split: Optional[Union[quapy.data.base.LabelledCollection, float, Callable]] = None)¶
- Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing
the error metric.
- Parameters
training – the training set on which to optimize the hyperparameters
val_split – either a LabelledCollection on which to test the performance of the different settings, or a float in [0,1] indicating the proportion of labelled data to extract from the training set
- get_params(deep=True)¶
Returns the dictionary of hyper-parameters to explore (param_grid)
- Parameters
deep – Unused
- Returns
the dictionary param_grid
- quantify(instances)¶
Estimate class prevalence values
- Parameters
instances – sample contanining the instances
- set_params(**parameters)¶
Sets the hyper-parameters to explore.
- Parameters
parameters – a dictionary with keys the parameter names and values the list of values to explore
quapy.plot module¶
- quapy.plot.binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=None, nbins=5, colormap=<matplotlib.colors.ListedColormap object>, vertical_xticks=False, legend=True, savepath=None)¶
- quapy.plot.binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title=None, savepath=None)¶
- quapy.plot.binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=None, show_std=True, legend=True, train_prev=None, savepath=None)¶
- quapy.plot.error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=True, logscale=False, title='Quantification error as a function of distribution shift', savepath=None)¶
- quapy.plot.save_or_show(savepath)¶
quapy.util module¶
- class quapy.util.EarlyStop(patience, lower_is_better=True)¶
Bases:
object
- quapy.util.create_if_not_exist(path)¶
- quapy.util.create_parent_dir(path)¶
- quapy.util.download_file(url, archive_filename)¶
- quapy.util.download_file_if_not_exists(url, archive_path)¶
- quapy.util.get_quapy_home()¶
- quapy.util.map_parallel(func, args, n_jobs)¶
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then func is applied in two parallel processes to args[0:50] and to args[50:99]
- quapy.util.parallel(func, args, n_jobs)¶
A wrapper of multiprocessing: Parallel(n_jobs=n_jobs)(
delayed(func)(args_i) for args_i in args
) that takes the quapy.environ variable as input silently
- quapy.util.pickled_resource(pickle_path: str, generation_func: callable, *args)¶
Allows for fast reuse of resources that are generated only once by calling generation_func(*args). The next times this function is invoked, it loads the pickled resource. Example: def some_array(n):
return np.random.rand(n)
pickled_resource(‘./my_array.pkl’, some_array, 10) # the resource does not exist: it is created by some_array(10) pickled_resource(‘./my_array.pkl’, some_array, 10) # the resource exists: it is loaded from ‘./my_array.pkl’ :param pickle_path: the path where to save (first time) and load (next times) the resource :param generation_func: the function that generates the resource, in case it does not exist in pickle_path :param args: any arg that generation_func uses for generating the resources :return: the resource
- quapy.util.save_text_file(path, text)¶
- quapy.util.temp_seed(seed)¶
Can be used in a “with” context to set a temporal seed without modifying the outer numpy’s current state. E.g.: with temp_seed(random_seed):
# do any computation depending on np.random functionality
- Parameters
seed – the seed to set within the “with” context