diff --git a/README.md b/README.md index 5f8c1f1..7138260 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,15 @@ # QuaPy -QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) +QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python. -QuaPy roots on the concept of data sample, and provides implementations of -most important concepts in quantification literature, such as the most important -quantification baselines, many advanced quantification methods, -quantification-oriented model selection, many evaluation measures and protocols +QuaPy is based on the concept of "data sample", and provides implementations of the +most important aspects of the quantification workflow, such as (baseline and advanced) +quantification methods, +quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. -QuaPy also integrates commonly used datasets and offers visualization tools -for facilitating the analysis and interpretation of results. +QuaPy also makes available commonly used datasets, and offers visualization tools +for facilitating the analysis and interpretation of the experimental results. ### Installation @@ -19,9 +19,9 @@ pip install quapy ## A quick example: -The following script fetchs a Twitter dataset, trains and evaluates an -_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE) -between the class prevalence values estimated for the test set and the true prevalence values +The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the +_Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE) +between the predicted and the true class prevalence values of the test set. ```python @@ -42,27 +42,28 @@ error = qp.error.mae(true_prevalence, estim_prevalence) print(f'Mean Absolute Error (MAE)={error:.3f}') ``` -Quantification is useful in scenarios of prior probability shift. In other -words, we would not be interested in estimating the class prevalence values of the test set if -we could assume the IID assumption to hold, as this prevalence would simply coincide with the -class prevalence of the training set. For this reason, any Quantification model -should be tested across samples characterized by different class prevalence values. -QuaPy implements sampling procedures and evaluation protocols that automate this endeavour. +Quantification is useful in scenarios characterized by prior probability shift. In other +words, we would be little interested in estimating the class prevalence values of the test set if +we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the +class prevalence of the training set. For this reason, any quantification model +should be tested across many samples, even ones characterized by class prevalence +values different or very different from those found in the training set. +QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples. ## Features -* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization, -SVM-based variants for quantification, HDy, QuaNet, and Ensembles). +* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, +quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles). * Versatile functionality for performing evaluation based on artificial sampling protocols. -* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.). -* Popular datasets for Quantification (textual and numeric) available, including: +* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.). +* Datasets frequently used in quantification (textual and numeric), including: * 32 UCI Machine Learning datasets. - * 11 Twitter Sentiment datasets. - * 3 Reviews Sentiment datasets. -* Native supports for binary and single-label scenarios of quantification. -* Model selection functionality targeting quantification-oriented losses. -* Visualization tools for analysing results. + * 11 Twitter quantification-by-sentiment datasets. + * 3 product reviews quantification-by-sentiment datasets. +* Native support for binary and single-label multiclass quantification scenarios. +* Model selection functionality that minimizes quantification-oriented loss functions. +* Visualization tools for analysing the experimental results. ## Requirements @@ -93,15 +94,14 @@ The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) that allows SVMperf to optimize for the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X) -and for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) -for quantification. -This patch extends the former by also allowing SVMperf to optimize for +and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0). +This patch extends the above one by also allowing SVMperf to optimize for _AE_ and _RAE_. ## Wiki -Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples +Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples are provided: * [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)