From a8ef7a6ed3b9616453a26ca19cbc398a85e495d7 Mon Sep 17 00:00:00 2001 From: Alejandro Moreo Date: Tue, 10 Aug 2021 11:44:44 +0200 Subject: [PATCH 1/2] Update README.md --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 9a64c4f..5f8c1f1 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ pip install quapy The following script fetchs a Twitter dataset, trains and evaluates an _Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE) -between the class prevalences estimated for the test set and the true prevalences +between the class prevalence values estimated for the test set and the true prevalence values of the test set. ```python @@ -34,20 +34,20 @@ dataset = qp.datasets.fetch_twitter('semeval16') model = qp.method.aggregative.ACC(LogisticRegression()) model.fit(dataset.training) -estim_prevalences = model.quantify(dataset.test.instances) -true_prevalences = dataset.test.prevalence() +estim_prevalence = model.quantify(dataset.test.instances) +true_prevalence = dataset.test.prevalence() -error = qp.error.mae(true_prevalences, estim_prevalences) +error = qp.error.mae(true_prevalence, estim_prevalence) print(f'Mean Absolute Error (MAE)={error:.3f}') ``` Quantification is useful in scenarios of prior probability shift. In other -words, we would not be interested in estimating the class prevalences of the test set if +words, we would not be interested in estimating the class prevalence values of the test set if we could assume the IID assumption to hold, as this prevalence would simply coincide with the class prevalence of the training set. For this reason, any Quantification model -should be tested across samples characterized by different class prevalences. -QuaPy implements sampling procedures and evaluation protocols that automates this endeavour. +should be tested across samples characterized by different class prevalence values. +QuaPy implements sampling procedures and evaluation protocols that automate this endeavour. See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples. ## Features @@ -108,4 +108,4 @@ are provided: * [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) * [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods) * [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection) -* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting) \ No newline at end of file +* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting) From 789ebe450a2f057db64067518ff8787ccd04c902 Mon Sep 17 00:00:00 2001 From: fabseb60 <92160733+fabseb60@users.noreply.github.com> Date: Tue, 23 Nov 2021 18:49:48 +0100 Subject: [PATCH 2/2] Update README.md --- README.md | 58 +++++++++++++++++++++++++++---------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 5f8c1f1..7138260 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,15 @@ # QuaPy -QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) +QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python. -QuaPy roots on the concept of data sample, and provides implementations of -most important concepts in quantification literature, such as the most important -quantification baselines, many advanced quantification methods, -quantification-oriented model selection, many evaluation measures and protocols +QuaPy is based on the concept of "data sample", and provides implementations of the +most important aspects of the quantification workflow, such as (baseline and advanced) +quantification methods, +quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. -QuaPy also integrates commonly used datasets and offers visualization tools -for facilitating the analysis and interpretation of results. +QuaPy also makes available commonly used datasets, and offers visualization tools +for facilitating the analysis and interpretation of the experimental results. ### Installation @@ -19,9 +19,9 @@ pip install quapy ## A quick example: -The following script fetchs a Twitter dataset, trains and evaluates an -_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE) -between the class prevalence values estimated for the test set and the true prevalence values +The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the +_Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE) +between the predicted and the true class prevalence values of the test set. ```python @@ -42,27 +42,28 @@ error = qp.error.mae(true_prevalence, estim_prevalence) print(f'Mean Absolute Error (MAE)={error:.3f}') ``` -Quantification is useful in scenarios of prior probability shift. In other -words, we would not be interested in estimating the class prevalence values of the test set if -we could assume the IID assumption to hold, as this prevalence would simply coincide with the -class prevalence of the training set. For this reason, any Quantification model -should be tested across samples characterized by different class prevalence values. -QuaPy implements sampling procedures and evaluation protocols that automate this endeavour. +Quantification is useful in scenarios characterized by prior probability shift. In other +words, we would be little interested in estimating the class prevalence values of the test set if +we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the +class prevalence of the training set. For this reason, any quantification model +should be tested across many samples, even ones characterized by class prevalence +values different or very different from those found in the training set. +QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples. ## Features -* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization, -SVM-based variants for quantification, HDy, QuaNet, and Ensembles). +* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, +quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles). * Versatile functionality for performing evaluation based on artificial sampling protocols. -* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.). -* Popular datasets for Quantification (textual and numeric) available, including: +* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.). +* Datasets frequently used in quantification (textual and numeric), including: * 32 UCI Machine Learning datasets. - * 11 Twitter Sentiment datasets. - * 3 Reviews Sentiment datasets. -* Native supports for binary and single-label scenarios of quantification. -* Model selection functionality targeting quantification-oriented losses. -* Visualization tools for analysing results. + * 11 Twitter quantification-by-sentiment datasets. + * 3 product reviews quantification-by-sentiment datasets. +* Native support for binary and single-label multiclass quantification scenarios. +* Model selection functionality that minimizes quantification-oriented loss functions. +* Visualization tools for analysing the experimental results. ## Requirements @@ -93,15 +94,14 @@ The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) that allows SVMperf to optimize for the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X) -and for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) -for quantification. -This patch extends the former by also allowing SVMperf to optimize for +and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0). +This patch extends the above one by also allowing SVMperf to optimize for _AE_ and _RAE_. ## Wiki -Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples +Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples are provided: * [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)