Merge branch 'master' of github.com:HLT-ISTI/QuaPy

2021-12-15 16:50:22 +01:00 · 2021-12-15 16:50:22 +01:00 · e64a6e989a
parent 731cf7fdec 789ebe450a
commit e64a6e989a
1 changed files with 33 additions and 33 deletions
--- a/README.md
+++ b/README.md
@ -1,15 +1,15 @@
 # QuaPy

-QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)
+QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify)
 written in Python.

-QuaPy roots on the concept of data sample, and provides implementations of
-most important concepts in quantification literature, such as the most important 
-quantification baselines, many advanced quantification methods, 
-quantification-oriented model selection, many evaluation measures and protocols
+QuaPy is based on the concept of "data sample", and provides implementations of the
+most important aspects of the quantification workflow, such as (baseline and advanced)
+quantification methods, 
+quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols
 used for evaluating quantification methods.
-QuaPy also integrates commonly used datasets and offers visualization tools 
-for facilitating the analysis and interpretation of results.
+QuaPy also makes available commonly used datasets, and offers visualization tools 
+for facilitating the analysis and interpretation of the experimental results.

 ### Last updates:

@ -24,9 +24,9 @@ pip install quapy

 ## A quick example:

-The following script fetchs a Twitter dataset, trains and evaluates an 
-_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE)
-between the class prevalences estimated for the test set and the true prevalences
+The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the 
+_Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE)
+between the predicted and the true class prevalence values
 of the test set.

 ```python
@ -39,35 +39,36 @@ dataset = qp.datasets.fetch_twitter('semeval16')
 model = qp.method.aggregative.ACC(LogisticRegression())
 model.fit(dataset.training)

-estim_prevalences = model.quantify(dataset.test.instances)
-true_prevalences  = dataset.test.prevalence()
+estim_prevalence = model.quantify(dataset.test.instances)
+true_prevalence  = dataset.test.prevalence()

-error = qp.error.mae(true_prevalences, estim_prevalences)
+error = qp.error.mae(true_prevalence, estim_prevalence)

 print(f'Mean Absolute Error (MAE)={error:.3f}')
 ```

-Quantification is useful in scenarios of prior probability shift. In other
-words, we would not be interested in estimating the class prevalences of the test set if 
-we could assume the IID assumption to hold, as this prevalence would simply coincide with the 
-class prevalence of the training set. For this reason, any Quantification model 
-should be tested across samples characterized by different class prevalences.
-QuaPy implements sampling procedures and evaluation protocols that automates this endeavour.
+Quantification is useful in scenarios characterized by prior probability shift. In other
+words, we would be little interested in estimating the class prevalence values of the test set if 
+we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the 
+class prevalence of the training set. For this reason, any quantification model 
+should be tested across many samples, even ones characterized by class prevalence 
+values different or very different from those found in the training set.
+QuaPy implements sampling procedures and evaluation protocols that automate this workflow.
 See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.

 ## Features

-* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization,
-SVM-based variants for quantification, HDy, QuaNet, and Ensembles).
+* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
+quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
 * Versatile functionality for performing evaluation based on artificial sampling protocols.
-* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.).
-* Popular datasets for Quantification (textual and numeric) available, including:
+* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
+* Datasets frequently used in quantification (textual and numeric), including:
    * 32 UCI Machine Learning datasets.
-    * 11 Twitter Sentiment datasets.
-    * 3 Reviews Sentiment datasets. 
-* Native supports for binary and single-label scenarios of quantification.
-* Model selection functionality targeting quantification-oriented losses.
-* Visualization tools for analysing results.
+    * 11 Twitter quantification-by-sentiment datasets.
+    * 3 product reviews quantification-by-sentiment datasets. 
+* Native support for binary and single-label multiclass quantification scenarios.
+* Model selection functionality that minimizes quantification-oriented loss functions.
+* Visualization tools for analysing the experimental results.

 ## Requirements

@ -98,9 +99,8 @@ The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is
 [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) 
 that allows SVMperf to optimize for
 the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X) 
-and for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
-for quantification.
-This patch extends the former by also allowing SVMperf to optimize for 
+and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
+This patch extends the above one by also allowing SVMperf to optimize for 
 _AE_ and _RAE_.
  
  
@ -108,11 +108,11 @@ _AE_ and _RAE_.

 The [developer API documentation](https://hlt-isti.github.io/QuaPy/build/html/modules.html) is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html). 

-Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples
+Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples
 are provided:

 * [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
 * [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
 * [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
 * [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
-* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)
+* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)