1
0
Fork 0

Merge branch 'master' of github.com:HLT-ISTI/QuaPy

This commit is contained in:
Alejandro Moreo Fernandez 2021-12-15 16:50:22 +01:00
commit e64a6e989a
1 changed files with 33 additions and 33 deletions

View File

@ -1,15 +1,15 @@
# QuaPy # QuaPy
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify)
written in Python. written in Python.
QuaPy roots on the concept of data sample, and provides implementations of QuaPy is based on the concept of "data sample", and provides implementations of the
most important concepts in quantification literature, such as the most important most important aspects of the quantification workflow, such as (baseline and advanced)
quantification baselines, many advanced quantification methods, quantification methods,
quantification-oriented model selection, many evaluation measures and protocols quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols
used for evaluating quantification methods. used for evaluating quantification methods.
QuaPy also integrates commonly used datasets and offers visualization tools QuaPy also makes available commonly used datasets, and offers visualization tools
for facilitating the analysis and interpretation of results. for facilitating the analysis and interpretation of the experimental results.
### Last updates: ### Last updates:
@ -24,9 +24,9 @@ pip install quapy
## A quick example: ## A quick example:
The following script fetchs a Twitter dataset, trains and evaluates an The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the
_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE) _Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE)
between the class prevalences estimated for the test set and the true prevalences between the predicted and the true class prevalence values
of the test set. of the test set.
```python ```python
@ -39,35 +39,36 @@ dataset = qp.datasets.fetch_twitter('semeval16')
model = qp.method.aggregative.ACC(LogisticRegression()) model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training) model.fit(dataset.training)
estim_prevalences = model.quantify(dataset.test.instances) estim_prevalence = model.quantify(dataset.test.instances)
true_prevalences = dataset.test.prevalence() true_prevalence = dataset.test.prevalence()
error = qp.error.mae(true_prevalences, estim_prevalences) error = qp.error.mae(true_prevalence, estim_prevalence)
print(f'Mean Absolute Error (MAE)={error:.3f}') print(f'Mean Absolute Error (MAE)={error:.3f}')
``` ```
Quantification is useful in scenarios of prior probability shift. In other Quantification is useful in scenarios characterized by prior probability shift. In other
words, we would not be interested in estimating the class prevalences of the test set if words, we would be little interested in estimating the class prevalence values of the test set if
we could assume the IID assumption to hold, as this prevalence would simply coincide with the we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the
class prevalence of the training set. For this reason, any Quantification model class prevalence of the training set. For this reason, any quantification model
should be tested across samples characterized by different class prevalences. should be tested across many samples, even ones characterized by class prevalence
QuaPy implements sampling procedures and evaluation protocols that automates this endeavour. values different or very different from those found in the training set.
QuaPy implements sampling procedures and evaluation protocols that automate this workflow.
See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples. See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
## Features ## Features
* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization, * Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
SVM-based variants for quantification, HDy, QuaNet, and Ensembles). quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
* Versatile functionality for performing evaluation based on artificial sampling protocols. * Versatile functionality for performing evaluation based on artificial sampling protocols.
* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.). * Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
* Popular datasets for Quantification (textual and numeric) available, including: * Datasets frequently used in quantification (textual and numeric), including:
* 32 UCI Machine Learning datasets. * 32 UCI Machine Learning datasets.
* 11 Twitter Sentiment datasets. * 11 Twitter quantification-by-sentiment datasets.
* 3 Reviews Sentiment datasets. * 3 product reviews quantification-by-sentiment datasets.
* Native supports for binary and single-label scenarios of quantification. * Native support for binary and single-label multiclass quantification scenarios.
* Model selection functionality targeting quantification-oriented losses. * Model selection functionality that minimizes quantification-oriented loss functions.
* Visualization tools for analysing results. * Visualization tools for analysing the experimental results.
## Requirements ## Requirements
@ -98,9 +99,8 @@ The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
that allows SVMperf to optimize for that allows SVMperf to optimize for
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X) the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
and for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
for quantification. This patch extends the above one by also allowing SVMperf to optimize for
This patch extends the former by also allowing SVMperf to optimize for
_AE_ and _RAE_. _AE_ and _RAE_.
@ -108,11 +108,11 @@ _AE_ and _RAE_.
The [developer API documentation](https://hlt-isti.github.io/QuaPy/build/html/modules.html) is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html). The [developer API documentation](https://hlt-isti.github.io/QuaPy/build/html/modules.html) is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html).
Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples
are provided: are provided:
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets) * [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) * [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods) * [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection) * [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting) * [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)