forked from moreo/QuaPy
Update README.md
This commit is contained in:
parent
a8ef7a6ed3
commit
789ebe450a
58
README.md
58
README.md
|
@ -1,15 +1,15 @@
|
||||||
# QuaPy
|
# QuaPy
|
||||||
|
|
||||||
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)
|
QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify)
|
||||||
written in Python.
|
written in Python.
|
||||||
|
|
||||||
QuaPy roots on the concept of data sample, and provides implementations of
|
QuaPy is based on the concept of "data sample", and provides implementations of the
|
||||||
most important concepts in quantification literature, such as the most important
|
most important aspects of the quantification workflow, such as (baseline and advanced)
|
||||||
quantification baselines, many advanced quantification methods,
|
quantification methods,
|
||||||
quantification-oriented model selection, many evaluation measures and protocols
|
quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols
|
||||||
used for evaluating quantification methods.
|
used for evaluating quantification methods.
|
||||||
QuaPy also integrates commonly used datasets and offers visualization tools
|
QuaPy also makes available commonly used datasets, and offers visualization tools
|
||||||
for facilitating the analysis and interpretation of results.
|
for facilitating the analysis and interpretation of the experimental results.
|
||||||
|
|
||||||
### Installation
|
### Installation
|
||||||
|
|
||||||
|
@ -19,9 +19,9 @@ pip install quapy
|
||||||
|
|
||||||
## A quick example:
|
## A quick example:
|
||||||
|
|
||||||
The following script fetchs a Twitter dataset, trains and evaluates an
|
The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the
|
||||||
_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE)
|
_Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE)
|
||||||
between the class prevalence values estimated for the test set and the true prevalence values
|
between the predicted and the true class prevalence values
|
||||||
of the test set.
|
of the test set.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
@ -42,27 +42,28 @@ error = qp.error.mae(true_prevalence, estim_prevalence)
|
||||||
print(f'Mean Absolute Error (MAE)={error:.3f}')
|
print(f'Mean Absolute Error (MAE)={error:.3f}')
|
||||||
```
|
```
|
||||||
|
|
||||||
Quantification is useful in scenarios of prior probability shift. In other
|
Quantification is useful in scenarios characterized by prior probability shift. In other
|
||||||
words, we would not be interested in estimating the class prevalence values of the test set if
|
words, we would be little interested in estimating the class prevalence values of the test set if
|
||||||
we could assume the IID assumption to hold, as this prevalence would simply coincide with the
|
we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the
|
||||||
class prevalence of the training set. For this reason, any Quantification model
|
class prevalence of the training set. For this reason, any quantification model
|
||||||
should be tested across samples characterized by different class prevalence values.
|
should be tested across many samples, even ones characterized by class prevalence
|
||||||
QuaPy implements sampling procedures and evaluation protocols that automate this endeavour.
|
values different or very different from those found in the training set.
|
||||||
|
QuaPy implements sampling procedures and evaluation protocols that automate this workflow.
|
||||||
See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
|
See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization,
|
* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
|
||||||
SVM-based variants for quantification, HDy, QuaNet, and Ensembles).
|
quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
|
||||||
* Versatile functionality for performing evaluation based on artificial sampling protocols.
|
* Versatile functionality for performing evaluation based on artificial sampling protocols.
|
||||||
* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.).
|
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
|
||||||
* Popular datasets for Quantification (textual and numeric) available, including:
|
* Datasets frequently used in quantification (textual and numeric), including:
|
||||||
* 32 UCI Machine Learning datasets.
|
* 32 UCI Machine Learning datasets.
|
||||||
* 11 Twitter Sentiment datasets.
|
* 11 Twitter quantification-by-sentiment datasets.
|
||||||
* 3 Reviews Sentiment datasets.
|
* 3 product reviews quantification-by-sentiment datasets.
|
||||||
* Native supports for binary and single-label scenarios of quantification.
|
* Native support for binary and single-label multiclass quantification scenarios.
|
||||||
* Model selection functionality targeting quantification-oriented losses.
|
* Model selection functionality that minimizes quantification-oriented loss functions.
|
||||||
* Visualization tools for analysing results.
|
* Visualization tools for analysing the experimental results.
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
|
@ -93,15 +94,14 @@ The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is
|
||||||
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
|
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
|
||||||
that allows SVMperf to optimize for
|
that allows SVMperf to optimize for
|
||||||
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
|
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
|
||||||
and for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
|
and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
|
||||||
for quantification.
|
This patch extends the above one by also allowing SVMperf to optimize for
|
||||||
This patch extends the former by also allowing SVMperf to optimize for
|
|
||||||
_AE_ and _RAE_.
|
_AE_ and _RAE_.
|
||||||
|
|
||||||
|
|
||||||
## Wiki
|
## Wiki
|
||||||
|
|
||||||
Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples
|
Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples
|
||||||
are provided:
|
are provided:
|
||||||
|
|
||||||
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
|
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
|
||||||
|
|
Loading…
Reference in New Issue