* Argument `norm` specifies how to normalize the estimate `p` when the vector lies outside of the probability simplex.
Options are:
*`"clip"` which clips the values to range `[0, 1]` and then L1-normalizes the vector
*`"mapsimplex"` which projects the results on the probability simplex, as proposed by Vaz et al. in
[Remark 5 of Vaz, et. (2019)](https://jmlr.csail.mit.edu/papers/volume20/18-456/18-456.pdf). This implementation
relies on [Mathieu Blondel's `projection_simplex_sort`](https://gist.github.com/mblondel/6f3b7aaad90606b98f71))
*`"condsoftmax"` applies softmax normalization only if the prevalence vector lies outside of the probability simplex.
#### BayesianCC (_New in v0.1.9_!)
The `BayesianCC` is a variant of ACC introduced in
[Ziegler, A. and Czyż, P. "Bayesian quantification with black-box estimators", arXiv (2023)](https://arxiv.org/abs/2302.09159),
which models the probabilities `q = Mp` using latent random variables with weak Bayesian priors, rather than
plug-in probability estimates. In particular, it uses Markov Chain Monte Carlo sampling to find the values of
`p` compatible with the observed quantities.
The `aggregate` method returns the posterior mean and the `get_prevalence_samples` method can be used to find
uncertainty around `p` estimates (conditional on the observed data and the trained classifier)
and is suitable for problems in which the `q = Mp` matrix is nearly non-invertible.
Note that this quantification method requires `val_split` to be a `float` and installation of additional dependencies (`$ pip install quapy[bayes]`) needed to run Markov chain Monte Carlo sampling. Markov Chain Monte Carlo is is slower than matrix inversion methods, but is guaranteed to sample proper probability vectors, so no clipping strategies are required.
An example presenting how to run the method and use posterior samples is available in `examples/bayesian_quantification.py`.
### Expectation Maximization (EMQ)
The Expectation Maximization Quantifier (EMQ), also known as
the SLD, is available at `qp.method.aggregative.EMQ` or via the
alias `qp.method.aggregative.ExpectationMaximizationQuantifier`.
The method is described in:
_Saerens, M., Latinne, P., and Decaestecker, C. (2002). Adjusting the outputs of a classifier
to new a priori probabilities: A simple procedure. Neural Computation, 14(1):21–41._
EMQ works with a probabilistic classifier (if the classifier
given as input is a hard one, a calibration will be attempted).
Although this method was originally proposed for improving the
posterior probabilities of a probabilistic classifier, and not
for improving the estimation of prior probabilities, EMQ ranks
almost always among the most effective quantifiers in the
experiments we have carried out.
An example of use can be found below:
```python
import quapy as qp
from sklearn.linear_model import LogisticRegression
_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
(thanks to _Pablo González_ for the contributions!)
### Threshold Optimization methods
_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423)
and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
### Explicit Loss Minimization
The Explicit Loss Minimization (ELM) represent a family of methods
based on structured output learning, i.e., quantifiers relying on
classifiers that have been optimized targeting a
quantification-oriented evaluation measure.
The original methods are implemented in QuaPy as classify & count (CC)
quantifiers that use Joachim's [SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
as the underlying classifier, properly set to optimize for the desired loss.
In QuaPy, this can be more achieved by calling the functions:
*`newSVMQ`: returns the quantification method called SVM(Q) that optimizes for the metric _Q_ defined
in [_Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591–604._](https://www.sciencedirect.com/science/article/pii/S003132031400291X)
*`newSVMKLD` and `newSVMNKLD`: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in [_Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._](https://dl.acm.org/doi/abs/10.1145/2700406)
*`newSVMAE` and `newSVMRAE`: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
(Mean) Relative Absolute Error, as first used by
[_Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23._](https://arxiv.org/abs/2011.02552)
the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
QuaPy in order to make available ELM variants for what nowadays
are considered the most well-behaved evaluation metrics in quantification.
In order to make these models work, you would need to run the script
`prepare_svmperf.sh` (distributed along with QuaPy) that
downloads `SVMperf`' source code, applies a patch that
implements the quantification oriented losses, and compiles the
sources.
If you want to add any custom loss, you would need to modify
the source code of `SVMperf` in order to implement it, and
assign a valid loss code to it. Then you must re-compile
the whole thing and instantiate the quantifier in QuaPy
as follows:
```python
# you can either set the path to your custom svm_perf_quantification implementation
# in the environment variable, or as an argument to the constructor of ELM
The [](quapy.method.composable) module allows the composition of quantification methods from loss functions and feature transformations. Any composed method solves a linear system of equations by minimizing the loss after transforming the data. Methods of this kind include ACC, PACC, HDx, HDy, and many other well-known methods, as well as an unlimited number of re-combinations of their building blocks.
The composition of a method is implemented through the [](quapy.method.composable.ComposableQuantifier) class. Its documentation also features an example to get you started in composing your own methods.
```python
ComposableQuantifier( # ordinal ACC, as proposed by Bunse et al., 2022
More exhaustive examples of method compositions, including hyper-parameter optimization, can be found in [the example directory](https://github.com/HLT-ISTI/QuaPy/tree/master/examples).
To implement your own loss functions and feature representations, follow the corresponding manual of the [qunfold package](https://github.com/mirkobunse/qunfold), which provides the back-end of QuaPy's composable module.
The [](quapy.method.composable.ClassTransformer) requires the classifier to have a property `oob_score==True` and to produce a property `oob_decision_function` during fitting. In [scikit-learn](https://scikit-learn.org/), this requirement is fulfilled by any bagging classifier, such as random forests. Any other classifier needs to be cross-validated through the [](quapy.method.composable.CVClassifier).
Please, check the [model selection manual](./model-selection) if you want to optimize the hyperparameters of ensemble for classification or quantification.
QuaPy offers an implementation of QuaNet, a deep learning model presented in:
[_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
A recurrent neural network for sentiment quantification.
In Proceedings of the 27th ACM International Conference on
Information and Knowledge Management (pp. 1775-1778)._](https://dl.acm.org/doi/abs/10.1145/3269206.3269287)
This model requires `torch` to be installed.
QuaNet also requires a classifier that can provide embedded representations
of the inputs.
In the original paper, QuaNet was tested using an LSTM as the base classifier.
In the following example, we show an instantiation of QuaNet that instead uses CNN as a probabilistic classifier, taking its last layer representation as the document embedding:
```python
import quapy as qp
from quapy.method.meta import QuaNet
from quapy.classification.neural import NeuralClassifierTrainer, CNNnet
# use samples of 100 elements
qp.environ['SAMPLE_SIZE'] = 100
# load the kindle dataset as text, and convert words to numerical indexes