1
0
Fork 0
This commit is contained in:
Alejandro Moreo Fernandez 2023-02-17 12:54:15 +01:00
commit bfaa5678d7
85 changed files with 7638 additions and 17312 deletions

View File

@ -13,6 +13,7 @@ for facilitating the analysis and interpretation of the experimental results.
### Last updates:
* Version 0.1.7 is released! major changes can be consulted [here](quapy/CHANGE_LOG.txt).
* A detailed documentation is now available [here](https://hlt-isti.github.io/QuaPy/)
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)
@ -22,6 +23,20 @@ for facilitating the analysis and interpretation of the experimental results.
pip install quapy
```
### Cite QuaPy
If you find QuaPy useful (and we hope you will), plese consider citing the original paper in your research:
```
@inproceedings{moreo2021quapy,
title={QuaPy: a python-based framework for quantification},
author={Moreo, Alejandro and Esuli, Andrea and Sebastiani, Fabrizio},
booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
pages={4534--4543},
year={2021}
}
```
## A quick example:
The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the
@ -59,13 +74,14 @@ See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
## Features
* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
* Versatile functionality for performing evaluation based on artificial sampling protocols.
quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
* Datasets frequently used in quantification (textual and numeric), including:
* 32 UCI Machine Learning datasets.
* 11 Twitter quantification-by-sentiment datasets.
* 3 product reviews quantification-by-sentiment datasets.
* 4 tasks from LeQua competition (_new in v0.1.7!_)
* Native support for binary and single-label multiclass quantification scenarios.
* Model selection functionality that minimizes quantification-oriented loss functions.
* Visualization tools for analysing the experimental results.
@ -80,29 +96,6 @@ quantification methods based on structured output learning, HDy, QuaNet, and qua
* pandas, xlrd
* matplotlib
## SVM-perf with quantification-oriented losses
In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
SVM(AE), or SVM(RAE), you have to first download the
[svmperf](http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
package, apply the patch
[svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch), and compile the sources.
The script [prepare_svmperf.sh](prepare_svmperf.sh) does all the job. Simply run:
```
./prepare_svmperf.sh
```
The resulting directory [svm_perf_quantification](./svm_perf_quantification) contains the
patched version of _svmperf_ with quantification-oriented losses.
The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is an extension of the patch made available by
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
that allows SVMperf to optimize for
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
This patch extends the above one by also allowing SVMperf to optimize for
_AE_ and _RAE_.
## Documentation
@ -113,6 +106,8 @@ are provided:
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)

View File

@ -1,7 +1,20 @@
sample_size should not be mandatory when qp.environ['SAMPLE_SIZE'] has been specified
clean all the cumbersome methods that have to be implemented for new quantifiers (e.g., n_classes_ prop, etc.)
make truly parallel the GridSearchQ
make more examples in the "examples" directory
merge with master, because I had to fix some problems with QuaNet due to an issue notified via GitHub!
added cross_val_predict in qp.model_selection (i.e., a cross_val_predict for quantification) --would be nice to have
it parallelized
check the OneVsAll module(s)
check the set_params de neural.py, because the separation of estimator__<param> is not implemented; see also
__check_params_colision
HDy can be customized so that the number of bins is specified, instead of explored within the fit method
Packaging:
==========================================
Documentation with sphinx
Document methods with paper references
unit-tests
clean wiki_examples!

View File

@ -2,23 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Datasets &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Datasets &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="quapy" href="modules.html" />
<link rel="prev" title="Getting Started" href="readme.html" />
<link rel="next" title="Evaluation" href="Evaluation.html" />
<link rel="prev" title="Installation" href="Installation.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -34,12 +37,12 @@
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="modules.html" title="quapy"
<a href="Evaluation.html" title="Evaluation"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="readme.html" title="Getting Started"
<a href="Installation.html" title="Installation"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="datasets">
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this headline"></a></h1>
<section id="datasets">
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this heading"></a></h1>
<p>QuaPy makes available several datasets that have been used in
quantification literature, as well as an interface to allow
anyone import their custom datasets.</p>
@ -77,7 +80,7 @@ Take a look at the following code:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.17</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.33</span><span class="p">]</span>
</pre></div>
</div>
<p>One can easily produce new samples at desired class prevalences:</p>
<p>One can easily produce new samples at desired class prevalence values:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">sample_size</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">prev</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span>
<span class="n">sample</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prev</span><span class="p">)</span>
@ -106,31 +109,12 @@ the indexes, that can then be used to generate the sample:</p>
<span class="o">...</span>
</pre></div>
</div>
<p>QuaPy also implements the artificial sampling protocol that produces (via a
Pythons generator) a series of <em>LabelledCollection</em> objects with equidistant
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">artificial_sampling_generator</span><span class="p">(</span><span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">(),</span> <span class="n">prec</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
</pre></div>
</div>
<p>produces one sampling for each (valid) combination of prevalences originating from
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
that is:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
<span class="o">...</span>
<span class="p">[</span><span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
</pre></div>
</div>
<p>See the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">Evaluation wiki</a> for
further details on how to use the artificial sampling protocol to properly
evaluate a quantification method.</p>
<div class="section" id="reviews-datasets">
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this headline"></a></h2>
<p>However, generating samples for evaluation purposes is tackled in QuaPy
by means of the evaluation protocols (see the dedicated entries in the Wiki
for <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluation</a> and
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">protocols</a>).</p>
<section id="reviews-datasets">
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this heading"></a></h2>
<p>Three datasets of reviews about Kindle devices, Harry Potters series, and
the well-known IMDb movie reviews can be fetched using a unified interface.
For example:</p>
@ -150,47 +134,47 @@ For example:</p>
</pre></div>
</div>
<p>Some statistics of the fhe available datasets are summarized below:</p>
<table class="colwidths-auto docutils align-default">
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="text-align:center head"><p>classes</p></th>
<th class="text-align:center head"><p>train size</p></th>
<th class="text-align:center head"><p>test size</p></th>
<th class="text-align:center head"><p>train prev</p></th>
<th class="text-align:center head"><p>test prev</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>train size</p></th>
<th class="head text-center"><p>test size</p></th>
<th class="head text-center"><p>train prev</p></th>
<th class="head text-center"><p>test prev</p></th>
<th class="head"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>hp</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>9533</p></td>
<td class="text-align:center"><p>18399</p></td>
<td class="text-align:center"><p>[0.018, 0.982]</p></td>
<td class="text-align:center"><p>[0.065, 0.935]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>9533</p></td>
<td class="text-center"><p>18399</p></td>
<td class="text-center"><p>[0.018, 0.982]</p></td>
<td class="text-center"><p>[0.065, 0.935]</p></td>
<td><p>text</p></td>
</tr>
<tr class="row-odd"><td><p>kindle</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>3821</p></td>
<td class="text-align:center"><p>21591</p></td>
<td class="text-align:center"><p>[0.081, 0.919]</p></td>
<td class="text-align:center"><p>[0.063, 0.937]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>3821</p></td>
<td class="text-center"><p>21591</p></td>
<td class="text-center"><p>[0.081, 0.919]</p></td>
<td class="text-center"><p>[0.063, 0.937]</p></td>
<td><p>text</p></td>
</tr>
<tr class="row-even"><td><p>imdb</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>25000</p></td>
<td class="text-align:center"><p>25000</p></td>
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>25000</p></td>
<td class="text-center"><p>25000</p></td>
<td class="text-center"><p>[0.500, 0.500]</p></td>
<td class="text-center"><p>[0.500, 0.500]</p></td>
<td><p>text</p></td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="twitter-sentiment-datasets">
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this headline"></a></h2>
</section>
<section id="twitter-sentiment-datasets">
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this heading"></a></h2>
<p>11 Twitter datasets for sentiment analysis.
Text is not accessible, and the documents were made available
in tf-idf format. Each dataset presents two splits: a train/val
@ -221,123 +205,123 @@ The lists of the Twitter datasets ids can be consulted in:</p>
</pre></div>
</div>
<p>Some details can be found below:</p>
<table class="colwidths-auto docutils align-default">
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="text-align:center head"><p>classes</p></th>
<th class="text-align:center head"><p>train size</p></th>
<th class="text-align:center head"><p>test size</p></th>
<th class="text-align:center head"><p>features</p></th>
<th class="text-align:center head"><p>train prev</p></th>
<th class="text-align:center head"><p>test prev</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>train size</p></th>
<th class="head text-center"><p>test size</p></th>
<th class="head text-center"><p>features</p></th>
<th class="head text-center"><p>train prev</p></th>
<th class="head text-center"><p>test prev</p></th>
<th class="head"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>gasp</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>8788</p></td>
<td class="text-align:center"><p>3765</p></td>
<td class="text-align:center"><p>694582</p></td>
<td class="text-align:center"><p>[0.421, 0.496, 0.082]</p></td>
<td class="text-align:center"><p>[0.407, 0.507, 0.086]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>8788</p></td>
<td class="text-center"><p>3765</p></td>
<td class="text-center"><p>694582</p></td>
<td class="text-center"><p>[0.421, 0.496, 0.082]</p></td>
<td class="text-center"><p>[0.407, 0.507, 0.086]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>hcr</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>1594</p></td>
<td class="text-align:center"><p>798</p></td>
<td class="text-align:center"><p>222046</p></td>
<td class="text-align:center"><p>[0.546, 0.211, 0.243]</p></td>
<td class="text-align:center"><p>[0.640, 0.167, 0.193]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>1594</p></td>
<td class="text-center"><p>798</p></td>
<td class="text-center"><p>222046</p></td>
<td class="text-center"><p>[0.546, 0.211, 0.243]</p></td>
<td class="text-center"><p>[0.640, 0.167, 0.193]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>omd</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>1839</p></td>
<td class="text-align:center"><p>787</p></td>
<td class="text-align:center"><p>199151</p></td>
<td class="text-align:center"><p>[0.463, 0.271, 0.266]</p></td>
<td class="text-align:center"><p>[0.437, 0.283, 0.280]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>1839</p></td>
<td class="text-center"><p>787</p></td>
<td class="text-center"><p>199151</p></td>
<td class="text-center"><p>[0.463, 0.271, 0.266]</p></td>
<td class="text-center"><p>[0.437, 0.283, 0.280]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>sanders</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>2155</p></td>
<td class="text-align:center"><p>923</p></td>
<td class="text-align:center"><p>229399</p></td>
<td class="text-align:center"><p>[0.161, 0.691, 0.148]</p></td>
<td class="text-align:center"><p>[0.164, 0.688, 0.148]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>2155</p></td>
<td class="text-center"><p>923</p></td>
<td class="text-center"><p>229399</p></td>
<td class="text-center"><p>[0.161, 0.691, 0.148]</p></td>
<td class="text-center"><p>[0.164, 0.688, 0.148]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>semeval13</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>11338</p></td>
<td class="text-align:center"><p>3813</p></td>
<td class="text-align:center"><p>1215742</p></td>
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-align:center"><p>[0.158, 0.430, 0.412]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>11338</p></td>
<td class="text-center"><p>3813</p></td>
<td class="text-center"><p>1215742</p></td>
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-center"><p>[0.158, 0.430, 0.412]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>semeval14</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>11338</p></td>
<td class="text-align:center"><p>1853</p></td>
<td class="text-align:center"><p>1215742</p></td>
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-align:center"><p>[0.109, 0.361, 0.530]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>11338</p></td>
<td class="text-center"><p>1853</p></td>
<td class="text-center"><p>1215742</p></td>
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-center"><p>[0.109, 0.361, 0.530]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>semeval15</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>11338</p></td>
<td class="text-align:center"><p>2390</p></td>
<td class="text-align:center"><p>1215742</p></td>
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-align:center"><p>[0.153, 0.413, 0.434]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>11338</p></td>
<td class="text-center"><p>2390</p></td>
<td class="text-center"><p>1215742</p></td>
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-center"><p>[0.153, 0.413, 0.434]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>semeval16</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>8000</p></td>
<td class="text-align:center"><p>2000</p></td>
<td class="text-align:center"><p>889504</p></td>
<td class="text-align:center"><p>[0.157, 0.351, 0.492]</p></td>
<td class="text-align:center"><p>[0.163, 0.341, 0.497]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>8000</p></td>
<td class="text-center"><p>2000</p></td>
<td class="text-center"><p>889504</p></td>
<td class="text-center"><p>[0.157, 0.351, 0.492]</p></td>
<td class="text-center"><p>[0.163, 0.341, 0.497]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>sst</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>2971</p></td>
<td class="text-align:center"><p>1271</p></td>
<td class="text-align:center"><p>376132</p></td>
<td class="text-align:center"><p>[0.261, 0.452, 0.288]</p></td>
<td class="text-align:center"><p>[0.207, 0.481, 0.312]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>2971</p></td>
<td class="text-center"><p>1271</p></td>
<td class="text-center"><p>376132</p></td>
<td class="text-center"><p>[0.261, 0.452, 0.288]</p></td>
<td class="text-center"><p>[0.207, 0.481, 0.312]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>wa</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>2184</p></td>
<td class="text-align:center"><p>936</p></td>
<td class="text-align:center"><p>248563</p></td>
<td class="text-align:center"><p>[0.305, 0.414, 0.281]</p></td>
<td class="text-align:center"><p>[0.282, 0.446, 0.272]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>2184</p></td>
<td class="text-center"><p>936</p></td>
<td class="text-center"><p>248563</p></td>
<td class="text-center"><p>[0.305, 0.414, 0.281]</p></td>
<td class="text-center"><p>[0.282, 0.446, 0.272]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>wb</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>4259</p></td>
<td class="text-align:center"><p>1823</p></td>
<td class="text-align:center"><p>404333</p></td>
<td class="text-align:center"><p>[0.270, 0.392, 0.337]</p></td>
<td class="text-align:center"><p>[0.274, 0.392, 0.335]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>4259</p></td>
<td class="text-center"><p>1823</p></td>
<td class="text-center"><p>404333</p></td>
<td class="text-center"><p>[0.270, 0.392, 0.337]</p></td>
<td class="text-center"><p>[0.274, 0.392, 0.335]</p></td>
<td><p>sparse</p></td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="uci-machine-learning">
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this headline"></a></h2>
</section>
<section id="uci-machine-learning">
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this heading"></a></h2>
<p>A set of 32 datasets from the <a class="reference external" href="https://archive.ics.uci.edu/ml/datasets.php">UCI Machine Learning repository</a>
used in:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Pérez</span><span class="o">-</span><span class="n">Gállego</span><span class="p">,</span> <span class="n">P</span><span class="o">.</span><span class="p">,</span> <span class="n">Quevedo</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">R</span><span class="o">.</span><span class="p">,</span> <span class="o">&amp;</span> <span class="k">del</span> <span class="n">Coz</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">J</span><span class="o">.</span> <span class="p">(</span><span class="mi">2017</span><span class="p">)</span><span class="o">.</span>
@ -371,252 +355,252 @@ training+test dataset at a time, following a kFCV protocol:</p>
<p>Above code will allow to conduct a 2x5FCV evaluation on the “yeast” dataset.</p>
<p>All datasets come in numerical form (dense matrices); some statistics
are summarized below.</p>
<table class="colwidths-auto docutils align-default">
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="text-align:center head"><p>classes</p></th>
<th class="text-align:center head"><p>instances</p></th>
<th class="text-align:center head"><p>features</p></th>
<th class="text-align:center head"><p>prev</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>instances</p></th>
<th class="head text-center"><p>features</p></th>
<th class="head text-center"><p>prev</p></th>
<th class="head"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>acute.a</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>120</p></td>
<td class="text-align:center"><p>6</p></td>
<td class="text-align:center"><p>[0.508, 0.492]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>120</p></td>
<td class="text-center"><p>6</p></td>
<td class="text-center"><p>[0.508, 0.492]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>acute.b</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>120</p></td>
<td class="text-align:center"><p>6</p></td>
<td class="text-align:center"><p>[0.583, 0.417]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>120</p></td>
<td class="text-center"><p>6</p></td>
<td class="text-center"><p>[0.583, 0.417]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>balance.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>625</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>625</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.539, 0.461]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>balance.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>625</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.922, 0.078]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>625</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.922, 0.078]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>balance.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>625</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>625</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.539, 0.461]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>breast-cancer</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>683</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.350, 0.650]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>683</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.350, 0.650]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>cmc.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1473</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.573, 0.427]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1473</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.573, 0.427]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>cmc.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1473</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.774, 0.226]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1473</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.774, 0.226]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>cmc.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1473</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1473</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.653, 0.347]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>ctg.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>2126</p></td>
<td class="text-align:center"><p>22</p></td>
<td class="text-align:center"><p>[0.222, 0.778]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>2126</p></td>
<td class="text-center"><p>22</p></td>
<td class="text-center"><p>[0.222, 0.778]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>ctg.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>2126</p></td>
<td class="text-align:center"><p>22</p></td>
<td class="text-align:center"><p>[0.861, 0.139]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>2126</p></td>
<td class="text-center"><p>22</p></td>
<td class="text-center"><p>[0.861, 0.139]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>ctg.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>2126</p></td>
<td class="text-align:center"><p>22</p></td>
<td class="text-align:center"><p>[0.917, 0.083]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>2126</p></td>
<td class="text-center"><p>22</p></td>
<td class="text-center"><p>[0.917, 0.083]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>german</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1000</p></td>
<td class="text-align:center"><p>24</p></td>
<td class="text-align:center"><p>[0.300, 0.700]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>24</p></td>
<td class="text-center"><p>[0.300, 0.700]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>haberman</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>306</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>[0.735, 0.265]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>306</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>[0.735, 0.265]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>ionosphere</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>351</p></td>
<td class="text-align:center"><p>34</p></td>
<td class="text-align:center"><p>[0.641, 0.359]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>351</p></td>
<td class="text-center"><p>34</p></td>
<td class="text-center"><p>[0.641, 0.359]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>iris.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>150</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>150</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.667, 0.333]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>iris.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>150</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>150</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.667, 0.333]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>iris.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>150</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>150</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.667, 0.333]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>mammographic</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>830</p></td>
<td class="text-align:center"><p>5</p></td>
<td class="text-align:center"><p>[0.514, 0.486]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>830</p></td>
<td class="text-center"><p>5</p></td>
<td class="text-center"><p>[0.514, 0.486]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>pageblocks.5</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>5473</p></td>
<td class="text-align:center"><p>10</p></td>
<td class="text-align:center"><p>[0.979, 0.021]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>5473</p></td>
<td class="text-center"><p>10</p></td>
<td class="text-center"><p>[0.979, 0.021]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>semeion</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1593</p></td>
<td class="text-align:center"><p>256</p></td>
<td class="text-align:center"><p>[0.901, 0.099]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1593</p></td>
<td class="text-center"><p>256</p></td>
<td class="text-center"><p>[0.901, 0.099]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>sonar</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>208</p></td>
<td class="text-align:center"><p>60</p></td>
<td class="text-align:center"><p>[0.534, 0.466]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>208</p></td>
<td class="text-center"><p>60</p></td>
<td class="text-center"><p>[0.534, 0.466]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>spambase</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>4601</p></td>
<td class="text-align:center"><p>57</p></td>
<td class="text-align:center"><p>[0.606, 0.394]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>4601</p></td>
<td class="text-center"><p>57</p></td>
<td class="text-center"><p>[0.606, 0.394]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>spectf</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>267</p></td>
<td class="text-align:center"><p>44</p></td>
<td class="text-align:center"><p>[0.794, 0.206]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>267</p></td>
<td class="text-center"><p>44</p></td>
<td class="text-center"><p>[0.794, 0.206]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>tictactoe</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>958</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>958</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.653, 0.347]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>transfusion</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>748</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.762, 0.238]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>748</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.762, 0.238]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>wdbc</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>569</p></td>
<td class="text-align:center"><p>30</p></td>
<td class="text-align:center"><p>[0.627, 0.373]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>569</p></td>
<td class="text-center"><p>30</p></td>
<td class="text-center"><p>[0.627, 0.373]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>wine.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>178</p></td>
<td class="text-align:center"><p>13</p></td>
<td class="text-align:center"><p>[0.669, 0.331]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>178</p></td>
<td class="text-center"><p>13</p></td>
<td class="text-center"><p>[0.669, 0.331]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>wine.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>178</p></td>
<td class="text-align:center"><p>13</p></td>
<td class="text-align:center"><p>[0.601, 0.399]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>178</p></td>
<td class="text-center"><p>13</p></td>
<td class="text-center"><p>[0.601, 0.399]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>wine.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>178</p></td>
<td class="text-align:center"><p>13</p></td>
<td class="text-align:center"><p>[0.730, 0.270]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>178</p></td>
<td class="text-center"><p>13</p></td>
<td class="text-center"><p>[0.730, 0.270]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>wine-q-red</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1599</p></td>
<td class="text-align:center"><p>11</p></td>
<td class="text-align:center"><p>[0.465, 0.535]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1599</p></td>
<td class="text-center"><p>11</p></td>
<td class="text-center"><p>[0.465, 0.535]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>wine-q-white</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>4898</p></td>
<td class="text-align:center"><p>11</p></td>
<td class="text-align:center"><p>[0.335, 0.665]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>4898</p></td>
<td class="text-center"><p>11</p></td>
<td class="text-center"><p>[0.335, 0.665]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>yeast</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1484</p></td>
<td class="text-align:center"><p>8</p></td>
<td class="text-align:center"><p>[0.711, 0.289]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1484</p></td>
<td class="text-center"><p>8</p></td>
<td class="text-center"><p>[0.711, 0.289]</p></td>
<td><p>dense</p></td>
</tr>
</tbody>
</table>
<div class="section" id="issues">
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this headline"></a></h3>
<section id="issues">
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this heading"></a></h3>
<p>All datasets will be downloaded automatically the first time they are requested, and
stored in the <em>quapy_data</em> folder for faster further reuse.
However, some datasets require special actions that at the moment are not fully
@ -631,10 +615,82 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
OS-dependent software manually. Information on how to do it will be printed the first
time the dataset is invoked.</p></li>
</ul>
</section>
</section>
<section id="lequa-datasets">
<h2>LeQua Datasets<a class="headerlink" href="#lequa-datasets" title="Permalink to this heading"></a></h2>
<p>QuaPy also provides the datasets used for the LeQua competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
are multiclass quantification problems consisting of estimating the class prevalence
values of 28 different merchandise products.</p>
<p>Every task consists of a training set, a set of validation samples (for model selection)
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
(training) and two generation protocols (for validation and test samples), as follows:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">training</span><span class="p">,</span> <span class="n">val_generator</span><span class="p">,</span> <span class="n">test_generator</span> <span class="o">=</span> <span class="n">fetch_lequa2022</span><span class="p">(</span><span class="n">task</span><span class="o">=</span><span class="n">task</span><span class="p">)</span>
</pre></div>
</div>
<p>See the <code class="docutils literal notranslate"><span class="pre">lequa2022_experiments.py</span></code> in the examples folder for further details on how to
carry out experiments using these datasets.</p>
<p>The datasets are downloaded only once, and stored for fast reuse.</p>
<p>Some statistics are summarized below:</p>
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>train size</p></th>
<th class="head text-center"><p>validation samples</p></th>
<th class="head text-center"><p>test samples</p></th>
<th class="head text-center"><p>docs by sample</p></th>
<th class="head text-center"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>T1A</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>250</p></td>
<td class="text-center"><p>vector</p></td>
</tr>
<tr class="row-odd"><td><p>T1B</p></td>
<td class="text-center"><p>28</p></td>
<td class="text-center"><p>20000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>vector</p></td>
</tr>
<tr class="row-even"><td><p>T2A</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>250</p></td>
<td class="text-center"><p>text</p></td>
</tr>
<tr class="row-odd"><td><p>T2B</p></td>
<td class="text-center"><p>28</p></td>
<td class="text-center"><p>20000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>text</p></td>
</tr>
</tbody>
</table>
<p>For further details on the datasets, we refer to the original
<a class="reference external" href="https://ceur-ws.org/Vol-3180/paper-146.pdf">paper</a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Esuli</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Moreo</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Sebastiani</span><span class="p">,</span> <span class="n">F</span><span class="o">.</span><span class="p">,</span> <span class="o">&amp;</span> <span class="n">Sperduti</span><span class="p">,</span> <span class="n">G</span><span class="o">.</span> <span class="p">(</span><span class="mi">2022</span><span class="p">)</span><span class="o">.</span>
<span class="n">A</span> <span class="n">Detailed</span> <span class="n">Overview</span> <span class="n">of</span> <span class="n">LeQua</span><span class="o">@</span> <span class="n">CLEF</span> <span class="mi">2022</span><span class="p">:</span> <span class="n">Learning</span> <span class="n">to</span> <span class="n">Quantify</span><span class="o">.</span>
</pre></div>
</div>
<div class="section" id="adding-custom-datasets">
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this headline"></a></h2>
</section>
<section id="adding-custom-datasets">
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this heading"></a></h2>
<p>QuaPy provides data loaders for simple formats dealing with
text, following the format:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">class</span><span class="o">-</span><span class="nb">id</span> \<span class="n">t</span> <span class="n">first</span> <span class="n">document</span><span class="s1">&#39;s pre-processed text </span><span class="se">\n</span>
@ -664,17 +720,20 @@ all classes to be present in the collection).</p>
paths, in order to create a training and test pair of <em>LabelledCollection</em>,
e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="n">train_path</span> <span class="o">=</span> <span class="s1">&#39;../my_data/train.dat&#39;</span>
<span class="n">test_path</span> <span class="o">=</span> <span class="s1">&#39;../my_data/test.dat&#39;</span>
<span class="k">def</span> <span class="nf">my_custom_loader</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">return</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">Dataset</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">my_custom_loader</span><span class="p">)</span>
</pre></div>
</div>
<div class="section" id="data-processing">
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this headline"></a></h3>
<section id="data-processing">
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this heading"></a></h3>
<p>QuaPy implements a number of preprocessing functions in the package <em>qp.data.preprocessing</em>, including:</p>
<ul class="simple">
<li><p><em>text2tfidf</em>: tfidf vectorization</p></li>
@ -683,9 +742,9 @@ e.g.:</p>
that the column values have zero mean and unit variance).</p></li>
<li><p><em>index</em>: transforms textual tokens into lists of numeric ids)</p></li>
</ul>
</div>
</div>
</div>
</section>
</section>
</section>
<div class="clearer"></div>
@ -694,8 +753,9 @@ that the column values have zero mean and unit variance).</p></li>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Datasets</a><ul>
<li><a class="reference internal" href="#reviews-datasets">Reviews Datasets</a></li>
<li><a class="reference internal" href="#twitter-sentiment-datasets">Twitter Sentiment Datasets</a></li>
@ -703,6 +763,7 @@ that the column values have zero mean and unit variance).</p></li>
<li><a class="reference internal" href="#issues">Issues:</a></li>
</ul>
</li>
<li><a class="reference internal" href="#lequa-datasets">LeQua Datasets</a></li>
<li><a class="reference internal" href="#adding-custom-datasets">Adding Custom Datasets</a><ul>
<li><a class="reference internal" href="#data-processing">Data Processing</a></li>
</ul>
@ -711,12 +772,17 @@ that the column values have zero mean and unit variance).</p></li>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="readme.html"
title="previous chapter">Getting Started</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Installation.html"
title="previous chapter">Installation</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Evaluation.html"
title="next chapter">Evaluation</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -733,7 +799,7 @@ that the column values have zero mean and unit variance).</p></li>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -748,18 +814,18 @@ that the column values have zero mean and unit variance).</p></li>
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="modules.html" title="quapy"
<a href="Evaluation.html" title="Evaluation"
>next</a> |</li>
<li class="right" >
<a href="readme.html" title="Getting Started"
<a href="Installation.html" title="Installation"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,22 +2,25 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Evaluation &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Evaluation &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Quantification Methods" href="Methods.html" />
<link rel="next" title="Protocols" href="Protocols.html" />
<link rel="prev" title="Datasets" href="Datasets.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
@ -34,12 +37,12 @@
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Protocols.html" title="Protocols"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Datasets.html" title="Datasets"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="evaluation">
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this headline"></a></h1>
<section id="evaluation">
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this heading"></a></h1>
<p>Quantification is an appealing tool in scenarios of dataset shift,
and particularly in scenarios of prior-probability shift.
That is, the interest in estimating the class prevalences arises
@ -62,8 +65,8 @@ to be unlikely (as is the case in general scenarios of
machine learning governed by the iid assumption).
In brief, quantification requires dedicated evaluation protocols,
which are implemented in QuaPy and explained here.</p>
<div class="section" id="error-measures">
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this headline"></a></h2>
<section id="error-measures">
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this heading"></a></h2>
<p>The module quapy.error implements the following error measures for quantification:</p>
<ul class="simple">
<li><p><em>mae</em>: mean absolute error</p></li>
@ -96,13 +99,13 @@ third argument, e.g.:</p>
Traditionally, this value is set to 1/(2T) in past literature,
with T the sampling size. One could either pass this value
to the function each time, or to set a QuaPys environment
variable <em>SAMPLE_SIZE</em> once, and ommit this argument
variable <em>SAMPLE_SIZE</em> once, and omit this argument
thereafter (recommended);
e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># once for all</span>
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">])</span> <span class="c1"># let&#39;s assume 3 classes</span>
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">])</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;mrae(</span><span class="si">{</span><span class="n">true_prev</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">estim_prev</span><span class="si">}</span><span class="s1">) = </span><span class="si">{</span><span class="n">error</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
</div>
@ -112,150 +115,95 @@ e.g.:</p>
</div>
<p>Finally, it is possible to instantiate QuaPys quantification
error functions from strings using, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">&#39;mse&#39;</span><span class="p">)</span>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">&#39;mse&#39;</span><span class="p">)</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">error_function</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="evaluation-protocols">
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this headline"></a></h2>
<p>QuaPy implements the so-called “artificial sampling protocol”,
according to which a test set is used to generate samplings at
desired prevalences of fixed size and covering the full spectrum
of prevalences. This protocol is called “artificial” in contrast
to the “natural prevalence sampling” protocol that,
despite introducing some variability during sampling, approximately
preserves the training class prevalence.</p>
<p>In the artificial sampling procol, the user specifies the number
of (equally distant) points to be generated from the interval [0,1].</p>
<p>For example, if n_prevpoints=11 then, for each class, the prevalences
[0., 0.1, 0.2, …, 1.] will be used. This means that, for two classes,
the number of different prevalences will be 11 (since, once the prevalence
of one class is determined, the other one is constrained). For 3 classes,
the number of valid combinations can be obtained as 11 + 10 + … + 1 = 66.
In general, the number of valid combinations that will be produced for a given
value of n_prevpoints can be consulted by invoking
quapy.functional.num_prevalence_combinations, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="mi">21</span>
<span class="n">n_classes</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</section>
<section id="evaluation-protocols">
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this heading"></a></h2>
<p>An <em>evaluation protocol</em> is an evaluation procedure that uses
one specific <em>sample generation procotol</em> to genereate many
samples, typically characterized by widely varying amounts of
<em>shift</em> with respect to the original distribution, that are then
used to evaluate the performance of a (trained) quantifier.
These protocols are explained in more detail in a dedicated <a class="reference internal" href="Protocols.html"><span class="doc std std-doc">entry
in the wiki</span></a>. For the moment being, let us assume we already have
chosen and instantiated one specific such protocol, that we here
simply call <em>prot</em>. Let also assume our model is called
<em>quantifier</em> and that our evaluatio measure of choice is
<em>mae</em>. The evaluation comes down to:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mae</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;MAE = </span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>in this example, n=1771. Note the last argument, n_repeats, that
informs of the number of examples that will be generated for any
valid combination (typical values are, e.g., 1 for a single sample,
or 10 or higher for computing standard deviations of performing statistical
significance tests).</p>
<p>One can instead work the other way around, i.e., one could set a
maximum budged of evaluations and get the number of prevalence points that
will generate a number of evaluations close, but not higher, than
the fixed budget. This can be achieved with the function
quapy.functional.get_nprevpoints_approximation, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">budget</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">get_nprevpoints_approximation</span><span class="p">(</span><span class="n">budget</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;by setting n_prevpoints=</span><span class="si">{</span><span class="n">n_prevpoints</span><span class="si">}</span><span class="s1"> the number of evaluations for </span><span class="si">{</span><span class="n">n_classes</span><span class="si">}</span><span class="s1"> classes will be </span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<p>It is often desirable to evaluate our system using more than one
single evaluatio measure. In this case, it is convenient to generate
a <em>report</em>. A report in QuaPy is a dataframe accounting for all the
true prevalence values with their corresponding prevalence values
as estimated by the quantifier, along with the error each has given
rise.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">report</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluation_report</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span> <span class="s1">&#39;mrae&#39;</span><span class="p">,</span> <span class="s1">&#39;mkld&#39;</span><span class="p">])</span>
</pre></div>
</div>
<p>that will print:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">by</span> <span class="n">setting</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">30</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">evaluations</span> <span class="k">for</span> <span class="mi">4</span> <span class="n">classes</span> <span class="n">will</span> <span class="n">be</span> <span class="mi">4960</span>
</pre></div>
</div>
<p>The cost of evaluation will depend on the values of <em>n_prevpoints</em>, <em>n_classes</em>,
and <em>n_repeats</em>. Since it might sometimes be cumbersome to control the overall
cost of an experiment having to do with the number of combinations that
will be generated for a particular setting of these arguments (particularly
when <em>n_classes&gt;2</em>), evaluation functions
typically allow the user to rather specify an <em>evaluation budget</em>, i.e., a maximum
number of samplings to generate. By specifying this argument, one could avoid
specifying <em>n_prevpoints</em>, and the value for it that would lead to a closer
number of evaluation budget, without surpassing it, will be automatically set.</p>
<p>The following script shows a full example in which a PACC model relying
on a Logistic Regressor classifier is
tested on the <em>kindle</em> dataset by means of the artificial prevalence
sampling protocol on samples of size 500, in terms of various
evaluation metrics.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
<p>From a pandas dataframe, it is straightforward to visualize all the results,
and compute the averaged values, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">&#39;display.expand_frame_repr&#39;</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="n">report</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">report</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="p">)</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;kindle&#39;</span><span class="p">)</span>
<span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">preprocessing</span><span class="o">.</span><span class="n">text2tfidf</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
<span class="n">pacc</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_report</span><span class="p">(</span>
<span class="n">pacc</span><span class="p">,</span> <span class="c1"># the quantification method</span>
<span class="n">test</span><span class="p">,</span> <span class="c1"># the test set on which the method will be evaluated</span>
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="c1">#indicates the size of samples to be drawn</span>
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="c1"># how many prevalence points will be extracted from the interval [0, 1] for each category</span>
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># number of times each prevalence will be used to generate a test sample</span>
<span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)</span>
<span class="n">random_seed</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span> <span class="c1"># setting a random seed allows to replicate the test samples across runs</span>
<span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span> <span class="s1">&#39;mrae&#39;</span><span class="p">,</span> <span class="s1">&#39;mkld&#39;</span><span class="p">],</span> <span class="c1"># specify the evaluation metrics</span>
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># set to True to show some standard-line outputs</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Averaged values:&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
</pre></div>
</div>
<p>The resulting report is a pandas dataframe that can be directly printed.
Here, we set some display options from pandas just to make the output clearer;
note also that the estimated prevalences are shown as strings using the
function strprev function that simply converts a prevalence into a
string representing it, with a fixed decimal precision (default 3):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">&#39;display.expand_frame_repr&#39;</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s2">&quot;precision&quot;</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">df</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</pre></div>
</div>
<p>The output should look like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.000</span><span class="p">,</span> <span class="mf">1.000</span><span class="p">]</span> <span class="mf">0.000</span> <span class="mf">0.000</span> <span class="mf">0.000e+00</span>
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.091</span><span class="p">,</span> <span class="mf">0.909</span><span class="p">]</span> <span class="mf">0.009</span> <span class="mf">0.048</span> <span class="mf">4.426e-04</span>
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.163</span><span class="p">,</span> <span class="mf">0.837</span><span class="p">]</span> <span class="mf">0.037</span> <span class="mf">0.114</span> <span class="mf">4.633e-03</span>
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.283</span><span class="p">,</span> <span class="mf">0.717</span><span class="p">]</span> <span class="mf">0.017</span> <span class="mf">0.041</span> <span class="mf">7.383e-04</span>
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.366</span><span class="p">,</span> <span class="mf">0.634</span><span class="p">]</span> <span class="mf">0.034</span> <span class="mf">0.070</span> <span class="mf">2.412e-03</span>
<span class="mi">5</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.459</span><span class="p">,</span> <span class="mf">0.541</span><span class="p">]</span> <span class="mf">0.041</span> <span class="mf">0.082</span> <span class="mf">3.387e-03</span>
<span class="mi">6</span> <span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.565</span><span class="p">,</span> <span class="mf">0.435</span><span class="p">]</span> <span class="mf">0.035</span> <span class="mf">0.073</span> <span class="mf">2.535e-03</span>
<span class="mi">7</span> <span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.654</span><span class="p">,</span> <span class="mf">0.346</span><span class="p">]</span> <span class="mf">0.046</span> <span class="mf">0.108</span> <span class="mf">4.701e-03</span>
<span class="mi">8</span> <span class="p">[</span><span class="mf">0.8</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.725</span><span class="p">,</span> <span class="mf">0.275</span><span class="p">]</span> <span class="mf">0.075</span> <span class="mf">0.235</span> <span class="mf">1.515e-02</span>
<span class="mi">9</span> <span class="p">[</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.858</span><span class="p">,</span> <span class="mf">0.142</span><span class="p">]</span> <span class="mf">0.042</span> <span class="mf">0.229</span> <span class="mf">7.740e-03</span>
<span class="mi">10</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.945</span><span class="p">,</span> <span class="mf">0.055</span><span class="p">]</span> <span class="mf">0.055</span> <span class="mf">27.357</span> <span class="mf">5.219e-02</span>
</pre></div>
</div>
<p>One can get the averaged scores using standard pandas
functions, i.e.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
</pre></div>
</div>
<p>will produce the following output:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="mf">0.500</span>
<span class="n">mae</span> <span class="mf">0.035</span>
<span class="n">mrae</span> <span class="mf">2.578</span>
<span class="n">mkld</span> <span class="mf">0.009</span>
<p>This will produce an output like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.308</span><span class="p">,</span> <span class="mf">0.692</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.314</span><span class="p">,</span> <span class="mf">0.686</span><span class="p">]</span> <span class="mf">0.005649</span> <span class="mf">0.013182</span> <span class="mf">0.000074</span>
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.896</span><span class="p">,</span> <span class="mf">0.104</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.909</span><span class="p">,</span> <span class="mf">0.091</span><span class="p">]</span> <span class="mf">0.013145</span> <span class="mf">0.069323</span> <span class="mf">0.000985</span>
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.848</span><span class="p">,</span> <span class="mf">0.152</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.809</span><span class="p">,</span> <span class="mf">0.191</span><span class="p">]</span> <span class="mf">0.039063</span> <span class="mf">0.149806</span> <span class="mf">0.005175</span>
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.016</span><span class="p">,</span> <span class="mf">0.984</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.033</span><span class="p">,</span> <span class="mf">0.967</span><span class="p">]</span> <span class="mf">0.017236</span> <span class="mf">0.487529</span> <span class="mf">0.005298</span>
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.728</span><span class="p">,</span> <span class="mf">0.272</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.751</span><span class="p">,</span> <span class="mf">0.249</span><span class="p">]</span> <span class="mf">0.022769</span> <span class="mf">0.057146</span> <span class="mf">0.001350</span>
<span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span>
<span class="mi">4995</span> <span class="p">[</span><span class="mf">0.72</span><span class="p">,</span> <span class="mf">0.28</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.698</span><span class="p">,</span> <span class="mf">0.302</span><span class="p">]</span> <span class="mf">0.021752</span> <span class="mf">0.053631</span> <span class="mf">0.001133</span>
<span class="mi">4996</span> <span class="p">[</span><span class="mf">0.868</span><span class="p">,</span> <span class="mf">0.132</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.888</span><span class="p">,</span> <span class="mf">0.112</span><span class="p">]</span> <span class="mf">0.020490</span> <span class="mf">0.088230</span> <span class="mf">0.001985</span>
<span class="mi">4997</span> <span class="p">[</span><span class="mf">0.292</span><span class="p">,</span> <span class="mf">0.708</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.298</span><span class="p">,</span> <span class="mf">0.702</span><span class="p">]</span> <span class="mf">0.006149</span> <span class="mf">0.014788</span> <span class="mf">0.000090</span>
<span class="mi">4998</span> <span class="p">[</span><span class="mf">0.24</span><span class="p">,</span> <span class="mf">0.76</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.220</span><span class="p">,</span> <span class="mf">0.780</span><span class="p">]</span> <span class="mf">0.019950</span> <span class="mf">0.054309</span> <span class="mf">0.001127</span>
<span class="mi">4999</span> <span class="p">[</span><span class="mf">0.948</span><span class="p">,</span> <span class="mf">0.052</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.965</span><span class="p">,</span> <span class="mf">0.035</span><span class="p">]</span> <span class="mf">0.016941</span> <span class="mf">0.165776</span> <span class="mf">0.003538</span>
<span class="p">[</span><span class="mi">5000</span> <span class="n">rows</span> <span class="n">x</span> <span class="mi">5</span> <span class="n">columns</span><span class="p">]</span>
<span class="n">Averaged</span> <span class="n">values</span><span class="p">:</span>
<span class="n">mae</span> <span class="mf">0.023588</span>
<span class="n">mrae</span> <span class="mf">0.108779</span>
<span class="n">mkld</span> <span class="mf">0.003631</span>
<span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>
<span class="n">Process</span> <span class="n">finished</span> <span class="k">with</span> <span class="n">exit</span> <span class="n">code</span> <span class="mi">0</span>
</pre></div>
</div>
<p>Other evaluation functions include:</p>
<ul class="simple">
<li><p><em>artificial_sampling_eval</em>: that computes the evaluation for a
given evaluation metric, returning the average instead of a dataframe.</p></li>
<li><p><em>artificial_sampling_prediction</em>: that returns two np.arrays containing the
true prevalences and the estimated prevalences.</p></li>
</ul>
<p>See the documentation for further details.</p>
</div>
<p>Alternatively, we can simply generate all the predictions by:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">)</span>
</pre></div>
</div>
<p>All the evaluation functions implement specific optimizations for speeding-up
the evaluation of aggregative quantifiers (i.e., of instances of <em>AggregativeQuantifier</em>).
The optimization comes down to generating classification predictions (either crisp or soft)
only once for the entire test set, and then applying the sampling procedure to the
predictions, instead of generating samples of instances and then computing the
classification predictions every time. This is only possible when the protocol
is an instance of <em>OnLabelledCollectionProtocol</em>. The optimization is only
carried out when the number of classification predictions thus generated would be
smaller than the number of predictions required for the entire protocol; e.g.,
if the original dataset contains 1M instances, but the protocol is such that it would
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
classification for each sample. This behaviour is indicated by setting
<em>aggr_speedup=”auto”</em>. Conversely, when indicating <em>aggr_speedup=”force”</em> QuaPy will
precompute all the predictions irrespectively of the number of instances and number of samples.
Finally, this can be deactivated by setting <em>aggr_speedup=False</em>. Note that this optimization
is not only applied for the final evaluation, but also for the internal evaluations carried
out during <em>model selection</em>. Since these are typically many, the heuristic can help reduce the
execution time a lot.</p>
</section>
</section>
<div class="clearer"></div>
@ -264,8 +212,9 @@ true prevalences and the estimated prevalences.</p></li>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Evaluation</a><ul>
<li><a class="reference internal" href="#error-measures">Error Measures</a></li>
<li><a class="reference internal" href="#evaluation-protocols">Evaluation Protocols</a></li>
@ -273,12 +222,17 @@ true prevalences and the estimated prevalences.</p></li>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="Datasets.html"
title="previous chapter">Datasets</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="Methods.html"
title="next chapter">Quantification Methods</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Datasets.html"
title="previous chapter">Datasets</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Protocols.html"
title="next chapter">Protocols</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -295,7 +249,7 @@ true prevalences and the estimated prevalences.</p></li>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -310,18 +264,18 @@ true prevalences and the estimated prevalences.</p></li>
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Protocols.html" title="Protocols"
>next</a> |</li>
<li class="right" >
<a href="Datasets.html" title="Datasets"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,18 +2,21 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Installation &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Installation &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
@ -39,7 +42,7 @@
<li class="right" >
<a href="index.html" title="Welcome to QuaPys documentation!"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
</ul>
</div>
@ -49,15 +52,15 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline"></a></h1>
<section id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this heading"></a></h1>
<p>QuaPy can be easily installed via <cite>pip</cite></p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">quapy</span>
</pre></div>
</div>
<p>See <a class="reference external" href="https://pypi.org/project/QuaPy/">pip page</a> for older versions.</p>
<div class="section" id="requirements">
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this headline"></a></h2>
<section id="requirements">
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this heading"></a></h2>
<ul class="simple">
<li><p>scikit-learn, numpy, scipy</p></li>
<li><p>pytorch (for QuaNet)</p></li>
@ -67,9 +70,9 @@
<li><p>pandas, xlrd</p></li>
<li><p>matplotlib</p></li>
</ul>
</div>
<div class="section" id="svm-perf-with-quantification-oriented-losses">
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this headline"></a></h2>
</section>
<section id="svm-perf-with-quantification-oriented-losses">
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this heading"></a></h2>
<p>In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
SVM(AE), or SVM(RAE), you have to first download the
<a class="reference external" href="http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">svmperf</a>
@ -96,8 +99,8 @@ and for the <cite>KLD</cite> and <cite>NKLD</cite> as proposed by
for quantification.
This patch extends the former by also allowing SVMperf to optimize for
<cite>AE</cite> and <cite>RAE</cite>.</p>
</div>
</div>
</section>
</section>
<div class="clearer"></div>
@ -106,8 +109,9 @@ This patch extends the former by also allowing SVMperf to optimize for
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Installation</a><ul>
<li><a class="reference internal" href="#requirements">Requirements</a></li>
<li><a class="reference internal" href="#svm-perf-with-quantification-oriented-losses">SVM-perf with quantification-oriented losses</a></li>
@ -115,12 +119,17 @@ This patch extends the former by also allowing SVMperf to optimize for
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Welcome to QuaPys documentation!</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="Datasets.html"
title="next chapter">Datasets</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Welcome to QuaPys documentation!</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Datasets.html"
title="next chapter">Datasets</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -137,7 +146,7 @@ This patch extends the former by also allowing SVMperf to optimize for
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -157,13 +166,13 @@ This patch extends the former by also allowing SVMperf to optimize for
<li class="right" >
<a href="index.html" title="Welcome to QuaPys documentation!"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,23 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Quantification Methods &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Quantification Methods &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Plotting" href="Plotting.html" />
<link rel="prev" title="Evaluation" href="Evaluation.html" />
<link rel="next" title="Model Selection" href="Model-Selection.html" />
<link rel="prev" title="Protocols" href="Protocols.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -34,12 +37,12 @@
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
<a href="Model-Selection.html" title="Model Selection"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Evaluation.html" title="Evaluation"
<a href="Protocols.html" title="Protocols"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="quantification-methods">
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this headline"></a></h1>
<section id="quantification-methods">
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this heading"></a></h1>
<p>Quantification methods can be categorized as belonging to
<em>aggregative</em> and <em>non-aggregative</em> groups.
Most methods included in QuaPy at the moment are of type <em>aggregative</em>
@ -65,12 +68,6 @@ and implement some abstract methods:</p>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span> <span class="o">...</span>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span> <span class="o">...</span>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span> <span class="o">...</span>
</pre></div>
</div>
<p>The meaning of those functions should be familiar to those
@ -82,12 +79,12 @@ scikit-learn structure has not been adopted <em>as is</em> in QuaPy responds
the fact that scikit-learns <em>predict</em> function is expected to return
one output for each input element e.g., a predicted label for each
instance in a sample while in quantification the output for a sample
is one single array of class prevalences), while functions <em>set_params</em>
and <em>get_params</em> allow a
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>
to automate the process of hyperparameter search.</p>
<div class="section" id="aggregative-methods">
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this headline"></a></h2>
is one single array of class prevalences).
Quantifiers also extend from scikit-learns <code class="docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, in order
to simplify the use of <em>set_params</em> and <em>get_params</em> used in
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>.</p>
<section id="aggregative-methods">
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this heading"></a></h2>
<p>All quantification methods are implemented as part of the
<em>qp.method</em> package. In particular, <em>aggregative</em> methods are defined in
<em>qp.method.aggregative</em>, and extend <em>AggregativeQuantifier(BaseQuantifier)</em>.
@ -103,12 +100,12 @@ The methods that any <em>aggregative</em> quantifier must implement are:</p>
individual predictions of a classifier. Indeed, a default implementation
of <em>BaseQuantifier.quantify</em> is already provided, which looks like:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">preclassify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">)</span>
</pre></div>
</div>
<p>Aggregative quantifiers are expected to maintain a classifier (which is
accessed through the <em>&#64;property</em> <em>learner</em>). This classifier is
accessed through the <em>&#64;property</em> <em>classifier</em>). This classifier is
given as input to the quantifier, and can be already fit
on external data (in which case, the <em>fit_learner</em> argument should
be set to False), or be fit by the quantifiers fit (default).</p>
@ -118,12 +115,8 @@ aggregative methods, that should inherit from the abstract class
The particularity of <em>probabilistic</em> aggregative methods (w.r.t.
non-probabilistic ones), is that the default quantifier is defined
in terms of the posterior probabilities returned by a probabilistic
classifier, and not by the crisp decisions of a hard classifier; i.e.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
<span class="n">classif_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">posterior_probabilities</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_posteriors</span><span class="p">)</span>
</pre></div>
</div>
classifier, and not by the crisp decisions of a hard classifier.
In any case, the interface <em>classify(instances)</em> remains unchanged.</p>
<p>One advantage of <em>aggregative</em> methods (either probabilistic or not)
is that the evaluation according to any sampling procedure (e.g.,
the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>)
@ -133,8 +126,8 @@ reuse these predictions, without requiring to classify each element every time.
QuaPy leverages this property to speed-up any procedure having to do with
quantification over samples, as is customarily done in model selection or
in evaluation.</p>
<div class="section" id="the-classify-count-variants">
<h3>The Classify &amp; Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this headline"></a></h3>
<section id="the-classify-count-variants">
<h3>The Classify &amp; Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this heading"></a></h3>
<p>QuaPy implements the four CC variants, i.e.:</p>
<ul class="simple">
<li><p><em>CC</em> (Classify &amp; Count), the simplest aggregative quantifier; one that
@ -150,9 +143,7 @@ with a SVM as the classifier:</p>
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">&#39;hcr&#39;</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">&#39;hcr&#39;</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="c1"># instantiate a classifier learner, in this case a SVM</span>
<span class="n">svm</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
@ -196,7 +187,7 @@ e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PCC</span><span class="p">(</span><span class="n">svm</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;classifier:&#39;</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;classifier:&#39;</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span>
</pre></div>
</div>
<p>In this case, QuaPy will print:</p>
@ -214,9 +205,9 @@ be applied to hard classifiers when <em>fit_learner=True</em>; an exception
will be raised otherwise.</p>
<p>Lastly, everything we said aboud ACC and PCC
applies to PACC as well.</p>
</div>
<div class="section" id="expectation-maximization-emq">
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this headline"></a></h3>
</section>
<section id="expectation-maximization-emq">
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this heading"></a></h3>
<p>The Expectation Maximization Quantifier (EMQ), also known as
the SLD, is available at <em>qp.method.aggregative.EMQ</em> or via the
alias <em>qp.method.aggregative.ExpectationMaximizationQuantifier</em>.
@ -241,13 +232,21 @@ experiments we have carried out.</p>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="hellinger-distance-y-hdy">
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this headline"></a></h3>
<p>The method HDy is described in:</p>
<p><em>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164.</em></p>
<p><em>New in v0.1.7</em>: EMQ now accepts two new parameters in the construction method, namely
<em>exact_train_prev</em> which allows to use the true training prevalence as the departing
prevalence estimation (default behaviour), or instead an approximation of it as
suggested by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>
(by setting <em>exact_train_prev=False</em>).
The other parameter is <em>recalib</em> which allows to indicate a calibration method, among those
proposed by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>,
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
See the API documentation for further details.</p>
</section>
<section id="hellinger-distance-y-hdy">
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this heading"></a></h3>
<p>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
<a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S0020025512004069">González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164.</a></p>
<p>It is implemented in <em>qp.method.aggregative.HDy</em> (also accessible
through the allias <em>qp.method.aggregative.HellingerDistanceY</em>).
This method works with a probabilistic classifier (hard classifiers
@ -274,30 +273,48 @@ provided in QuaPy accepts only binary datasets.</p>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the generalized
“Distribution Matching” approaches for multiclass, inspired by the framework
of <a class="reference external" href="https://arxiv.org/abs/1606.00868">Firat (2016)</a>. One can instantiate
a variant of HDy for multiclass quantification as follows:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mutliclassHDy</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">DistributionMatching</span><span class="p">(</span><span class="n">classifier</span><span class="o">=</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">divergence</span><span class="o">=</span><span class="s1">&#39;HD&#39;</span><span class="p">,</span> <span class="n">cdf</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</pre></div>
</div>
<div class="section" id="explicit-loss-minimization">
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this headline"></a></h3>
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the “DyS”
framework proposed by <a class="reference external" href="https://ojs.aaai.org/index.php/AAAI/article/view/4376">Maletzke et al (2020)</a>
and the “SMM” method proposed by <a class="reference external" href="https://ieeexplore.ieee.org/document/9260028">Hassan et al (2019)</a>
(thanks to <em>Pablo González</em> for the contributions!)</p>
</section>
<section id="threshold-optimization-methods">
<h3>Threshold Optimization methods<a class="headerlink" href="#threshold-optimization-methods" title="Permalink to this heading"></a></h3>
<p><em>New in v0.1.7:</em> QuaPy now implements Formans threshold optimization methods;
see, e.g., <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/1150402.1150423">(Forman 2006)</a>
and <a class="reference external" href="https://link.springer.com/article/10.1007/s10618-008-0097-y">(Forman 2008)</a>.
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.</p>
</section>
<section id="explicit-loss-minimization">
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this heading"></a></h3>
<p>The Explicit Loss Minimization (ELM) represent a family of methods
based on structured output learning, i.e., quantifiers relying on
classifiers that have been optimized targeting a
quantification-oriented evaluation measure.</p>
<p>In QuaPy, the following methods, all relying on Joachims
<a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
implementation, are available in <em>qp.method.aggregative</em>:</p>
quantification-oriented evaluation measure.
The original methods are implemented in QuaPy as classify &amp; count (CC)
quantifiers that use Joachims <a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
as the underlying classifier, properly set to optimize for the desired loss.</p>
<p>In QuaPy, this can be more achieved by calling the functions:</p>
<ul class="simple">
<li><p>SVMQ (SVM-Q) is a quantification method optimizing the metric <em>Q</em> defined
in <em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604.</em></p></li>
<li><p>SVMKLD (SVM for Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
<li><p><em>newSVMQ</em>: returns the quantification method called SVM(Q) that optimizes for the metric <em>Q</em> defined
in <a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S003132031400291X"><em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604.</em></a></p></li>
<li><p><em>newSVMKLD</em> and <em>newSVMNKLD</em>: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/2700406"><em>Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
<li><p>SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
<li><p>SVMAE (SVM for Mean Absolute Error)</p></li>
<li><p>SVMRAE (SVM for Mean Relative Absolute Error)</p></li>
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></a></p></li>
<li><p><em>newSVMAE</em> and <em>newSVMRAE</em>: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
(Mean) Relative Absolute Error, as first used by
<a class="reference external" href="https://arxiv.org/abs/2011.02552"><em>Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23.</em></a></p></li>
</ul>
<p>the last two methods (SVMAE and SVMRAE) have been implemented in
<p>the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
QuaPy in order to make available ELM variants for what nowadays
are considered the most well-behaved evaluation metrics in quantification.</p>
<p>In order to make these models work, you would need to run the script
@ -327,11 +344,15 @@ currently supports only binary classification.
ELM variants (any binary quantifier in general) can be extended
to operate in single-label scenarios trivially by adopting a
“one-vs-all” strategy (as, e.g., in
<em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122</em>).
In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
<a class="reference external" href="https://link.springer.com/article/10.1007/s13278-016-0327-z"><em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122</em></a>).
In QuaPy this is possible by using the <em>OneVsAll</em> class.</p>
<p>There are two ways for instantiating this class, <em>OneVsAllGeneric</em> that works for
any quantifier, and <em>OneVsAllAggregative</em> that is optimized for aggregative quantifiers.
In general, you can simply use the <em>getOneVsAll</em> function and QuaPy will choose
the more convenient of the two.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span><span class="p">,</span> <span class="n">OneVsAll</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span>
<span class="c1"># load a single-label dataset (this one contains 3 classes)</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">&#39;hcr&#39;</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
@ -339,30 +360,32 @@ In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
<span class="c1"># let qp know where svmperf is</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SVMPERF_HOME&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;../svm_perf_quantification&#39;</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">OneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">getOneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="meta-models">
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this headline"></a></h2>
<p>Check the examples <em><span class="xref myst">explicit_loss_minimization.py</span></em>
and <span class="xref myst">one_vs_all.py</span> for more details.</p>
</section>
</section>
<section id="meta-models">
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this heading"></a></h2>
<p>By <em>meta</em> models we mean quantification methods that are defined on top of other
quantification methods, and that thus do not squarely belong to the aggregative nor
the non-aggregative group (indeed, <em>meta</em> models could use quantifiers from any of those
groups).
<em>Meta</em> models are implemented in the <em>qp.method.meta</em> module.</p>
<div class="section" id="ensembles">
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this headline"></a></h3>
<section id="ensembles">
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this heading"></a></h3>
<p>QuaPy implements (some of) the variants proposed in:</p>
<ul class="simple">
<li><p><em>Pérez-Gállego, P., Quevedo, J. R., &amp; del Coz, J. J. (2017).
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253516300628"><em>Pérez-Gállego, P., Quevedo, J. R., &amp; del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100.</em></p></li>
<li><p><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., &amp; del Coz, J. J. (2019).
Information Fusion, 34, 87-100.</em></a></p></li>
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253517303652"><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., &amp; del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15.</em></p></li>
Information Fusion, 45, 1-15.</em></a></p></li>
</ul>
<p>The following code shows how to instantiate an Ensemble of 30 <em>Adjusted Classify &amp; Count</em> (ACC)
quantifiers operating with a <em>Logistic Regressor</em> (LR) as the base classifier, and using the
@ -391,14 +414,14 @@ the performance estimated for each member of the ensemble in terms of that evalu
informs of the number of members to retain.</p>
<p>Please, check the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selection</a>
wiki if you want to optimize the hyperparameters of ensemble for classification or quantification.</p>
</div>
<div class="section" id="the-quanet-neural-network">
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this headline"></a></h3>
</section>
<section id="the-quanet-neural-network">
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this heading"></a></h3>
<p>QuaPy offers an implementation of QuaNet, a deep learning model presented in:</p>
<p><em>Esuli, A., Moreo, A., &amp; Sebastiani, F. (2018, October).
<p><a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/3269206.3269287"><em>Esuli, A., Moreo, A., &amp; Sebastiani, F. (2018, October).
A recurrent neural network for sentiment quantification.
In Proceedings of the 27th ACM International Conference on
Information and Knowledge Management (pp. 1775-1778).</em></p>
Information and Knowledge Management (pp. 1775-1778).</em></a></p>
<p>This model requires <em>torch</em> to be installed.
QuaNet also requires a classifier that can provide embedded representations
of the inputs.
@ -420,14 +443,14 @@ In the following example, we show an instantiation of QuaNet that instead uses C
<span class="n">learner</span> <span class="o">=</span> <span class="n">NeuralClassifierTrainer</span><span class="p">(</span><span class="n">cnn</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cuda&#39;</span><span class="p">)</span>
<span class="c1"># train QuaNet</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cuda&#39;</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cuda&#39;</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
</section>
</section>
</section>
<div class="clearer"></div>
@ -436,13 +459,15 @@ In the following example, we show an instantiation of QuaNet that instead uses C
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Quantification Methods</a><ul>
<li><a class="reference internal" href="#aggregative-methods">Aggregative Methods</a><ul>
<li><a class="reference internal" href="#the-classify-count-variants">The Classify &amp; Count variants</a></li>
<li><a class="reference internal" href="#expectation-maximization-emq">Expectation Maximization (EMQ)</a></li>
<li><a class="reference internal" href="#hellinger-distance-y-hdy">Hellinger Distance y (HDy)</a></li>
<li><a class="reference internal" href="#threshold-optimization-methods">Threshold Optimization methods</a></li>
<li><a class="reference internal" href="#explicit-loss-minimization">Explicit Loss Minimization</a></li>
</ul>
</li>
@ -455,12 +480,17 @@ In the following example, we show an instantiation of QuaNet that instead uses C
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="Evaluation.html"
title="previous chapter">Evaluation</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="Plotting.html"
title="next chapter">Plotting</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Protocols.html"
title="previous chapter">Protocols</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Model-Selection.html"
title="next chapter">Model Selection</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -477,7 +507,7 @@ In the following example, we show an instantiation of QuaNet that instead uses C
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -492,18 +522,18 @@ In the following example, we show an instantiation of QuaNet that instead uses C
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
<a href="Model-Selection.html" title="Model Selection"
>next</a> |</li>
<li class="right" >
<a href="Evaluation.html" title="Evaluation"
<a href="Protocols.html" title="Protocols"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,21 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Model Selection &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Model Selection &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Plotting" href="Plotting.html" />
<link rel="prev" title="Quantification Methods" href="Methods.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -31,7 +36,13 @@
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
</ul>
</div>
@ -41,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="model-selection">
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this headline"></a></h1>
<section id="model-selection">
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this heading"></a></h1>
<p>As a supervised machine learning task, quantification methods
can strongly depend on a good choice of model hyper-parameters.
The process whereby those hyper-parameters are chosen is
@ -50,8 +61,8 @@ typically known as <em>Model Selection</em>, and typically consists of
testing different settings and picking the one that performed
best in a held-out validation set in terms of any given
evaluation measure.</p>
<div class="section" id="targeting-a-quantification-oriented-loss">
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this headline"></a></h2>
<section id="targeting-a-quantification-oriented-loss">
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this heading"></a></h2>
<p>The task being optimized determines the evaluation protocol,
i.e., the criteria according to which the performance of
any given method for solving is to be assessed.
@ -63,81 +74,91 @@ specifically designed for the task of quantification.</p>
classification, and thus the model selection strategies
customarily adopted in classification have simply been
applied to quantification (see the next section).
It has been argued in <em>Moreo, Alejandro, and Fabrizio Sebastiani.
Re-Assessing the Classify and Count” Quantification Method.
arXiv preprint arXiv:2011.02552 (2020).</em>
It has been argued in <a class="reference external" href="https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6">Moreo, Alejandro, and Fabrizio Sebastiani.
Re-Assessing the Classify and Count” Quantification Method.
ECIR 2021: Advances in Information Retrieval pp 7591.</a>
that specific model selection strategies should
be adopted for quantification. That is, model selection
strategies for quantification should target
quantification-oriented losses and be tested in a variety
of scenarios exhibiting different degrees of prior
probability shift.</p>
<p>The class
<em>qp.model_selection.GridSearchQ</em>
implements a grid-search exploration over the space of
hyper-parameter combinations that evaluates each<br />
combination of hyper-parameters
by means of a given quantification-oriented
<p>The class <em>qp.model_selection.GridSearchQ</em> implements a grid-search exploration over the space of
hyper-parameter combinations that <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluates</a>
each combination of hyper-parameters by means of a given quantification-oriented
error metric (e.g., any of the error functions implemented
in <em>qp.error</em>) and according to the
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation"><em>artificial sampling protocol</em></a>.</p>
<p>The following is an example of model selection for quantification:</p>
in <em>qp.error</em>) and according to a
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">sampling generation protocol</a>.</p>
<p>The following is an example (also included in the examples folder) of model selection for quantification:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">PCC</span>
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">DistributionMatching</span>
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="c1"># set a seed to replicate runs</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd">In this example, we show how to perform model selection on a DistributionMatching quantifier.</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;hp&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;N_JOBS&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># explore hyper-parameters in parallel</span>
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;imdb&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="c1"># The model will be returned by the fit method of GridSearchQ.</span>
<span class="c1"># Model selection will be performed with a fixed budget of 1000 evaluations</span>
<span class="c1"># for each hyper-parameter combination. The error to optimize is the MAE for</span>
<span class="c1"># quantification, as evaluated on artificially drawn samples at prevalences </span>
<span class="c1"># covering the entire spectrum on a held-out portion (40%) of the training set.</span>
<span class="c1"># Every combination of hyper-parameters will be evaluated by confronting the</span>
<span class="c1"># quantifier thus configured against a series of samples generated by means</span>
<span class="c1"># of a sample generation protocol. For this example, we will use the</span>
<span class="c1"># artificial-prevalence protocol (APP), that generates samples with prevalence</span>
<span class="c1"># values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).</span>
<span class="c1"># We devote 30% of the dataset for this exploration.</span>
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
<span class="n">protocol</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">)</span>
<span class="c1"># We will explore a classification-dependent hyper-parameter (e.g., the &#39;C&#39;</span>
<span class="c1"># hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter</span>
<span class="c1"># (e.g., the number of bins in a DistributionMatching quantifier.</span>
<span class="c1"># Classifier-dependent hyper-parameters have to be marked with a prefix &quot;classifier__&quot;</span>
<span class="c1"># in order to let the quantifier know this hyper-parameter belongs to its underlying</span>
<span class="c1"># classifier.</span>
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">),</span>
<span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">64</span><span class="p">],</span>
<span class="p">}</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">GridSearchQ</span><span class="p">(</span>
<span class="n">model</span><span class="o">=</span><span class="n">PCC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">()),</span>
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;C&#39;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">10</span><span class="p">),</span> <span class="s1">&#39;class_weight&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;balanced&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span>
<span class="n">eval_budget</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
<span class="n">error</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span>
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set</span>
<span class="n">val_split</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span>
<span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
<span class="n">param_grid</span><span class="o">=</span><span class="n">param_grid</span><span class="p">,</span>
<span class="n">protocol</span><span class="o">=</span><span class="n">protocol</span><span class="p">,</span>
<span class="n">error</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span> <span class="c1"># the error to optimize is the MAE (a quantification-oriented loss)</span>
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set once done</span>
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># show information as the process goes on</span>
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">best_model_</span>
<span class="c1"># evaluation in terms of MAE</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_eval</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span>
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="p">,</span>
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span>
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">101</span><span class="p">,</span>
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">error_metric</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span>
<span class="p">)</span>
<span class="c1"># we use the same evaluation protocol (APP) on the test set</span>
<span class="n">mae_score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">),</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;MAE=</span><span class="si">{</span><span class="n">results</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;MAE=</span><span class="si">{</span><span class="n">mae_score</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>In this example, the system outputs:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>[GridSearchQ]: starting optimization with n_jobs=1
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 0.0001, &#39;class_weight&#39;: &#39;balanced&#39;} got mae score 0.24987
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 0.0001, &#39;class_weight&#39;: None} got mae score 0.48135
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 0.001, &#39;class_weight&#39;: &#39;balanced&#39;} got mae score 0.24866
[...]
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 100000.0, &#39;class_weight&#39;: None} got mae score 0.43676
[GridSearchQ]: optimization finished: best params {&#39;C&#39;: 0.1, &#39;class_weight&#39;: &#39;balanced&#39;} (score=0.19982)
[GridSearchQ]: refitting on the whole development set
model selection ended: best hyper-parameters={&#39;C&#39;: 0.1, &#39;class_weight&#39;: &#39;balanced&#39;}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00&lt;00:00, 5005.54it/s]
MAE=0.20342
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">starting</span> <span class="n">model</span> <span class="n">selection</span> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=-</span><span class="mi">1</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">64</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04021</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.1356</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04286</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2139</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">16</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04888</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2491</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.001</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">8</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.05163</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.5372</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">1000.0</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.02445</span> <span class="p">[</span><span class="n">took</span> <span class="mf">2.9056</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">optimization</span> <span class="n">finished</span><span class="p">:</span> <span class="n">best</span> <span class="n">params</span> <span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="p">(</span><span class="n">score</span><span class="o">=</span><span class="mf">0.02234</span><span class="p">)</span> <span class="p">[</span><span class="n">took</span> <span class="mf">7.3114</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">refitting</span> <span class="n">on</span> <span class="n">the</span> <span class="n">whole</span> <span class="n">development</span> <span class="nb">set</span>
<span class="n">model</span> <span class="n">selection</span> <span class="n">ended</span><span class="p">:</span> <span class="n">best</span> <span class="n">hyper</span><span class="o">-</span><span class="n">parameters</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span>
<span class="n">MAE</span><span class="o">=</span><span class="mf">0.03102</span>
</pre></div>
</div>
<p>The parameter <em>val_split</em> can alternatively be used to indicate
@ -145,9 +166,9 @@ a validation set (i.e., an instance of <em>LabelledCollection</em>) instead
of a proportion. This could be useful if one wants to have control
on the specific data split to be used across different model selection
experiments.</p>
</div>
<div class="section" id="targeting-a-classification-oriented-loss">
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this headline"></a></h2>
</section>
<section id="targeting-a-classification-oriented-loss">
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this heading"></a></h2>
<p>Optimizing a model for quantification could rather be
computationally costly.
In aggregative methods, one could alternatively try to optimize
@ -161,32 +182,15 @@ The following code illustrates how to do that:</p>
<span class="n">LogisticRegression</span><span class="p">(),</span>
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;C&#39;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="s1">&#39;class_weight&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;balanced&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
<span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">PCC</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
</pre></div>
</div>
<p>In this example, the system outputs:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>model selection ended: best hyper-parameters={&#39;C&#39;: 10000.0, &#39;class_weight&#39;: None}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00&lt;00:00, 5379.55it/s]
MAE=0.41734
</pre></div>
</div>
<p>Note that the MAE is worse than the one we obtained when optimizing
for quantification and, indeed, the hyper-parameters found optimal
largely differ between the two selection modalities. The
hyper-parameters C=10000 and class_weight=None have been found
to work well for the specific training prevalence of the HP dataset,
but these hyper-parameters turned out to be suboptimal when the
class prevalences of the test set differs (as is indeed tested
in scenarios of quantification).</p>
<p>This is, however, not always the case, and one could, in practice,
find examples
in which optimizing for classification ends up resulting in a better
quantifier than when optimizing for quantification.
Nonetheless, this is theoretically unlikely to happen.</p>
</div>
</div>
<p>However, this is conceptually flawed, since the model should be
optimized for the task at hand (quantification), and not for a surrogate task (classification),
i.e., the model should be requested to deliver low quantification errors, rather
than low classification errors.</p>
</section>
</section>
<div class="clearer"></div>
@ -195,8 +199,9 @@ Nonetheless, this is theoretically unlikely to happen.</p>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Model Selection</a><ul>
<li><a class="reference internal" href="#targeting-a-quantification-oriented-loss">Targeting a Quantification-oriented loss</a></li>
<li><a class="reference internal" href="#targeting-a-classification-oriented-loss">Targeting a Classification-oriented loss</a></li>
@ -204,6 +209,17 @@ Nonetheless, this is theoretically unlikely to happen.</p>
</li>
</ul>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Methods.html"
title="previous chapter">Quantification Methods</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Plotting.html"
title="next chapter">Plotting</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -220,7 +236,7 @@ Nonetheless, this is theoretically unlikely to happen.</p>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -234,13 +250,19 @@ Nonetheless, this is theoretically unlikely to happen.</p>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
>next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,23 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Plotting &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Plotting &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="quapy" href="modules.html" />
<link rel="prev" title="Quantification Methods" href="Methods.html" />
<link rel="prev" title="Model Selection" href="Model-Selection.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -37,9 +40,9 @@
<a href="modules.html" title="quapy"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Model-Selection.html" title="Model Selection"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="plotting">
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this headline"></a></h1>
<section id="plotting">
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this heading"></a></h1>
<p>The module <em>qp.plot</em> implements some basic plotting functions
that can help analyse the performance of a quantification method.</p>
<p>All plotting functions receive as inputs the outcomes of
@ -91,7 +94,7 @@ quantification methods across different scenarios showcasing
the accuracy of the quantifier in predicting class prevalences
for a wide range of prior distributions. This can easily be
achieved by means of the
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">artificial sampling protocol</a>
that is implemented in QuaPy.</p>
<p>The following code shows how to perform one simple experiment
in which the 4 <em>CC-variants</em>, all equipped with a linear SVM, are
@ -100,6 +103,7 @@ tested across the entire spectrum of class priors (taking 21 splits
of the interval [0,1], i.e., using prevalence steps of 0.05, and
generating 100 random samples at each prevalence).</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">from</span> <span class="nn">protocol</span> <span class="kn">import</span> <span class="n">APP</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">CC</span><span class="p">,</span> <span class="n">ACC</span><span class="p">,</span> <span class="n">PCC</span><span class="p">,</span> <span class="n">PACC</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
@ -108,28 +112,26 @@ generating 100 random samples at each prevalence).</p>
<span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
<span class="k">def</span> <span class="nf">base_classifier</span><span class="p">():</span>
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">(</span><span class="n">class_weight</span><span class="o">=</span><span class="s1">&#39;balanced&#39;</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">models</span><span class="p">():</span>
<span class="k">yield</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;CC&#39;</span><span class="p">,</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;ACC&#39;</span><span class="p">,</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;PCC&#39;</span><span class="p">,</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;PACC&#39;</span><span class="p">,</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;kindle&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;kindle&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">test</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
<span class="p">)</span>
<span class="k">for</span> <span class="n">method_name</span><span class="p">,</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">)</span>
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method_name</span><span class="p">)</span>
<span class="n">true_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">true_prev</span><span class="p">)</span>
<span class="n">estim_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prev</span><span class="p">)</span>
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">train</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
<span class="k">return</span> <span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span>
@ -137,8 +139,8 @@ generating 100 random samples at each prevalence).</p>
</pre></div>
</div>
<p>the plots that can be generated are explained below.</p>
<div class="section" id="diagonal-plot">
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this headline"></a></h2>
<section id="diagonal-plot">
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this heading"></a></h2>
<p>The <em>diagonal</em> plot shows a very insightful view of the
quantifiers performance. It plots the predicted class
prevalence (in the y-axis) against the true class prevalence
@ -164,9 +166,9 @@ the complete list of arguments in the documentation).</p>
<p>Finally, note how most quantifiers, and specially the “unadjusted”
variants CC and PCC, are strongly biased towards the
prevalence seen during training.</p>
</div>
<div class="section" id="quantification-bias">
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this headline"></a></h2>
</section>
<section id="quantification-bias">
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this heading"></a></h2>
<p>This plot aims at evincing the bias that any quantifier
displays with respect to the training prevalences by
means of <a class="reference external" href="https://en.wikipedia.org/wiki/Box_plot">box plots</a>.
@ -196,21 +198,19 @@ IMDb dataset, and generate the bias plot again.
This example can be run by rewritting the <em>gen_data()</em> function
like this:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;imdb&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;imdb&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">CC</span><span class="p">(</span><span class="n">LinearSVC</span><span class="p">())</span>
<span class="n">method_data</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">training_prevalence</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">,</span> <span class="mi">9</span><span class="p">):</span>
<span class="n">training_size</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">training_prevalence</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">sample</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
<span class="p">)</span>
<span class="c1"># method names can contain Latex syntax</span>
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">&#39;CC$_{&#39;</span> <span class="o">+</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span> <span class="o">*</span> <span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span> <span class="o">+</span> <span class="s1">&#39;\%}$&#39;</span>
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained</span>
<span class="n">train_sample</span> <span class="o">=</span> <span class="n">train</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span><span class="o">-</span><span class="n">training_prevalence</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_sample</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">&#39;CC$_{&#39;</span><span class="o">+</span><span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span><span class="o">*</span><span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span> <span class="o">+</span> <span class="s1">&#39;\%}$&#39;</span>
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">train_sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
<span class="k">return</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">method_data</span><span class="p">)</span>
</pre></div>
@ -237,9 +237,9 @@ and a negative bias (or a tendency to underestimate) in cases of high prevalence
<p><img alt="diag plot on IMDb" src="_images/bin_diag_cc.png" /></p>
<p>showing pretty clearly the dependency of CC on the prior probabilities
of the labeled set it was trained on.</p>
</div>
<div class="section" id="error-by-drift">
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this headline"></a></h2>
</section>
<section id="error-by-drift">
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this heading"></a></h2>
<p>Above discussed plots are useful for analyzing and comparing
the performance of different quantification methods, but are
limited to the binary case. The “error by drift” is a plot
@ -270,8 +270,8 @@ In those cases, however, it is likely that the variances of each
method get higher, to the detriment of the visualization.
We recommend to set <em>show_std=False</em> in those cases
in order to hide the color bands.</p>
</div>
</div>
</section>
</section>
<div class="clearer"></div>
@ -280,8 +280,9 @@ in order to hide the color bands.</p>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Plotting</a><ul>
<li><a class="reference internal" href="#diagonal-plot">Diagonal Plot</a></li>
<li><a class="reference internal" href="#quantification-bias">Quantification bias</a></li>
@ -290,12 +291,17 @@ in order to hide the color bands.</p>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="Methods.html"
title="previous chapter">Quantification Methods</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Model-Selection.html"
title="previous chapter">Model Selection</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -312,7 +318,7 @@ in order to hide the color bands.</p>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -330,15 +336,15 @@ in order to hide the color bands.</p>
<a href="modules.html" title="quapy"
>next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Model-Selection.html" title="Model Selection"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -30,7 +30,8 @@ Output the class prevalences (showing 2 digit precision):
[0.17, 0.50, 0.33]
```
One can easily produce new samples at desired class prevalences:
One can easily produce new samples at desired class prevalence values:
```python
sample_size = 10
prev = [0.4, 0.1, 0.5]
@ -63,32 +64,10 @@ for method in methods:
...
```
QuaPy also implements the artificial sampling protocol that produces (via a
Python's generator) a series of _LabelledCollection_ objects with equidistant
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:
```python
for sample in data.artificial_sampling_generator(sample_size=100, n_prevalences=5):
print(F.strprev(sample.prevalence(), prec=2))
```
produces one sampling for each (valid) combination of prevalences originating from
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
that is:
```
[0.00, 0.00, 1.00]
[0.00, 0.25, 0.75]
[0.00, 0.50, 0.50]
[0.00, 0.75, 0.25]
[0.00, 1.00, 0.00]
[0.25, 0.00, 0.75]
...
[1.00, 0.00, 0.00]
```
See the [Evaluation wiki](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) for
further details on how to use the artificial sampling protocol to properly
evaluate a quantification method.
However, generating samples for evaluation purposes is tackled in QuaPy
by means of the evaluation protocols (see the dedicated entries in the Wiki
for [evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) and
[protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)).
## Reviews Datasets
@ -177,6 +156,8 @@ Some details can be found below:
| sst | 3 | 2971 | 1271 | 376132 | [0.261, 0.452, 0.288] | [0.207, 0.481, 0.312] | sparse |
| wa | 3 | 2184 | 936 | 248563 | [0.305, 0.414, 0.281] | [0.282, 0.446, 0.272] | sparse |
| wb | 3 | 4259 | 1823 | 404333 | [0.270, 0.392, 0.337] | [0.274, 0.392, 0.335] | sparse |
## UCI Machine Learning
A set of 32 datasets from the [UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets.php)
@ -272,6 +253,46 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
OS-dependent software manually. Information on how to do it will be printed the first
time the dataset is invoked.
## LeQua Datasets
QuaPy also provides the datasets used for the LeQua competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
are multiclass quantification problems consisting of estimating the class prevalence
values of 28 different merchandise products.
Every task consists of a training set, a set of validation samples (for model selection)
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
(training) and two generation protocols (for validation and test samples), as follows:
```python
training, val_generator, test_generator = fetch_lequa2022(task=task)
```
See the `lequa2022_experiments.py` in the examples folder for further details on how to
carry out experiments using these datasets.
The datasets are downloaded only once, and stored for fast reuse.
Some statistics are summarized below:
| Dataset | classes | train size | validation samples | test samples | docs by sample | type |
|---------|:-------:|:----------:|:------------------:|:------------:|:----------------:|:--------:|
| T1A | 2 | 5000 | 1000 | 5000 | 250 | vector |
| T1B | 28 | 20000 | 1000 | 5000 | 1000 | vector |
| T2A | 2 | 5000 | 1000 | 5000 | 250 | text |
| T2B | 28 | 20000 | 1000 | 5000 | 1000 | text |
For further details on the datasets, we refer to the original
[paper](https://ceur-ws.org/Vol-3180/paper-146.pdf):
```
Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
```
## Adding Custom Datasets
QuaPy provides data loaders for simple formats dealing with
@ -312,12 +333,15 @@ e.g.:
```python
import quapy as qp
train_path = '../my_data/train.dat'
test_path = '../my_data/test.dat'
def my_custom_loader(path):
with open(path, 'rb') as fin:
...
return instances, labels
data = qp.data.Dataset.load(train_path, test_path, my_custom_loader)
```

View File

@ -50,7 +50,7 @@ indicating the value for the smoothing parameter epsilon.
Traditionally, this value is set to 1/(2T) in past literature,
with T the sampling size. One could either pass this value
to the function each time, or to set a QuaPy's environment
variable _SAMPLE_SIZE_ once, and ommit this argument
variable _SAMPLE_SIZE_ once, and omit this argument
thereafter (recommended);
e.g.:
@ -58,7 +58,7 @@ e.g.:
qp.environ['SAMPLE_SIZE'] = 100 # once for all
true_prev = np.asarray([0.5, 0.3, 0.2]) # let's assume 3 classes
estim_prev = np.asarray([0.1, 0.3, 0.6])
error = qp.ae_.mrae(true_prev, estim_prev)
error = qp.error.mrae(true_prev, estim_prev)
print(f'mrae({true_prev}, {estim_prev}) = {error:.3f}')
```
@ -71,162 +71,99 @@ Finally, it is possible to instantiate QuaPy's quantification
error functions from strings using, e.g.:
```python
error_function = qp.ae_.from_name('mse')
error_function = qp.error.from_name('mse')
error = error_function(true_prev, estim_prev)
```
## Evaluation Protocols
QuaPy implements the so-called "artificial sampling protocol",
according to which a test set is used to generate samplings at
desired prevalences of fixed size and covering the full spectrum
of prevalences. This protocol is called "artificial" in contrast
to the "natural prevalence sampling" protocol that,
despite introducing some variability during sampling, approximately
preserves the training class prevalence.
In the artificial sampling procol, the user specifies the number
of (equally distant) points to be generated from the interval [0,1].
For example, if n_prevpoints=11 then, for each class, the prevalences
[0., 0.1, 0.2, ..., 1.] will be used. This means that, for two classes,
the number of different prevalences will be 11 (since, once the prevalence
of one class is determined, the other one is constrained). For 3 classes,
the number of valid combinations can be obtained as 11 + 10 + ... + 1 = 66.
In general, the number of valid combinations that will be produced for a given
value of n_prevpoints can be consulted by invoking
quapy.functional.num_prevalence_combinations, e.g.:
An _evaluation protocol_ is an evaluation procedure that uses
one specific _sample generation procotol_ to genereate many
samples, typically characterized by widely varying amounts of
_shift_ with respect to the original distribution, that are then
used to evaluate the performance of a (trained) quantifier.
These protocols are explained in more detail in a dedicated [entry
in the wiki](Protocols.md). For the moment being, let us assume we already have
chosen and instantiated one specific such protocol, that we here
simply call _prot_. Let also assume our model is called
_quantifier_ and that our evaluatio measure of choice is
_mae_. The evaluation comes down to:
```python
import quapy.functional as F
n_prevpoints = 21
n_classes = 4
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
mae = qp.evaluation.evaluate(quantifier, protocol=prot, error_metric='mae')
print(f'MAE = {mae:.4f}')
```
in this example, n=1771. Note the last argument, n_repeats, that
informs of the number of examples that will be generated for any
valid combination (typical values are, e.g., 1 for a single sample,
or 10 or higher for computing standard deviations of performing statistical
significance tests).
One can instead work the other way around, i.e., one could set a
maximum budged of evaluations and get the number of prevalence points that
will generate a number of evaluations close, but not higher, than
the fixed budget. This can be achieved with the function
quapy.functional.get_nprevpoints_approximation, e.g.:
It is often desirable to evaluate our system using more than one
single evaluatio measure. In this case, it is convenient to generate
a _report_. A report in QuaPy is a dataframe accounting for all the
true prevalence values with their corresponding prevalence values
as estimated by the quantifier, along with the error each has given
rise.
```python
budget = 5000
n_prevpoints = F.get_nprevpoints_approximation(budget, n_classes, n_repeats=1)
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
print(f'by setting n_prevpoints={n_prevpoints} the number of evaluations for {n_classes} classes will be {n}')
```
that will print:
```
by setting n_prevpoints=30 the number of evaluations for 4 classes will be 4960
report = qp.evaluation.evaluation_report(quantifier, protocol=prot, error_metrics=['mae', 'mrae', 'mkld'])
```
The cost of evaluation will depend on the values of _n_prevpoints_, _n_classes_,
and _n_repeats_. Since it might sometimes be cumbersome to control the overall
cost of an experiment having to do with the number of combinations that
will be generated for a particular setting of these arguments (particularly
when _n_classes>2_), evaluation functions
typically allow the user to rather specify an _evaluation budget_, i.e., a maximum
number of samplings to generate. By specifying this argument, one could avoid
specifying _n_prevpoints_, and the value for it that would lead to a closer
number of evaluation budget, without surpassing it, will be automatically set.
The following script shows a full example in which a PACC model relying
on a Logistic Regressor classifier is
tested on the _kindle_ dataset by means of the artificial prevalence
sampling protocol on samples of size 500, in terms of various
evaluation metrics.
````python
import quapy as qp
import quapy.functional as F
from sklearn.linear_model import LogisticRegression
qp.environ['SAMPLE_SIZE'] = 500
dataset = qp.datasets.fetch_reviews('kindle')
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
training = dataset.training
test = dataset.test
lr = LogisticRegression()
pacc = qp.method.aggregative.PACC(lr)
pacc.fit(training)
df = qp.evaluation.artificial_sampling_report(
pacc, # the quantification method
test, # the test set on which the method will be evaluated
sample_size=qp.environ['SAMPLE_SIZE'], #indicates the size of samples to be drawn
n_prevpoints=11, # how many prevalence points will be extracted from the interval [0, 1] for each category
n_repetitions=1, # number of times each prevalence will be used to generate a test sample
n_jobs=-1, # indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)
random_seed=42, # setting a random seed allows to replicate the test samples across runs
error_metrics=['mae', 'mrae', 'mkld'], # specify the evaluation metrics
verbose=True # set to True to show some standard-line outputs
)
````
The resulting report is a pandas' dataframe that can be directly printed.
Here, we set some display options from pandas just to make the output clearer;
note also that the estimated prevalences are shown as strings using the
function strprev function that simply converts a prevalence into a
string representing it, with a fixed decimal precision (default 3):
From a pandas' dataframe, it is straightforward to visualize all the results,
and compute the averaged values, e.g.:
```python
import pandas as pd
pd.set_option('display.expand_frame_repr', False)
pd.set_option("precision", 3)
df['estim-prev'] = df['estim-prev'].map(F.strprev)
print(df)
report['estim-prev'] = report['estim-prev'].map(F.strprev)
print(report)
print('Averaged values:')
print(report.mean())
```
The output should look like:
This will produce an output like:
```
true-prev estim-prev mae mrae mkld
0 [0.0, 1.0] [0.000, 1.000] 0.000 0.000 0.000e+00
1 [0.1, 0.9] [0.091, 0.909] 0.009 0.048 4.426e-04
2 [0.2, 0.8] [0.163, 0.837] 0.037 0.114 4.633e-03
3 [0.3, 0.7] [0.283, 0.717] 0.017 0.041 7.383e-04
4 [0.4, 0.6] [0.366, 0.634] 0.034 0.070 2.412e-03
5 [0.5, 0.5] [0.459, 0.541] 0.041 0.082 3.387e-03
6 [0.6, 0.4] [0.565, 0.435] 0.035 0.073 2.535e-03
7 [0.7, 0.3] [0.654, 0.346] 0.046 0.108 4.701e-03
8 [0.8, 0.2] [0.725, 0.275] 0.075 0.235 1.515e-02
9 [0.9, 0.1] [0.858, 0.142] 0.042 0.229 7.740e-03
10 [1.0, 0.0] [0.945, 0.055] 0.055 27.357 5.219e-02
true-prev estim-prev mae mrae mkld
0 [0.308, 0.692] [0.314, 0.686] 0.005649 0.013182 0.000074
1 [0.896, 0.104] [0.909, 0.091] 0.013145 0.069323 0.000985
2 [0.848, 0.152] [0.809, 0.191] 0.039063 0.149806 0.005175
3 [0.016, 0.984] [0.033, 0.967] 0.017236 0.487529 0.005298
4 [0.728, 0.272] [0.751, 0.249] 0.022769 0.057146 0.001350
... ... ... ... ... ...
4995 [0.72, 0.28] [0.698, 0.302] 0.021752 0.053631 0.001133
4996 [0.868, 0.132] [0.888, 0.112] 0.020490 0.088230 0.001985
4997 [0.292, 0.708] [0.298, 0.702] 0.006149 0.014788 0.000090
4998 [0.24, 0.76] [0.220, 0.780] 0.019950 0.054309 0.001127
4999 [0.948, 0.052] [0.965, 0.035] 0.016941 0.165776 0.003538
[5000 rows x 5 columns]
Averaged values:
mae 0.023588
mrae 0.108779
mkld 0.003631
dtype: float64
Process finished with exit code 0
```
One can get the averaged scores using standard pandas'
functions, i.e.:
Alternatively, we can simply generate all the predictions by:
```python
print(df.mean())
true_prevs, estim_prevs = qp.evaluation.prediction(quantifier, protocol=prot)
```
will produce the following output:
```
true-prev 0.500
mae 0.035
mrae 2.578
mkld 0.009
dtype: float64
```
Other evaluation functions include:
* _artificial_sampling_eval_: that computes the evaluation for a
given evaluation metric, returning the average instead of a dataframe.
* _artificial_sampling_prediction_: that returns two np.arrays containing the
true prevalences and the estimated prevalences.
See the documentation for further details.
All the evaluation functions implement specific optimizations for speeding-up
the evaluation of aggregative quantifiers (i.e., of instances of _AggregativeQuantifier_).
The optimization comes down to generating classification predictions (either crisp or soft)
only once for the entire test set, and then applying the sampling procedure to the
predictions, instead of generating samples of instances and then computing the
classification predictions every time. This is only possible when the protocol
is an instance of _OnLabelledCollectionProtocol_. The optimization is only
carried out when the number of classification predictions thus generated would be
smaller than the number of predictions required for the entire protocol; e.g.,
if the original dataset contains 1M instances, but the protocol is such that it would
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
classification for each sample. This behaviour is indicated by setting
_aggr_speedup="auto"_. Conversely, when indicating _aggr_speedup="force"_ QuaPy will
precompute all the predictions irrespectively of the number of instances and number of samples.
Finally, this can be deactivated by setting _aggr_speedup=False_. Note that this optimization
is not only applied for the final evaluation, but also for the internal evaluations carried
out during _model selection_. Since these are typically many, the heuristic can help reduce the
execution time a lot.

View File

@ -16,12 +16,6 @@ and implement some abstract methods:
@abstractmethod
def quantify(self, instances): ...
@abstractmethod
def set_params(self, **parameters): ...
@abstractmethod
def get_params(self, deep=True): ...
```
The meaning of those functions should be familiar to those
used to work with scikit-learn since the class structure of QuaPy
@ -32,10 +26,10 @@ scikit-learn' structure has not been adopted _as is_ in QuaPy responds to
the fact that scikit-learn's _predict_ function is expected to return
one output for each input element --e.g., a predicted label for each
instance in a sample-- while in quantification the output for a sample
is one single array of class prevalences), while functions _set_params_
and _get_params_ allow a
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
to automate the process of hyperparameter search.
is one single array of class prevalences).
Quantifiers also extend from scikit-learn's `BaseEstimator`, in order
to simplify the use of _set_params_ and _get_params_ used in
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection).
## Aggregative Methods
@ -58,11 +52,11 @@ of _BaseQuantifier.quantify_ is already provided, which looks like:
```python
def quantify(self, instances):
classif_predictions = self.preclassify(instances)
classif_predictions = self.classify(instances)
return self.aggregate(classif_predictions)
```
Aggregative quantifiers are expected to maintain a classifier (which is
accessed through the _@property_ _learner_). This classifier is
accessed through the _@property_ _classifier_). This classifier is
given as input to the quantifier, and can be already fit
on external data (in which case, the _fit_learner_ argument should
be set to False), or be fit by the quantifier's fit (default).
@ -73,13 +67,8 @@ _AggregativeProbabilisticQuantifier(AggregativeQuantifier)_.
The particularity of _probabilistic_ aggregative methods (w.r.t.
non-probabilistic ones), is that the default quantifier is defined
in terms of the posterior probabilities returned by a probabilistic
classifier, and not by the crisp decisions of a hard classifier; i.e.:
```python
def quantify(self, instances):
classif_posteriors = self.posterior_probabilities(instances)
return self.aggregate(classif_posteriors)
```
classifier, and not by the crisp decisions of a hard classifier.
In any case, the interface _classify(instances)_ remains unchanged.
One advantage of _aggregative_ methods (either probabilistic or not)
is that the evaluation according to any sampling procedure (e.g.,
@ -110,9 +99,7 @@ import quapy as qp
import quapy.functional as F
from sklearn.svm import LinearSVC
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
training = dataset.training
test = dataset.test
training, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
# instantiate a classifier learner, in this case a SVM
svm = LinearSVC()
@ -156,11 +143,12 @@ model.fit(training, val_split=5)
```
The following code illustrates the case in which PCC is used:
```python
model = qp.method.aggregative.PCC(svm)
model.fit(training)
estim_prevalence = model.quantify(test.instances)
print('classifier:', model.learner)
print('classifier:', model.classifier)
```
In this case, QuaPy will print:
```
@ -211,14 +199,22 @@ model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```
_New in v0.1.7_: EMQ now accepts two new parameters in the construction method, namely
_exact_train_prev_ which allows to use the true training prevalence as the departing
prevalence estimation (default behaviour), or instead an approximation of it as
suggested by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html)
(by setting _exact_train_prev=False_).
The other parameter is _recalib_ which allows to indicate a calibration method, among those
proposed by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html),
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
See the API documentation for further details.
### Hellinger Distance y (HDy)
The method HDy is described in:
_Implementation of the method based on the Hellinger Distance y (HDy) proposed by
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164._
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
[González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
It is implemented in _qp.method.aggregative.HDy_ (also accessible
through the allias _qp.method.aggregative.HellingerDistanceY_).
@ -249,30 +245,51 @@ model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```
_New in v0.1.7:_ QuaPy now provides an implementation of the generalized
"Distribution Matching" approaches for multiclass, inspired by the framework
of [Firat (2016)](https://arxiv.org/abs/1606.00868). One can instantiate
a variant of HDy for multiclass quantification as follows:
```python
mutliclassHDy = qp.method.aggregative.DistributionMatching(classifier=LogisticRegression(), divergence='HD', cdf=False)
```
_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
(thanks to _Pablo González_ for the contributions!)
### Threshold Optimization methods
_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423)
and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
### Explicit Loss Minimization
The Explicit Loss Minimization (ELM) represent a family of methods
based on structured output learning, i.e., quantifiers relying on
classifiers that have been optimized targeting a
quantification-oriented evaluation measure.
The original methods are implemented in QuaPy as classify & count (CC)
quantifiers that use Joachim's [SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
as the underlying classifier, properly set to optimize for the desired loss.
In QuaPy, the following methods, all relying on Joachim's
[SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
implementation, are available in _qp.method.aggregative_:
In QuaPy, this can be more achieved by calling the functions:
* SVMQ (SVM-Q) is a quantification method optimizing the metric _Q_ defined
in _Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604._
* SVMKLD (SVM for Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
* _newSVMQ_: returns the quantification method called SVM(Q) that optimizes for the metric _Q_ defined
in [_Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604._](https://www.sciencedirect.com/science/article/pii/S003132031400291X)
* _newSVMKLD_ and _newSVMNKLD_: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in [_Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
* SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
* SVMAE (SVM for Mean Absolute Error)
* SVMRAE (SVM for Mean Relative Absolute Error)
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._](https://dl.acm.org/doi/abs/10.1145/2700406)
* _newSVMAE_ and _newSVMRAE_: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
(Mean) Relative Absolute Error, as first used by
[_Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23._](https://arxiv.org/abs/2011.02552)
the last two methods (SVMAE and SVMRAE) have been implemented in
the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
QuaPy in order to make available ELM variants for what nowadays
are considered the most well-behaved evaluation metrics in quantification.
@ -306,13 +323,18 @@ currently supports only binary classification.
ELM variants (any binary quantifier in general) can be extended
to operate in single-label scenarios trivially by adopting a
"one-vs-all" strategy (as, e.g., in
_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122_).
In QuaPy this is possible by using the _OneVsAll_ class:
[_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122_](https://link.springer.com/article/10.1007/s13278-016-0327-z)).
In QuaPy this is possible by using the _OneVsAll_ class.
There are two ways for instantiating this class, _OneVsAllGeneric_ that works for
any quantifier, and _OneVsAllAggregative_ that is optimized for aggregative quantifiers.
In general, you can simply use the _getOneVsAll_ function and QuaPy will choose
the more convenient of the two.
```python
import quapy as qp
from quapy.method.aggregative import SVMQ, OneVsAll
from quapy.method.aggregative import SVMQ
# load a single-label dataset (this one contains 3 classes)
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
@ -320,11 +342,14 @@ dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
# let qp know where svmperf is
qp.environ['SVMPERF_HOME'] = '../svm_perf_quantification'
model = OneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
model = getOneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```
Check the examples _[explicit_loss_minimization.py](..%2Fexamples%2Fexplicit_loss_minimization.py)_
and [one_vs_all.py](..%2Fexamples%2Fone_vs_all.py) for more details.
## Meta Models
By _meta_ models we mean quantification methods that are defined on top of other
@ -337,12 +362,12 @@ _Meta_ models are implemented in the _qp.method.meta_ module.
QuaPy implements (some of) the variants proposed in:
* _Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
* [_Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100._
* _Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Information Fusion, 34, 87-100._](https://www.sciencedirect.com/science/article/pii/S1566253516300628)
* [_Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15._
Information Fusion, 45, 1-15._](https://www.sciencedirect.com/science/article/pii/S1566253517303652)
The following code shows how to instantiate an Ensemble of 30 _Adjusted Classify & Count_ (ACC)
quantifiers operating with a _Logistic Regressor_ (LR) as the base classifier, and using the
@ -378,10 +403,10 @@ wiki if you want to optimize the hyperparameters of ensemble for classification
QuaPy offers an implementation of QuaNet, a deep learning model presented in:
_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
[_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
A recurrent neural network for sentiment quantification.
In Proceedings of the 27th ACM International Conference on
Information and Knowledge Management (pp. 1775-1778)._
Information and Knowledge Management (pp. 1775-1778)._](https://dl.acm.org/doi/abs/10.1145/3269206.3269287)
This model requires _torch_ to be installed.
QuaNet also requires a classifier that can provide embedded representations
@ -406,7 +431,8 @@ cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
learner = NeuralClassifierTrainer(cnn, device='cuda')
# train QuaNet
model = QuaNet(learner, qp.environ['SAMPLE_SIZE'], device='cuda')
model = QuaNet(learner, device='cuda')
model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```

View File

@ -22,9 +22,9 @@ Quantification has long been regarded as an add-on of
classification, and thus the model selection strategies
customarily adopted in classification have simply been
applied to quantification (see the next section).
It has been argued in _Moreo, Alejandro, and Fabrizio Sebastiani.
"Re-Assessing the" Classify and Count" Quantification Method."
arXiv preprint arXiv:2011.02552 (2020)._
It has been argued in [Moreo, Alejandro, and Fabrizio Sebastiani.
Re-Assessing the "Classify and Count" Quantification Method.
ECIR 2021: Advances in Information Retrieval pp 7591.](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6)
that specific model selection strategies should
be adopted for quantification. That is, model selection
strategies for quantification should target
@ -32,76 +32,86 @@ quantification-oriented losses and be tested in a variety
of scenarios exhibiting different degrees of prior
probability shift.
The class
_qp.model_selection.GridSearchQ_
implements a grid-search exploration over the space of
hyper-parameter combinations that evaluates each
combination of hyper-parameters
by means of a given quantification-oriented
The class _qp.model_selection.GridSearchQ_ implements a grid-search exploration over the space of
hyper-parameter combinations that [evaluates](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
each combination of hyper-parameters by means of a given quantification-oriented
error metric (e.g., any of the error functions implemented
in _qp.error_) and according to the
[_artificial sampling protocol_](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation).
in _qp.error_) and according to a
[sampling generation protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols).
The following is an example of model selection for quantification:
The following is an example (also included in the examples folder) of model selection for quantification:
```python
import quapy as qp
from quapy.method.aggregative import PCC
from quapy.protocol import APP
from quapy.method.aggregative import DistributionMatching
from sklearn.linear_model import LogisticRegression
import numpy as np
# set a seed to replicate runs
np.random.seed(0)
qp.environ['SAMPLE_SIZE'] = 500
"""
In this example, we show how to perform model selection on a DistributionMatching quantifier.
"""
dataset = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=5)
model = DistributionMatching(LogisticRegression())
qp.environ['SAMPLE_SIZE'] = 100
qp.environ['N_JOBS'] = -1 # explore hyper-parameters in parallel
training, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
# The model will be returned by the fit method of GridSearchQ.
# Model selection will be performed with a fixed budget of 1000 evaluations
# for each hyper-parameter combination. The error to optimize is the MAE for
# quantification, as evaluated on artificially drawn samples at prevalences
# covering the entire spectrum on a held-out portion (40%) of the training set.
# Every combination of hyper-parameters will be evaluated by confronting the
# quantifier thus configured against a series of samples generated by means
# of a sample generation protocol. For this example, we will use the
# artificial-prevalence protocol (APP), that generates samples with prevalence
# values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).
# We devote 30% of the dataset for this exploration.
training, validation = training.split_stratified(train_prop=0.7)
protocol = APP(validation)
# We will explore a classification-dependent hyper-parameter (e.g., the 'C'
# hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter
# (e.g., the number of bins in a DistributionMatching quantifier.
# Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"
# in order to let the quantifier know this hyper-parameter belongs to its underlying
# classifier.
param_grid = {
'classifier__C': np.logspace(-3,3,7),
'nbins': [8, 16, 32, 64],
}
model = qp.model_selection.GridSearchQ(
model=PCC(LogisticRegression()),
param_grid={'C': np.logspace(-4,5,10), 'class_weight': ['balanced', None]},
sample_size=qp.environ['SAMPLE_SIZE'],
eval_budget=1000,
error='mae',
refit=True, # retrain on the whole labelled set
val_split=0.4,
model=model,
param_grid=param_grid,
protocol=protocol,
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
refit=True, # retrain on the whole labelled set once done
verbose=True # show information as the process goes on
).fit(dataset.training)
).fit(training)
print(f'model selection ended: best hyper-parameters={model.best_params_}')
model = model.best_model_
# evaluation in terms of MAE
results = qp.evaluation.artificial_sampling_eval(
model,
dataset.test,
sample_size=qp.environ['SAMPLE_SIZE'],
n_prevpoints=101,
n_repetitions=10,
error_metric='mae'
)
# we use the same evaluation protocol (APP) on the test set
mae_score = qp.evaluation.evaluate(model, protocol=APP(test), error_metric='mae')
print(f'MAE={results:.5f}')
print(f'MAE={mae_score:.5f}')
```
In this example, the system outputs:
```
[GridSearchQ]: starting optimization with n_jobs=1
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': 'balanced'} got mae score 0.24987
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': None} got mae score 0.48135
[GridSearchQ]: checking hyperparams={'C': 0.001, 'class_weight': 'balanced'} got mae score 0.24866
[GridSearchQ]: starting model selection with self.n_jobs =-1
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 64} got mae score 0.04021 [took 1.1356s]
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 32} got mae score 0.04286 [took 1.2139s]
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 16} got mae score 0.04888 [took 1.2491s]
[GridSearchQ]: hyperparams={'classifier__C': 0.001, 'nbins': 8} got mae score 0.05163 [took 1.5372s]
[...]
[GridSearchQ]: checking hyperparams={'C': 100000.0, 'class_weight': None} got mae score 0.43676
[GridSearchQ]: optimization finished: best params {'C': 0.1, 'class_weight': 'balanced'} (score=0.19982)
[GridSearchQ]: hyperparams={'classifier__C': 1000.0, 'nbins': 32} got mae score 0.02445 [took 2.9056s]
[GridSearchQ]: optimization finished: best params {'classifier__C': 100.0, 'nbins': 32} (score=0.02234) [took 7.3114s]
[GridSearchQ]: refitting on the whole development set
model selection ended: best hyper-parameters={'C': 0.1, 'class_weight': 'balanced'}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5005.54it/s]
MAE=0.20342
model selection ended: best hyper-parameters={'classifier__C': 100.0, 'nbins': 32}
MAE=0.03102
```
The parameter _val_split_ can alternatively be used to indicate
@ -121,39 +131,20 @@ quantification literature have opted for this strategy.
In QuaPy, this is achieved by simply instantiating the
classifier learner as a GridSearchCV from scikit-learn.
The following code illustrates how to do that:
The following code illustrates how to do that:
```python
learner = GridSearchCV(
LogisticRegression(),
param_grid={'C': np.logspace(-4, 5, 10), 'class_weight': ['balanced', None]},
cv=5)
model = PCC(learner).fit(dataset.training)
print(f'model selection ended: best hyper-parameters={model.learner.best_params_}')
model = DistributionMatching(learner).fit(dataset.training)
```
In this example, the system outputs:
```
model selection ended: best hyper-parameters={'C': 10000.0, 'class_weight': None}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5379.55it/s]
MAE=0.41734
```
Note that the MAE is worse than the one we obtained when optimizing
for quantification and, indeed, the hyper-parameters found optimal
largely differ between the two selection modalities. The
hyper-parameters C=10000 and class_weight=None have been found
to work well for the specific training prevalence of the HP dataset,
but these hyper-parameters turned out to be suboptimal when the
class prevalences of the test set differs (as is indeed tested
in scenarios of quantification).
This is, however, not always the case, and one could, in practice,
find examples
in which optimizing for classification ends up resulting in a better
quantifier than when optimizing for quantification.
Nonetheless, this is theoretically unlikely to happen.
However, this is conceptually flawed, since the model should be
optimized for the task at hand (quantification), and not for a surrogate task (classification),
i.e., the model should be requested to deliver low quantification errors, rather
than low classification errors.

View File

@ -43,7 +43,7 @@ quantification methods across different scenarios showcasing
the accuracy of the quantifier in predicting class prevalences
for a wide range of prior distributions. This can easily be
achieved by means of the
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
that is implemented in QuaPy.
The following code shows how to perform one simple experiment
@ -55,6 +55,7 @@ generating 100 random samples at each prevalence).
```python
import quapy as qp
from protocol import APP
from quapy.method.aggregative import CC, ACC, PCC, PACC
from sklearn.svm import LinearSVC
@ -63,28 +64,26 @@ qp.environ['SAMPLE_SIZE'] = 500
def gen_data():
def base_classifier():
return LinearSVC()
return LinearSVC(class_weight='balanced')
def models():
yield CC(base_classifier())
yield ACC(base_classifier())
yield PCC(base_classifier())
yield PACC(base_classifier())
yield 'CC', CC(base_classifier())
yield 'ACC', ACC(base_classifier())
yield 'PCC', PCC(base_classifier())
yield 'PACC', PACC(base_classifier())
data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)
train, test = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5).train_test
method_names, true_prevs, estim_prevs, tr_prevs = [], [], [], []
for model in models():
model.fit(data.training)
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
model, data.test, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
)
for method_name, model in models():
model.fit(train)
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
method_names.append(model.__class__.__name__)
method_names.append(method_name)
true_prevs.append(true_prev)
estim_prevs.append(estim_prev)
tr_prevs.append(data.training.prevalence())
tr_prevs.append(train.prevalence())
return method_names, true_prevs, estim_prevs, tr_prevs
@ -163,21 +162,19 @@ like this:
```python
def gen_data():
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5)
train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
model = CC(LinearSVC())
method_data = []
for training_prevalence in np.linspace(0.1, 0.9, 9):
training_size = 5000
# since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)
training = data.training.sampling(training_size, 1 - training_prevalence)
model.fit(training)
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
model, data.sample, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
)
# method names can contain Latex syntax
method_name = 'CC$_{' + f'{int(100 * training_prevalence)}' + '\%}$'
method_data.append((method_name, true_prev, estim_prev, training.prevalence()))
# since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained
train_sample = train.sampling(training_size, 1-training_prevalence)
model.fit(train_sample)
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
method_name = 'CC$_{'+f'{int(100*training_prevalence)}' + '\%}$'
method_data.append((method_name, true_prev, estim_prev, train_sample.prevalence()))
return zip(*method_data)
```

View File

@ -64,6 +64,7 @@ Features
* 32 UCI Machine Learning datasets.
* 11 Twitter Sentiment datasets.
* 3 Reviews Sentiment datasets.
* 4 tasks from LeQua competition (_new in v0.1.7!_)
* Native supports for binary and single-label scenarios of quantification.
* Model selection functionality targeting quantification-oriented losses.
* Visualization tools for analysing results.
@ -75,8 +76,9 @@ Features
Installation
Datasets
Evaluation
Protocols
Methods
Model Selection
Model-Selection
Plotting
API Developers documentation<modules>

View File

@ -1,27 +1,38 @@
:tocdepth: 2
quapy.classification package
============================
Submodules
----------
quapy.classification.methods module
-----------------------------------
quapy.classification.calibration
--------------------------------
.. versionadded:: 0.1.7
.. automodule:: quapy.classification.calibration
:members:
:undoc-members:
:show-inheritance:
quapy.classification.methods
----------------------------
.. automodule:: quapy.classification.methods
:members:
:undoc-members:
:show-inheritance:
quapy.classification.neural module
----------------------------------
quapy.classification.neural
---------------------------
.. automodule:: quapy.classification.neural
:members:
:undoc-members:
:show-inheritance:
quapy.classification.svmperf module
-----------------------------------
quapy.classification.svmperf
----------------------------
.. automodule:: quapy.classification.svmperf
:members:

View File

@ -1,35 +1,37 @@
:tocdepth: 2
quapy.data package
==================
Submodules
----------
quapy.data.base module
----------------------
quapy.data.base
---------------
.. automodule:: quapy.data.base
:members:
:undoc-members:
:show-inheritance:
quapy.data.datasets module
--------------------------
quapy.data.datasets
-------------------
.. automodule:: quapy.data.datasets
:members:
:undoc-members:
:show-inheritance:
quapy.data.preprocessing module
-------------------------------
quapy.data.preprocessing
------------------------
.. automodule:: quapy.data.preprocessing
:members:
:undoc-members:
:show-inheritance:
quapy.data.reader module
------------------------
quapy.data.reader
-----------------
.. automodule:: quapy.data.reader
:members:

View File

@ -1,43 +1,45 @@
:tocdepth: 2
quapy.method package
====================
Submodules
----------
quapy.method.aggregative module
-------------------------------
quapy.method.aggregative
------------------------
.. automodule:: quapy.method.aggregative
:members:
:undoc-members:
:show-inheritance:
quapy.method.base module
------------------------
quapy.method.base
-----------------
.. automodule:: quapy.method.base
:members:
:undoc-members:
:show-inheritance:
quapy.method.meta module
------------------------
quapy.method.meta
-----------------
.. automodule:: quapy.method.meta
:members:
:undoc-members:
:show-inheritance:
quapy.method.neural module
--------------------------
quapy.method.neural
-------------------
.. automodule:: quapy.method.neural
:members:
:undoc-members:
:show-inheritance:
quapy.method.non\_aggregative module
------------------------------------
quapy.method.non\_aggregative
-----------------------------
.. automodule:: quapy.method.non_aggregative
:members:

View File

@ -1,68 +1,79 @@
:tocdepth: 2
quapy package
=============
Subpackages
-----------
.. toctree::
:maxdepth: 4
quapy.classification
quapy.data
quapy.method
quapy.tests
Submodules
----------
quapy.error module
------------------
quapy.error
-----------
.. automodule:: quapy.error
:members:
:undoc-members:
:show-inheritance:
quapy.evaluation module
-----------------------
quapy.evaluation
----------------
.. automodule:: quapy.evaluation
:members:
:undoc-members:
:show-inheritance:
quapy.functional module
-----------------------
quapy.protocol
--------------
.. versionadded:: 0.1.7
.. automodule:: quapy.protocol
:members:
:undoc-members:
:show-inheritance:
quapy.functional
----------------
.. automodule:: quapy.functional
:members:
:undoc-members:
:show-inheritance:
quapy.model\_selection module
-----------------------------
quapy.model\_selection
----------------------
.. automodule:: quapy.model_selection
:members:
:undoc-members:
:show-inheritance:
quapy.plot module
-----------------
quapy.plot
----------
.. automodule:: quapy.plot
:members:
:undoc-members:
:show-inheritance:
quapy.util module
-----------------
quapy.util
----------
.. automodule:: quapy.util
:members:
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 3
quapy.classification
quapy.data
quapy.method
Module contents
---------------
@ -70,3 +81,4 @@ Module contents
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,37 +0,0 @@
quapy.tests package
===================
Submodules
----------
quapy.tests.test\_base module
-----------------------------
.. automodule:: quapy.tests.test_base
:members:
:undoc-members:
:show-inheritance:
quapy.tests.test\_datasets module
---------------------------------
.. automodule:: quapy.tests.test_datasets
:members:
:undoc-members:
:show-inheritance:
quapy.tests.test\_methods module
--------------------------------
.. automodule:: quapy.tests.test_methods
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: quapy.tests
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,7 +0,0 @@
Getting Started
===============
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python.
Installation
------------
>>> pip install quapy

View File

@ -1 +0,0 @@
.. include:: ../../README.md

View File

@ -1,701 +0,0 @@
@import url("basic.css");
/* -- page layout ----------------------------------------------------------- */
body {
font-family: Georgia, serif;
font-size: 17px;
background-color: #fff;
color: #000;
margin: 0;
padding: 0;
}
div.document {
width: 940px;
margin: 30px auto 0 auto;
}
div.documentwrapper {
float: left;
width: 100%;
}
div.bodywrapper {
margin: 0 0 0 220px;
}
div.sphinxsidebar {
width: 220px;
font-size: 14px;
line-height: 1.5;
}
hr {
border: 1px solid #B1B4B6;
}
div.body {
background-color: #fff;
color: #3E4349;
padding: 0 30px 0 30px;
}
div.body > .section {
text-align: left;
}
div.footer {
width: 940px;
margin: 20px auto 30px auto;
font-size: 14px;
color: #888;
text-align: right;
}
div.footer a {
color: #888;
}
p.caption {
font-family: inherit;
font-size: inherit;
}
div.relations {
display: none;
}
div.sphinxsidebar a {
color: #444;
text-decoration: none;
border-bottom: 1px dotted #999;
}
div.sphinxsidebar a:hover {
border-bottom: 1px solid #999;
}
div.sphinxsidebarwrapper {
padding: 18px 10px;
}
div.sphinxsidebarwrapper p.logo {
padding: 0;
margin: -10px 0 0 0px;
text-align: center;
}
div.sphinxsidebarwrapper h1.logo {
margin-top: -10px;
text-align: center;
margin-bottom: 5px;
text-align: left;
}
div.sphinxsidebarwrapper h1.logo-name {
margin-top: 0px;
}
div.sphinxsidebarwrapper p.blurb {
margin-top: 0;
font-style: normal;
}
div.sphinxsidebar h3,
div.sphinxsidebar h4 {
font-family: Georgia, serif;
color: #444;
font-size: 24px;
font-weight: normal;
margin: 0 0 5px 0;
padding: 0;
}
div.sphinxsidebar h4 {
font-size: 20px;
}
div.sphinxsidebar h3 a {
color: #444;
}
div.sphinxsidebar p.logo a,
div.sphinxsidebar h3 a,
div.sphinxsidebar p.logo a:hover,
div.sphinxsidebar h3 a:hover {
border: none;
}
div.sphinxsidebar p {
color: #555;
margin: 10px 0;
}
div.sphinxsidebar ul {
margin: 10px 0;
padding: 0;
color: #000;
}
div.sphinxsidebar ul li.toctree-l1 > a {
font-size: 120%;
}
div.sphinxsidebar ul li.toctree-l2 > a {
font-size: 110%;
}
div.sphinxsidebar input {
border: 1px solid #CCC;
font-family: Georgia, serif;
font-size: 1em;
}
div.sphinxsidebar hr {
border: none;
height: 1px;
color: #AAA;
background: #AAA;
text-align: left;
margin-left: 0;
width: 50%;
}
div.sphinxsidebar .badge {
border-bottom: none;
}
div.sphinxsidebar .badge:hover {
border-bottom: none;
}
/* To address an issue with donation coming after search */
div.sphinxsidebar h3.donation {
margin-top: 10px;
}
/* -- body styles ----------------------------------------------------------- */
a {
color: #004B6B;
text-decoration: underline;
}
a:hover {
color: #6D4100;
text-decoration: underline;
}
div.body h1,
div.body h2,
div.body h3,
div.body h4,
div.body h5,
div.body h6 {
font-family: Georgia, serif;
font-weight: normal;
margin: 30px 0px 10px 0px;
padding: 0;
}
div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; }
div.body h2 { font-size: 180%; }
div.body h3 { font-size: 150%; }
div.body h4 { font-size: 130%; }
div.body h5 { font-size: 100%; }
div.body h6 { font-size: 100%; }
a.headerlink {
color: #DDD;
padding: 0 4px;
text-decoration: none;
}
a.headerlink:hover {
color: #444;
background: #EAEAEA;
}
div.body p, div.body dd, div.body li {
line-height: 1.4em;
}
div.admonition {
margin: 20px 0px;
padding: 10px 30px;
background-color: #EEE;
border: 1px solid #CCC;
}
div.admonition tt.xref, div.admonition code.xref, div.admonition a tt {
background-color: #FBFBFB;
border-bottom: 1px solid #fafafa;
}
div.admonition p.admonition-title {
font-family: Georgia, serif;
font-weight: normal;
font-size: 24px;
margin: 0 0 10px 0;
padding: 0;
line-height: 1;
}
div.admonition p.last {
margin-bottom: 0;
}
div.highlight {
background-color: #fff;
}
dt:target, .highlight {
background: #FAF3E8;
}
div.warning {
background-color: #FCC;
border: 1px solid #FAA;
}
div.danger {
background-color: #FCC;
border: 1px solid #FAA;
-moz-box-shadow: 2px 2px 4px #D52C2C;
-webkit-box-shadow: 2px 2px 4px #D52C2C;
box-shadow: 2px 2px 4px #D52C2C;
}
div.error {
background-color: #FCC;
border: 1px solid #FAA;
-moz-box-shadow: 2px 2px 4px #D52C2C;
-webkit-box-shadow: 2px 2px 4px #D52C2C;
box-shadow: 2px 2px 4px #D52C2C;
}
div.caution {
background-color: #FCC;
border: 1px solid #FAA;
}
div.attention {
background-color: #FCC;
border: 1px solid #FAA;
}
div.important {
background-color: #EEE;
border: 1px solid #CCC;
}
div.note {
background-color: #EEE;
border: 1px solid #CCC;
}
div.tip {
background-color: #EEE;
border: 1px solid #CCC;
}
div.hint {
background-color: #EEE;
border: 1px solid #CCC;
}
div.seealso {
background-color: #EEE;
border: 1px solid #CCC;
}
div.topic {
background-color: #EEE;
}
p.admonition-title {
display: inline;
}
p.admonition-title:after {
content: ":";
}
pre, tt, code {
font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
font-size: 0.9em;
}
.hll {
background-color: #FFC;
margin: 0 -12px;
padding: 0 12px;
display: block;
}
img.screenshot {
}
tt.descname, tt.descclassname, code.descname, code.descclassname {
font-size: 0.95em;
}
tt.descname, code.descname {
padding-right: 0.08em;
}
img.screenshot {
-moz-box-shadow: 2px 2px 4px #EEE;
-webkit-box-shadow: 2px 2px 4px #EEE;
box-shadow: 2px 2px 4px #EEE;
}
table.docutils {
border: 1px solid #888;
-moz-box-shadow: 2px 2px 4px #EEE;
-webkit-box-shadow: 2px 2px 4px #EEE;
box-shadow: 2px 2px 4px #EEE;
}
table.docutils td, table.docutils th {
border: 1px solid #888;
padding: 0.25em 0.7em;
}
table.field-list, table.footnote {
border: none;
-moz-box-shadow: none;
-webkit-box-shadow: none;
box-shadow: none;
}
table.footnote {
margin: 15px 0;
width: 100%;
border: 1px solid #EEE;
background: #FDFDFD;
font-size: 0.9em;
}
table.footnote + table.footnote {
margin-top: -15px;
border-top: none;
}
table.field-list th {
padding: 0 0.8em 0 0;
}
table.field-list td {
padding: 0;
}
table.field-list p {
margin-bottom: 0.8em;
}
/* Cloned from
* https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68
*/
.field-name {
-moz-hyphens: manual;
-ms-hyphens: manual;
-webkit-hyphens: manual;
hyphens: manual;
}
table.footnote td.label {
width: .1px;
padding: 0.3em 0 0.3em 0.5em;
}
table.footnote td {
padding: 0.3em 0.5em;
}
dl {
margin: 0;
padding: 0;
}
dl dd {
margin-left: 30px;
}
blockquote {
margin: 0 0 0 30px;
padding: 0;
}
ul, ol {
/* Matches the 30px from the narrow-screen "li > ul" selector below */
margin: 10px 0 10px 30px;
padding: 0;
}
pre {
background: #EEE;
padding: 7px 30px;
margin: 15px 0px;
line-height: 1.3em;
}
div.viewcode-block:target {
background: #ffd;
}
dl pre, blockquote pre, li pre {
margin-left: 0;
padding-left: 30px;
}
tt, code {
background-color: #ecf0f3;
color: #222;
/* padding: 1px 2px; */
}
tt.xref, code.xref, a tt {
background-color: #FBFBFB;
border-bottom: 1px solid #fff;
}
a.reference {
text-decoration: none;
border-bottom: 1px dotted #004B6B;
}
/* Don't put an underline on images */
a.image-reference, a.image-reference:hover {
border-bottom: none;
}
a.reference:hover {
border-bottom: 1px solid #6D4100;
}
a.footnote-reference {
text-decoration: none;
font-size: 0.7em;
vertical-align: top;
border-bottom: 1px dotted #004B6B;
}
a.footnote-reference:hover {
border-bottom: 1px solid #6D4100;
}
a:hover tt, a:hover code {
background: #EEE;
}
@media screen and (max-width: 870px) {
div.sphinxsidebar {
display: none;
}
div.document {
width: 100%;
}
div.documentwrapper {
margin-left: 0;
margin-top: 0;
margin-right: 0;
margin-bottom: 0;
}
div.bodywrapper {
margin-top: 0;
margin-right: 0;
margin-bottom: 0;
margin-left: 0;
}
ul {
margin-left: 0;
}
li > ul {
/* Matches the 30px from the "ul, ol" selector above */
margin-left: 30px;
}
.document {
width: auto;
}
.footer {
width: auto;
}
.bodywrapper {
margin: 0;
}
.footer {
width: auto;
}
.github {
display: none;
}
}
@media screen and (max-width: 875px) {
body {
margin: 0;
padding: 20px 30px;
}
div.documentwrapper {
float: none;
background: #fff;
}
div.sphinxsidebar {
display: block;
float: none;
width: 102.5%;
margin: 50px -30px -20px -30px;
padding: 10px 20px;
background: #333;
color: #FFF;
}
div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p,
div.sphinxsidebar h3 a {
color: #fff;
}
div.sphinxsidebar a {
color: #AAA;
}
div.sphinxsidebar p.logo {
display: none;
}
div.document {
width: 100%;
margin: 0;
}
div.footer {
display: none;
}
div.bodywrapper {
margin: 0;
}
div.body {
min-height: 0;
padding: 0;
}
.rtd_doc_footer {
display: none;
}
.document {
width: auto;
}
.footer {
width: auto;
}
.footer {
width: auto;
}
.github {
display: none;
}
}
/* misc. */
.revsys-inline {
display: none!important;
}
/* Make nested-list/multi-paragraph items look better in Releases changelog
* pages. Without this, docutils' magical list fuckery causes inconsistent
* formatting between different release sub-lists.
*/
div#changelog > div.section > ul > li > p:only-child {
margin-bottom: 0;
}
/* Hide fugly table cell borders in ..bibliography:: directive output */
table.docutils.citation, table.docutils.citation td, table.docutils.citation th {
border: none;
/* Below needed in some edge cases; if not applied, bottom shadows appear */
-moz-box-shadow: none;
-webkit-box-shadow: none;
box-shadow: none;
}
/* relbar */
.related {
line-height: 30px;
width: 100%;
font-size: 0.9rem;
}
.related.top {
border-bottom: 1px solid #EEE;
margin-bottom: 20px;
}
.related.bottom {
border-top: 1px solid #EEE;
}
.related ul {
padding: 0;
margin: 0;
list-style: none;
}
.related li {
display: inline;
}
nav#rellinks {
float: right;
}
nav#rellinks li+li:before {
content: "|";
}
nav#breadcrumbs li+li:before {
content: "\00BB";
}
/* Hide certain items when printing */
@media print {
div.related {
display: none;
}
}

View File

@ -4,7 +4,7 @@
*
* Sphinx stylesheet -- basic theme.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
@ -222,7 +222,7 @@ table.modindextable td {
/* -- general body styles --------------------------------------------------- */
div.body {
min-width: 450px;
min-width: 360px;
max-width: 800px;
}
@ -237,16 +237,6 @@ a.headerlink {
visibility: hidden;
}
a.brackets:before,
span.brackets > a:before{
content: "[";
}
a.brackets:after,
span.brackets > a:after {
content: "]";
}
h1:hover > a.headerlink,
h2:hover > a.headerlink,
h3:hover > a.headerlink,
@ -334,13 +324,15 @@ aside.sidebar {
p.sidebar-title {
font-weight: bold;
}
nav.contents,
aside.topic,
div.admonition, div.topic, blockquote {
clear: left;
}
/* -- topics ---------------------------------------------------------------- */
nav.contents,
aside.topic,
div.topic {
border: 1px solid #ccc;
padding: 7px;
@ -379,6 +371,8 @@ div.body p.centered {
div.sidebar > :last-child,
aside.sidebar > :last-child,
nav.contents > :last-child,
aside.topic > :last-child,
div.topic > :last-child,
div.admonition > :last-child {
margin-bottom: 0;
@ -386,6 +380,8 @@ div.admonition > :last-child {
div.sidebar::after,
aside.sidebar::after,
nav.contents::after,
aside.topic::after,
div.topic::after,
div.admonition::after,
blockquote::after {
@ -428,10 +424,6 @@ table.docutils td, table.docutils th {
border-bottom: 1px solid #aaa;
}
table.footnote td, table.footnote th {
border: 0 !important;
}
th {
text-align: left;
padding-right: 5px;
@ -614,20 +606,26 @@ ol.simple p,
ul.simple p {
margin-bottom: 0;
}
dl.footnote > dt,
dl.citation > dt {
aside.footnote > span,
div.citation > span {
float: left;
margin-right: 0.5em;
}
dl.footnote > dd,
dl.citation > dd {
aside.footnote > span:last-of-type,
div.citation > span:last-of-type {
padding-right: 0.5em;
}
aside.footnote > p {
margin-left: 2em;
}
div.citation > p {
margin-left: 4em;
}
aside.footnote > p:last-of-type,
div.citation > p:last-of-type {
margin-bottom: 0em;
}
dl.footnote > dd:after,
dl.citation > dd:after {
aside.footnote > p:last-of-type:after,
div.citation > p:last-of-type:after {
content: "";
clear: both;
}
@ -644,10 +642,6 @@ dl.field-list > dt {
padding-right: 5px;
}
dl.field-list > dt:after {
content: ":";
}
dl.field-list > dd {
padding-left: 0.5em;
margin-top: 0em;
@ -731,8 +725,9 @@ dl.glossary dt {
.classifier:before {
font-style: normal;
margin: 0.5em;
margin: 0 0.5em;
content: ":";
display: inline-block;
}
abbr, acronym {
@ -756,6 +751,7 @@ span.pre {
-ms-hyphens: none;
-webkit-hyphens: none;
hyphens: none;
white-space: nowrap;
}
div[class*="highlight-"] {

View File

@ -294,6 +294,8 @@ div.quotebar {
padding: 2px 7px;
border: 1px solid #ccc;
}
nav.contents,
aside.topic,
div.topic {
background-color: #f8f8f8;

View File

@ -9,33 +9,22 @@
// :copyright: Copyright 2012-2014 by Sphinx team, see AUTHORS.
// :license: BSD, see LICENSE for details.
//
$(document).ready(function(){
if (navigator.userAgent.indexOf('iPhone') > 0 ||
navigator.userAgent.indexOf('Android') > 0) {
$("li.nav-item-0 a").text("Top");
const initialiseBizStyle = () => {
if (navigator.userAgent.indexOf("iPhone") > 0 || navigator.userAgent.indexOf("Android") > 0) {
document.querySelector("li.nav-item-0 a").innerText = "Top"
}
const truncator = item => {if (item.textContent.length > 20) {
item.title = item.innerText
item.innerText = item.innerText.substr(0, 17) + "..."
}
}
document.querySelectorAll("div.related:first ul li:not(.right) a").slice(1).forEach(truncator);
document.querySelectorAll("div.related:last ul li:not(.right) a").slice(1).forEach(truncator);
}
$("div.related:first ul li:not(.right) a").slice(1).each(function(i, item){
if (item.text.length > 20) {
var tmpstr = item.text
$(item).attr("title", tmpstr);
$(item).text(tmpstr.substr(0, 17) + "...");
}
});
$("div.related:last ul li:not(.right) a").slice(1).each(function(i, item){
if (item.text.length > 20) {
var tmpstr = item.text
$(item).attr("title", tmpstr);
$(item).text(tmpstr.substr(0, 17) + "...");
}
});
});
window.addEventListener("resize",
() => (document.querySelector("li.nav-item-0 a").innerText = (window.innerWidth <= 776) ? "Top" : "QuaPy 0.1.7 documentation")
)
$(window).resize(function(){
if ($(window).width() <= 776) {
$("li.nav-item-0 a").text("Top");
}
else {
$("li.nav-item-0 a").text("QuaPy 0.1.6 documentation");
}
});
if (document.readyState !== "loading") initialiseBizStyle()
else document.addEventListener("DOMContentLoaded", initialiseBizStyle)

View File

@ -1 +0,0 @@
/* This file intentionally left blank. */

View File

@ -2,322 +2,155 @@
* doctools.js
* ~~~~~~~~~~~
*
* Sphinx JavaScript utilities for all documentation.
* Base JavaScript utilities for all Sphinx HTML documentation.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
"use strict";
/**
* select a different prefix for underscore
*/
$u = _.noConflict();
const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([
"TEXTAREA",
"INPUT",
"SELECT",
"BUTTON",
]);
/**
* make the code below compatible with browsers without
* an installed firebug like debugger
if (!window.console || !console.firebug) {
var names = ["log", "debug", "info", "warn", "error", "assert", "dir",
"dirxml", "group", "groupEnd", "time", "timeEnd", "count", "trace",
"profile", "profileEnd"];
window.console = {};
for (var i = 0; i < names.length; ++i)
window.console[names[i]] = function() {};
}
*/
/**
* small helper function to urldecode strings
*
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
*/
jQuery.urldecode = function(x) {
if (!x) {
return x
const _ready = (callback) => {
if (document.readyState !== "loading") {
callback();
} else {
document.addEventListener("DOMContentLoaded", callback);
}
return decodeURIComponent(x.replace(/\+/g, ' '));
};
/**
* small helper function to urlencode strings
*/
jQuery.urlencode = encodeURIComponent;
/**
* This function returns the parsed url parameters of the
* current request. Multiple values per key are supported,
* it will always return arrays of strings for the value parts.
*/
jQuery.getQueryParameters = function(s) {
if (typeof s === 'undefined')
s = document.location.search;
var parts = s.substr(s.indexOf('?') + 1).split('&');
var result = {};
for (var i = 0; i < parts.length; i++) {
var tmp = parts[i].split('=', 2);
var key = jQuery.urldecode(tmp[0]);
var value = jQuery.urldecode(tmp[1]);
if (key in result)
result[key].push(value);
else
result[key] = [value];
}
return result;
};
/**
* highlight a given string on a jquery object by wrapping it in
* span elements with the given class name.
*/
jQuery.fn.highlightText = function(text, className) {
function highlight(node, addItems) {
if (node.nodeType === 3) {
var val = node.nodeValue;
var pos = val.toLowerCase().indexOf(text);
if (pos >= 0 &&
!jQuery(node.parentNode).hasClass(className) &&
!jQuery(node.parentNode).hasClass("nohighlight")) {
var span;
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
if (isInSVG) {
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
} else {
span = document.createElement("span");
span.className = className;
}
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
document.createTextNode(val.substr(pos + text.length)),
node.nextSibling));
node.nodeValue = val.substr(0, pos);
if (isInSVG) {
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
var bbox = node.parentElement.getBBox();
rect.x.baseVal.value = bbox.x;
rect.y.baseVal.value = bbox.y;
rect.width.baseVal.value = bbox.width;
rect.height.baseVal.value = bbox.height;
rect.setAttribute('class', className);
addItems.push({
"parent": node.parentNode,
"target": rect});
}
}
}
else if (!jQuery(node).is("button, select, textarea")) {
jQuery.each(node.childNodes, function() {
highlight(this, addItems);
});
}
}
var addItems = [];
var result = this.each(function() {
highlight(this, addItems);
});
for (var i = 0; i < addItems.length; ++i) {
jQuery(addItems[i].parent).before(addItems[i].target);
}
return result;
};
/*
* backward compatibility for jQuery.browser
* This will be supported until firefox bug is fixed.
*/
if (!jQuery.browser) {
jQuery.uaMatch = function(ua) {
ua = ua.toLowerCase();
var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
/(msie) ([\w.]+)/.exec(ua) ||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
[];
return {
browser: match[ 1 ] || "",
version: match[ 2 ] || "0"
};
};
jQuery.browser = {};
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
}
/**
* Small JavaScript module for the documentation.
*/
var Documentation = {
init : function() {
this.fixFirefoxAnchorBug();
this.highlightSearchWords();
this.initIndexTable();
if (DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) {
this.initOnKeyListeners();
}
const Documentation = {
init: () => {
Documentation.initDomainIndexTable();
Documentation.initOnKeyListeners();
},
/**
* i18n support
*/
TRANSLATIONS : {},
PLURAL_EXPR : function(n) { return n === 1 ? 0 : 1; },
LOCALE : 'unknown',
TRANSLATIONS: {},
PLURAL_EXPR: (n) => (n === 1 ? 0 : 1),
LOCALE: "unknown",
// gettext and ngettext don't access this so that the functions
// can safely bound to a different name (_ = Documentation.gettext)
gettext : function(string) {
var translated = Documentation.TRANSLATIONS[string];
if (typeof translated === 'undefined')
return string;
return (typeof translated === 'string') ? translated : translated[0];
gettext: (string) => {
const translated = Documentation.TRANSLATIONS[string];
switch (typeof translated) {
case "undefined":
return string; // no translation
case "string":
return translated; // translation exists
default:
return translated[0]; // (singular, plural) translation tuple exists
}
},
ngettext : function(singular, plural, n) {
var translated = Documentation.TRANSLATIONS[singular];
if (typeof translated === 'undefined')
return (n == 1) ? singular : plural;
return translated[Documentation.PLURALEXPR(n)];
ngettext: (singular, plural, n) => {
const translated = Documentation.TRANSLATIONS[singular];
if (typeof translated !== "undefined")
return translated[Documentation.PLURAL_EXPR(n)];
return n === 1 ? singular : plural;
},
addTranslations : function(catalog) {
for (var key in catalog.messages)
this.TRANSLATIONS[key] = catalog.messages[key];
this.PLURAL_EXPR = new Function('n', 'return +(' + catalog.plural_expr + ')');
this.LOCALE = catalog.locale;
addTranslations: (catalog) => {
Object.assign(Documentation.TRANSLATIONS, catalog.messages);
Documentation.PLURAL_EXPR = new Function(
"n",
`return (${catalog.plural_expr})`
);
Documentation.LOCALE = catalog.locale;
},
/**
* add context elements like header anchor links
* helper function to focus on search bar
*/
addContextElements : function() {
$('div[id] > :header:first').each(function() {
$('<a class="headerlink">\u00B6</a>').
attr('href', '#' + this.id).
attr('title', _('Permalink to this headline')).
appendTo(this);
});
$('dt[id]').each(function() {
$('<a class="headerlink">\u00B6</a>').
attr('href', '#' + this.id).
attr('title', _('Permalink to this definition')).
appendTo(this);
});
focusSearchBar: () => {
document.querySelectorAll("input[name=q]")[0]?.focus();
},
/**
* workaround a firefox stupidity
* see: https://bugzilla.mozilla.org/show_bug.cgi?id=645075
* Initialise the domain index toggle buttons
*/
fixFirefoxAnchorBug : function() {
if (document.location.hash && $.browser.mozilla)
window.setTimeout(function() {
document.location.href += '';
}, 10);
},
/**
* highlight the search words provided in the url in the text
*/
highlightSearchWords : function() {
var params = $.getQueryParameters();
var terms = (params.highlight) ? params.highlight[0].split(/\s+/) : [];
if (terms.length) {
var body = $('div.body');
if (!body.length) {
body = $('body');
initDomainIndexTable: () => {
const toggler = (el) => {
const idNumber = el.id.substr(7);
const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`);
if (el.src.substr(-9) === "minus.png") {
el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`;
toggledRows.forEach((el) => (el.style.display = "none"));
} else {
el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`;
toggledRows.forEach((el) => (el.style.display = ""));
}
window.setTimeout(function() {
$.each(terms, function() {
body.highlightText(this.toLowerCase(), 'highlighted');
});
}, 10);
$('<p class="highlight-link"><a href="javascript:Documentation.' +
'hideSearchWords()">' + _('Hide Search Matches') + '</a></p>')
.appendTo($('#searchbox'));
}
};
const togglerElements = document.querySelectorAll("img.toggler");
togglerElements.forEach((el) =>
el.addEventListener("click", (event) => toggler(event.currentTarget))
);
togglerElements.forEach((el) => (el.style.display = ""));
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler);
},
/**
* init the domain index toggle buttons
*/
initIndexTable : function() {
var togglers = $('img.toggler').click(function() {
var src = $(this).attr('src');
var idnum = $(this).attr('id').substr(7);
$('tr.cg-' + idnum).toggle();
if (src.substr(-9) === 'minus.png')
$(this).attr('src', src.substr(0, src.length-9) + 'plus.png');
else
$(this).attr('src', src.substr(0, src.length-8) + 'minus.png');
}).css('display', '');
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) {
togglers.click();
}
},
initOnKeyListeners: () => {
// only install a listener if it is really needed
if (
!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS &&
!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS
)
return;
/**
* helper function to hide the search marks again
*/
hideSearchWords : function() {
$('#searchbox .highlight-link').fadeOut(300);
$('span.highlighted').removeClass('highlighted');
},
document.addEventListener("keydown", (event) => {
// bail for input elements
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
// bail with special keys
if (event.altKey || event.ctrlKey || event.metaKey) return;
/**
* make the url absolute
*/
makeURL : function(relativeURL) {
return DOCUMENTATION_OPTIONS.URL_ROOT + '/' + relativeURL;
},
if (!event.shiftKey) {
switch (event.key) {
case "ArrowLeft":
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
/**
* get the current relative url
*/
getCurrentURL : function() {
var path = document.location.pathname;
var parts = path.split(/\//);
$.each(DOCUMENTATION_OPTIONS.URL_ROOT.split(/\//), function() {
if (this === '..')
parts.pop();
});
var url = parts.join('/');
return path.substring(url.lastIndexOf('/') + 1, path.length - 1);
},
initOnKeyListeners: function() {
$(document).keydown(function(event) {
var activeElementType = document.activeElement.tagName;
// don't navigate when in search box, textarea, dropdown or button
if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT'
&& activeElementType !== 'BUTTON' && !event.altKey && !event.ctrlKey && !event.metaKey
&& !event.shiftKey) {
switch (event.keyCode) {
case 37: // left
var prevHref = $('link[rel="prev"]').prop('href');
if (prevHref) {
window.location.href = prevHref;
return false;
const prevLink = document.querySelector('link[rel="prev"]');
if (prevLink && prevLink.href) {
window.location.href = prevLink.href;
event.preventDefault();
}
break;
case 39: // right
var nextHref = $('link[rel="next"]').prop('href');
if (nextHref) {
window.location.href = nextHref;
return false;
case "ArrowRight":
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
const nextLink = document.querySelector('link[rel="next"]');
if (nextLink && nextLink.href) {
window.location.href = nextLink.href;
event.preventDefault();
}
break;
}
}
// some keyboard layouts may need Shift to get /
switch (event.key) {
case "/":
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break;
Documentation.focusSearchBar();
event.preventDefault();
}
});
}
},
};
// quick alias for translations
_ = Documentation.gettext;
const _ = Documentation.gettext;
$(document).ready(function() {
Documentation.init();
});
_ready(Documentation.init);

View File

@ -1,12 +1,14 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '0.1.6',
LANGUAGE: 'None',
VERSION: '0.1.7',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
FILE_SUFFIX: '.html',
LINK_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt',
NAVIGATION_WITH_KEYS: false
NAVIGATION_WITH_KEYS: false,
SHOW_SEARCH_SUMMARY: true,
ENABLE_SEARCH_SHORTCUTS: true,
};

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -5,12 +5,12 @@
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
var stopwords = ["a","and","are","as","at","be","but","by","for","if","in","into","is","it","near","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"];
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
/* Non-minified version is copied as a separate JS file, is available */
@ -197,101 +197,3 @@ var Stemmer = function() {
}
}
var splitChars = (function() {
var result = {};
var singles = [96, 180, 187, 191, 215, 247, 749, 885, 903, 907, 909, 930, 1014, 1648,
1748, 1809, 2416, 2473, 2481, 2526, 2601, 2609, 2612, 2615, 2653, 2702,
2706, 2729, 2737, 2740, 2857, 2865, 2868, 2910, 2928, 2948, 2961, 2971,
2973, 3085, 3089, 3113, 3124, 3213, 3217, 3241, 3252, 3295, 3341, 3345,
3369, 3506, 3516, 3633, 3715, 3721, 3736, 3744, 3748, 3750, 3756, 3761,
3781, 3912, 4239, 4347, 4681, 4695, 4697, 4745, 4785, 4799, 4801, 4823,
4881, 5760, 5901, 5997, 6313, 7405, 8024, 8026, 8028, 8030, 8117, 8125,
8133, 8181, 8468, 8485, 8487, 8489, 8494, 8527, 11311, 11359, 11687, 11695,
11703, 11711, 11719, 11727, 11735, 12448, 12539, 43010, 43014, 43019, 43587,
43696, 43713, 64286, 64297, 64311, 64317, 64319, 64322, 64325, 65141];
var i, j, start, end;
for (i = 0; i < singles.length; i++) {
result[singles[i]] = true;
}
var ranges = [[0, 47], [58, 64], [91, 94], [123, 169], [171, 177], [182, 184], [706, 709],
[722, 735], [741, 747], [751, 879], [888, 889], [894, 901], [1154, 1161],
[1318, 1328], [1367, 1368], [1370, 1376], [1416, 1487], [1515, 1519], [1523, 1568],
[1611, 1631], [1642, 1645], [1750, 1764], [1767, 1773], [1789, 1790], [1792, 1807],
[1840, 1868], [1958, 1968], [1970, 1983], [2027, 2035], [2038, 2041], [2043, 2047],
[2070, 2073], [2075, 2083], [2085, 2087], [2089, 2307], [2362, 2364], [2366, 2383],
[2385, 2391], [2402, 2405], [2419, 2424], [2432, 2436], [2445, 2446], [2449, 2450],
[2483, 2485], [2490, 2492], [2494, 2509], [2511, 2523], [2530, 2533], [2546, 2547],
[2554, 2564], [2571, 2574], [2577, 2578], [2618, 2648], [2655, 2661], [2672, 2673],
[2677, 2692], [2746, 2748], [2750, 2767], [2769, 2783], [2786, 2789], [2800, 2820],
[2829, 2830], [2833, 2834], [2874, 2876], [2878, 2907], [2914, 2917], [2930, 2946],
[2955, 2957], [2966, 2968], [2976, 2978], [2981, 2983], [2987, 2989], [3002, 3023],
[3025, 3045], [3059, 3076], [3130, 3132], [3134, 3159], [3162, 3167], [3170, 3173],
[3184, 3191], [3199, 3204], [3258, 3260], [3262, 3293], [3298, 3301], [3312, 3332],
[3386, 3388], [3390, 3423], [3426, 3429], [3446, 3449], [3456, 3460], [3479, 3481],
[3518, 3519], [3527, 3584], [3636, 3647], [3655, 3663], [3674, 3712], [3717, 3718],
[3723, 3724], [3726, 3731], [3752, 3753], [3764, 3772], [3774, 3775], [3783, 3791],
[3802, 3803], [3806, 3839], [3841, 3871], [3892, 3903], [3949, 3975], [3980, 4095],
[4139, 4158], [4170, 4175], [4182, 4185], [4190, 4192], [4194, 4196], [4199, 4205],
[4209, 4212], [4226, 4237], [4250, 4255], [4294, 4303], [4349, 4351], [4686, 4687],
[4702, 4703], [4750, 4751], [4790, 4791], [4806, 4807], [4886, 4887], [4955, 4968],
[4989, 4991], [5008, 5023], [5109, 5120], [5741, 5742], [5787, 5791], [5867, 5869],
[5873, 5887], [5906, 5919], [5938, 5951], [5970, 5983], [6001, 6015], [6068, 6102],
[6104, 6107], [6109, 6111], [6122, 6127], [6138, 6159], [6170, 6175], [6264, 6271],
[6315, 6319], [6390, 6399], [6429, 6469], [6510, 6511], [6517, 6527], [6572, 6592],
[6600, 6607], [6619, 6655], [6679, 6687], [6741, 6783], [6794, 6799], [6810, 6822],
[6824, 6916], [6964, 6980], [6988, 6991], [7002, 7042], [7073, 7085], [7098, 7167],
[7204, 7231], [7242, 7244], [7294, 7400], [7410, 7423], [7616, 7679], [7958, 7959],
[7966, 7967], [8006, 8007], [8014, 8015], [8062, 8063], [8127, 8129], [8141, 8143],
[8148, 8149], [8156, 8159], [8173, 8177], [8189, 8303], [8306, 8307], [8314, 8318],
[8330, 8335], [8341, 8449], [8451, 8454], [8456, 8457], [8470, 8472], [8478, 8483],
[8506, 8507], [8512, 8516], [8522, 8525], [8586, 9311], [9372, 9449], [9472, 10101],
[10132, 11263], [11493, 11498], [11503, 11516], [11518, 11519], [11558, 11567],
[11622, 11630], [11632, 11647], [11671, 11679], [11743, 11822], [11824, 12292],
[12296, 12320], [12330, 12336], [12342, 12343], [12349, 12352], [12439, 12444],
[12544, 12548], [12590, 12592], [12687, 12689], [12694, 12703], [12728, 12783],
[12800, 12831], [12842, 12880], [12896, 12927], [12938, 12976], [12992, 13311],
[19894, 19967], [40908, 40959], [42125, 42191], [42238, 42239], [42509, 42511],
[42540, 42559], [42592, 42593], [42607, 42622], [42648, 42655], [42736, 42774],
[42784, 42785], [42889, 42890], [42893, 43002], [43043, 43055], [43062, 43071],
[43124, 43137], [43188, 43215], [43226, 43249], [43256, 43258], [43260, 43263],
[43302, 43311], [43335, 43359], [43389, 43395], [43443, 43470], [43482, 43519],
[43561, 43583], [43596, 43599], [43610, 43615], [43639, 43641], [43643, 43647],
[43698, 43700], [43703, 43704], [43710, 43711], [43715, 43738], [43742, 43967],
[44003, 44015], [44026, 44031], [55204, 55215], [55239, 55242], [55292, 55295],
[57344, 63743], [64046, 64047], [64110, 64111], [64218, 64255], [64263, 64274],
[64280, 64284], [64434, 64466], [64830, 64847], [64912, 64913], [64968, 65007],
[65020, 65135], [65277, 65295], [65306, 65312], [65339, 65344], [65371, 65381],
[65471, 65473], [65480, 65481], [65488, 65489], [65496, 65497]];
for (i = 0; i < ranges.length; i++) {
start = ranges[i][0];
end = ranges[i][1];
for (j = start; j <= end; j++) {
result[j] = true;
}
}
return result;
})();
function splitQuery(query) {
var result = [];
var start = -1;
for (var i = 0; i < query.length; i++) {
if (splitChars[query.charCodeAt(i)]) {
if (start !== -1) {
result.push(query.slice(start, i));
start = -1;
}
} else if (start === -1) {
start = i;
}
}
if (start !== -1) {
result.push(query.slice(start));
}
return result;
}

View File

@ -4,22 +4,24 @@
*
* Sphinx JavaScript utilities for the full-text search.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
"use strict";
if (!Scorer) {
/**
* Simple result scoring code.
*/
/**
* Simple result scoring code.
*/
if (typeof Scorer === "undefined") {
var Scorer = {
// Implement the following function to further tweak the score for each result
// The function takes a result array [filename, title, anchor, descr, score]
// The function takes a result array [docname, title, anchor, descr, score, filename]
// and returns the new score.
/*
score: function(result) {
return result[4];
score: result => {
const [docname, title, anchor, descr, score, filename] = result
return score
},
*/
@ -28,9 +30,11 @@ if (!Scorer) {
// or matches in the last dotted part of the object name
objPartialMatch: 6,
// Additive scores depending on the priority of the object
objPrio: {0: 15, // used to be importantResults
1: 5, // used to be objectResults
2: -5}, // used to be unimportantResults
objPrio: {
0: 15, // used to be importantResults
1: 5, // used to be objectResults
2: -5, // used to be unimportantResults
},
// Used when the priority is not in the mapping.
objPrioDefault: 0,
@ -39,455 +43,495 @@ if (!Scorer) {
partialTitle: 7,
// query found in terms
term: 5,
partialTerm: 2
partialTerm: 2,
};
}
if (!splitQuery) {
function splitQuery(query) {
return query.split(/\s+/);
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
/**
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
*/
const _escapeRegExp = (string) =>
string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
const _displayItem = (item, searchTerms) => {
const docBuilder = DOCUMENTATION_OPTIONS.BUILDER;
const docUrlRoot = DOCUMENTATION_OPTIONS.URL_ROOT;
const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX;
const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX;
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const [docName, title, anchor, descr, score, _filename] = item;
let listItem = document.createElement("li");
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
// dirhtml builder
let dirname = docName + "/";
if (dirname.match(/\/index\/$/))
dirname = dirname.substring(0, dirname.length - 6);
else if (dirname === "index/") dirname = "";
requestUrl = docUrlRoot + dirname;
linkUrl = requestUrl;
} else {
// normal html builders
requestUrl = docUrlRoot + docName + docFileSuffix;
linkUrl = docName + docLinkSuffix;
}
let linkEl = listItem.appendChild(document.createElement("a"));
linkEl.href = linkUrl + anchor;
linkEl.dataset.score = score;
linkEl.innerHTML = title;
if (descr)
listItem.appendChild(document.createElement("span")).innerHTML =
" (" + descr + ")";
else if (showSearchSummary)
fetch(requestUrl)
.then((responseData) => responseData.text())
.then((data) => {
if (data)
listItem.appendChild(
Search.makeSearchSummary(data, searchTerms)
);
});
Search.output.appendChild(listItem);
};
const _finishSearch = (resultCount) => {
Search.stopPulse();
Search.title.innerText = _("Search Results");
if (!resultCount)
Search.status.innerText = Documentation.gettext(
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
Search.status.innerText = _(
`Search finished, found ${resultCount} page(s) matching the search query.`
);
};
const _displayNextItem = (
results,
resultCount,
searchTerms
) => {
// results left, load the summary and display it
// this is intended to be dynamic (don't sub resultsCount)
if (results.length) {
_displayItem(results.pop(), searchTerms);
setTimeout(
() => _displayNextItem(results, resultCount, searchTerms),
5
);
}
// search finished, update title and status message
else _finishSearch(resultCount);
};
/**
* Default splitQuery function. Can be overridden in ``sphinx.search`` with a
* custom function per language.
*
* The regular expression works by splitting the string on consecutive characters
* that are not Unicode letters, numbers, underscores, or emoji characters.
* This is the same as ``\W+`` in Python, preserving the surrogate pair area.
*/
if (typeof splitQuery === "undefined") {
var splitQuery = (query) => query
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu)
.filter(term => term) // remove remaining empty strings
}
/**
* Search Module
*/
var Search = {
const Search = {
_index: null,
_queued_query: null,
_pulse_status: -1,
_index : null,
_queued_query : null,
_pulse_status : -1,
htmlToText : function(htmlString) {
var virtualDocument = document.implementation.createHTMLDocument('virtual');
var htmlElement = $(htmlString, virtualDocument);
htmlElement.find('.headerlink').remove();
docContent = htmlElement.find('[role=main]')[0];
if(docContent === undefined) {
console.warn("Content block not found. Sphinx search tries to obtain it " +
"via '[role=main]'. Could you check your theme or template.");
return "";
}
return docContent.textContent || docContent.innerText;
htmlToText: (htmlString) => {
const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html');
htmlElement.querySelectorAll(".headerlink").forEach((el) => { el.remove() });
const docContent = htmlElement.querySelector('[role="main"]');
if (docContent !== undefined) return docContent.textContent;
console.warn(
"Content block not found. Sphinx search tries to obtain it via '[role=main]'. Could you check your theme or template."
);
return "";
},
init : function() {
var params = $.getQueryParameters();
if (params.q) {
var query = params.q[0];
$('input[name="q"]')[0].value = query;
this.performSearch(query);
}
init: () => {
const query = new URLSearchParams(window.location.search).get("q");
document
.querySelectorAll('input[name="q"]')
.forEach((el) => (el.value = query));
if (query) Search.performSearch(query);
},
loadIndex : function(url) {
$.ajax({type: "GET", url: url, data: null,
dataType: "script", cache: true,
complete: function(jqxhr, textstatus) {
if (textstatus != "success") {
document.getElementById("searchindexloader").src = url;
}
}});
},
loadIndex: (url) =>
(document.body.appendChild(document.createElement("script")).src = url),
setIndex : function(index) {
var q;
this._index = index;
if ((q = this._queued_query) !== null) {
this._queued_query = null;
Search.query(q);
setIndex: (index) => {
Search._index = index;
if (Search._queued_query !== null) {
const query = Search._queued_query;
Search._queued_query = null;
Search.query(query);
}
},
hasIndex : function() {
return this._index !== null;
},
hasIndex: () => Search._index !== null,
deferQuery : function(query) {
this._queued_query = query;
},
deferQuery: (query) => (Search._queued_query = query),
stopPulse : function() {
this._pulse_status = 0;
},
stopPulse: () => (Search._pulse_status = -1),
startPulse : function() {
if (this._pulse_status >= 0)
return;
function pulse() {
var i;
startPulse: () => {
if (Search._pulse_status >= 0) return;
const pulse = () => {
Search._pulse_status = (Search._pulse_status + 1) % 4;
var dotString = '';
for (i = 0; i < Search._pulse_status; i++)
dotString += '.';
Search.dots.text(dotString);
if (Search._pulse_status > -1)
window.setTimeout(pulse, 500);
}
Search.dots.innerText = ".".repeat(Search._pulse_status);
if (Search._pulse_status >= 0) window.setTimeout(pulse, 500);
};
pulse();
},
/**
* perform a search for something (or wait until index is loaded)
*/
performSearch : function(query) {
performSearch: (query) => {
// create the required interface elements
this.out = $('#search-results');
this.title = $('<h2>' + _('Searching') + '</h2>').appendTo(this.out);
this.dots = $('<span></span>').appendTo(this.title);
this.status = $('<p class="search-summary">&nbsp;</p>').appendTo(this.out);
this.output = $('<ul class="search"/>').appendTo(this.out);
const searchText = document.createElement("h2");
searchText.textContent = _("Searching");
const searchSummary = document.createElement("p");
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
searchList.classList.add("search");
$('#search-progress').text(_('Preparing search...'));
this.startPulse();
const out = document.getElementById("search-results");
Search.title = out.appendChild(searchText);
Search.dots = Search.title.appendChild(document.createElement("span"));
Search.status = out.appendChild(searchSummary);
Search.output = out.appendChild(searchList);
const searchProgress = document.getElementById("search-progress");
// Some themes don't use the search progress node
if (searchProgress) {
searchProgress.innerText = _("Preparing search...");
}
Search.startPulse();
// index already loaded, the browser was quick!
if (this.hasIndex())
this.query(query);
else
this.deferQuery(query);
if (Search.hasIndex()) Search.query(query);
else Search.deferQuery(query);
},
/**
* execute search (requires search index to be loaded)
*/
query : function(query) {
var i;
query: (query) => {
const filenames = Search._index.filenames;
const docNames = Search._index.docnames;
const titles = Search._index.titles;
const allTitles = Search._index.alltitles;
const indexEntries = Search._index.indexentries;
// stem the searchterms and add them to the correct list
var stemmer = new Stemmer();
var searchterms = [];
var excluded = [];
var hlterms = [];
var tmp = splitQuery(query);
var objectterms = [];
for (i = 0; i < tmp.length; i++) {
if (tmp[i] !== "") {
objectterms.push(tmp[i].toLowerCase());
}
// stem the search terms and add them to the correct list
const stemmer = new Stemmer();
const searchTerms = new Set();
const excludedTerms = new Set();
const highlightTerms = new Set();
const objectTerms = new Set(splitQuery(query.toLowerCase().trim()));
splitQuery(query.trim()).forEach((queryTerm) => {
const queryTermLower = queryTerm.toLowerCase();
// maybe skip this "word"
// stopwords array is from language_data.js
if (
stopwords.indexOf(queryTermLower) !== -1 ||
queryTerm.match(/^\d+$/)
)
return;
if ($u.indexOf(stopwords, tmp[i].toLowerCase()) != -1 || tmp[i] === "") {
// skip this "word"
continue;
}
// stem the word
var word = stemmer.stemWord(tmp[i].toLowerCase());
// prevent stemmer from cutting word smaller than two chars
if(word.length < 3 && tmp[i].length >= 3) {
word = tmp[i];
}
var toAppend;
let word = stemmer.stemWord(queryTermLower);
// select the correct list
if (word[0] == '-') {
toAppend = excluded;
word = word.substr(1);
}
if (word[0] === "-") excludedTerms.add(word.substr(1));
else {
toAppend = searchterms;
hlterms.push(tmp[i].toLowerCase());
searchTerms.add(word);
highlightTerms.add(queryTermLower);
}
// only add if not already in the list
if (!$u.contains(toAppend, word))
toAppend.push(word);
});
if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js
localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" "))
}
var highlightstring = '?highlight=' + $.urlencode(hlterms.join(" "));
// console.debug('SEARCH: searching for:');
// console.info('required: ', searchterms);
// console.info('excluded: ', excluded);
// console.debug("SEARCH: searching for:");
// console.info("required: ", [...searchTerms]);
// console.info("excluded: ", [...excludedTerms]);
// prepare search
var terms = this._index.terms;
var titleterms = this._index.titleterms;
// array of [docname, title, anchor, descr, score, filename]
let results = [];
_removeChildren(document.getElementById("search-progress"));
// array of [filename, title, anchor, descr, score]
var results = [];
$('#search-progress').empty();
const queryLower = query.toLowerCase();
for (const [title, foundTitles] of Object.entries(allTitles)) {
if (title.toLowerCase().includes(queryLower) && (queryLower.length >= title.length/2)) {
for (const [file, id] of foundTitles) {
let score = Math.round(100 * queryLower.length / title.length)
results.push([
docNames[file],
titles[file] !== title ? `${titles[file]} > ${title}` : title,
id !== null ? "#" + id : "",
null,
score,
filenames[file],
]);
}
}
}
// search for explicit entries in index directives
for (const [entry, foundEntries] of Object.entries(indexEntries)) {
if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) {
for (const [file, id] of foundEntries) {
let score = Math.round(100 * queryLower.length / entry.length)
results.push([
docNames[file],
titles[file],
id ? "#" + id : "",
null,
score,
filenames[file],
]);
}
}
}
// lookup as object
for (i = 0; i < objectterms.length; i++) {
var others = [].concat(objectterms.slice(0, i),
objectterms.slice(i+1, objectterms.length));
results = results.concat(this.performObjectSearch(objectterms[i], others));
}
objectTerms.forEach((term) =>
results.push(...Search.performObjectSearch(term, objectTerms))
);
// lookup as search terms in fulltext
results = results.concat(this.performTermsSearch(searchterms, excluded, terms, titleterms));
results.push(...Search.performTermsSearch(searchTerms, excludedTerms));
// let the scorer override scores with a custom scoring function
if (Scorer.score) {
for (i = 0; i < results.length; i++)
results[i][4] = Scorer.score(results[i]);
}
if (Scorer.score) results.forEach((item) => (item[4] = Scorer.score(item)));
// now sort the results by score (in opposite order of appearance, since the
// display function below uses pop() to retrieve items) and then
// alphabetically
results.sort(function(a, b) {
var left = a[4];
var right = b[4];
if (left > right) {
return 1;
} else if (left < right) {
return -1;
} else {
results.sort((a, b) => {
const leftScore = a[4];
const rightScore = b[4];
if (leftScore === rightScore) {
// same score: sort alphabetically
left = a[1].toLowerCase();
right = b[1].toLowerCase();
return (left > right) ? -1 : ((left < right) ? 1 : 0);
const leftTitle = a[1].toLowerCase();
const rightTitle = b[1].toLowerCase();
if (leftTitle === rightTitle) return 0;
return leftTitle > rightTitle ? -1 : 1; // inverted is intentional
}
return leftScore > rightScore ? 1 : -1;
});
// remove duplicate search results
// note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept
let seen = new Set();
results = results.reverse().reduce((acc, result) => {
let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(',');
if (!seen.has(resultStr)) {
acc.push(result);
seen.add(resultStr);
}
return acc;
}, []);
results = results.reverse();
// for debugging
//Search.lastresults = results.slice(); // a copy
//console.info('search results:', Search.lastresults);
// console.info("search results:", Search.lastresults);
// print the results
var resultCount = results.length;
function displayNextItem() {
// results left, load the summary and display it
if (results.length) {
var item = results.pop();
var listItem = $('<li></li>');
var requestUrl = "";
var linkUrl = "";
if (DOCUMENTATION_OPTIONS.BUILDER === 'dirhtml') {
// dirhtml builder
var dirname = item[0] + '/';
if (dirname.match(/\/index\/$/)) {
dirname = dirname.substring(0, dirname.length-6);
} else if (dirname == 'index/') {
dirname = '';
}
requestUrl = DOCUMENTATION_OPTIONS.URL_ROOT + dirname;
linkUrl = requestUrl;
} else {
// normal html builders
requestUrl = DOCUMENTATION_OPTIONS.URL_ROOT + item[0] + DOCUMENTATION_OPTIONS.FILE_SUFFIX;
linkUrl = item[0] + DOCUMENTATION_OPTIONS.LINK_SUFFIX;
}
listItem.append($('<a/>').attr('href',
linkUrl +
highlightstring + item[2]).html(item[1]));
if (item[3]) {
listItem.append($('<span> (' + item[3] + ')</span>'));
Search.output.append(listItem);
setTimeout(function() {
displayNextItem();
}, 5);
} else if (DOCUMENTATION_OPTIONS.HAS_SOURCE) {
$.ajax({url: requestUrl,
dataType: "text",
complete: function(jqxhr, textstatus) {
var data = jqxhr.responseText;
if (data !== '' && data !== undefined) {
var summary = Search.makeSearchSummary(data, searchterms, hlterms);
if (summary) {
listItem.append(summary);
}
}
Search.output.append(listItem);
setTimeout(function() {
displayNextItem();
}, 5);
}});
} else {
// no source available, just display title
Search.output.append(listItem);
setTimeout(function() {
displayNextItem();
}, 5);
}
}
// search finished, update title and status message
else {
Search.stopPulse();
Search.title.text(_('Search Results'));
if (!resultCount)
Search.status.text(_('Your search did not match any documents. Please make sure that all words are spelled correctly and that you\'ve selected enough categories.'));
else
Search.status.text(_('Search finished, found %s page(s) matching the search query.').replace('%s', resultCount));
Search.status.fadeIn(500);
}
}
displayNextItem();
_displayNextItem(results, results.length, searchTerms);
},
/**
* search for object names
*/
performObjectSearch : function(object, otherterms) {
var filenames = this._index.filenames;
var docnames = this._index.docnames;
var objects = this._index.objects;
var objnames = this._index.objnames;
var titles = this._index.titles;
performObjectSearch: (object, objectTerms) => {
const filenames = Search._index.filenames;
const docNames = Search._index.docnames;
const objects = Search._index.objects;
const objNames = Search._index.objnames;
const titles = Search._index.titles;
var i;
var results = [];
const results = [];
for (var prefix in objects) {
for (var name in objects[prefix]) {
var fullname = (prefix ? prefix + '.' : '') + name;
var fullnameLower = fullname.toLowerCase()
if (fullnameLower.indexOf(object) > -1) {
var score = 0;
var parts = fullnameLower.split('.');
// check for different match types: exact matches of full name or
// "last name" (i.e. last dotted part)
if (fullnameLower == object || parts[parts.length - 1] == object) {
score += Scorer.objNameMatch;
// matches in last name
} else if (parts[parts.length - 1].indexOf(object) > -1) {
score += Scorer.objPartialMatch;
}
var match = objects[prefix][name];
var objname = objnames[match[1]][2];
var title = titles[match[0]];
// If more than one term searched for, we require other words to be
// found in the name/title/description
if (otherterms.length > 0) {
var haystack = (prefix + ' ' + name + ' ' +
objname + ' ' + title).toLowerCase();
var allfound = true;
for (i = 0; i < otherterms.length; i++) {
if (haystack.indexOf(otherterms[i]) == -1) {
allfound = false;
break;
}
}
if (!allfound) {
continue;
}
}
var descr = objname + _(', in ') + title;
const objectSearchCallback = (prefix, match) => {
const name = match[4]
const fullname = (prefix ? prefix + "." : "") + name;
const fullnameLower = fullname.toLowerCase();
if (fullnameLower.indexOf(object) < 0) return;
var anchor = match[3];
if (anchor === '')
anchor = fullname;
else if (anchor == '-')
anchor = objnames[match[1]][1] + '-' + fullname;
// add custom score for some objects according to scorer
if (Scorer.objPrio.hasOwnProperty(match[2])) {
score += Scorer.objPrio[match[2]];
} else {
score += Scorer.objPrioDefault;
}
results.push([docnames[match[0]], fullname, '#'+anchor, descr, score, filenames[match[0]]]);
}
let score = 0;
const parts = fullnameLower.split(".");
// check for different match types: exact matches of full name or
// "last name" (i.e. last dotted part)
if (fullnameLower === object || parts.slice(-1)[0] === object)
score += Scorer.objNameMatch;
else if (parts.slice(-1)[0].indexOf(object) > -1)
score += Scorer.objPartialMatch; // matches in last name
const objName = objNames[match[1]][2];
const title = titles[match[0]];
// If more than one term searched for, we require other words to be
// found in the name/title/description
const otherTerms = new Set(objectTerms);
otherTerms.delete(object);
if (otherTerms.size > 0) {
const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase();
if (
[...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0)
)
return;
}
}
let anchor = match[3];
if (anchor === "") anchor = fullname;
else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname;
const descr = objName + _(", in ") + title;
// add custom score for some objects according to scorer
if (Scorer.objPrio.hasOwnProperty(match[2]))
score += Scorer.objPrio[match[2]];
else score += Scorer.objPrioDefault;
results.push([
docNames[match[0]],
fullname,
"#" + anchor,
descr,
score,
filenames[match[0]],
]);
};
Object.keys(objects).forEach((prefix) =>
objects[prefix].forEach((array) =>
objectSearchCallback(prefix, array)
)
);
return results;
},
/**
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
*/
escapeRegExp : function(string) {
return string.replace(/[.*+\-?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
},
/**
* search for full-text terms in the index
*/
performTermsSearch : function(searchterms, excluded, terms, titleterms) {
var docnames = this._index.docnames;
var filenames = this._index.filenames;
var titles = this._index.titles;
performTermsSearch: (searchTerms, excludedTerms) => {
// prepare search
const terms = Search._index.terms;
const titleTerms = Search._index.titleterms;
const filenames = Search._index.filenames;
const docNames = Search._index.docnames;
const titles = Search._index.titles;
var i, j, file;
var fileMap = {};
var scoreMap = {};
var results = [];
const scoreMap = new Map();
const fileMap = new Map();
// perform the search on the required terms
for (i = 0; i < searchterms.length; i++) {
var word = searchterms[i];
var files = [];
var _o = [
{files: terms[word], score: Scorer.term},
{files: titleterms[word], score: Scorer.title}
searchTerms.forEach((word) => {
const files = [];
const arr = [
{ files: terms[word], score: Scorer.term },
{ files: titleTerms[word], score: Scorer.title },
];
// add support for partial matches
if (word.length > 2) {
var word_regex = this.escapeRegExp(word);
for (var w in terms) {
if (w.match(word_regex) && !terms[word]) {
_o.push({files: terms[w], score: Scorer.partialTerm})
}
}
for (var w in titleterms) {
if (w.match(word_regex) && !titleterms[word]) {
_o.push({files: titleterms[w], score: Scorer.partialTitle})
}
}
const escapedWord = _escapeRegExp(word);
Object.keys(terms).forEach((term) => {
if (term.match(escapedWord) && !terms[word])
arr.push({ files: terms[term], score: Scorer.partialTerm });
});
Object.keys(titleTerms).forEach((term) => {
if (term.match(escapedWord) && !titleTerms[word])
arr.push({ files: titleTerms[word], score: Scorer.partialTitle });
});
}
// no match but word was a required one
if ($u.every(_o, function(o){return o.files === undefined;})) {
break;
}
if (arr.every((record) => record.files === undefined)) return;
// found search word in contents
$u.each(_o, function(o) {
var _files = o.files;
if (_files === undefined)
return
arr.forEach((record) => {
if (record.files === undefined) return;
if (_files.length === undefined)
_files = [_files];
files = files.concat(_files);
let recordFiles = record.files;
if (recordFiles.length === undefined) recordFiles = [recordFiles];
files.push(...recordFiles);
// set score for the word in each file to Scorer.term
for (j = 0; j < _files.length; j++) {
file = _files[j];
if (!(file in scoreMap))
scoreMap[file] = {};
scoreMap[file][word] = o.score;
}
// set score for the word in each file
recordFiles.forEach((file) => {
if (!scoreMap.has(file)) scoreMap.set(file, {});
scoreMap.get(file)[word] = record.score;
});
});
// create the mapping
for (j = 0; j < files.length; j++) {
file = files[j];
if (file in fileMap && fileMap[file].indexOf(word) === -1)
fileMap[file].push(word);
else
fileMap[file] = [word];
}
}
files.forEach((file) => {
if (fileMap.has(file) && fileMap.get(file).indexOf(word) === -1)
fileMap.get(file).push(word);
else fileMap.set(file, [word]);
});
});
// now check if the files don't contain excluded terms
for (file in fileMap) {
var valid = true;
const results = [];
for (const [file, wordList] of fileMap) {
// check if all requirements are matched
var filteredTermCount = // as search terms with length < 3 are discarded: ignore
searchterms.filter(function(term){return term.length > 2}).length
// as search terms with length < 3 are discarded
const filteredTermCount = [...searchTerms].filter(
(term) => term.length > 2
).length;
if (
fileMap[file].length != searchterms.length &&
fileMap[file].length != filteredTermCount
) continue;
wordList.length !== searchTerms.size &&
wordList.length !== filteredTermCount
)
continue;
// ensure that none of the excluded terms is in the search result
for (i = 0; i < excluded.length; i++) {
if (terms[excluded[i]] == file ||
titleterms[excluded[i]] == file ||
$u.contains(terms[excluded[i]] || [], file) ||
$u.contains(titleterms[excluded[i]] || [], file)) {
valid = false;
break;
}
}
if (
[...excludedTerms].some(
(term) =>
terms[term] === file ||
titleTerms[term] === file ||
(terms[term] || []).includes(file) ||
(titleTerms[term] || []).includes(file)
)
)
break;
// if we have still a valid result we can add it to the result list
if (valid) {
// select one (max) score for the file.
// for better ranking, we should calculate ranking by using words statistics like basic tf-idf...
var score = $u.max($u.map(fileMap[file], function(w){return scoreMap[file][w]}));
results.push([docnames[file], titles[file], '', null, score, filenames[file]]);
}
// select one (max) score for the file.
const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w]));
// add result to the result list
results.push([
docNames[file],
titles[file],
"",
null,
score,
filenames[file],
]);
}
return results;
},
@ -495,34 +539,28 @@ var Search = {
/**
* helper function to return a node containing the
* search summary for a given text. keywords is a list
* of stemmed words, hlwords is the list of normal, unstemmed
* words. the first one is used to find the occurrence, the
* latter for highlighting it.
* of stemmed words.
*/
makeSearchSummary : function(htmlText, keywords, hlwords) {
var text = Search.htmlToText(htmlText);
if (text == "") {
return null;
}
var textLower = text.toLowerCase();
var start = 0;
$.each(keywords, function() {
var i = textLower.indexOf(this.toLowerCase());
if (i > -1)
start = i;
});
start = Math.max(start - 120, 0);
var excerpt = ((start > 0) ? '...' : '') +
$.trim(text.substr(start, 240)) +
((start + 240 - text.length) ? '...' : '');
var rv = $('<p class="context"></p>').text(excerpt);
$.each(hlwords, function() {
rv = rv.highlightText(this, 'highlighted');
});
return rv;
}
makeSearchSummary: (htmlText, keywords) => {
const text = Search.htmlToText(htmlText);
if (text === "") return null;
const textLower = text.toLowerCase();
const actualStartPosition = [...keywords]
.map((k) => textLower.indexOf(k.toLowerCase()))
.filter((i) => i > -1)
.slice(-1)[0];
const startWithContext = Math.max(actualStartPosition - 120, 0);
const top = startWithContext === 0 ? "" : "...";
const tail = startWithContext + 240 < text.length ? "..." : "";
let summary = document.createElement("p");
summary.classList.add("context");
summary.textContent = top + text.substr(startWithContext, 240).trim() + tail;
return summary;
},
};
$(document).ready(function() {
Search.init();
});
_ready(Search.init);

View File

@ -2,18 +2,20 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Index &#8212; QuaPy 0.1.6 documentation</title>
<title>Index &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="#" />
<link rel="search" title="Search" href="search.html" />
@ -31,7 +33,7 @@
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Index</a></li>
</ul>
</div>
@ -54,6 +56,7 @@
| <a href="#G"><strong>G</strong></a>
| <a href="#H"><strong>H</strong></a>
| <a href="#I"><strong>I</strong></a>
| <a href="#J"><strong>J</strong></a>
| <a href="#K"><strong>K</strong></a>
| <a href="#L"><strong>L</strong></a>
| <a href="#M"><strong>M</strong></a>
@ -68,12 +71,17 @@
| <a href="#V"><strong>V</strong></a>
| <a href="#W"><strong>W</strong></a>
| <a href="#X"><strong>X</strong></a>
| <a href="#Y"><strong>Y</strong></a>
</div>
<h2 id="A">A</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.error.absolute_error">absolute_error() (in module quapy.error)</a>
</li>
<li><a href="quapy.html#quapy.protocol.AbstractProtocol">AbstractProtocol (class in quapy.protocol)</a>
</li>
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol">AbstractStochasticSeededProtocol (class in quapy.protocol)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ACC">ACC (class in quapy.method.aggregative)</a>
</li>
@ -96,46 +104,36 @@
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.CC.aggregate">(quapy.method.aggregative.CC method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ELM.aggregate">(quapy.method.aggregative.ELM method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.DistributionMatching.aggregate">(quapy.method.aggregative.DistributionMatching method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.DyS.aggregate">(quapy.method.aggregative.DyS method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.aggregate">(quapy.method.aggregative.EMQ method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.HDy.aggregate">(quapy.method.aggregative.HDy method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.aggregate">(quapy.method.aggregative.OneVsAll method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAllAggregative.aggregate">(quapy.method.aggregative.OneVsAllAggregative method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.aggregate">(quapy.method.aggregative.PACC method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.PCC.aggregate">(quapy.method.aggregative.PCC method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SMM.aggregate">(quapy.method.aggregative.SMM method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ThresholdOptimization.aggregate">(quapy.method.aggregative.ThresholdOptimization method)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.aggregative">aggregative (quapy.method.aggregative.AggregativeQuantifier property)</a>
<ul>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.aggregative">(quapy.method.base.BaseQuantifier property)</a>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.aggregative">aggregative (quapy.method.meta.Ensemble property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.aggregative">(quapy.method.meta.Ensemble property)</a>
</li>
</ul></li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier">AggregativeProbabilisticQuantifier (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier">AggregativeQuantifier (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.html#quapy.evaluation.artificial_prevalence_prediction">artificial_prevalence_prediction() (in module quapy.evaluation)</a>
<li><a href="quapy.html#quapy.protocol.APP">APP (class in quapy.protocol)</a>
</li>
<li><a href="quapy.html#quapy.evaluation.artificial_prevalence_protocol">artificial_prevalence_protocol() (in module quapy.evaluation)</a>
</li>
<li><a href="quapy.html#quapy.evaluation.artificial_prevalence_report">artificial_prevalence_report() (in module quapy.evaluation)</a>
</li>
<li><a href="quapy.html#quapy.functional.artificial_prevalence_sampling">artificial_prevalence_sampling() (in module quapy.functional)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.artificial_sampling_generator">artificial_sampling_generator() (quapy.data.base.LabelledCollection method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.artificial_sampling_index_generator">artificial_sampling_index_generator() (quapy.data.base.LabelledCollection method)</a>
<li><a href="quapy.html#quapy.protocol.ArtificialPrevalenceProtocol">ArtificialPrevalenceProtocol (in module quapy.protocol)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TorchDataset.asDataloader">asDataloader() (quapy.classification.neural.TorchDataset method)</a>
</li>
@ -146,6 +144,8 @@
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier">BaseQuantifier (class in quapy.method.base)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.calibration.BCTSCalibration">BCTSCalibration (class in quapy.classification.calibration)</a>
</li>
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.best_model">best_model() (quapy.model_selection.GridSearchQ method)</a>
</li>
@ -155,14 +155,6 @@
<ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.binary">(quapy.data.base.LabelledCollection property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.binary">(quapy.method.aggregative.OneVsAll property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.binary">(quapy.method.base.BaseQuantifier property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BinaryQuantifier.binary">(quapy.method.base.BinaryQuantifier property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.binary">(quapy.method.meta.Ensemble property)</a>
</li>
</ul></li>
</ul></td>
@ -185,32 +177,30 @@
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.CC">CC (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.Dataset.classes_">classes_ (quapy.data.base.Dataset property)</a>
<li><a href="quapy.html#quapy.functional.check_prevalence_vector">check_prevalence_vector() (in module quapy.functional)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.classes_">classes_ (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase property)</a>
<ul>
<li><a href="quapy.data.html#quapy.data.base.Dataset.classes_">(quapy.data.base.Dataset property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.classes_">(quapy.method.aggregative.AggregativeQuantifier property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.classes_">(quapy.method.aggregative.OneVsAll property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.classes_">(quapy.method.base.BaseQuantifier property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.classes_">(quapy.method.meta.Ensemble property)</a>
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric.classes_">(quapy.method.base.OneVsAllGeneric property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.classes_">(quapy.method.neural.QuaNetTrainer property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.classes_">(quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation property)</a>
</li>
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.classes_">(quapy.model_selection.GridSearchQ property)</a>
</li>
</ul></li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.classifier">classifier (quapy.method.aggregative.AggregativeQuantifier property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ACC.classify">classify() (quapy.method.aggregative.ACC method)</a>
<ul>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.classify">(quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.classify">(quapy.method.aggregative.AggregativeQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ELM.classify">(quapy.method.aggregative.ELM method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.classify">(quapy.method.aggregative.OneVsAll method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAllAggregative.classify">(quapy.method.aggregative.OneVsAllAggregative method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.classify">(quapy.method.aggregative.PACC method)</a>
</li>
@ -224,12 +214,20 @@
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.clean_checkpoint_dir">clean_checkpoint_dir() (quapy.method.neural.QuaNetTrainer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.CNNnet">CNNnet (class in quapy.classification.neural)</a>
</li>
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.collator">collator() (quapy.protocol.AbstractStochasticSeededProtocol method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.counts">counts() (quapy.data.base.LabelledCollection method)</a>
</li>
<li><a href="quapy.html#quapy.util.create_if_not_exist">create_if_not_exist() (in module quapy.util)</a>
</li>
<li><a href="quapy.html#quapy.util.create_parent_dir">create_parent_dir() (in module quapy.util)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.cross_generate_predictions">cross_generate_predictions() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.cross_generate_predictions_depr">cross_generate_predictions_depr() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.html#quapy.model_selection.cross_val_predict">cross_val_predict() (in module quapy.model_selection)</a>
</li>
</ul></td>
</tr></table>
@ -248,6 +246,8 @@
</li>
</ul></li>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.dimensions">dimensions() (quapy.classification.neural.TextClassifierNet method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.DistributionMatching">DistributionMatching (class in quapy.method.aggregative)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
@ -259,9 +259,13 @@
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.document_embedding">(quapy.classification.neural.TextClassifierNet method)</a>
</li>
</ul></li>
<li><a href="quapy.html#quapy.protocol.DomainMixer">DomainMixer (class in quapy.protocol)</a>
</li>
<li><a href="quapy.html#quapy.util.download_file">download_file() (in module quapy.util)</a>
</li>
<li><a href="quapy.html#quapy.util.download_file_if_not_exists">download_file_if_not_exists() (in module quapy.util)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.DyS">DyS (class in quapy.method.aggregative)</a>
</li>
</ul></td>
</tr></table>
@ -278,17 +282,15 @@
<li><a href="quapy.method.html#quapy.method.meta.EEMQ">EEMQ() (in module quapy.method.meta)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.EHDy">EHDy() (in module quapy.method.meta)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ELM">ELM (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.EM">EM() (quapy.method.aggregative.EMQ class method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ">EMQ (class in quapy.method.aggregative)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble">Ensemble (class in quapy.method.meta)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.meta.ensembleFactory">ensembleFactory() (in module quapy.method.meta)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.EPACC">EPACC() (in module quapy.method.meta)</a>
@ -299,9 +301,11 @@
</li>
<li><a href="quapy.html#quapy.evaluation.evaluate">evaluate() (in module quapy.evaluation)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ExpectationMaximizationQuantifier">ExpectationMaximizationQuantifier (in module quapy.method.aggregative)</a>
<li><a href="quapy.html#quapy.evaluation.evaluate_on_samples">evaluate_on_samples() (in module quapy.evaluation)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ExplicitLossMinimisation">ExplicitLossMinimisation (in module quapy.method.aggregative)</a>
<li><a href="quapy.html#quapy.evaluation.evaluation_report">evaluation_report() (in module quapy.evaluation)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ExpectationMaximizationQuantifier">ExpectationMaximizationQuantifier (in module quapy.method.aggregative)</a>
</li>
</ul></td>
</tr></table>
@ -312,6 +316,8 @@
<li><a href="quapy.html#quapy.error.f1_error">f1_error() (in module quapy.error)</a>
</li>
<li><a href="quapy.html#quapy.error.f1e">f1e() (in module quapy.error)</a>
</li>
<li><a href="quapy.data.html#quapy.data.datasets.fetch_lequa2022">fetch_lequa2022() (in module quapy.data.datasets)</a>
</li>
<li><a href="quapy.data.html#quapy.data.datasets.fetch_reviews">fetch_reviews() (in module quapy.data.datasets)</a>
</li>
@ -321,9 +327,11 @@
</li>
<li><a href="quapy.data.html#quapy.data.datasets.fetch_UCILabelledCollection">fetch_UCILabelledCollection() (in module quapy.data.datasets)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.fit">fit() (quapy.classification.methods.LowRankLogisticRegression method)</a>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit">fit() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
<ul>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.fit">(quapy.classification.methods.LowRankLogisticRegression method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.fit">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf.fit">(quapy.classification.svmperf.SVMperf method)</a>
@ -336,21 +344,25 @@
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.CC.fit">(quapy.method.aggregative.CC method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ELM.fit">(quapy.method.aggregative.ELM method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.DistributionMatching.fit">(quapy.method.aggregative.DistributionMatching method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.DyS.fit">(quapy.method.aggregative.DyS method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.fit">(quapy.method.aggregative.EMQ method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.HDy.fit">(quapy.method.aggregative.HDy method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.fit">(quapy.method.aggregative.OneVsAll method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.fit">(quapy.method.aggregative.PACC method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.PCC.fit">(quapy.method.aggregative.PCC method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SMM.fit">(quapy.method.aggregative.SMM method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ThresholdOptimization.fit">(quapy.method.aggregative.ThresholdOptimization method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.fit">(quapy.method.base.BaseQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric.fit">(quapy.method.base.OneVsAllGeneric method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.fit">(quapy.method.meta.Ensemble method)</a>
</li>
@ -363,6 +375,10 @@
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv">fit_cv() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val">fit_tr_val() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer.fit_transform">fit_transform() (quapy.data.preprocessing.IndexTransformer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.forward">forward() (quapy.classification.neural.TextClassifierNet method)</a>
@ -385,9 +401,9 @@
<h2 id="G">G</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.evaluation.gen_prevalence_prediction">gen_prevalence_prediction() (in module quapy.evaluation)</a>
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.get_collator">get_collator() (quapy.protocol.OnLabelledCollectionProtocol class method)</a>
</li>
<li><a href="quapy.html#quapy.evaluation.gen_prevalence_report">gen_prevalence_report() (in module quapy.evaluation)</a>
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.get_labelled_collection">get_labelled_collection() (quapy.protocol.OnLabelledCollectionProtocol method)</a>
</li>
<li><a href="quapy.html#quapy.functional.get_nprevpoints_approximation">get_nprevpoints_approximation() (in module quapy.functional)</a>
</li>
@ -401,18 +417,10 @@
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.get_params">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.get_params">(quapy.classification.neural.TextClassifierNet method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.get_params">(quapy.method.aggregative.AggregativeQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.get_params">(quapy.method.aggregative.OneVsAll method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.get_params">(quapy.method.base.BaseQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.get_params">(quapy.method.meta.Ensemble method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.get_params">(quapy.method.neural.QuaNetTrainer method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.get_params">(quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation method)</a>
</li>
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.get_params">(quapy.model_selection.GridSearchQ method)</a>
</li>
@ -423,6 +431,12 @@
</li>
<li><a href="quapy.html#quapy.util.get_quapy_home">get_quapy_home() (in module quapy.util)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ACC.getPteCondEstim">getPteCondEstim() (quapy.method.aggregative.ACC class method)</a>
<ul>
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.getPteCondEstim">(quapy.method.aggregative.PACC class method)</a>
</li>
</ul></li>
<li><a href="quapy.html#quapy.model_selection.GridSearchQ">GridSearchQ (class in quapy.model_selection)</a>
</li>
</ul></td>
@ -446,22 +460,20 @@
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.preprocessing.index">index() (in module quapy.data.preprocessing)</a>
</li>
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer">IndexTransformer (class in quapy.data.preprocessing)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.isaggregative">isaggregative() (in module quapy.method.base)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.isbinary">isbinary() (in module quapy)</a>
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer">IndexTransformer (class in quapy.data.preprocessing)</a>
</li>
<li><a href="quapy.html#quapy.protocol.IterateProtocol">IterateProtocol (class in quapy.protocol)</a>
</li>
</ul></td>
</tr></table>
<ul>
<li><a href="quapy.data.html#quapy.data.base.isbinary">(in module quapy.data.base)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.isbinary">(in module quapy.method.base)</a>
</li>
</ul></li>
<li><a href="quapy.method.html#quapy.method.base.isprobabilistic">isprobabilistic() (in module quapy.method.base)</a>
<h2 id="J">J</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.join">join() (quapy.data.base.LabelledCollection class method)</a>
</li>
</ul></td>
</tr></table>
@ -486,8 +498,6 @@
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection">LabelledCollection (class in quapy.data.base)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.learner">learner (quapy.method.aggregative.AggregativeQuantifier property)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.Dataset.load">load() (quapy.data.base.Dataset class method)</a>
@ -538,6 +548,8 @@
<li><a href="quapy.html#module-quapy">quapy</a>
</li>
<li><a href="quapy.classification.html#module-quapy.classification">quapy.classification</a>
</li>
<li><a href="quapy.classification.html#module-quapy.classification.calibration">quapy.classification.calibration</a>
</li>
<li><a href="quapy.classification.html#module-quapy.classification.methods">quapy.classification.methods</a>
</li>
@ -576,6 +588,8 @@
<li><a href="quapy.html#module-quapy.model_selection">quapy.model_selection</a>
</li>
<li><a href="quapy.html#module-quapy.plot">quapy.plot</a>
</li>
<li><a href="quapy.html#module-quapy.protocol">quapy.protocol</a>
</li>
<li><a href="quapy.html#module-quapy.util">quapy.util</a>
</li>
@ -600,27 +614,33 @@
<ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.n_classes">(quapy.data.base.LabelledCollection property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.n_classes">(quapy.method.base.BaseQuantifier property)</a>
</li>
</ul></li>
<li><a href="quapy.html#quapy.evaluation.natural_prevalence_prediction">natural_prevalence_prediction() (in module quapy.evaluation)</a>
<li><a href="quapy.html#quapy.protocol.NaturalPrevalenceProtocol">NaturalPrevalenceProtocol (in module quapy.protocol)</a>
</li>
<li><a href="quapy.html#quapy.evaluation.natural_prevalence_protocol">natural_prevalence_protocol() (in module quapy.evaluation)</a>
<li><a href="quapy.classification.html#quapy.classification.calibration.NBVSCalibration">NBVSCalibration (class in quapy.classification.calibration)</a>
</li>
<li><a href="quapy.html#quapy.evaluation.natural_prevalence_report">natural_prevalence_report() (in module quapy.evaluation)</a>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer">NeuralClassifierTrainer (class in quapy.classification.neural)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.newELM">newELM() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.newOneVsAll">newOneVsAll() (in module quapy.method.base)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.natural_sampling_generator">natural_sampling_generator() (quapy.data.base.LabelledCollection method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMAE">newSVMAE() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.natural_sampling_index_generator">natural_sampling_index_generator() (quapy.data.base.LabelledCollection method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMKLD">newSVMKLD() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer">NeuralClassifierTrainer (class in quapy.classification.neural)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMQ">newSVMQ() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMRAE">newSVMRAE() (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.html#quapy.error.nkld">nkld() (in module quapy.error)</a>
</li>
<li><a href="quapy.html#quapy.functional.normalize_prevalence">normalize_prevalence() (in module quapy.functional)</a>
</li>
<li><a href="quapy.html#quapy.protocol.NPP">NPP (class in quapy.protocol)</a>
</li>
<li><a href="quapy.html#quapy.functional.num_prevalence_combinations">num_prevalence_combinations() (in module quapy.functional)</a>
</li>
@ -630,7 +650,17 @@
<h2 id="O">O</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll">OneVsAll (class in quapy.method.aggregative)</a>
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.on_preclassified_instances">on_preclassified_instances() (quapy.protocol.OnLabelledCollectionProtocol method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.OneVsAll">OneVsAll (class in quapy.method.base)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAllAggregative">OneVsAllAggregative (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric">OneVsAllGeneric (class in quapy.method.base)</a>
</li>
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol">OnLabelledCollectionProtocol (class in quapy.protocol)</a>
</li>
</ul></td>
</tr></table>
@ -638,6 +668,8 @@
<h2 id="P">P</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.p">p (quapy.data.base.LabelledCollection property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.PACC">PACC (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.html#quapy.util.parallel">parallel() (in module quapy.util)</a>
@ -646,52 +678,44 @@
</li>
<li><a href="quapy.html#quapy.util.pickled_resource">pickled_resource() (in module quapy.util)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.posterior_probabilities">posterior_probabilities() (quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict">predict() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
<ul>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.posterior_probabilities">(quapy.method.aggregative.OneVsAll method)</a>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict">(quapy.classification.methods.LowRankLogisticRegression method)</a>
</li>
</ul></li>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict">predict() (quapy.classification.methods.LowRankLogisticRegression method)</a>
<ul>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.predict">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf.predict">(quapy.classification.svmperf.SVMperf method)</a>
</li>
</ul></li>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict_proba">predict_proba() (quapy.classification.methods.LowRankLogisticRegression method)</a>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba">predict_proba() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
<ul>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict_proba">(quapy.classification.methods.LowRankLogisticRegression method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.predict_proba">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.predict_proba">(quapy.classification.neural.TextClassifierNet method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.predict_proba">(quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.predict_proba">(quapy.method.aggregative.EMQ method)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.evaluation.prediction">prediction() (in module quapy.evaluation)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.prevalence">prevalence() (quapy.data.base.LabelledCollection method)</a>
</li>
<li><a href="quapy.html#quapy.functional.prevalence_from_labels">prevalence_from_labels() (in module quapy.functional)</a>
</li>
<li><a href="quapy.html#quapy.functional.prevalence_from_probabilities">prevalence_from_probabilities() (in module quapy.functional)</a>
</li>
<li><a href="quapy.html#quapy.protocol.APP.prevalence_grid">prevalence_grid() (quapy.protocol.APP method)</a>
</li>
<li><a href="quapy.html#quapy.functional.prevalence_linspace">prevalence_linspace() (in module quapy.functional)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.probabilistic">probabilistic (quapy.method.aggregative.AggregativeProbabilisticQuantifier property)</a>
<ul>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.probabilistic">(quapy.method.aggregative.OneVsAll property)</a>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.probabilistic">probabilistic (quapy.method.meta.Ensemble property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.probabilistic">(quapy.method.base.BaseQuantifier property)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.probabilistic">(quapy.method.meta.Ensemble property)</a>
</li>
</ul></li>
<li><a href="quapy.method.html#quapy.method.aggregative.ProbabilisticAdjustedClassifyAndCount">ProbabilisticAdjustedClassifyAndCount (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ProbabilisticClassifyAndCount">ProbabilisticClassifyAndCount (in module quapy.method.aggregative)</a>
@ -706,14 +730,12 @@
</li>
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer">QuaNetTrainer (class in quapy.method.neural)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.quantify">quantify() (quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.quantify">quantify() (quapy.method.aggregative.AggregativeQuantifier method)</a>
<ul>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.quantify">(quapy.method.aggregative.AggregativeQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.quantify">(quapy.method.aggregative.OneVsAll method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.quantify">(quapy.method.base.BaseQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric.quantify">(quapy.method.base.OneVsAllGeneric method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.quantify">(quapy.method.meta.Ensemble method)</a>
</li>
@ -736,6 +758,13 @@
<ul>
<li><a href="quapy.classification.html#module-quapy.classification">module</a>
</li>
</ul></li>
<li>
quapy.classification.calibration
<ul>
<li><a href="quapy.classification.html#module-quapy.classification.calibration">module</a>
</li>
</ul></li>
<li>
@ -871,6 +900,13 @@
<ul>
<li><a href="quapy.html#module-quapy.plot">module</a>
</li>
</ul></li>
<li>
quapy.protocol
<ul>
<li><a href="quapy.html#module-quapy.protocol">module</a>
</li>
</ul></li>
<li>
@ -888,15 +924,25 @@
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.error.rae">rae() (in module quapy.error)</a>
</li>
<li><a href="quapy.data.html#quapy.data.preprocessing.reduce_columns">reduce_columns() (in module quapy.data.preprocessing)</a>
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.random_state">random_state (quapy.protocol.AbstractStochasticSeededProtocol property)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifier">RecalibratedProbabilisticClassifier (class in quapy.classification.calibration)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase">RecalibratedProbabilisticClassifierBase (class in quapy.classification.calibration)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.Dataset.reduce">reduce() (quapy.data.base.Dataset method)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.preprocessing.reduce_columns">reduce_columns() (in module quapy.data.preprocessing)</a>
</li>
<li><a href="quapy.data.html#quapy.data.reader.reindex_labels">reindex_labels() (in module quapy.data.reader)</a>
</li>
<li><a href="quapy.html#quapy.error.relative_absolute_error">relative_absolute_error() (in module quapy.error)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.reset_net_params">reset_net_params() (quapy.classification.neural.NeuralClassifierTrainer method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.RETURN_TYPES">RETURN_TYPES (quapy.protocol.OnLabelledCollectionProtocol attribute)</a>
</li>
</ul></td>
</tr></table>
@ -904,6 +950,30 @@
<h2 id="S">S</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.sample">sample() (quapy.protocol.AbstractStochasticSeededProtocol method)</a>
<ul>
<li><a href="quapy.html#quapy.protocol.APP.sample">(quapy.protocol.APP method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.DomainMixer.sample">(quapy.protocol.DomainMixer method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.NPP.sample">(quapy.protocol.NPP method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.UPP.sample">(quapy.protocol.UPP method)</a>
</li>
</ul></li>
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.samples_parameters">samples_parameters() (quapy.protocol.AbstractStochasticSeededProtocol method)</a>
<ul>
<li><a href="quapy.html#quapy.protocol.APP.samples_parameters">(quapy.protocol.APP method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.DomainMixer.samples_parameters">(quapy.protocol.DomainMixer method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.NPP.samples_parameters">(quapy.protocol.NPP method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.UPP.samples_parameters">(quapy.protocol.UPP method)</a>
</li>
</ul></li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.sampling">sampling() (quapy.data.base.LabelledCollection method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.sampling_from_index">sampling_from_index() (quapy.data.base.LabelledCollection method)</a>
@ -918,22 +988,10 @@
<ul>
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.set_params">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf.set_params">(quapy.classification.svmperf.SVMperf method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.set_params">(quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.set_params">(quapy.method.aggregative.AggregativeQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.set_params">(quapy.method.aggregative.OneVsAll method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.set_params">(quapy.method.base.BaseQuantifier method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.set_params">(quapy.method.meta.Ensemble method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.set_params">(quapy.method.neural.QuaNetTrainer method)</a>
</li>
<li><a href="quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.set_params">(quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation method)</a>
</li>
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.set_params">(quapy.model_selection.GridSearchQ method)</a>
</li>
@ -941,10 +999,14 @@
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.SLD">SLD (in module quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SMM">SMM (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.html#quapy.error.smooth">smooth() (in module quapy.error)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.ACC.solve_adjustment">solve_adjustment() (quapy.method.aggregative.ACC class method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.split_random">split_random() (quapy.data.base.LabelledCollection method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.split_stratified">split_stratified() (quapy.data.base.LabelledCollection method)</a>
</li>
@ -959,18 +1021,8 @@
</li>
</ul></li>
<li><a href="quapy.html#quapy.functional.strprev">strprev() (in module quapy.functional)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SVMAE">SVMAE (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SVMKLD">SVMKLD (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SVMNKLD">SVMNKLD (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf">SVMperf (class in quapy.classification.svmperf)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SVMQ">SVMQ (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.method.html#quapy.method.aggregative.SVMRAE">SVMRAE (class in quapy.method.aggregative)</a>
</li>
</ul></td>
</tr></table>
@ -986,12 +1038,40 @@
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet">TextClassifierNet (class in quapy.classification.neural)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.ThresholdOptimization">ThresholdOptimization (class in quapy.method.aggregative)</a>
</li>
<li><a href="quapy.html#quapy.functional.TopsoeDistance">TopsoeDistance() (in module quapy.functional)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TorchDataset">TorchDataset (class in quapy.classification.neural)</a>
</li>
<li><a href="quapy.html#quapy.protocol.AbstractProtocol.total">total() (quapy.protocol.AbstractProtocol method)</a>
<ul>
<li><a href="quapy.html#quapy.protocol.APP.total">(quapy.protocol.APP method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.DomainMixer.total">(quapy.protocol.DomainMixer method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.IterateProtocol.total">(quapy.protocol.IterateProtocol method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.NPP.total">(quapy.protocol.NPP method)</a>
</li>
<li><a href="quapy.html#quapy.protocol.UPP.total">(quapy.protocol.UPP method)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.Dataset.train_test">train_test (quapy.data.base.Dataset property)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.CNNnet.training">training (quapy.classification.neural.CNNnet attribute)</a>
<ul>
<li><a href="quapy.classification.html#quapy.classification.neural.LSTMnet.training">(quapy.classification.neural.LSTMnet attribute)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.training">(quapy.classification.neural.TextClassifierNet attribute)</a>
</li>
<li><a href="quapy.method.html#quapy.method.neural.QuaNetModule.training">(quapy.method.neural.QuaNetModule attribute)</a>
</li>
</ul></li>
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.transform">transform() (quapy.classification.methods.LowRankLogisticRegression method)</a>
<ul>
@ -1000,6 +1080,8 @@
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer.transform">(quapy.data.preprocessing.IndexTransformer method)</a>
</li>
</ul></li>
<li><a href="quapy.classification.html#quapy.classification.calibration.TSCalibration">TSCalibration (class in quapy.classification.calibration)</a>
</li>
</ul></td>
</tr></table>
@ -1010,11 +1092,15 @@
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.uniform_sampling">uniform_sampling() (quapy.data.base.LabelledCollection method)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.uniform_sampling_index">uniform_sampling_index() (quapy.data.base.LabelledCollection method)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.html#quapy.functional.uniform_simplex_sampling">uniform_simplex_sampling() (in module quapy.functional)</a>
</li>
<li><a href="quapy.html#quapy.protocol.UniformPrevalenceProtocol">UniformPrevalenceProtocol (in module quapy.protocol)</a>
</li>
<li><a href="quapy.html#quapy.protocol.UPP">UPP (class in quapy.protocol)</a>
</li>
</ul></td>
</tr></table>
@ -1039,6 +1125,8 @@
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer.vocabulary_size">vocabulary_size() (quapy.data.preprocessing.IndexTransformer method)</a>
</li>
<li><a href="quapy.classification.html#quapy.classification.calibration.VSCalibration">VSCalibration (class in quapy.classification.calibration)</a>
</li>
</ul></td>
</tr></table>
@ -1055,16 +1143,30 @@
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.method.html#quapy.method.aggregative.X">X (class in quapy.method.aggregative)</a>
<ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.X">(quapy.data.base.LabelledCollection property)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.xavier_uniform">xavier_uniform() (quapy.classification.neural.TextClassifierNet method)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.Xp">Xp (quapy.data.base.LabelledCollection property)</a>
</li>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.Xy">Xy (quapy.data.base.LabelledCollection property)</a>
</li>
</ul></td>
</tr></table>
<h2 id="Y">Y</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.y">y (quapy.data.base.LabelledCollection property)</a>
</li>
</ul></td>
</tr></table>
<div class="clearer"></div>
@ -1082,7 +1184,7 @@
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -1096,13 +1198,13 @@
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Index</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,19 +2,21 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Welcome to QuaPys documentation! &#8212; QuaPy 0.1.6 documentation</title>
<title>Welcome to QuaPys documentation! &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
@ -36,7 +38,7 @@
<li class="right" >
<a href="Installation.html" title="Installation"
accesskey="N">next</a> |</li>
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Welcome to QuaPys documentation!</a></li>
</ul>
</div>
@ -47,11 +49,11 @@
<div class="body" role="main">
<section id="welcome-to-quapy-s-documentation">
<h1>Welcome to QuaPys documentation!<a class="headerlink" href="#welcome-to-quapy-s-documentation" title="Permalink to this headline"></a></h1>
<h1>Welcome to QuaPys documentation!<a class="headerlink" href="#welcome-to-quapy-s-documentation" title="Permalink to this heading"></a></h1>
<p>QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)
written in Python.</p>
<section id="introduction">
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline"></a></h2>
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading"></a></h2>
<p>QuaPy roots on the concept of data sample, and provides implementations of most important concepts
in quantification literature, such as the most important quantification baselines, many advanced
quantification methods, quantification-oriented model selection, many evaluation measures and protocols
@ -59,7 +61,7 @@ used for evaluating quantification methods.
QuaPy also integrates commonly used datasets and offers visualization tools for facilitating the analysis and
interpretation of results.</p>
<section id="a-quick-example">
<h3>A quick example:<a class="headerlink" href="#a-quick-example" title="Permalink to this headline"></a></h3>
<h3>A quick example:<a class="headerlink" href="#a-quick-example" title="Permalink to this heading"></a></h3>
<p>The following script fetchs a Twitter dataset, trains and evaluates an
<cite>Adjusted Classify &amp; Count</cite> model in terms of the <cite>Mean Absolute Error</cite> (MAE)
between the class prevalences estimated for the test set and the true prevalences
@ -90,7 +92,7 @@ QuaPy implements sampling procedures and evaluation protocols that automates thi
See the <a class="reference internal" href="Evaluation.html"><span class="doc">Evaluation</span></a> for detailed examples.</p>
</section>
<section id="features">
<h3>Features<a class="headerlink" href="#features" title="Permalink to this headline"></a></h3>
<h3>Features<a class="headerlink" href="#features" title="Permalink to this heading"></a></h3>
<ul class="simple">
<li><p>Implementation of most popular quantification methods (Classify-&amp;-Count variants, Expectation-Maximization, SVM-based variants for quantification, HDy, QuaNet, and Ensembles).</p></li>
<li><p>Versatile functionality for performing evaluation based on artificial sampling protocols.</p></li>
@ -100,6 +102,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
<li><p>32 UCI Machine Learning datasets.</p></li>
<li><p>11 Twitter Sentiment datasets.</p></li>
<li><p>3 Reviews Sentiment datasets.</p></li>
<li><p>4 tasks from LeQua competition (_new in v0.1.7!_)</p></li>
</ul>
</dd>
</dl>
@ -120,6 +123,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#reviews-datasets">Reviews Datasets</a></li>
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#twitter-sentiment-datasets">Twitter Sentiment Datasets</a></li>
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#uci-machine-learning">UCI Machine Learning</a></li>
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#lequa-datasets">LeQua Datasets</a></li>
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#adding-custom-datasets">Adding Custom Datasets</a></li>
</ul>
</li>
@ -128,11 +132,23 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
<li class="toctree-l2"><a class="reference internal" href="Evaluation.html#evaluation-protocols">Evaluation Protocols</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Protocols.html">Protocols</a><ul>
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#artificial-prevalence-protocol">Artificial-Prevalence Protocol</a></li>
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#sampling-from-the-unit-simplex-the-uniform-prevalence-protocol-upp">Sampling from the unit-simplex, the Uniform-Prevalence Protocol (UPP)</a></li>
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#natural-prevalence-protocol">Natural-Prevalence Protocol</a></li>
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#other-protocols">Other protocols</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Methods.html">Quantification Methods</a><ul>
<li class="toctree-l2"><a class="reference internal" href="Methods.html#aggregative-methods">Aggregative Methods</a></li>
<li class="toctree-l2"><a class="reference internal" href="Methods.html#meta-models">Meta Models</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Model-Selection.html">Model Selection</a><ul>
<li class="toctree-l2"><a class="reference internal" href="Model-Selection.html#targeting-a-quantification-oriented-loss">Targeting a Quantification-oriented loss</a></li>
<li class="toctree-l2"><a class="reference internal" href="Model-Selection.html#targeting-a-classification-oriented-loss">Targeting a Classification-oriented loss</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Plotting.html">Plotting</a><ul>
<li class="toctree-l2"><a class="reference internal" href="Plotting.html#diagonal-plot">Diagonal Plot</a></li>
<li class="toctree-l2"><a class="reference internal" href="Plotting.html#quantification-bias">Quantification bias</a></li>
@ -149,7 +165,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
</section>
</section>
<section id="indices-and-tables">
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline"></a></h1>
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this heading"></a></h1>
<ul class="simple">
<li><p><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></p></li>
<li><p><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></p></li>
@ -164,8 +180,9 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="#">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="#">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Welcome to QuaPys documentation!</a><ul>
<li><a class="reference internal" href="#introduction">Introduction</a><ul>
<li><a class="reference internal" href="#a-quick-example">A quick example:</a></li>
@ -177,9 +194,12 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
<li><a class="reference internal" href="#indices-and-tables">Indices and tables</a></li>
</ul>
<h4>Next topic</h4>
<p class="topless"><a href="Installation.html"
title="next chapter">Installation</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Installation.html"
title="next chapter">Installation</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -196,7 +216,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -213,13 +233,13 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
<li class="right" >
<a href="Installation.html" title="Installation"
>next</a> |</li>
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Welcome to QuaPys documentation!</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,19 +2,21 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>quapy &#8212; QuaPy 0.1.6 documentation</title>
<title>quapy &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
@ -40,7 +42,7 @@
<li class="right" >
<a href="Plotting.html" title="Plotting"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">quapy</a></li>
</ul>
</div>
@ -51,47 +53,19 @@
<div class="body" role="main">
<section id="quapy">
<h1>quapy<a class="headerlink" href="#quapy" title="Permalink to this headline"></a></h1>
<h1>quapy<a class="headerlink" href="#quapy" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="quapy.html">quapy package</a><ul>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#subpackages">Subpackages</a><ul>
<li class="toctree-l3"><a class="reference internal" href="quapy.classification.html">quapy.classification package</a><ul>
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#submodules">Submodules</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification.methods">quapy.classification.methods module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification.neural">quapy.classification.neural module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification.svmperf">quapy.classification.svmperf module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification">Module contents</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="quapy.data.html">quapy.data package</a><ul>
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#submodules">Submodules</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.base">quapy.data.base module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.datasets">quapy.data.datasets module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.preprocessing">quapy.data.preprocessing module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.reader">quapy.data.reader module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data">Module contents</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="quapy.method.html">quapy.method package</a><ul>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#submodules">Submodules</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.aggregative">quapy.method.aggregative module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.base">quapy.method.base module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.meta">quapy.method.meta module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.neural">quapy.method.neural module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.non_aggregative">quapy.method.non_aggregative module</a></li>
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method">Module contents</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#submodules">Submodules</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.error">quapy.error module</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.evaluation">quapy.evaluation module</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.functional">quapy.functional module</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.model_selection">quapy.model_selection module</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.plot">quapy.plot module</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.util">quapy.util module</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.error">quapy.error</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.evaluation">quapy.evaluation</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#quapy-protocol">quapy.protocol</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.functional">quapy.functional</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.model_selection">quapy.model_selection</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.plot">quapy.plot</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.util">quapy.util</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#subpackages">Subpackages</a></li>
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy">Module contents</a></li>
</ul>
</li>
@ -106,12 +80,16 @@
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h4>Previous topic</h4>
<p class="topless"><a href="Plotting.html"
title="previous chapter">Plotting</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="quapy.html"
title="next chapter">quapy package</a></p>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Plotting.html"
title="previous chapter">Plotting</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="quapy.html"
title="next chapter">quapy package</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -128,7 +106,7 @@
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -148,13 +126,13 @@
<li class="right" >
<a href="Plotting.html" title="Plotting"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">quapy</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

Binary file not shown.

View File

@ -2,18 +2,20 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Python Module Index &#8212; QuaPy 0.1.6 documentation</title>
<title>Python Module Index &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
@ -34,7 +36,7 @@
<li class="right" >
<a href="#" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Python Module Index</a></li>
</ul>
</div>
@ -66,6 +68,11 @@
<td>&#160;&#160;&#160;
<a href="quapy.classification.html#module-quapy.classification"><code class="xref">quapy.classification</code></a></td><td>
<em></em></td></tr>
<tr class="cg-1">
<td></td>
<td>&#160;&#160;&#160;
<a href="quapy.classification.html#module-quapy.classification.calibration"><code class="xref">quapy.classification.calibration</code></a></td><td>
<em></em></td></tr>
<tr class="cg-1">
<td></td>
<td>&#160;&#160;&#160;
@ -161,6 +168,11 @@
<td>&#160;&#160;&#160;
<a href="quapy.html#module-quapy.plot"><code class="xref">quapy.plot</code></a></td><td>
<em></em></td></tr>
<tr class="cg-1">
<td></td>
<td>&#160;&#160;&#160;
<a href="quapy.html#module-quapy.protocol"><code class="xref">quapy.protocol</code></a></td><td>
<em></em></td></tr>
<tr class="cg-1">
<td></td>
<td>&#160;&#160;&#160;
@ -184,7 +196,7 @@
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -198,13 +210,13 @@
<li class="right" >
<a href="#" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Python Module Index</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,19 +2,21 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>quapy.classification package &#8212; QuaPy 0.1.6 documentation</title>
<title>quapy.classification package &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
@ -40,7 +42,7 @@
<li class="right" >
<a href="quapy.html" title="quapy package"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="quapy.html" accesskey="U">quapy package</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">quapy.classification package</a></li>
@ -53,16 +55,232 @@
<div class="body" role="main">
<section id="quapy-classification-package">
<h1>quapy.classification package<a class="headerlink" href="#quapy-classification-package" title="Permalink to this headline"></a></h1>
<h1>quapy.classification package<a class="headerlink" href="#quapy-classification-package" title="Permalink to this heading"></a></h1>
<section id="submodules">
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline"></a></h2>
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this heading"></a></h2>
</section>
<section id="quapy-classification-calibration">
<h2>quapy.classification.calibration<a class="headerlink" href="#quapy-classification-calibration" title="Permalink to this heading"></a></h2>
<div class="versionadded">
<p><span class="versionmodified added">New in version 0.1.7.</span></p>
</div>
<span class="target" id="module-quapy.classification.calibration"></span><dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.calibration.BCTSCalibration">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">BCTSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.BCTSCalibration" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
<p>Applies the Bias-Corrected Temperature Scaling (BCTS) calibration method from <cite>abstention.calibration</cite>, as defined in
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>classifier</strong> a scikit-learn probabilistic classifier</p></li>
<li><p><strong>val_split</strong> indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.</p></li>
<li><p><strong>n_jobs</strong> indicate the number of parallel workers (only when val_split is an integer)</p></li>
<li><p><strong>verbose</strong> whether or not to display information in the standard output</p></li>
</ul>
</dd>
</dl>
</dd></dl>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.calibration.NBVSCalibration">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">NBVSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.NBVSCalibration" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
<p>Applies the No-Bias Vector Scaling (NBVS) calibration method from <cite>abstention.calibration</cite>, as defined in
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>classifier</strong> a scikit-learn probabilistic classifier</p></li>
<li><p><strong>val_split</strong> indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.</p></li>
<li><p><strong>n_jobs</strong> indicate the number of parallel workers (only when val_split is an integer)</p></li>
<li><p><strong>verbose</strong> whether or not to display information in the standard output</p></li>
</ul>
</dd>
</dl>
</dd></dl>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifier">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">RecalibratedProbabilisticClassifier</span></span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifier" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>Abstract class for (re)calibration method from <cite>abstention.calibration</cite>, as defined in
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari, A., Kundaje, A., &amp; Shrikumar, A. (2020, November). Maximum likelihood with bias-corrected calibration
is hard-to-beat at label shift adaptation. In International Conference on Machine Learning (pp. 222-232). PMLR.</a>:</p>
</dd></dl>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">RecalibratedProbabilisticClassifierBase</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">calibrator</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifier" title="quapy.classification.calibration.RecalibratedProbabilisticClassifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifier</span></code></a></p>
<p>Applies a (re)calibration method from <cite>abstention.calibration</cite>, as defined in
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>classifier</strong> a scikit-learn probabilistic classifier</p></li>
<li><p><strong>calibrator</strong> the calibration object (an instance of abstention.calibration.CalibratorFactory)</p></li>
<li><p><strong>val_split</strong> indicate an integer k for performing kFCV to obtain the posterior probabilities, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.</p></li>
<li><p><strong>n_jobs</strong> indicate the number of parallel workers (only when val_split is an integer); default=None</p></li>
<li><p><strong>verbose</strong> whether or not to display information in the standard output</p></li>
</ul>
</dd>
</dl>
<dl class="py property">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.classes_">
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">classes_</span></span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.classes_" title="Permalink to this definition"></a></dt>
<dd><p>Returns the classes on which the classifier has been trained on</p>
<dl class="field-list simple">
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>array-like of shape <cite>(n_classes)</cite></p>
</dd>
</dl>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit">
<span class="sig-name descname"><span class="pre">fit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit" title="Permalink to this definition"></a></dt>
<dd><p>Fits the calibration for the probabilistic classifier.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p></li>
<li><p><strong>y</strong> array-like of shape <cite>(n_samples,)</cite> with the class labels</p></li>
</ul>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>self</p>
</dd>
</dl>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv">
<span class="sig-name descname"><span class="pre">fit_cv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv" title="Permalink to this definition"></a></dt>
<dd><p>Fits the calibration in a cross-validation manner, i.e., it generates posterior probabilities for all
training instances via cross-validation, and then retrains the classifier on all training instances.
The posterior probabilities thus generated are used for calibrating the outputs of the classifier.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p></li>
<li><p><strong>y</strong> array-like of shape <cite>(n_samples,)</cite> with the class labels</p></li>
</ul>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>self</p>
</dd>
</dl>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val">
<span class="sig-name descname"><span class="pre">fit_tr_val</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val" title="Permalink to this definition"></a></dt>
<dd><p>Fits the calibration in a train/val-split manner, i.e.t, it partitions the training instances into a
training and a validation set, and then uses the training samples to learn classifier which is then used
to generate posterior probabilities for the held-out validation data. These posteriors are used to calibrate
the classifier. The classifier is not retrained on the whole dataset.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p></li>
<li><p><strong>y</strong> array-like of shape <cite>(n_samples,)</cite> with the class labels</p></li>
</ul>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>self</p>
</dd>
</dl>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict">
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict" title="Permalink to this definition"></a></dt>
<dd><p>Predicts class labels for the data instances in <cite>X</cite></p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples,)</cite> with the class label predictions</p>
</dd>
</dl>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba">
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba" title="Permalink to this definition"></a></dt>
<dd><p>Generates posterior probabilities for the data instances in <cite>X</cite></p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with posterior probabilities</p>
</dd>
</dl>
</dd></dl>
</dd></dl>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.calibration.TSCalibration">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">TSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.TSCalibration" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
<p>Applies the Temperature Scaling (TS) calibration method from <cite>abstention.calibration</cite>, as defined in
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>classifier</strong> a scikit-learn probabilistic classifier</p></li>
<li><p><strong>val_split</strong> indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.</p></li>
<li><p><strong>n_jobs</strong> indicate the number of parallel workers (only when val_split is an integer)</p></li>
<li><p><strong>verbose</strong> whether or not to display information in the standard output</p></li>
</ul>
</dd>
</dl>
</dd></dl>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.calibration.VSCalibration">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">VSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.VSCalibration" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
<p>Applies the Vector Scaling (VS) calibration method from <cite>abstention.calibration</cite>, as defined in
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>classifier</strong> a scikit-learn probabilistic classifier</p></li>
<li><p><strong>val_split</strong> indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.</p></li>
<li><p><strong>n_jobs</strong> indicate the number of parallel workers (only when val_split is an integer)</p></li>
<li><p><strong>verbose</strong> whether or not to display information in the standard output</p></li>
</ul>
</dd>
</dl>
</dd></dl>
</section>
<section id="module-quapy.classification.methods">
<span id="quapy-classification-methods-module"></span><h2>quapy.classification.methods module<a class="headerlink" href="#module-quapy.classification.methods" title="Permalink to this headline"></a></h2>
<span id="quapy-classification-methods"></span><h2>quapy.classification.methods<a class="headerlink" href="#module-quapy.classification.methods" title="Permalink to this heading"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.methods.LowRankLogisticRegression">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.methods.</span></span><span class="sig-name descname"><span class="pre">LowRankLogisticRegression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n_components</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.base.BaseEstimator</span></code></p>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.methods.</span></span><span class="sig-name descname"><span class="pre">LowRankLogisticRegression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n_components</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code></p>
<p>An example of a classification method (i.e., an object that implements <cite>fit</cite>, <cite>predict</cite>, and <cite>predict_proba</cite>)
that also generates embedded inputs (i.e., that implements <cite>transform</cite>), as those required for
<code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.method.neural.QuaNet</span></code>. This is a mock method to allow for easily instantiating
@ -70,7 +288,7 @@ that also generates embedded inputs (i.e., that implements <cite>transform</cite
The transformation consists of applying <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.decomposition.TruncatedSVD</span></code>
while classification is performed using <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.linear_model.LogisticRegression</span></code> on the low-rank space.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>n_components</strong> the number of principal components to retain</p></li>
<li><p><strong>kwargs</strong> parameters for the
@ -84,13 +302,13 @@ while classification is performed using <code class="xref py py-class docutils l
<dd><p>Fit the model according to the given training data. The fit consists of
fitting <cite>TruncatedSVD</cite> and then <cite>LogisticRegression</cite> on the low-rank representation.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> with the instances</p></li>
<li><p><strong>y</strong> array-like of shape <cite>(n_samples, n_classes)</cite> with the class labels</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p><cite>self</cite></p>
</dd>
</dl>
@ -101,7 +319,7 @@ fitting <cite>TruncatedSVD</cite> and then <cite>LogisticRegression</cite> on th
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.get_params" title="Permalink to this definition"></a></dt>
<dd><p>Get hyper-parameters for this estimator.</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
</dd>
</dl>
@ -112,10 +330,10 @@ fitting <cite>TruncatedSVD</cite> and then <cite>LogisticRegression</cite> on th
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.predict" title="Permalink to this definition"></a></dt>
<dd><p>Predicts labels for the instances <cite>X</cite> embedded into the low-rank space.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of
instances in <cite>X</cite></p>
</dd>
@ -127,10 +345,10 @@ instances in <cite>X</cite></p>
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.predict_proba" title="Permalink to this definition"></a></dt>
<dd><p>Predicts posterior probabilities for the instances <cite>X</cite> embedded into the low-rank space.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with the posterior probabilities</p>
</dd>
</dl>
@ -141,7 +359,7 @@ instances in <cite>X</cite></p>
<span class="sig-name descname"><span class="pre">set_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.set_params" title="Permalink to this definition"></a></dt>
<dd><p>Set the parameters of this estimator.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>parameters</strong> a <cite>**kwargs</cite> dictionary with the estimator parameters for
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html">Logistic Regression</a>
and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
@ -155,10 +373,10 @@ and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
<dd><p>Returns the low-rank approximation of <cite>X</cite> with <cite>n_components</cite> dimensions, or <cite>X</cite> unaltered if
<cite>n_components</cite> &gt;= <cite>X.shape[1]</cite>.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> instances to embed</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_components)</cite> with the embedded instances</p>
</dd>
</dl>
@ -168,15 +386,15 @@ and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
</section>
<section id="module-quapy.classification.neural">
<span id="quapy-classification-neural-module"></span><h2>quapy.classification.neural module<a class="headerlink" href="#module-quapy.classification.neural" title="Permalink to this headline"></a></h2>
<span id="quapy-classification-neural"></span><h2>quapy.classification.neural<a class="headerlink" href="#module-quapy.classification.neural" title="Permalink to this heading"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">CNNnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_heights</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[3,</span> <span class="pre">5,</span> <span class="pre">7]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.CNNnet" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a></p>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">CNNnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_heights</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[3,</span> <span class="pre">5,</span> <span class="pre">7]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.CNNnet" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextClassifierNet</span></code></a></p>
<p>An implementation of <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a> based on
Convolutional Neural Networks.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>vocabulary_size</strong> the size of the vocabulary</p></li>
<li><p><strong>n_classes</strong> number of target classes</p></li>
@ -197,11 +415,11 @@ consecutive tokens that each kernel covers</p></li>
<dd><p>Embeds documents (i.e., performs the forward pass up to the
next-to-last layer).</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>input</strong> a batch of instances, typically generated by a torchs <cite>DataLoader</cite>
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a torch tensor of shape <cite>(n_samples, n_dimensions)</cite>, where
<cite>n_samples</cite> is the number of documents, and <cite>n_dimensions</cite> is the
dimensionality of the embedding</p>
@ -214,18 +432,23 @@ dimensionality of the embedding</p>
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.CNNnet.get_params" title="Permalink to this definition"></a></dt>
<dd><p>Get hyper-parameters for this estimator</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
</dd>
</dl>
</dd></dl>
<dl class="py attribute">
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet.training">
<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em><a class="headerlink" href="#quapy.classification.neural.CNNnet.training" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="py property">
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet.vocabulary_size">
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.CNNnet.vocabulary_size" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.CNNnet.vocabulary_size" title="Permalink to this definition"></a></dt>
<dd><p>Return the size of the vocabulary</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>integer</p>
</dd>
</dl>
@ -235,12 +458,12 @@ dimensionality of the embedding</p>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.neural.LSTMnet">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">LSTMnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lstm_class_nlayers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.LSTMnet" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a></p>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">LSTMnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lstm_class_nlayers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.LSTMnet" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextClassifierNet</span></code></a></p>
<p>An implementation of <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a> based on
Long Short Term Memory networks.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>vocabulary_size</strong> the size of the vocabulary</p></li>
<li><p><strong>n_classes</strong> number of target classes</p></li>
@ -258,11 +481,11 @@ Long Short Term Memory networks.</p>
<dd><p>Embeds documents (i.e., performs the forward pass up to the
next-to-last layer).</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>x</strong> a batch of instances, typically generated by a torchs <cite>DataLoader</cite>
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a torch tensor of shape <cite>(n_samples, n_dimensions)</cite>, where
<cite>n_samples</cite> is the number of documents, and <cite>n_dimensions</cite> is the
dimensionality of the embedding</p>
@ -275,18 +498,23 @@ dimensionality of the embedding</p>
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.LSTMnet.get_params" title="Permalink to this definition"></a></dt>
<dd><p>Get hyper-parameters for this estimator</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
</dd>
</dl>
</dd></dl>
<dl class="py attribute">
<dt class="sig sig-object py" id="quapy.classification.neural.LSTMnet.training">
<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em><a class="headerlink" href="#quapy.classification.neural.LSTMnet.training" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="py property">
<dt class="sig sig-object py" id="quapy.classification.neural.LSTMnet.vocabulary_size">
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.LSTMnet.vocabulary_size" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.LSTMnet.vocabulary_size" title="Permalink to this definition"></a></dt>
<dd><p>Return the size of the vocabulary</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>integer</p>
</dd>
</dl>
@ -296,11 +524,11 @@ dimensionality of the embedding</p>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.neural.NeuralClassifierTrainer">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">NeuralClassifierTrainer</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">net</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><span class="pre">quapy.classification.neural.TextClassifierNet</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">lr</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.001</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decay</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patience</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">10</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">epochs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">200</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">64</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size_test</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">512</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_length</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">300</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">device</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'cpu'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">checkpointpath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'../checkpoint/classifier_net.dat'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">NeuralClassifierTrainer</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">net</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><span class="pre">TextClassifierNet</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">lr</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.001</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decay</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patience</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">10</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">epochs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">200</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">64</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size_test</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">512</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_length</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">300</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">device</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'cpu'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">checkpointpath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'../checkpoint/classifier_net.dat'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>Trains a neural network for text classification.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>net</strong> an instance of <cite>TextClassifierNet</cite> implementing the forward pass</p></li>
<li><p><strong>lr</strong> learning rate (default 1e-3)</p></li>
@ -319,10 +547,10 @@ according to the evaluation in the held-out validation split (default ../chec
</dl>
<dl class="py property">
<dt class="sig sig-object py" id="quapy.classification.neural.NeuralClassifierTrainer.device">
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">device</span></span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.device" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">device</span></span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.device" title="Permalink to this definition"></a></dt>
<dd><p>Gets the device in which the network is allocated</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>device</p>
</dd>
</dl>
@ -333,14 +561,14 @@ according to the evaluation in the held-out validation split (default ../chec
<span class="sig-name descname"><span class="pre">fit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">labels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.3</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.fit" title="Permalink to this definition"></a></dt>
<dd><p>Fits the model according to the given training data.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>instances</strong> list of lists of indexed tokens</p></li>
<li><p><strong>labels</strong> array-like of shape <cite>(n_samples, n_classes)</cite> with the class labels</p></li>
<li><p><strong>val_split</strong> proportion of training documents to be taken as the validation set (default 0.3)</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p></p>
</dd>
</dl>
@ -351,7 +579,7 @@ according to the evaluation in the held-out validation split (default ../chec
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.get_params" title="Permalink to this definition"></a></dt>
<dd><p>Get hyper-parameters for this estimator</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
</dd>
</dl>
@ -362,10 +590,10 @@ according to the evaluation in the held-out validation split (default ../chec
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.predict" title="Permalink to this definition"></a></dt>
<dd><p>Predicts labels for the instances</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>instances</strong> list of lists of indexed tokens</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of
instances in <cite>X</cite></p>
</dd>
@ -377,10 +605,10 @@ instances in <cite>X</cite></p>
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.predict_proba" title="Permalink to this definition"></a></dt>
<dd><p>Predicts posterior probabilities for the instances</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with the posterior probabilities</p>
</dd>
</dl>
@ -391,7 +619,7 @@ instances in <cite>X</cite></p>
<span class="sig-name descname"><span class="pre">reset_net_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocab_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.reset_net_params" title="Permalink to this definition"></a></dt>
<dd><p>Reinitialize the network parameters</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>vocab_size</strong> the size of the vocabulary</p></li>
<li><p><strong>n_classes</strong> the number of target classes</p></li>
@ -407,7 +635,7 @@ instances in <cite>X</cite></p>
In this current version, parameter names for the trainer and learner should
be disjoint.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>params</strong> a <cite>**kwargs</cite> dictionary with the parameters</p>
</dd>
</dl>
@ -418,10 +646,10 @@ be disjoint.</p>
<span class="sig-name descname"><span class="pre">transform</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.transform" title="Permalink to this definition"></a></dt>
<dd><p>Returns the embeddings of the instances</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>instances</strong> list of lists of indexed tokens</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples, embed_size)</cite> with the embedded instances,
where <cite>embed_size</cite> is defined by the classification network</p>
</dd>
@ -432,15 +660,15 @@ where <cite>embed_size</cite> is defined by the classification network</p>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TextClassifierNet</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.nn.modules.module.Module</span></code></p>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TextClassifierNet</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">Module</span></code></p>
<p>Abstract Text classifier (<cite>torch.nn.Module</cite>)</p>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.dimensions">
<span class="sig-name descname"><span class="pre">dimensions</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.dimensions" title="Permalink to this definition"></a></dt>
<dd><p>Gets the number of dimensions of the embedding space</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>integer</p>
</dd>
</dl>
@ -448,15 +676,15 @@ where <cite>embed_size</cite> is defined by the classification network</p>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.document_embedding">
<em class="property"><span class="pre">abstract</span> </em><span class="sig-name descname"><span class="pre">document_embedding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.document_embedding" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">abstract</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">document_embedding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.document_embedding" title="Permalink to this definition"></a></dt>
<dd><p>Embeds documents (i.e., performs the forward pass up to the
next-to-last layer).</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>x</strong> a batch of instances, typically generated by a torchs <cite>DataLoader</cite>
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a torch tensor of shape <cite>(n_samples, n_dimensions)</cite>, where
<cite>n_samples</cite> is the number of documents, and <cite>n_dimensions</cite> is the
dimensionality of the embedding</p>
@ -469,11 +697,11 @@ dimensionality of the embedding</p>
<span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.forward" title="Permalink to this definition"></a></dt>
<dd><p>Performs the forward pass.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>x</strong> a batch of instances, typically generated by a torchs <cite>DataLoader</cite>
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a tensor of shape <cite>(n_instances, n_classes)</cite> with the decision scores
for each of the instances and classes</p>
</dd>
@ -482,10 +710,10 @@ for each of the instances and classes</p>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.get_params">
<em class="property"><span class="pre">abstract</span> </em><span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.get_params" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">abstract</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.get_params" title="Permalink to this definition"></a></dt>
<dd><p>Get hyper-parameters for this estimator</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
</dd>
</dl>
@ -496,23 +724,28 @@ for each of the instances and classes</p>
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.predict_proba" title="Permalink to this definition"></a></dt>
<dd><p>Predicts posterior probabilities for the instances in <cite>x</cite></p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>x</strong> a torch tensor of indexed tokens with shape <cite>(n_instances, pad_length)</cite>
where <cite>n_instances</cite> is the number of instances in the batch, and <cite>pad_length</cite>
is length of the pad in the batch</p>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with the posterior probabilities</p>
</dd>
</dl>
</dd></dl>
<dl class="py attribute">
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.training">
<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.training" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="py property">
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.vocabulary_size">
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.vocabulary_size" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.vocabulary_size" title="Permalink to this definition"></a></dt>
<dd><p>Return the size of the vocabulary</p>
<dl class="field-list simple">
<dt class="field-odd">Returns</dt>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>integer</p>
</dd>
</dl>
@ -528,11 +761,11 @@ is length of the pad in the batch</p>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.neural.TorchDataset">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TorchDataset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">labels</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TorchDataset" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.utils.data.dataset.Dataset</span></code></p>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TorchDataset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">labels</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TorchDataset" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">Dataset</span></code></p>
<p>Transforms labelled instances into a Torchs <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.utils.data.DataLoader</span></code> object</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>instances</strong> list of lists of indexed tokens</p></li>
<li><p><strong>labels</strong> array-like of shape <cite>(n_samples, n_classes)</cite> with the class labels</p></li>
@ -545,7 +778,7 @@ is length of the pad in the batch</p>
<dd><p>Converts the labelled collection into a Torch DataLoader with dynamic padding for
the batch</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>batch_size</strong> batch size</p></li>
<li><p><strong>shuffle</strong> whether or not to shuffle instances</p></li>
@ -555,7 +788,7 @@ applied, meaning that if the longest document in the batch is shorter than
<li><p><strong>device</strong> whether to allocate tensors in cpu or in cuda</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.utils.data.DataLoader</span></code> object</p>
</dd>
</dl>
@ -565,11 +798,11 @@ applied, meaning that if the longest document in the batch is shorter than
</section>
<section id="module-quapy.classification.svmperf">
<span id="quapy-classification-svmperf-module"></span><h2>quapy.classification.svmperf module<a class="headerlink" href="#module-quapy.classification.svmperf" title="Permalink to this headline"></a></h2>
<span id="quapy-classification-svmperf"></span><h2>quapy.classification.svmperf<a class="headerlink" href="#module-quapy.classification.svmperf" title="Permalink to this heading"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf">
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.svmperf.</span></span><span class="sig-name descname"><span class="pre">SVMperf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">svmperf_base</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.01</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">loss</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'01'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.base.BaseEstimator</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.base.ClassifierMixin</span></code></p>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.svmperf.</span></span><span class="sig-name descname"><span class="pre">SVMperf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">svmperf_base</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.01</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">loss</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'01'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">host_folder</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">ClassifierMixin</span></code></p>
<p>A wrapper for the <a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVM-perf package</a> by Thorsten Joachims.
When using losses for quantification, the source code has to be patched. See
the <a class="reference external" href="https://hlt-isti.github.io/QuaPy/build/html/Installation.html#svm-perf-with-quantification-oriented-losses">installation documentation</a>
@ -582,12 +815,14 @@ for further details.</p>
</ul>
</div></blockquote>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>svmperf_base</strong> path to directory containing the binary files <cite>svm_perf_learn</cite> and <cite>svm_perf_classify</cite></p></li>
<li><p><strong>C</strong> trade-off between training error and margin (default 0.01)</p></li>
<li><p><strong>verbose</strong> set to True to print svm-perf std outputs</p></li>
<li><p><strong>loss</strong> the loss to optimize for. Available losses are “01”, “f1”, “kld”, “nkld”, “q”, “qacc”, “qf1”, “qgm”, “mae”, “mrae”.</p></li>
<li><p><strong>host_folder</strong> directory where to store the trained model; set to None (default) for using a tmp directory
(temporal directories are automatically deleted)</p></li>
</ul>
</dd>
</dl>
@ -596,13 +831,13 @@ for further details.</p>
<span class="sig-name descname"><span class="pre">decision_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.decision_function" title="Permalink to this definition"></a></dt>
<dd><p>Evaluate the decision function for the samples in <cite>X</cite>.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> containing the instances to classify</p></li>
<li><p><strong>y</strong> unused</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>array-like of shape <cite>(n_samples,)</cite> containing the decision scores of the instances</p>
</dd>
</dl>
@ -613,13 +848,13 @@ for further details.</p>
<span class="sig-name descname"><span class="pre">fit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.fit" title="Permalink to this definition"></a></dt>
<dd><p>Trains the SVM for the multivariate performance loss</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>X</strong> training instances</p></li>
<li><p><strong>y</strong> a binary vector of labels</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p><cite>self</cite></p>
</dd>
</dl>
@ -628,35 +863,28 @@ for further details.</p>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf.predict">
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.predict" title="Permalink to this definition"></a></dt>
<dd><p>Predicts labels for the instances <cite>X</cite>
:param X: array-like of shape <cite>(n_samples, n_features)</cite> instances to classify
:return: a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of</p>
<blockquote>
<div><p>instances in <cite>X</cite></p>
</div></blockquote>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf.set_params">
<span class="sig-name descname"><span class="pre">set_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">parameters</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.set_params" title="Permalink to this definition"></a></dt>
<dd><p>Set the hyper-parameters for svm-perf. Currently, only the <cite>C</cite> parameter is supported</p>
<dd><p>Predicts labels for the instances <cite>X</cite></p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dd class="field-odd"><p><strong>parameters</strong> a <cite>**kwargs</cite> dictionary <cite>{C: &lt;float&gt;}</cite></p>
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>X</strong> array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of
instances in <cite>X</cite></p>
</dd>
</dl>
</dd></dl>
<dl class="py attribute">
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf.valid_losses">
<span class="sig-name descname"><span class="pre">valid_losses</span></span><em class="property"> <span class="pre">=</span> <span class="pre">{'01':</span> <span class="pre">0,</span> <span class="pre">'f1':</span> <span class="pre">1,</span> <span class="pre">'kld':</span> <span class="pre">12,</span> <span class="pre">'mae':</span> <span class="pre">26,</span> <span class="pre">'mrae':</span> <span class="pre">27,</span> <span class="pre">'nkld':</span> <span class="pre">13,</span> <span class="pre">'q':</span> <span class="pre">22,</span> <span class="pre">'qacc':</span> <span class="pre">23,</span> <span class="pre">'qf1':</span> <span class="pre">24,</span> <span class="pre">'qgm':</span> <span class="pre">25}</span></em><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.valid_losses" title="Permalink to this definition"></a></dt>
<span class="sig-name descname"><span class="pre">valid_losses</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{'01':</span> <span class="pre">0,</span> <span class="pre">'f1':</span> <span class="pre">1,</span> <span class="pre">'kld':</span> <span class="pre">12,</span> <span class="pre">'mae':</span> <span class="pre">26,</span> <span class="pre">'mrae':</span> <span class="pre">27,</span> <span class="pre">'nkld':</span> <span class="pre">13,</span> <span class="pre">'q':</span> <span class="pre">22,</span> <span class="pre">'qacc':</span> <span class="pre">23,</span> <span class="pre">'qf1':</span> <span class="pre">24,</span> <span class="pre">'qgm':</span> <span class="pre">25}</span></em><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.valid_losses" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</dd></dl>
</section>
<section id="module-quapy.classification">
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.classification" title="Permalink to this headline"></a></h2>
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.classification" title="Permalink to this heading"></a></h2>
</section>
</section>
@ -667,24 +895,31 @@ for further details.</p>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">quapy.classification package</a><ul>
<li><a class="reference internal" href="#submodules">Submodules</a></li>
<li><a class="reference internal" href="#module-quapy.classification.methods">quapy.classification.methods module</a></li>
<li><a class="reference internal" href="#module-quapy.classification.neural">quapy.classification.neural module</a></li>
<li><a class="reference internal" href="#module-quapy.classification.svmperf">quapy.classification.svmperf module</a></li>
<li><a class="reference internal" href="#quapy-classification-calibration">quapy.classification.calibration</a></li>
<li><a class="reference internal" href="#module-quapy.classification.methods">quapy.classification.methods</a></li>
<li><a class="reference internal" href="#module-quapy.classification.neural">quapy.classification.neural</a></li>
<li><a class="reference internal" href="#module-quapy.classification.svmperf">quapy.classification.svmperf</a></li>
<li><a class="reference internal" href="#module-quapy.classification">Module contents</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="quapy.html"
title="previous chapter">quapy package</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="quapy.data.html"
title="next chapter">quapy.data package</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="quapy.html"
title="previous chapter">quapy package</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="quapy.data.html"
title="next chapter">quapy.data package</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -701,7 +936,7 @@ for further details.</p>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -721,7 +956,7 @@ for further details.</p>
<li class="right" >
<a href="quapy.html" title="quapy package"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="quapy.html" >quapy package</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">quapy.classification package</a></li>
@ -729,7 +964,7 @@ for further details.</p>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,135 +0,0 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>quapy.tests package &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="prev" title="quapy.method package" href="quapy.method.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
<![endif]-->
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="quapy.method.html" title="quapy.method package"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="quapy.html" accesskey="U">quapy package</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">quapy.tests package</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="quapy-tests-package">
<h1>quapy.tests package<a class="headerlink" href="#quapy-tests-package" title="Permalink to this headline"></a></h1>
<div class="section" id="submodules">
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="quapy-tests-test-base-module">
<h2>quapy.tests.test_base module<a class="headerlink" href="#quapy-tests-test-base-module" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="quapy-tests-test-datasets-module">
<h2>quapy.tests.test_datasets module<a class="headerlink" href="#quapy-tests-test-datasets-module" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="quapy-tests-test-methods-module">
<h2>quapy.tests.test_methods module<a class="headerlink" href="#quapy-tests-test-methods-module" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="module-quapy.tests">
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.tests" title="Permalink to this headline"></a></h2>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">quapy.tests package</a><ul>
<li><a class="reference internal" href="#submodules">Submodules</a></li>
<li><a class="reference internal" href="#quapy-tests-test-base-module">quapy.tests.test_base module</a></li>
<li><a class="reference internal" href="#quapy-tests-test-datasets-module">quapy.tests.test_datasets module</a></li>
<li><a class="reference internal" href="#quapy-tests-test-methods-module">quapy.tests.test_methods module</a></li>
<li><a class="reference internal" href="#module-quapy.tests">Module contents</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="quapy.method.html"
title="previous chapter">quapy.method package</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/quapy.tests.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="quapy.method.html" title="quapy.method package"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="quapy.html" >quapy package</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">quapy.tests package</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
</div>
</body>
</html>

View File

@ -1,129 +0,0 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Getting Started &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="quapy" href="modules.html" />
<link rel="prev" title="Welcome to QuaPys documentation!" href="index.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
<![endif]-->
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="modules.html" title="quapy"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="index.html" title="Welcome to QuaPys documentation!"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Getting Started</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="getting-started">
<h1>Getting Started<a class="headerlink" href="#getting-started" title="Permalink to this headline"></a></h1>
<p>QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python.</p>
<div class="section" id="installation">
<h2>Installation<a class="headerlink" href="#installation" title="Permalink to this headline"></a></h2>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">pip</span> <span class="n">install</span> <span class="n">quapy</span>
</pre></div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Getting Started</a><ul>
<li><a class="reference internal" href="#installation">Installation</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Welcome to QuaPys documentation!</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/readme.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="modules.html" title="quapy"
>next</a> |</li>
<li class="right" >
<a href="index.html" title="Welcome to QuaPys documentation!"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Getting Started</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
</div>
</body>
</html>

View File

@ -1,92 +0,0 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>&lt;no title&gt; &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
<![endif]-->
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">&lt;no title&gt;</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<p>.. include:: ../../README.md</p>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/readme2.md.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">&lt;no title&gt;</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
</div>
</body>
</html>

View File

@ -2,11 +2,11 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Search &#8212; QuaPy 0.1.6 documentation</title>
<title>Search &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -14,7 +14,9 @@
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<script src="_static/searchtools.js"></script>
<script src="_static/language_data.js"></script>
@ -37,7 +39,7 @@
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Search</a></li>
</ul>
</div>
@ -97,13 +99,13 @@
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Search</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,69 @@
import quapy as qp
from quapy.data import LabelledCollection
from quapy.method.base import BinaryQuantifier
from quapy.model_selection import GridSearchQ
from quapy.method.aggregative import AggregativeProbabilisticQuantifier
from quapy.protocol import APP
import numpy as np
from sklearn.linear_model import LogisticRegression
# Define a custom quantifier: for this example, we will consider a new quantification algorithm that uses a
# logistic regressor for generating posterior probabilities, and then applies a custom threshold value to the
# posteriors. Since the quantifier internally uses a classifier, it is an aggregative quantifier; and since it
# relies on posterior probabilities, it is a probabilistic-aggregative quantifier. Note also it has an
# internal hyperparameter (let say, alpha) which is the decision threshold. Let's also assume the quantifier
# is binary, for simplicity.
class MyQuantifier(AggregativeProbabilisticQuantifier, BinaryQuantifier):
def __init__(self, classifier, alpha=0.5):
self.alpha = alpha
# aggregative quantifiers have an internal self.classifier attribute
self.classifier = classifier
def fit(self, data: LabelledCollection, fit_classifier=True):
assert fit_classifier, 'this quantifier needs to fit the classifier!'
self.classifier.fit(*data.Xy)
return self
# in general, we would need to implement the method quantify(self, instances) but, since this method is of
# type aggregative, we can simply implement the method aggregate, which has the following interface
def aggregate(self, classif_predictions: np.ndarray):
# the posterior probabilities have already been generated by the quantify method; we only need to
# specify what to do with them
positive_probabilities = classif_predictions[:, 1]
crisp_decisions = positive_probabilities > self.alpha
pos_prev = crisp_decisions.mean()
neg_prev = 1-pos_prev
return np.asarray([neg_prev, pos_prev])
if __name__ == '__main__':
qp.environ['SAMPLE_SIZE'] = 100
# define an instance of our custom quantifier
quantifier = MyQuantifier(LogisticRegression(), alpha=0.5)
# load the IMDb dataset
train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
# model selection
# let us assume we want to explore our hyperparameter alpha along with one hyperparameter of the classifier
train, val = train.split_stratified(train_prop=0.75)
param_grid = {
'alpha': np.linspace(0, 1, 11), # quantifier-dependent hyperparameter
'classifier__C': np.logspace(-2, 2, 5) # classifier-dependent hyperparameter
}
quantifier = GridSearchQ(quantifier, param_grid, protocol=APP(val), n_jobs=-1, verbose=True).fit(train)
# evaluation
mae = qp.evaluation.evaluate(quantifier, protocol=APP(test), error_metric='mae')
print(f'MAE = {mae:.4f}')
# final remarks: this method is only for demonstration purposes and makes little sense in general. The method relies
# on an hyperparameter alpha for binarizing the posterior probabilities. A much better way for fulfilling this
# goal would be to calibrate the classifier (LogisticRegression is already reasonably well calibrated) and then
# simply cut at 0.5.

View File

@ -0,0 +1,72 @@
import quapy as qp
from quapy.method.aggregative import newELM
from quapy.method.base import newOneVsAll
from quapy.model_selection import GridSearchQ
from quapy.protocol import UPP
"""
In this example, we will show hoy to define a quantifier based on explicit loss minimization (ELM).
ELM is a family of quantification methods relying on structured output learning. In particular, we will
showcase how to instantiate SVM(Q) as proposed by `Barranquero et al. 2015
<https://www.sciencedirect.com/science/article/pii/S003132031400291X>`_, and SVM(KLD) and SVM(nKLD) as proposed by
`Esuli et al. 2015 <https://dl.acm.org/doi/abs/10.1145/2700406>`_.
All ELM quantifiers rely on SVMperf for optimizing a structured loss function (Q, KLD, or nKLD). Since these are
not part of the original SVMperf package by Joachims, you have to first download the SVMperf package, apply the
patch svm-perf-quantification-ext.patch (provided with QuaPy library), and compile the sources.
The script prepare_svmperf.sh does all the job. Simply run:
>>> ./prepare_svmperf.sh
Note that ELM quantifiers are nothing but a classify and count (CC) model instantiated with SVMperf as the
underlying classifier. E.g., SVM(Q) comes down to:
>>> CC(SVMperf(svmperf_base, loss='q'))
this means that ELM are aggregative quantifiers (since CC is an aggregative quantifier). QuaPy provides some helper
functions for simplify this; for example:
>>> newSVMQ(svmperf_base)
returns an instance of SVM(Q) (i.e., an instance of CC properly set to work with SVMperf optimizing for Q.
Since we wan to explore the losses, we will instead use newELM. For this example we will create a quantifier for tweet
sentiment analysis considering three classes: negative, neutral, and positive. Since SVMperf is a binary classifier,
our quantifier will be binary as well. We will use a one-vs-all approach to work in multiclass model.
For more details about how one-vs-all works, we refer to the example "one_vs_all.py" and to the API documentation.
"""
qp.environ['SAMPLE_SIZE'] = 100
qp.environ['N_JOBS'] = -1
qp.environ['SVMPERF_HOME'] = '../svm_perf_quantification'
quantifier = newOneVsAll(newELM())
print(f'the quantifier is an instance of {quantifier.__class__.__name__}')
# load a ternary dataset
train_modsel, val = qp.datasets.fetch_twitter('hcr', for_model_selection=True, pickle=True).train_test
"""
model selection:
We explore the classifier's loss and the classifier's C hyperparameters.
Since our model is actually an instance of OneVsAllAggregative, we need to add the prefix "binary_quantifier", and
since our binary quantifier is an instance of CC, we need to add the prefix "classifier".
"""
param_grid = {
'binary_quantifier__classifier__loss': ['q', 'kld', 'mae'], # classifier-dependent hyperparameter
'binary_quantifier__classifier__C': [0.01, 1, 100], # classifier-dependent hyperparameter
}
print('starting model selection')
model_selection = GridSearchQ(quantifier, param_grid, protocol=UPP(val), verbose=True, refit=False)
quantifier = model_selection.fit(train_modsel).best_model()
print('training on the whole training set')
train, test = qp.datasets.fetch_twitter('hcr', for_model_selection=False, pickle=True).train_test
quantifier.fit(train)
# evaluation
mae = qp.evaluation.evaluate(quantifier, protocol=UPP(test), error_metric='mae')
print(f'MAE = {mae:.4f}')

View File

@ -0,0 +1,53 @@
import numpy as np
from sklearn.linear_model import LogisticRegression
import quapy as qp
import quapy.functional as F
from quapy.data.datasets import LEQUA2022_SAMPLE_SIZE, fetch_lequa2022
from quapy.evaluation import evaluation_report
from quapy.method.aggregative import EMQ
from quapy.model_selection import GridSearchQ
import pandas as pd
"""
This example shows hoy to use the LeQua datasets (new in v0.1.7). For more information about the datasets, and the
LeQua competition itself, check:
https://lequa2022.github.io/index (the site of the competition)
https://ceur-ws.org/Vol-3180/paper-146.pdf (the overview paper)
"""
# there are 4 tasks (T1A, T1B, T2A, T2B)
task = 'T1A'
# set the sample size in the environment. The sample size is task-dendendent and can be consulted by doing:
qp.environ['SAMPLE_SIZE'] = LEQUA2022_SAMPLE_SIZE[task]
qp.environ['N_JOBS'] = -1
# the fetch method returns a training set (an instance of LabelledCollection) and two generators: one for the
# validation set and another for the test sets. These generators are both instances of classes that extend
# AbstractProtocol (i.e., classes that implement sampling generation procedures) and, in particular, are instances
# of SamplesFromDir, a protocol that simply iterates over pre-generated samples (those provided for the competition)
# stored in a directory.
training, val_generator, test_generator = fetch_lequa2022(task=task)
# define the quantifier
quantifier = EMQ(classifier=LogisticRegression())
# model selection
param_grid = {
'classifier__C': np.logspace(-3, 3, 7), # classifier-dependent: inverse of regularization strength
'classifier__class_weight': ['balanced', None], # classifier-dependent: weights of each class
'recalib': ['bcts', 'platt', None] # quantifier-dependent: recalibration method (new in v0.1.7)
}
model_selection = GridSearchQ(quantifier, param_grid, protocol=val_generator, error='mrae', refit=False, verbose=True)
quantifier = model_selection.fit(training)
# evaluation
report = evaluation_report(quantifier, protocol=test_generator, error_metrics=['mae', 'mrae', 'mkld'], verbose=True)
# printing results
pd.set_option('display.expand_frame_repr', False)
report['estim-prev'] = report['estim-prev'].map(F.strprev)
print(report)
print('Averaged values:')
print(report.mean())

View File

@ -0,0 +1,63 @@
import numpy as np
from abstention.calibration import NoBiasVectorScaling, VectorScaling, TempScaling
from sklearn.calibration import CalibratedClassifierCV
from sklearn.linear_model import LogisticRegression
import quapy as qp
import quapy.functional as F
from classification.calibration import RecalibratedProbabilisticClassifierBase, NBVSCalibration, \
BCTSCalibration
from data.datasets import LEQUA2022_SAMPLE_SIZE, fetch_lequa2022
from evaluation import evaluation_report
from method.aggregative import EMQ
from model_selection import GridSearchQ
import pandas as pd
for task in ['T1A', 'T1B']:
# calibration = TempScaling(verbose=False, bias_positions='all')
qp.environ['SAMPLE_SIZE'] = LEQUA2022_SAMPLE_SIZE[task]
training, val_generator, test_generator = fetch_lequa2022(task=task)
# define the quantifier
# learner = BCTSCalibration(LogisticRegression(), n_jobs=-1)
# learner = CalibratedClassifierCV(LogisticRegression())
learner = LogisticRegression()
quantifier = EMQ(classifier=learner)
# model selection
param_grid = {
'classifier__C': np.logspace(-3, 3, 7),
'classifier__class_weight': ['balanced', None],
'recalib': ['platt', 'ts', 'vs', 'nbvs', 'bcts', None],
'exact_train_prev': [False, True]
}
model_selection = GridSearchQ(quantifier, param_grid, protocol=val_generator, error='mrae', n_jobs=-1, refit=False, verbose=True)
quantifier = model_selection.fit(training)
# evaluation
report = evaluation_report(quantifier, protocol=test_generator, error_metrics=['mae', 'mrae', 'mkld'], verbose=True)
# import os
# os.makedirs(f'./out', exist_ok=True)
# with open(f'./out/EMQ_{calib}_{task}.txt', 'wt') as foo:
# estim_prev = report['estim-prev'].values
# nclasses = len(estim_prev[0])
# foo.write(f'id,'+','.join([str(x) for x in range(nclasses)])+'\n')
# for id, prev in enumerate(estim_prev):
# foo.write(f'{id},'+','.join([f'{p:.5f}' for p in prev])+'\n')
#
# #os.makedirs(f'./errors/{task}', exist_ok=True)
# with open(f'./out/EMQ_{calib}_{task}_errors.txt', 'wt') as foo:
# maes, mraes = report['mae'].values, report['mrae'].values
# foo.write(f'id,AE,RAE\n')
# for id, (ae_i, rae_i) in enumerate(zip(maes, mraes)):
# foo.write(f'{id},{ae_i:.5f},{rae_i:.5f}\n')
# printing results
pd.set_option('display.expand_frame_repr', False)
report['estim-prev'] = report['estim-prev'].map(F.strprev)
print(report)
print('Averaged values:')
print(report.mean())

View File

@ -0,0 +1,57 @@
import quapy as qp
from quapy.protocol import APP
from quapy.method.aggregative import DistributionMatching
from sklearn.linear_model import LogisticRegression
import numpy as np
"""
In this example, we show how to perform model selection on a DistributionMatching quantifier.
"""
model = DistributionMatching(LogisticRegression())
qp.environ['SAMPLE_SIZE'] = 100
qp.environ['N_JOBS'] = -1
training, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
# The model will be returned by the fit method of GridSearchQ.
# Every combination of hyper-parameters will be evaluated by confronting the
# quantifier thus configured against a series of samples generated by means
# of a sample generation protocol. For this example, we will use the
# artificial-prevalence protocol (APP), that generates samples with prevalence
# values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).
# We devote 30% of the dataset for this exploration.
training, validation = training.split_stratified(train_prop=0.7)
protocol = APP(validation)
# We will explore a classification-dependent hyper-parameter (e.g., the 'C'
# hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter
# (e.g., the number of bins in a DistributionMatching quantifier.
# Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"
# in order to let the quantifier know this hyper-parameter belongs to its underlying
# classifier.
param_grid = {
'classifier__C': np.logspace(-3,3,7),
'nbins': [8, 16, 32, 64],
}
model = qp.model_selection.GridSearchQ(
model=model,
param_grid=param_grid,
protocol=protocol,
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
refit=True, # retrain on the whole labelled set once done
verbose=True # show information as the process goes on
).fit(training)
print(f'model selection ended: best hyper-parameters={model.best_params_}')
model = model.best_model_
# evaluation in terms of MAE
# we use the same evaluation protocol (APP) on the test set
mae_score = qp.evaluation.evaluate(model, protocol=APP(test), error_metric='mae')
print(f'MAE={mae_score:.5f}')

54
examples/one_vs_all.py Normal file
View File

@ -0,0 +1,54 @@
import quapy as qp
from quapy.method.aggregative import MS2
from quapy.method.base import newOneVsAll
from quapy.model_selection import GridSearchQ
from quapy.protocol import UPP
from sklearn.linear_model import LogisticRegression
import numpy as np
"""
In this example, we will create a quantifier for tweet sentiment analysis considering three classes: negative, neutral,
and positive. We will use a one-vs-all approach using a binary quantifier for demonstration purposes.
"""
qp.environ['SAMPLE_SIZE'] = 100
qp.environ['N_JOBS'] = -1
"""
Any binary quantifier can be turned into a single-label quantifier by means of getOneVsAll function.
This function returns an instance of OneVsAll quantifier. Actually, it either returns the subclass OneVsAllGeneric
when the quantifier is an instance of BaseQuantifier, and it returns OneVsAllAggregative when the quantifier is
an instance of AggregativeQuantifier. Although OneVsAllGeneric works in all cases, using OneVsAllAggregative has
some additional advantages (namely, all the advantages that AggregativeQuantifiers enjoy, i.e., faster predictions
during evaluation).
"""
quantifier = newOneVsAll(MS2(LogisticRegression()))
print(f'the quantifier is an instance of {quantifier.__class__.__name__}')
# load a ternary dataset
train_modsel, val = qp.datasets.fetch_twitter('hcr', for_model_selection=True, pickle=True).train_test
"""
model selection: for this example, we are relying on the UPP protocol, i.e., a variant of the
artificial-prevalence protocol that generates random samples (100 in this case) for randomly picked priors
from the unit simplex. The priors are sampled using the Kraemer algorithm. Note this is in contrast to the
standard APP protocol, that instead explores a prefixed grid of prevalence values.
"""
param_grid = {
'binary_quantifier__classifier__C': np.logspace(-2,2,5), # classifier-dependent hyperparameter
'binary_quantifier__classifier__class_weight': ['balanced', None] # classifier-dependent hyperparameter
}
print('starting model selection')
model_selection = GridSearchQ(quantifier, param_grid, protocol=UPP(val), verbose=True, refit=False)
quantifier = model_selection.fit(train_modsel).best_model()
print('training on the whole training set')
train, test = qp.datasets.fetch_twitter('hcr', for_model_selection=False, pickle=True).train_test
quantifier.fit(train)
# evaluation
mae = qp.evaluation.evaluate(quantifier, protocol=UPP(test), error_metric='mae')
print(f'MAE = {mae:.4f}')

View File

@ -0,0 +1,35 @@
import quapy as qp
from quapy.classification.neural import CNNnet
from quapy.classification.neural import NeuralClassifierTrainer
from quapy.method.meta import QuaNet
import quapy.functional as F
"""
This example shows how to train QuaNet. The internal classifier is a word-based CNN.
"""
# set the sample size in the environment
qp.environ["SAMPLE_SIZE"] = 100
# the dataset is textual (Kindle reviews from Amazon), so we need to index terms, i.e.,
# we need to convert distinct terms into numerical ids
dataset = qp.datasets.fetch_reviews('kindle', pickle=True)
qp.data.preprocessing.index(dataset, min_df=5, inplace=True)
train, test = dataset.train_test
# train the text classifier:
cnn_module = CNNnet(dataset.vocabulary_size, dataset.training.n_classes)
cnn_classifier = NeuralClassifierTrainer(cnn_module, device='cuda')
cnn_classifier.fit(*dataset.training.Xy)
# train QuaNet (alternatively, we can set fit_classifier=True and let QuaNet train the classifier)
quantifier = QuaNet(cnn_classifier, device='cuda')
quantifier.fit(train, fit_classifier=False)
# prediction and evaluation
estim_prevalence = quantifier.quantify(test.instances)
mae = qp.error.mae(test.prevalence(), estim_prevalence)
print(f'true prevalence: {F.strprev(test.prevalence())}')
print(f'estim prevalence: {F.strprev(estim_prevalence)}')
print(f'MAE = {mae:.4f}')

79
quapy/CHANGE_LOG.txt Normal file
View File

@ -0,0 +1,79 @@
Change Log 0.1.7
----------------
- Protocols are now abstracted as instances of AbstractProtocol. There is a new class extending AbstractProtocol called
AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
There are some examples of protocols, APP, NPP, UPP, DomainMixer (experimental).
The idea is to start the sample generation by simply calling the __call__ method.
This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
and sampling functions in LabelledCollection relied of the old functions. E.g., the functionality of
qp.evaluation.artificial_prevalence_report or qp.evaluation.natural_prevalence_report is now obtained by means of
qp.evaluation.report which takes a protocol as an argument. I have not maintained compatibility with the old
interfaces because I did not really like them. Check the wiki guide and the examples for more details.
- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).
- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
which the total number of classifications would be smaller than the number of classifications with the standard
procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
or not.
- examples directory created!
- DyS, Topsoe distance and binary search (thanks to Pablo González)
- Multi-thread reproducibility via seeding (thanks to Pablo González)
- n_jobs is now taken from the environment if set to None
- ACC, PACC, Forman's threshold variants have been parallelized.
- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
test protocol maybe, or None for bypassing it?
- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes
- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
with the plain python type (e.g., float).
- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
https://github.com/kundajelab/abstention added.
- the internal classifier of aggregative methods is now called "classifier" instead of "learner"
- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters
should be marked with a "classifier__" prefix (just like in scikit-learn with estimators), while the quantifier's
specific hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has hyperparameters
"classifier__C", "classifier__class_weight", etc., instead of "C" and "class_weight" as in v0.1.6.
- hyperparameters yielding to inconsistent runs raise a ValueError exception, while hyperparameter combinations
yielding to internal errors of surrogate functions are reported and skipped, without stopping the grid search.
- DistributionMatching methods added. This is a general framework for distribution matching methods that catters for
multiclass quantification. That is to say, one could get a multiclass variant of the (originally binary) HDy
method aligned with the Firat's formulation.
- internal method properties "binary", "aggregative", and "probabilistic" have been removed; these conditions are
checked via isinstance
- quantifiers (i.e., classes that inherit from BaseQuantifier) are not forced to implement classes_ or n_classes;
these can be used anyway internally, but the framework will not suppose (nor impose) that a quantifier implements
them
- qp.evaluation.prediction has been optimized so that, if a quantifier is of type aggregative, and if the evaluation
protocol is of type OnLabelledCollection, then the computation is faster. In this specific case, the predictions
are issued only once and for all, and not for each sample. An exception to this (which is implement also), is
when the number of instances across all samples is anyway smaller than the number of instances in the original
labelled collection; in this case the heuristic is of no help, and is therefore not applied.
- the distinction between "classify" and "posterior_probabilities" has been removed in Aggregative quantifiers,
so that probabilistic classifiers return posterior probabilities, while non-probabilistic quantifiers
return crisp decisions.
- OneVsAll fixed. There are now two classes: a generic one OneVsAllGeneric that works with any quantifier (e.g.,
any instance of BaseQuantifier), and a subclass of it called OneVsAllAggregative which implements the
classify / aggregate interface. Both are instances of OneVsAll. There is a method getOneVsAll that returns the
best instance based on the type of quantifier.

View File

@ -2,15 +2,15 @@ from . import error
from . import data
from quapy.data import datasets
from . import functional
from . import method
# from . import method
from . import evaluation
from . import protocol
from . import plot
from . import util
from . import model_selection
from . import classification
from quapy.method.base import isprobabilistic, isaggregative
__version__ = '0.1.6'
__version__ = '0.1.7'
environ = {
'SAMPLE_SIZE': None,
@ -18,8 +18,33 @@ environ = {
'UNK_INDEX': 0,
'PAD_TOKEN': '[PAD]',
'PAD_INDEX': 1,
'SVMPERF_HOME': './svm_perf_quantification'
'SVMPERF_HOME': './svm_perf_quantification',
'N_JOBS': 1
}
def isbinary(x):
return x.binary
def _get_njobs(n_jobs):
"""
If `n_jobs` is None, then it returns `environ['N_JOBS']`; if otherwise, returns `n_jobs`.
:param n_jobs: the number of `n_jobs` or None if not specified
:return: int
"""
return environ['N_JOBS'] if n_jobs is None else n_jobs
def _get_sample_size(sample_size):
"""
If `sample_size` is None, then it returns `environ['SAMPLE_SIZE']`; if otherwise, returns `sample_size`.
If none of these are set, then a ValueError exception is raised.
:param sample_size: integer or None
:return: int
"""
sample_size = environ['SAMPLE_SIZE'] if sample_size is None else sample_size
if sample_size is None:
raise ValueError('neither sample_size nor qp.environ["SAMPLE_SIZE"] have been specified')
return sample_size

View File

@ -0,0 +1,215 @@
from copy import deepcopy
from abstention.calibration import NoBiasVectorScaling, TempScaling, VectorScaling
from sklearn.base import BaseEstimator, clone
from sklearn.model_selection import cross_val_predict, train_test_split
import numpy as np
# Wrappers of calibration defined by Alexandari et al. in paper <http://proceedings.mlr.press/v119/alexandari20a.html>
# requires "pip install abstension"
# see https://github.com/kundajelab/abstention
class RecalibratedProbabilisticClassifier:
"""
Abstract class for (re)calibration method from `abstention.calibration`, as defined in
`Alexandari, A., Kundaje, A., & Shrikumar, A. (2020, November). Maximum likelihood with bias-corrected calibration
is hard-to-beat at label shift adaptation. In International Conference on Machine Learning (pp. 222-232). PMLR.
<http://proceedings.mlr.press/v119/alexandari20a.html>`_:
"""
pass
class RecalibratedProbabilisticClassifierBase(BaseEstimator, RecalibratedProbabilisticClassifier):
"""
Applies a (re)calibration method from `abstention.calibration`, as defined in
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
:param classifier: a scikit-learn probabilistic classifier
:param calibrator: the calibration object (an instance of abstention.calibration.CalibratorFactory)
:param val_split: indicate an integer k for performing kFCV to obtain the posterior probabilities, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer); default=None
:param verbose: whether or not to display information in the standard output
"""
def __init__(self, classifier, calibrator, val_split=5, n_jobs=None, verbose=False):
self.classifier = classifier
self.calibrator = calibrator
self.val_split = val_split
self.n_jobs = n_jobs
self.verbose = verbose
def fit(self, X, y):
"""
Fits the calibration for the probabilistic classifier.
:param X: array-like of shape `(n_samples, n_features)` with the data instances
:param y: array-like of shape `(n_samples,)` with the class labels
:return: self
"""
k = self.val_split
if isinstance(k, int):
if k < 2:
raise ValueError('wrong value for val_split: the number of folds must be > 2')
return self.fit_cv(X, y)
elif isinstance(k, float):
if not (0 < k < 1):
raise ValueError('wrong value for val_split: the proportion of validation documents must be in (0,1)')
return self.fit_cv(X, y)
def fit_cv(self, X, y):
"""
Fits the calibration in a cross-validation manner, i.e., it generates posterior probabilities for all
training instances via cross-validation, and then retrains the classifier on all training instances.
The posterior probabilities thus generated are used for calibrating the outputs of the classifier.
:param X: array-like of shape `(n_samples, n_features)` with the data instances
:param y: array-like of shape `(n_samples,)` with the class labels
:return: self
"""
posteriors = cross_val_predict(
self.classifier, X, y, cv=self.val_split, n_jobs=self.n_jobs, verbose=self.verbose, method='predict_proba'
)
self.classifier.fit(X, y)
nclasses = len(np.unique(y))
self.calibration_function = self.calibrator(posteriors, np.eye(nclasses)[y], posterior_supplied=True)
return self
def fit_tr_val(self, X, y):
"""
Fits the calibration in a train/val-split manner, i.e.t, it partitions the training instances into a
training and a validation set, and then uses the training samples to learn classifier which is then used
to generate posterior probabilities for the held-out validation data. These posteriors are used to calibrate
the classifier. The classifier is not retrained on the whole dataset.
:param X: array-like of shape `(n_samples, n_features)` with the data instances
:param y: array-like of shape `(n_samples,)` with the class labels
:return: self
"""
Xtr, Xva, ytr, yva = train_test_split(X, y, test_size=self.val_split, stratify=y)
self.classifier.fit(Xtr, ytr)
posteriors = self.classifier.predict_proba(Xva)
nclasses = len(np.unique(yva))
self.calibrator = self.calibrator(posteriors, np.eye(nclasses)[yva], posterior_supplied=True)
return self
def predict(self, X):
"""
Predicts class labels for the data instances in `X`
:param X: array-like of shape `(n_samples, n_features)` with the data instances
:return: array-like of shape `(n_samples,)` with the class label predictions
"""
return self.classifier.predict(X)
def predict_proba(self, X):
"""
Generates posterior probabilities for the data instances in `X`
:param X: array-like of shape `(n_samples, n_features)` with the data instances
:return: array-like of shape `(n_samples, n_classes)` with posterior probabilities
"""
posteriors = self.classifier.predict_proba(X)
return self.calibration_function(posteriors)
@property
def classes_(self):
"""
Returns the classes on which the classifier has been trained on
:return: array-like of shape `(n_classes)`
"""
return self.classifier.classes_
class NBVSCalibration(RecalibratedProbabilisticClassifierBase):
"""
Applies the No-Bias Vector Scaling (NBVS) calibration method from `abstention.calibration`, as defined in
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
:param classifier: a scikit-learn probabilistic classifier
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
:param verbose: whether or not to display information in the standard output
"""
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
self.classifier = classifier
self.calibrator = NoBiasVectorScaling(verbose=verbose)
self.val_split = val_split
self.n_jobs = n_jobs
self.verbose = verbose
class BCTSCalibration(RecalibratedProbabilisticClassifierBase):
"""
Applies the Bias-Corrected Temperature Scaling (BCTS) calibration method from `abstention.calibration`, as defined in
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
:param classifier: a scikit-learn probabilistic classifier
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
:param verbose: whether or not to display information in the standard output
"""
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
self.classifier = classifier
self.calibrator = TempScaling(verbose=verbose, bias_positions='all')
self.val_split = val_split
self.n_jobs = n_jobs
self.verbose = verbose
class TSCalibration(RecalibratedProbabilisticClassifierBase):
"""
Applies the Temperature Scaling (TS) calibration method from `abstention.calibration`, as defined in
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
:param classifier: a scikit-learn probabilistic classifier
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
:param verbose: whether or not to display information in the standard output
"""
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
self.classifier = classifier
self.calibrator = TempScaling(verbose=verbose)
self.val_split = val_split
self.n_jobs = n_jobs
self.verbose = verbose
class VSCalibration(RecalibratedProbabilisticClassifierBase):
"""
Applies the Vector Scaling (VS) calibration method from `abstention.calibration`, as defined in
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
:param classifier: a scikit-learn probabilistic classifier
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
training set afterwards. Default value is 5.
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
:param verbose: whether or not to display information in the standard output
"""
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
self.classifier = classifier
self.calibrator = VectorScaling(verbose=verbose)
self.val_split = val_split
self.n_jobs = n_jobs
self.verbose = verbose

View File

@ -42,7 +42,7 @@ class NeuralClassifierTrainer:
batch_size=64,
batch_size_test=512,
padding_length=300,
device='cpu',
device='cuda',
checkpointpath='../checkpoint/classifier_net.dat'):
super().__init__()
@ -62,7 +62,6 @@ class NeuralClassifierTrainer:
}
self.learner_hyperparams = self.net.get_params()
self.checkpointpath = checkpointpath
self.classes_ = np.asarray([0, 1])
print(f'[NeuralNetwork running on {device}]')
os.makedirs(Path(checkpointpath).parent, exist_ok=True)
@ -174,6 +173,7 @@ class NeuralClassifierTrainer:
:return:
"""
train, val = LabelledCollection(instances, labels).split_stratified(1-val_split)
self.classes_ = train.classes_
opt = self.trainer_hyperparams
checkpoint = self.checkpointpath
self.reset_net_params(self.vocab_size, train.n_classes)
@ -229,11 +229,11 @@ class NeuralClassifierTrainer:
self.net.eval()
opt = self.trainer_hyperparams
with torch.no_grad():
positive_probs = []
posteriors = []
for xi in TorchDataset(instances).asDataloader(
opt['batch_size_test'], shuffle=False, pad_length=opt['padding_length'], device=opt['device']):
positive_probs.append(self.net.predict_proba(xi))
return np.concatenate(positive_probs)
posteriors.append(self.net.predict_proba(xi))
return np.concatenate(posteriors)
def transform(self, instances):
"""

View File

@ -1,5 +1,7 @@
import random
import shutil
import subprocess
import tempfile
from os import remove, makedirs
from os.path import join, exists
from subprocess import PIPE, STDOUT
@ -23,26 +25,34 @@ class SVMperf(BaseEstimator, ClassifierMixin):
:param C: trade-off between training error and margin (default 0.01)
:param verbose: set to True to print svm-perf std outputs
:param loss: the loss to optimize for. Available losses are "01", "f1", "kld", "nkld", "q", "qacc", "qf1", "qgm", "mae", "mrae".
:param host_folder: directory where to store the trained model; set to None (default) for using a tmp directory
(temporal directories are automatically deleted)
"""
# losses with their respective codes in svm_perf implementation
valid_losses = {'01':0, 'f1':1, 'kld':12, 'nkld':13, 'q':22, 'qacc':23, 'qf1':24, 'qgm':25, 'mae':26, 'mrae':27}
def __init__(self, svmperf_base, C=0.01, verbose=False, loss='01'):
def __init__(self, svmperf_base, C=0.01, verbose=False, loss='01', host_folder=None):
assert exists(svmperf_base), f'path {svmperf_base} does not seem to point to a valid path'
self.svmperf_base = svmperf_base
self.C = C
self.verbose = verbose
self.loss = loss
self.host_folder = host_folder
def set_params(self, **parameters):
"""
Set the hyper-parameters for svm-perf. Currently, only the `C` parameter is supported
:param parameters: a `**kwargs` dictionary `{'C': <float>}`
"""
assert list(parameters.keys()) == ['C'], 'currently, only the C parameter is supported'
self.C = parameters['C']
# def set_params(self, **parameters):
# """
# Set the hyper-parameters for svm-perf. Currently, only the `C` and `loss` parameters are supported
#
# :param parameters: a `**kwargs` dictionary `{'C': <float>}`
# """
# assert sorted(list(parameters.keys())) == ['C', 'loss'], \
# 'currently, only the C and loss parameters are supported'
# self.C = parameters.get('C', self.C)
# self.loss = parameters.get('loss', self.loss)
#
# def get_params(self, deep=True):
# return {'C': self.C, 'loss': self.loss}
def fit(self, X, y):
"""
@ -65,14 +75,14 @@ class SVMperf(BaseEstimator, ClassifierMixin):
local_random = random.Random()
# this would allow to run parallel instances of predict
random_code = '-'.join(str(local_random.randint(0,1000000)) for _ in range(5))
# self.tmpdir = tempfile.TemporaryDirectory(suffix=random_code)
# tmp dir are removed after the fit terminates in multiprocessing... moving to regular directories + __del__
self.tmpdir = '.svmperf-' + random_code
random_code = 'svmperfprocess'+'-'.join(str(local_random.randint(0, 1000000)) for _ in range(5))
if self.host_folder is None:
# tmp dir are removed after the fit terminates in multiprocessing...
self.tmpdir = tempfile.TemporaryDirectory(suffix=random_code).name
else:
self.tmpdir = join(self.host_folder, '.' + random_code)
makedirs(self.tmpdir, exist_ok=True)
# self.model = join(self.tmpdir.name, 'model-'+random_code)
# traindat = join(self.tmpdir.name, f'train-{random_code}.dat')
self.model = join(self.tmpdir, 'model-'+random_code)
traindat = join(self.tmpdir, f'train-{random_code}.dat')
@ -94,6 +104,7 @@ class SVMperf(BaseEstimator, ClassifierMixin):
def predict(self, X):
"""
Predicts labels for the instances `X`
:param X: array-like of shape `(n_samples, n_features)` instances to classify
:return: a `numpy` array of length `n` containing the label predictions, where `n` is the number of
instances in `X`
@ -119,8 +130,6 @@ class SVMperf(BaseEstimator, ClassifierMixin):
# in order to allow for parallel runs of predict, a random code is assigned
local_random = random.Random()
random_code = '-'.join(str(local_random.randint(0, 1000000)) for _ in range(5))
# predictions_path = join(self.tmpdir.name, 'predictions'+random_code+'.dat')
# testdat = join(self.tmpdir.name, 'test'+random_code+'.dat')
predictions_path = join(self.tmpdir, 'predictions' + random_code + '.dat')
testdat = join(self.tmpdir, 'test' + random_code + '.dat')
dump_svmlight_file(X, y, testdat, zero_based=False)
@ -141,5 +150,5 @@ class SVMperf(BaseEstimator, ClassifierMixin):
def __del__(self):
if hasattr(self, 'tmpdir'):
pass # shutil.rmtree(self.tmpdir, ignore_errors=True)
shutil.rmtree(self.tmpdir, ignore_errors=True)

169
quapy/data/_lequa2022.py Normal file
View File

@ -0,0 +1,169 @@
from typing import Tuple, Union
import pandas as pd
import numpy as np
import os
from quapy.protocol import AbstractProtocol
DEV_SAMPLES = 1000
TEST_SAMPLES = 5000
ERROR_TOL = 1E-3
def load_category_map(path):
cat2code = {}
with open(path, 'rt') as fin:
for line in fin:
category, code = line.split()
cat2code[category] = int(code)
code2cat = [cat for cat, code in sorted(cat2code.items(), key=lambda x: x[1])]
return cat2code, code2cat
def load_raw_documents(path):
df = pd.read_csv(path)
documents = list(df["text"].values)
labels = None
if "label" in df.columns:
labels = df["label"].values.astype(int)
return documents, labels
def load_vector_documents(path):
D = pd.read_csv(path).to_numpy(dtype=float)
labelled = D.shape[1] == 301
if labelled:
X, y = D[:, 1:], D[:, 0].astype(int).flatten()
else:
X, y = D, None
return X, y
class SamplesFromDir(AbstractProtocol):
def __init__(self, path_dir:str, ground_truth_path:str, load_fn):
self.path_dir = path_dir
self.load_fn = load_fn
self.true_prevs = ResultSubmission.load(ground_truth_path)
def __call__(self):
for id, prevalence in self.true_prevs.iterrows():
sample, _ = self.load_fn(os.path.join(self.path_dir, f'{id}.txt'))
yield sample, prevalence
class ResultSubmission:
def __init__(self):
self.df = None
def __init_df(self, categories: int):
if not isinstance(categories, int) or categories < 2:
raise TypeError('wrong format for categories: an int (>=2) was expected')
df = pd.DataFrame(columns=list(range(categories)))
df.index.set_names('id', inplace=True)
self.df = df
@property
def n_categories(self):
return len(self.df.columns.values)
def add(self, sample_id: int, prevalence_values: np.ndarray):
if not isinstance(sample_id, int):
raise TypeError(f'error: expected int for sample_sample, found {type(sample_id)}')
if not isinstance(prevalence_values, np.ndarray):
raise TypeError(f'error: expected np.ndarray for prevalence_values, found {type(prevalence_values)}')
if self.df is None:
self.__init_df(categories=len(prevalence_values))
if sample_id in self.df.index.values:
raise ValueError(f'error: prevalence values for "{sample_id}" already added')
if prevalence_values.ndim != 1 and prevalence_values.size != self.n_categories:
raise ValueError(f'error: wrong shape found for prevalence vector {prevalence_values}')
if (prevalence_values < 0).any() or (prevalence_values > 1).any():
raise ValueError(f'error: prevalence values out of range [0,1] for "{sample_id}"')
if np.abs(prevalence_values.sum() - 1) > ERROR_TOL:
raise ValueError(f'error: prevalence values do not sum up to one for "{sample_id}"'
f'(error tolerance {ERROR_TOL})')
self.df.loc[sample_id] = prevalence_values
def __len__(self):
return len(self.df)
@classmethod
def load(cls, path: str) -> 'ResultSubmission':
df = ResultSubmission.check_file_format(path)
r = ResultSubmission()
r.df = df
return r
def dump(self, path: str):
ResultSubmission.check_dataframe_format(self.df)
self.df.to_csv(path)
def prevalence(self, sample_id: int):
sel = self.df.loc[sample_id]
if sel.empty:
return None
else:
return sel.values.flatten()
def iterrows(self):
for index, row in self.df.iterrows():
prevalence = row.values.flatten()
yield index, prevalence
@classmethod
def check_file_format(cls, path) -> Union[pd.DataFrame, Tuple[pd.DataFrame, str]]:
try:
df = pd.read_csv(path, index_col=0)
except Exception as e:
print(f'the file {path} does not seem to be a valid csv file. ')
print(e)
return ResultSubmission.check_dataframe_format(df, path=path)
@classmethod
def check_dataframe_format(cls, df, path=None) -> Union[pd.DataFrame, Tuple[pd.DataFrame, str]]:
hint_path = '' # if given, show the data path in the error message
if path is not None:
hint_path = f' in {path}'
if df.index.name != 'id' or len(df.columns) < 2:
raise ValueError(f'wrong header{hint_path}, '
f'the format of the header should be "id,0,...,n-1", '
f'where n is the number of categories')
if [int(ci) for ci in df.columns.values] != list(range(len(df.columns))):
raise ValueError(f'wrong header{hint_path}, category ids should be 0,1,2,...,n-1, '
f'where n is the number of categories')
if df.empty:
raise ValueError(f'error{hint_path}: results file is empty')
elif len(df) != DEV_SAMPLES and len(df) != TEST_SAMPLES:
raise ValueError(f'wrong number of prevalence values found{hint_path}; '
f'expected {DEV_SAMPLES} for development sets and '
f'{TEST_SAMPLES} for test sets; found {len(df)}')
ids = set(df.index.values)
expected_ids = set(range(len(df)))
if ids != expected_ids:
missing = expected_ids - ids
if missing:
raise ValueError(f'there are {len(missing)} missing ids{hint_path}: {sorted(missing)}')
unexpected = ids - expected_ids
if unexpected:
raise ValueError(f'there are {len(missing)} unexpected ids{hint_path}: {sorted(unexpected)}')
for category_id in df.columns:
if (df[category_id] < 0).any() or (df[category_id] > 1).any():
raise ValueError(f'error{hint_path} column "{category_id}" contains values out of range [0,1]')
prevs = df.values
round_errors = np.abs(prevs.sum(axis=-1) - 1.) > ERROR_TOL
if round_errors.any():
raise ValueError(f'warning: prevalence values in rows with id {np.where(round_errors)[0].tolist()} '
f'do not sum up to 1 (error tolerance {ERROR_TOL}), '
f'probably due to some rounding errors.')
return df

View File

@ -1,24 +1,29 @@
import itertools
from functools import cached_property
from typing import Iterable
import numpy as np
from scipy.sparse import issparse
from scipy.sparse import vstack
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold
from quapy.functional import artificial_prevalence_sampling, strprev
from numpy.random import RandomState
from quapy.functional import strprev
from quapy.util import temp_seed
class LabelledCollection:
"""
A LabelledCollection is a set of objects each with a label associated to it. This class implements many sampling
routines.
A LabelledCollection is a set of objects each with a label attached to each of them.
This class implements several sampling routines and other utilities.
:param instances: array-like (np.ndarray, list, or csr_matrix are supported)
:param labels: array-like with the same length of instances
:param classes_: optional, list of classes from which labels are taken. If not specified, the classes are inferred
:param classes: optional, list of classes from which labels are taken. If not specified, the classes are inferred
from the labels. The classes must be indicated in cases in which some of the labels might have no examples
(i.e., a prevalence of 0)
"""
def __init__(self, instances, labels, classes_=None):
def __init__(self, instances, labels, classes=None):
if issparse(instances):
self.instances = instances
elif isinstance(instances, list) and len(instances) > 0 and isinstance(instances[0], str):
@ -28,14 +33,14 @@ class LabelledCollection:
self.instances = np.asarray(instances)
self.labels = np.asarray(labels)
n_docs = len(self)
if classes_ is None:
if classes is None:
self.classes_ = np.unique(self.labels)
self.classes_.sort()
else:
self.classes_ = np.unique(np.asarray(classes_))
self.classes_ = np.unique(np.asarray(classes))
self.classes_.sort()
if len(set(self.labels).difference(set(classes_))) > 0:
raise ValueError(f'labels ({set(self.labels)}) contain values not included in classes_ ({set(classes_)})')
if len(set(self.labels).difference(set(classes))) > 0:
raise ValueError(f'labels ({set(self.labels)}) contain values not included in classes_ ({set(classes)})')
self.index = {class_: np.arange(n_docs)[self.labels == class_] for class_ in self.classes_}
@classmethod
@ -65,7 +70,7 @@ class LabelledCollection:
def prevalence(self):
"""
Returns the prevalence, or relative frequency, of the classes of interest.
Returns the prevalence, or relative frequency, of the classes in the codeframe.
:return: a np.ndarray of shape `(n_classes)` with the relative frequencies of each class, in the same order
as listed by `self.classes_`
@ -74,7 +79,7 @@ class LabelledCollection:
def counts(self):
"""
Returns the number of instances for each of the classes of interest.
Returns the number of instances for each of the classes in the codeframe.
:return: a np.ndarray of shape `(n_classes)` with the number of instances of each class, in the same order
as listed by `self.classes_`
@ -99,7 +104,7 @@ class LabelledCollection:
"""
return self.n_classes == 2
def sampling_index(self, size, *prevs, shuffle=True):
def sampling_index(self, size, *prevs, shuffle=True, random_state=None):
"""
Returns an index to be used to extract a random sample of desired size and desired prevalence values. If the
prevalence values are not specified, then returns the index of a uniform sampling.
@ -111,50 +116,72 @@ class LabelledCollection:
it is constrained. E.g., for binary collections, only the prevalence `p` for the first class (as listed in
`self.classes_` can be specified, while the other class takes prevalence value `1-p`
:param shuffle: if set to True (default), shuffles the index before returning it
:param random_state: seed for reproducing sampling
:return: a np.ndarray of shape `(size)` with the indexes
"""
if len(prevs) == 0: # no prevalence was indicated; returns an index for uniform sampling
return self.uniform_sampling_index(size)
return self.uniform_sampling_index(size, random_state=random_state)
if len(prevs) == self.n_classes - 1:
prevs = prevs + (1 - sum(prevs),)
assert len(prevs) == self.n_classes, 'unexpected number of prevalences'
assert sum(prevs) == 1, f'prevalences ({prevs}) wrong range (sum={sum(prevs)})'
taken = 0
indexes_sample = []
for i, class_ in enumerate(self.classes_):
if i == self.n_classes - 1:
n_requested = size - taken
else:
n_requested = int(size * prevs[i])
# Decide how many instances should be taken for each class in order to satisfy the requested prevalence
# accurately, and the number of instances in the sample (exactly). If int(size * prevs[i]) (which is
# <= size * prevs[i]) examples are drawn from class i, there could be a remainder number of instances to take
# to satisfy the size constrain. The remainder is distributed along the classes with probability = prevs.
# (This aims at avoiding the remainder to be placed in a class for which the prevalence requested is 0.)
n_requests = {class_: round(size * prevs[i]) for i, class_ in enumerate(self.classes_)}
remainder = size - sum(n_requests.values())
with temp_seed(random_state):
# due to rounding, the remainder can be 0, >0, or <0
if remainder > 0:
# when the remainder is >0 we randomly add 1 to the requests for each class;
# more prevalent classes are more likely to be taken in order to minimize the impact in the final prevalence
for rand_class in np.random.choice(self.classes_, size=remainder, p=prevs):
n_requests[rand_class] += 1
elif remainder < 0:
# when the remainder is <0 we randomly remove 1 from the requests, unless the request is 0 for a chosen
# class; we repeat until remainder==0
while remainder!=0:
rand_class = np.random.choice(self.classes_, p=prevs)
if n_requests[rand_class] > 0:
n_requests[rand_class] -= 1
remainder += 1
n_candidates = len(self.index[class_])
index_sample = self.index[class_][
np.random.choice(n_candidates, size=n_requested, replace=(n_requested > n_candidates))
] if n_requested > 0 else []
indexes_sample = []
for class_, n_requested in n_requests.items():
n_candidates = len(self.index[class_])
index_sample = self.index[class_][
np.random.choice(n_candidates, size=n_requested, replace=(n_requested > n_candidates))
] if n_requested > 0 else []
indexes_sample.append(index_sample)
taken += n_requested
indexes_sample.append(index_sample)
indexes_sample = np.concatenate(indexes_sample).astype(int)
indexes_sample = np.concatenate(indexes_sample).astype(int)
if shuffle:
indexes_sample = np.random.permutation(indexes_sample)
if shuffle:
indexes_sample = np.random.permutation(indexes_sample)
return indexes_sample
def uniform_sampling_index(self, size):
def uniform_sampling_index(self, size, random_state=None):
"""
Returns an index to be used to extract a uniform sample of desired size. The sampling is drawn
with replacement if the requested size is greater than the number of instances, or without replacement
otherwise.
:param size: integer, the size of the uniform sample
:param random_state: if specified, guarantees reproducibility of the split.
:return: a np.ndarray of shape `(size)` with the indexes
"""
return np.random.choice(len(self), size, replace=False)
if random_state is not None:
ng = RandomState(seed=random_state)
else:
ng = np.random
return ng.choice(len(self), size, replace=size > len(self))
def sampling(self, size, *prevs, shuffle=True):
def sampling(self, size, *prevs, shuffle=True, random_state=None):
"""
Return a random sample (an instance of :class:`LabelledCollection`) of desired size and desired prevalence
values. For each class, the sampling is drawn without replacement if the requested prevalence is larger than
@ -165,22 +192,24 @@ class LabelledCollection:
it is constrained. E.g., for binary collections, only the prevalence `p` for the first class (as listed in
`self.classes_` can be specified, while the other class takes prevalence value `1-p`
:param shuffle: if set to True (default), shuffles the index before returning it
:param random_state: seed for reproducing sampling
:return: an instance of :class:`LabelledCollection` with length == `size` and prevalence close to `prevs` (or
prevalence == `prevs` if the exact prevalence values can be met as proportions of instances)
"""
prev_index = self.sampling_index(size, *prevs, shuffle=shuffle)
prev_index = self.sampling_index(size, *prevs, shuffle=shuffle, random_state=random_state)
return self.sampling_from_index(prev_index)
def uniform_sampling(self, size):
def uniform_sampling(self, size, random_state=None):
"""
Returns a uniform sample (an instance of :class:`LabelledCollection`) of desired size. The sampling is drawn
with replacement if the requested size is greater than the number of instances, or without replacement
otherwise.
:param size: integer, the requested size
:param random_state: if specified, guarantees reproducibility of the split.
:return: an instance of :class:`LabelledCollection` with length == `size`
"""
unif_index = self.uniform_sampling_index(size)
unif_index = self.uniform_sampling_index(size, random_state=random_state)
return self.sampling_from_index(unif_index)
def sampling_from_index(self, index):
@ -193,7 +222,7 @@ class LabelledCollection:
"""
documents = self.instances[index]
labels = self.labels[index]
return LabelledCollection(documents, labels, classes_=self.classes_)
return LabelledCollection(documents, labels, classes=self.classes_)
def split_stratified(self, train_prop=0.6, random_state=None):
"""
@ -207,92 +236,91 @@ class LabelledCollection:
:return: two instances of :class:`LabelledCollection`, the first one with `train_prop` elements, and the
second one with `1-train_prop` elements
"""
tr_docs, te_docs, tr_labels, te_labels = \
train_test_split(self.instances, self.labels, train_size=train_prop, stratify=self.labels,
random_state=random_state)
return LabelledCollection(tr_docs, tr_labels), LabelledCollection(te_docs, te_labels)
tr_docs, te_docs, tr_labels, te_labels = train_test_split(
self.instances, self.labels, train_size=train_prop, stratify=self.labels, random_state=random_state
)
training = LabelledCollection(tr_docs, tr_labels, classes=self.classes_)
test = LabelledCollection(te_docs, te_labels, classes=self.classes_)
return training, test
def artificial_sampling_generator(self, sample_size, n_prevalences=101, repeats=1):
def split_random(self, train_prop=0.6, random_state=None):
"""
A generator of samples that implements the artificial prevalence protocol (APP).
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
[1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of samples for each valid
combination of prevalence values is indicated by `repeats`.
Returns two instances of :class:`LabelledCollection` split randomly from this collection, at desired
proportion.
:param sample_size: the number of instances in each sample
:param n_prevalences: the number of prevalence points to be taken from the [0,1] interval (including the
limits {0,1}). E.g., if `n_prevalences=11`, then the prevalence points to take are [0, 0.1, 0.2, ..., 1]
:param repeats: the number of samples to generate for each valid combination of prevalence values (default 1)
:return: yield samples generated at artificially controlled prevalence values
:param train_prop: the proportion of elements to include in the left-most returned collection (typically used
as the training collection). The rest of elements are included in the right-most returned collection
(typically used as a test collection).
:param random_state: if specified, guarantees reproducibility of the split.
:return: two instances of :class:`LabelledCollection`, the first one with `train_prop` elements, and the
second one with `1-train_prop` elements
"""
dimensions = self.n_classes
for prevs in artificial_prevalence_sampling(dimensions, n_prevalences, repeats):
yield self.sampling(sample_size, *prevs)
def artificial_sampling_index_generator(self, sample_size, n_prevalences=101, repeats=1):
"""
A generator of sample indexes implementing the artificial prevalence protocol (APP).
The APP consists of exploring
a grid of prevalence values (e.g., [0, 0.05, 0.1, 0.15, ..., 1]), and generating all valid combinations of
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
[1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of sample indexes for each valid
combination of prevalence values is indicated by `repeats`
:param sample_size: the number of instances in each sample (i.e., length of each index)
:param n_prevalences: the number of prevalence points to be taken from the [0,1] interval (including the
limits {0,1}). E.g., if `n_prevalences=11`, then the prevalence points to take are [0, 0.1, 0.2, ..., 1]
:param repeats: the number of samples to generate for each valid combination of prevalence values (default 1)
:return: yield the indexes that generate the samples according to APP
"""
dimensions = self.n_classes
for prevs in artificial_prevalence_sampling(dimensions, n_prevalences, repeats):
yield self.sampling_index(sample_size, *prevs)
def natural_sampling_generator(self, sample_size, repeats=100):
"""
A generator of samples that implements the natural prevalence protocol (NPP). The NPP consists of drawing
samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
:param sample_size: integer, the number of instances in each sample
:param repeats: the number of samples to generate
:return: yield instances of :class:`LabelledCollection`
"""
for _ in range(repeats):
yield self.uniform_sampling(sample_size)
def natural_sampling_index_generator(self, sample_size, repeats=100):
"""
A generator of sample indexes according to the natural prevalence protocol (NPP). The NPP consists of drawing
samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
:param sample_size: integer, the number of instances in each sample (i.e., the length of each index)
:param repeats: the number of indexes to generate
:return: yield `repeats` instances of np.ndarray with shape `(sample_size,)`
"""
for _ in range(repeats):
yield self.uniform_sampling_index(sample_size)
indexes = np.random.RandomState(seed=random_state).permutation(len(self))
if isinstance(train_prop, int):
assert train_prop < len(self), \
'argument train_prop cannot be greater than the number of elements in the collection'
splitpoint = train_prop
elif isinstance(train_prop, float):
assert 0 < train_prop < 1, \
'argument train_prop out of range (0,1)'
splitpoint = int(np.round(len(self)*train_prop))
left, right = indexes[:splitpoint], indexes[splitpoint:]
training = self.sampling_from_index(left)
test = self.sampling_from_index(right)
return training, test
def __add__(self, other):
"""
Returns a new :class:`LabelledCollection` as the union of this collection with another collection
Returns a new :class:`LabelledCollection` as the union of this collection with another collection.
Both labelled collections must have the same classes.
:param other: another :class:`LabelledCollection`
:return: a :class:`LabelledCollection` representing the union of both collections
"""
if other is None:
return self
elif issparse(self.instances) and issparse(other.instances):
join_instances = vstack([self.instances, other.instances])
elif isinstance(self.instances, list) and isinstance(other.instances, list):
join_instances = self.instances + other.instances
elif isinstance(self.instances, np.ndarray) and isinstance(other.instances, np.ndarray):
join_instances = np.concatenate([self.instances, other.instances])
if not all(np.sort(self.classes_)==np.sort(other.classes_)):
raise NotImplementedError(f'unsupported operation for collections on different classes; '
f'expected {self.classes_}, found {other.classes_}')
return LabelledCollection.join(self, other)
@classmethod
def join(cls, *args: Iterable['LabelledCollection']):
"""
Returns a new :class:`LabelledCollection` as the union of the collections given in input.
:param args: instances of :class:`LabelledCollection`
:return: a :class:`LabelledCollection` representing the union of both collections
"""
args = [lc for lc in args if lc is not None]
assert len(args) > 0, 'empty list is not allowed for mix'
assert all([isinstance(lc, LabelledCollection) for lc in args]), \
'only instances of LabelledCollection allowed'
first_instances = args[0].instances
first_type = type(first_instances)
assert all([type(lc.instances)==first_type for lc in args[1:]]), \
'not all the collections are of instances of the same type'
if issparse(first_instances) or isinstance(first_instances, np.ndarray):
first_ndim = first_instances.ndim
assert all([lc.instances.ndim == first_ndim for lc in args[1:]]), \
'not all the ndarrays are of the same dimension'
if first_ndim > 1:
first_shape = first_instances.shape[1:]
assert all([lc.instances.shape[1:] == first_shape for lc in args[1:]]), \
'not all the ndarrays are of the same shape'
if issparse(first_instances):
instances = vstack([lc.instances for lc in args])
else:
instances = np.concatenate([lc.instances for lc in args])
elif isinstance(first_instances, list):
instances = list(itertools.chain(lc.instances for lc in args))
else:
raise NotImplementedError('unsupported operation for collection types')
labels = np.concatenate([self.labels, other.labels])
return LabelledCollection(join_instances, labels)
labels = np.concatenate([lc.labels for lc in args])
classes = np.unique(labels).sort()
return LabelledCollection(instances, labels, classes=classes)
@property
def Xy(self):
@ -305,6 +333,44 @@ class LabelledCollection:
"""
return self.instances, self.labels
@property
def Xp(self):
"""
Gets the instances and the true prevalence. This is useful when implementing evaluation protocols from
a :class:`LabelledCollection` object.
:return: a tuple `(instances, prevalence)` from this collection
"""
return self.instances, self.prevalence()
@property
def X(self):
"""
An alias to self.instances
:return: self.instances
"""
return self.instances
@property
def y(self):
"""
An alias to self.labels
:return: self.labels
"""
return self.labels
@property
def p(self):
"""
An alias to self.prevalence()
:return: self.prevalence()
"""
return self.prevalence()
def stats(self, show=True):
"""
Returns (and eventually prints) a dictionary with some stats of this collection. E.g.,:
@ -337,7 +403,7 @@ class LabelledCollection:
f'#classes={stats_["classes"]}, prevs={stats_["prevs"]}')
return stats_
def kFCV(self, nfolds=5, nrepeats=1, random_state=0):
def kFCV(self, nfolds=5, nrepeats=1, random_state=None):
"""
Generator of stratified folds to be used in k-fold cross validation.
@ -439,7 +505,17 @@ class Dataset:
"""
return len(self.vocabulary)
def stats(self, show):
@property
def train_test(self):
"""
Alias to `self.training` and `self.test`
:return: the training and test collections
:return: the training and test collections
"""
return self.training, self.test
def stats(self, show=True):
"""
Returns (and eventually prints) a dictionary with some stats of this dataset. E.g.,:
@ -477,13 +553,14 @@ class Dataset:
yield Dataset(train, test, name=f'fold {(i % nfolds) + 1}/{nfolds} (round={(i // nfolds) + 1})')
def isbinary(data):
"""
Returns True if `data` is either a binary :class:`Dataset` or a binary :class:`LabelledCollection`
def reduce(self, n_train=100, n_test=100):
"""
Reduce the number of instances in place for quick experiments. Preserves the prevalence of each set.
:param data: a :class:`Dataset` or a :class:`LabelledCollection` object
:return: True if labelled according to two classes
"""
if isinstance(data, Dataset) or isinstance(data, LabelledCollection):
return data.binary
return False
:param n_train: number of training documents to keep (default 100)
:param n_test: number of test documents to keep (default 100)
:return: self
"""
self.training = self.training.sampling(n_train, *self.training.prevalence())
self.test = self.test.sampling(n_test, *self.test.prevalence())
return self

View File

@ -6,12 +6,14 @@ import os
import zipfile
from os.path import join
import pandas as pd
import scipy
from quapy.data.base import Dataset, LabelledCollection
from quapy.data.preprocessing import text2tfidf, reduce_columns
from quapy.data.reader import *
from quapy.util import download_file_if_not_exists, download_file, get_quapy_home, pickled_resource
REVIEWS_SENTIMENT_DATASETS = ['hp', 'kindle', 'imdb']
TWITTER_SENTIMENT_DATASETS_TEST = ['gasp', 'hcr', 'omd', 'sanders',
'semeval13', 'semeval14', 'semeval15', 'semeval16',
@ -43,6 +45,22 @@ UCI_DATASETS = ['acute.a', 'acute.b',
'wine-q-red', 'wine-q-white',
'yeast']
LEQUA2022_TASKS = ['T1A', 'T1B', 'T2A', 'T2B']
_TXA_SAMPLE_SIZE = 250
_TXB_SAMPLE_SIZE = 1000
LEQUA2022_SAMPLE_SIZE = {
'TXA': _TXA_SAMPLE_SIZE,
'TXB': _TXB_SAMPLE_SIZE,
'T1A': _TXA_SAMPLE_SIZE,
'T1B': _TXB_SAMPLE_SIZE,
'T2A': _TXA_SAMPLE_SIZE,
'T2B': _TXB_SAMPLE_SIZE,
'binary': _TXA_SAMPLE_SIZE,
'multiclass': _TXB_SAMPLE_SIZE
}
def fetch_reviews(dataset_name, tfidf=False, min_df=None, data_home=None, pickle=False) -> Dataset:
"""
@ -532,4 +550,77 @@ def fetch_UCILabelledCollection(dataset_name, data_home=None, verbose=False) ->
def _df_replace(df, col, repl={'yes': 1, 'no':0}, astype=float):
df[col] = df[col].apply(lambda x:repl[x]).astype(astype, copy=False)
df[col] = df[col].apply(lambda x:repl[x]).astype(astype, copy=False)
def fetch_lequa2022(task, data_home=None):
"""
Loads the official datasets provided for the `LeQua <https://lequa2022.github.io/index>`_ competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B are multiclass quantification
problems consisting of estimating the class prevalence values of 28 different merchandise products.
We refer to the `Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
<https://ceur-ws.org/Vol-3180/paper-146.pdf>`_ for a detailed description
on the tasks and datasets.
The datasets are downloaded only once, and stored for fast reuse.
See `lequa2022_experiments.py` provided in the example folder, that can serve as a guide on how to use these
datasets.
:param task: a string representing the task name; valid ones are T1A, T1B, T2A, and T2B
:param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default
~/quay_data/ directory)
:return: a tuple `(train, val_gen, test_gen)` where `train` is an instance of
:class:`quapy.data.base.LabelledCollection`, `val_gen` and `test_gen` are instances of
:class:`quapy.protocol.SamplesFromDir`, i.e., are sampling protocols that return a series of samples
labelled by prevalence.
"""
from quapy.data._lequa2022 import load_raw_documents, load_vector_documents, SamplesFromDir
assert task in LEQUA2022_TASKS, \
f'Unknown task {task}. Valid ones are {LEQUA2022_TASKS}'
if data_home is None:
data_home = get_quapy_home()
URL_TRAINDEV=f'https://zenodo.org/record/6546188/files/{task}.train_dev.zip'
URL_TEST=f'https://zenodo.org/record/6546188/files/{task}.test.zip'
URL_TEST_PREV=f'https://zenodo.org/record/6546188/files/{task}.test_prevalences.zip'
lequa_dir = join(data_home, 'lequa2022')
os.makedirs(lequa_dir, exist_ok=True)
def download_unzip_and_remove(unzipped_path, url):
tmp_path = join(lequa_dir, task + '_tmp.zip')
download_file_if_not_exists(url, tmp_path)
with zipfile.ZipFile(tmp_path) as file:
file.extractall(unzipped_path)
os.remove(tmp_path)
if not os.path.exists(join(lequa_dir, task)):
download_unzip_and_remove(lequa_dir, URL_TRAINDEV)
download_unzip_and_remove(lequa_dir, URL_TEST)
download_unzip_and_remove(lequa_dir, URL_TEST_PREV)
if task in ['T1A', 'T1B']:
load_fn = load_vector_documents
elif task in ['T2A', 'T2B']:
load_fn = load_raw_documents
tr_path = join(lequa_dir, task, 'public', 'training_data.txt')
train = LabelledCollection.load(tr_path, loader_func=load_fn)
val_samples_path = join(lequa_dir, task, 'public', 'dev_samples')
val_true_prev_path = join(lequa_dir, task, 'public', 'dev_prevalences.txt')
val_gen = SamplesFromDir(val_samples_path, val_true_prev_path, load_fn=load_fn)
test_samples_path = join(lequa_dir, task, 'public', 'test_samples')
test_true_prev_path = join(lequa_dir, task, 'public', 'test_prevalences.txt')
test_gen = SamplesFromDir(test_samples_path, test_true_prev_path, load_fn=load_fn)
return train, val_gen, test_gen

View File

@ -88,7 +88,7 @@ def standardize(dataset: Dataset, inplace=False):
:param dataset: a :class:`quapy.data.base.Dataset` object
:param inplace: set to True if the transformation is to be applied inplace, or to False (default) if a new
:class:`quapy.data.base.Dataset` is to be returned
:return:
:return: an instance of :class:`quapy.data.base.Dataset`
"""
s = StandardScaler(copy=not inplace)
training = s.fit_transform(dataset.training.instances)
@ -110,7 +110,7 @@ def index(dataset: Dataset, min_df=5, inplace=False, **kwargs):
:param min_df: minimum number of occurrences below which the term is replaced by a `UNK` index
:param inplace: whether or not to apply the transformation inplace (True), or to a new copy (False, default)
:param kwargs: the rest of parameters of the transformation (as for sklearn's
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>_`)
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>_`)
:return: a new :class:`quapy.data.base.Dataset` (if inplace=False) or a reference to the current
:class:`quapy.data.base.Dataset` (inplace=True) consisting of lists of integer values representing indices.
"""
@ -121,6 +121,9 @@ def index(dataset: Dataset, min_df=5, inplace=False, **kwargs):
training_index = indexer.fit_transform(dataset.training.instances)
test_index = indexer.transform(dataset.test.instances)
training_index = np.asarray(training_index, dtype=object)
test_index = np.asarray(test_index, dtype=object)
if inplace:
dataset.training = LabelledCollection(training_index, dataset.training.labels, dataset.classes_)
dataset.test = LabelledCollection(test_index, dataset.test.labels, dataset.classes_)
@ -147,7 +150,8 @@ class IndexTransformer:
contains, and that would be generated by sklearn's
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
:param kwargs: keyworded arguments from `CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
:param kwargs: keyworded arguments from
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
"""
def __init__(self, **kwargs):
@ -169,7 +173,7 @@ class IndexTransformer:
self.pad = self.add_word(qp.environ['PAD_TOKEN'], qp.environ['PAD_INDEX'])
return self
def transform(self, X, n_jobs=-1):
def transform(self, X, n_jobs=None):
"""
Transforms the strings in `X` as lists of numerical ids
@ -179,14 +183,15 @@ class IndexTransformer:
"""
# given the number of tasks and the number of jobs, generates the slices for the parallel processes
assert self.unk != -1, 'transform called before fit'
indexed = map_parallel(func=self._index, args=X, n_jobs=n_jobs)
return np.asarray(indexed)
n_jobs = qp._get_njobs(n_jobs)
return map_parallel(func=self._index, args=X, n_jobs=n_jobs)
def _index(self, documents):
vocab = self.vocabulary_.copy()
return [[vocab.prevalence(word, self.unk) for word in self.analyzer(doc)] for doc in tqdm(documents, 'indexing')]
return [[vocab.get(word, self.unk) for word in self.analyzer(doc)] for doc in tqdm(documents, 'indexing')]
def fit_transform(self, X, n_jobs=-1):
def fit_transform(self, X, n_jobs=None):
"""
Fits the transform on `X` and transforms it.

View File

@ -102,7 +102,7 @@ def reindex_labels(y):
y = np.asarray(y)
classnames = np.asarray(sorted(np.unique(y)))
label2index = {label: index for index, label in enumerate(classnames)}
indexed = np.empty(y.shape, dtype=np.int)
indexed = np.empty(y.shape, dtype=int)
for label in classnames:
indexed[y==label] = label2index[label]
return indexed, classnames
@ -121,7 +121,7 @@ def binarize(y, pos_class):
0 otherwise
"""
y = np.asarray(y)
ybin = np.zeros(y.shape, dtype=np.int)
ybin = np.zeros(y.shape, dtype=int)
ybin[y == pos_class] = 1
return ybin

View File

@ -11,11 +11,6 @@ def from_name(err_name):
"""
assert err_name in ERROR_NAMES, f'unknown error {err_name}'
callable_error = globals()[err_name]
if err_name in QUANTIFICATION_ERROR_SMOOTH_NAMES:
eps = __check_eps()
def bound_callable_error(y_true, y_pred):
return callable_error(y_true, y_pred, eps)
return bound_callable_error
return callable_error
@ -215,12 +210,14 @@ def __check_eps(eps=None):
CLASSIFICATION_ERROR = {f1e, acce}
QUANTIFICATION_ERROR = {mae, mrae, mse, mkld, mnkld, ae, rae, se, kld, nkld}
QUANTIFICATION_ERROR = {mae, mrae, mse, mkld, mnkld}
QUANTIFICATION_ERROR_SINGLE = {ae, rae, se, kld, nkld}
QUANTIFICATION_ERROR_SMOOTH = {kld, nkld, rae, mkld, mnkld, mrae}
CLASSIFICATION_ERROR_NAMES = {func.__name__ for func in CLASSIFICATION_ERROR}
QUANTIFICATION_ERROR_NAMES = {func.__name__ for func in QUANTIFICATION_ERROR}
QUANTIFICATION_ERROR_SINGLE_NAMES = {func.__name__ for func in QUANTIFICATION_ERROR_SINGLE}
QUANTIFICATION_ERROR_SMOOTH_NAMES = {func.__name__ for func in QUANTIFICATION_ERROR_SMOOTH}
ERROR_NAMES = CLASSIFICATION_ERROR_NAMES | QUANTIFICATION_ERROR_NAMES
ERROR_NAMES = CLASSIFICATION_ERROR_NAMES | QUANTIFICATION_ERROR_NAMES | QUANTIFICATION_ERROR_SINGLE_NAMES
f1_error = f1e
acc_error = acce

View File

@ -1,296 +1,122 @@
from typing import Union, Callable, Iterable
import numpy as np
from tqdm import tqdm
import inspect
import quapy as qp
from quapy.data import LabelledCollection
from quapy.protocol import AbstractProtocol, OnLabelledCollectionProtocol, IterateProtocol
from quapy.method.base import BaseQuantifier
from quapy.util import temp_seed, _check_sample_size
import quapy.functional as F
import pandas as pd
def artificial_prevalence_prediction(
def prediction(
model: BaseQuantifier,
test: LabelledCollection,
sample_size=None,
n_prevpoints=101,
n_repetitions=1,
eval_budget: int = None,
n_jobs=1,
random_seed=42,
protocol: AbstractProtocol,
aggr_speedup: Union[str, bool] = 'auto',
verbose=False):
"""
Performs the predictions for all samples generated according to the Artificial Prevalence Protocol (APP).
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
[1, 0, 0] prevalence values of size `sample_size` will be considered). The number of samples for each valid
combination of prevalence values is indicated by `repeats`.
Uses a quantification model to generate predictions for the samples generated via a specific protocol.
This function is central to all evaluation processes, and is endowed with an optimization to speed-up the
prediction of protocols that generate samples from a large collection. The optimization applies to aggregative
quantifiers only, and to OnLabelledCollectionProtocol protocols, and comes down to generating the classification
predictions once and for all, and then generating samples over the classification predictions (instead of over
the raw instances), so that the classifier prediction is never called again. This behaviour is obtained by
setting `aggr_speedup` to 'auto' or True, and is only carried out if the overall process is convenient in terms
of computations (e.g., if the number of classification predictions needed for the original collection exceed the
number of classification predictions needed for all samples, then the optimization is not undertaken).
:param model: the model in charge of generating the class prevalence estimations
:param test: the test set on which to perform APP
:param sample_size: integer, the size of the samples; if None, then the sample size is
taken from qp.environ['SAMPLE_SIZE']
:param n_prevpoints: integer, the number of different prevalences to sample (or set to None if eval_budget
is specified; default 101, i.e., steps of 1%)
:param n_repetitions: integer, the number of repetitions for each prevalence (default 1)
:param eval_budget: integer, if specified, sets a ceil on the number of evaluations to perform. For example, if
there are 3 classes, `repeats=1`, and `eval_budget=20`, then `n_prevpoints` will be set to 5, since this
will generate 15 different prevalence vectors ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0]) and
since setting `n_prevpoints=6` would produce more than 20 evaluations.
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:param random_seed: integer, allows to replicate the samplings. The seed is local to the method and does not affect
any other random process (default 42)
:param verbose: if True, shows a progress bar
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
`(n_prevpoints*repeats)` and `n` the number of classes. The first one contains the true prevalence values
for the samples generated while the second one contains the prevalence estimations
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
:param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of
:class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the protocol
in charge of generating the samples for which the model has to issue class prevalence predictions.
:param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of
instances in the original collection on which the protocol acts is larger than the number of instances
in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is
convenient or not. Set to False to deactivate.
:param verbose: boolean, show or not information in stdout
:return: a tuple `(true_prevs, estim_prevs)` in which each element in the tuple is an array of shape
`(n_samples, n_classes)` containing the true, or predicted, prevalence values for each sample
"""
assert aggr_speedup in [False, True, 'auto', 'force'], 'invalid value for aggr_speedup'
sample_size = _check_sample_size(sample_size)
n_prevpoints, _ = qp.evaluation._check_num_evals(test.n_classes, n_prevpoints, eval_budget, n_repetitions, verbose)
sout = lambda x: print(x) if verbose else None
with temp_seed(random_seed):
indexes = list(test.artificial_sampling_index_generator(sample_size, n_prevpoints, n_repetitions))
apply_optimization = False
return _predict_from_indexes(indexes, model, test, n_jobs, verbose)
if aggr_speedup in [True, 'auto', 'force']:
# checks whether the prediction can be made more efficiently; this check consists in verifying if the model is
# of type aggregative, if the protocol is based on LabelledCollection, and if the total number of documents to
# classify using the protocol would exceed the number of test documents in the original collection
from quapy.method.aggregative import AggregativeQuantifier
if isinstance(model, AggregativeQuantifier) and isinstance(protocol, OnLabelledCollectionProtocol):
if aggr_speedup == 'force':
apply_optimization = True
sout(f'forcing aggregative speedup')
elif hasattr(protocol, 'sample_size'):
nD = len(protocol.get_labelled_collection())
samplesD = protocol.total() * protocol.sample_size
if nD < samplesD:
apply_optimization = True
sout(f'speeding up the prediction for the aggregative quantifier, '
f'total classifications {nD} instead of {samplesD}')
def natural_prevalence_prediction(
model: BaseQuantifier,
test: LabelledCollection,
sample_size=None,
repeats=100,
n_jobs=1,
random_seed=42,
verbose=False):
"""
Performs the predictions for all samples generated according to the Natural Prevalence Protocol (NPP).
The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural
prevalence of the collection.
:param model: the model in charge of generating the class prevalence estimations
:param test: the test set on which to perform NPP
:param sample_size: integer, the size of the samples; if None, then the sample size is
taken from qp.environ['SAMPLE_SIZE']
:param repeats: integer, the number of samples to generate (default 100)
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect
any other random process (default 42)
:param verbose: if True, shows a progress bar
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
`(repeats)` and `n` the number of classes. The first one contains the true prevalence values
for the samples generated while the second one contains the prevalence estimations
"""
sample_size = _check_sample_size(sample_size)
with temp_seed(random_seed):
indexes = list(test.natural_sampling_index_generator(sample_size, repeats))
return _predict_from_indexes(indexes, model, test, n_jobs, verbose)
def gen_prevalence_prediction(model: BaseQuantifier, gen_fn: Callable, eval_budget=None):
"""
Generates prevalence predictions for a custom protocol defined as a generator function that yields
samples at each iteration. The sequence of samples is processed exhaustively if `eval_budget=None`
or up to the `eval_budget` iterations if specified.
:param model: the model in charge of generating the class prevalence estimations
:param gen_fn: a generator function yielding one sample at each iteration
:param eval_budget: a maximum number of evaluations to run. Set to None (default) for exploring the
entire sequence
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
generated and `n` the number of classes. The first one contains the true prevalence values
for the samples generated while the second one contains the prevalence estimations
"""
if not inspect.isgenerator(gen_fn()):
raise ValueError('param "gen_fun" is not a callable returning a generator')
if not isinstance(eval_budget, int):
eval_budget = -1
true_prevalences, estim_prevalences = [], []
for sample_instances, true_prev in gen_fn():
true_prevalences.append(true_prev)
estim_prevalences.append(model.quantify(sample_instances))
eval_budget -= 1
if eval_budget == 0:
break
true_prevalences = np.asarray(true_prevalences)
estim_prevalences = np.asarray(estim_prevalences)
return true_prevalences, estim_prevalences
def _predict_from_indexes(
indexes,
model: BaseQuantifier,
test: LabelledCollection,
n_jobs=1,
verbose=False):
if model.aggregative: #isinstance(model, qp.method.aggregative.AggregativeQuantifier):
# print('\tinstance of aggregative-quantifier')
quantification_func = model.aggregate
if model.probabilistic: # isinstance(model, qp.method.aggregative.AggregativeProbabilisticQuantifier):
# print('\t\tinstance of probabilitstic-aggregative-quantifier')
preclassified_instances = model.posterior_probabilities(test.instances)
else:
# print('\t\tinstance of hard-aggregative-quantifier')
preclassified_instances = model.classify(test.instances)
test = LabelledCollection(preclassified_instances, test.labels)
if apply_optimization:
pre_classified = model.classify(protocol.get_labelled_collection().instances)
protocol_with_predictions = protocol.on_preclassified_instances(pre_classified)
return __prediction_helper(model.aggregate, protocol_with_predictions, verbose)
else:
# print('\t\tinstance of base-quantifier')
quantification_func = model.quantify
def _predict_prevalences(index):
sample = test.sampling_from_index(index)
true_prevalence = sample.prevalence()
estim_prevalence = quantification_func(sample.instances)
return true_prevalence, estim_prevalence
pbar = tqdm(indexes, desc='[artificial sampling protocol] generating predictions') if verbose else indexes
results = qp.util.parallel(_predict_prevalences, pbar, n_jobs=n_jobs)
true_prevalences, estim_prevalences = zip(*results)
true_prevalences = np.asarray(true_prevalences)
estim_prevalences = np.asarray(estim_prevalences)
return true_prevalences, estim_prevalences
return __prediction_helper(model.quantify, protocol, verbose)
def artificial_prevalence_report(
model: BaseQuantifier,
test: LabelledCollection,
sample_size=None,
n_prevpoints=101,
n_repetitions=1,
eval_budget: int = None,
n_jobs=1,
random_seed=42,
error_metrics:Iterable[Union[str,Callable]]='mae',
verbose=False):
def __prediction_helper(quantification_fn, protocol: AbstractProtocol, verbose=False):
true_prevs, estim_prevs = [], []
for sample_instances, sample_prev in tqdm(protocol(), total=protocol.total(), desc='predicting') if verbose else protocol():
estim_prevs.append(quantification_fn(sample_instances))
true_prevs.append(sample_prev)
true_prevs = np.asarray(true_prevs)
estim_prevs = np.asarray(estim_prevs)
return true_prevs, estim_prevs
def evaluation_report(model: BaseQuantifier,
protocol: AbstractProtocol,
error_metrics: Iterable[Union[str,Callable]] = 'mae',
aggr_speedup: Union[str, bool] = 'auto',
verbose=False):
"""
Generates an evaluation report for all samples generated according to the Artificial Prevalence Protocol (APP).
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
[1, 0, 0] prevalence values of size `sample_size` will be considered). The number of samples for each valid
combination of prevalence values is indicated by `repeats`.
Te report takes the form of a
pandas' `dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
in which the rows correspond to different samples, and the columns inform of the true prevalence values,
the estimated prevalence values, and the score obtained by each of the evaluation measures indicated.
Generates a report (a pandas' DataFrame) containing information of the evaluation of the model as according
to a specific protocol and in terms of one or more evaluation metrics (errors).
:param model: the model in charge of generating the class prevalence estimations
:param test: the test set on which to perform APP
:param sample_size: integer, the size of the samples; if None, then the sample size is
taken from qp.environ['SAMPLE_SIZE']
:param n_prevpoints: integer, the number of different prevalences to sample (or set to None if eval_budget
is specified; default 101, i.e., steps of 1%)
:param n_repetitions: integer, the number of repetitions for each prevalence (default 1)
:param eval_budget: integer, if specified, sets a ceil on the number of evaluations to perform. For example, if
there are 3 classes, `repeats=1`, and `eval_budget=20`, then `n_prevpoints` will be set to 5, since this
will generate 15 different prevalence vectors ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0]) and
since setting `n_prevpoints=6` would produce more than 20 evaluations.
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:param random_seed: integer, allows to replicate the samplings. The seed is local to the method and does not affect
any other random process (default 42)
:param error_metrics: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
callable error function; optionally, a list of strings or callables can be indicated, if the results
are to be evaluated with more than one error metric. Default is "mae"
:param verbose: if True, shows a progress bar
:return: pandas' dataframe with rows corresponding to different samples, and with columns informing of the
true prevalence values, the estimated prevalence values, and the score obtained by each of the evaluation
measures indicated.
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
:param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of
:class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the protocol
in charge of generating the samples in which the model is evaluated.
:param error_metrics: a string, or list of strings, representing the name(s) of an error function in `qp.error`
(e.g., 'mae', the default value), or a callable function, or a list of callable functions, implementing
the error function itself.
:param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of
instances in the original collection on which the protocol acts is larger than the number of instances
in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is
convenient or not. Set to False to deactivate.
:param verbose: boolean, show or not information in stdout
:return: a pandas' DataFrame containing the columns 'true-prev' (the true prevalence of each sample),
'estim-prev' (the prevalence estimated by the model for each sample), and as many columns as error metrics
have been indicated, each displaying the score in terms of that metric for every sample.
"""
true_prevs, estim_prevs = artificial_prevalence_prediction(
model, test, sample_size, n_prevpoints, n_repetitions, eval_budget, n_jobs, random_seed, verbose
)
true_prevs, estim_prevs = prediction(model, protocol, aggr_speedup=aggr_speedup, verbose=verbose)
return _prevalence_report(true_prevs, estim_prevs, error_metrics)
def natural_prevalence_report(
model: BaseQuantifier,
test: LabelledCollection,
sample_size=None,
repeats=100,
n_jobs=1,
random_seed=42,
error_metrics:Iterable[Union[str,Callable]]='mae',
verbose=False):
"""
Generates an evaluation report for all samples generated according to the Natural Prevalence Protocol (NPP).
The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural
prevalence of the collection.
Te report takes the form of a
pandas' `dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
in which the rows correspond to different samples, and the columns inform of the true prevalence values,
the estimated prevalence values, and the score obtained by each of the evaluation measures indicated.
:param model: the model in charge of generating the class prevalence estimations
:param test: the test set on which to perform NPP
:param sample_size: integer, the size of the samples; if None, then the sample size is
taken from qp.environ['SAMPLE_SIZE']
:param repeats: integer, the number of samples to generate (default 100)
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect
any other random process (default 42)
:param error_metrics: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
callable error function; optionally, a list of strings or callables can be indicated, if the results
are to be evaluated with more than one error metric. Default is "mae"
:param verbose: if True, shows a progress bar
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
`(repeats)` and `n` the number of classes. The first one contains the true prevalence values
for the samples generated while the second one contains the prevalence estimations
"""
sample_size = _check_sample_size(sample_size)
true_prevs, estim_prevs = natural_prevalence_prediction(
model, test, sample_size, repeats, n_jobs, random_seed, verbose
)
return _prevalence_report(true_prevs, estim_prevs, error_metrics)
def gen_prevalence_report(model: BaseQuantifier, gen_fn: Callable, eval_budget=None,
error_metrics:Iterable[Union[str,Callable]]='mae'):
"""
GGenerates an evaluation report for a custom protocol defined as a generator function that yields
samples at each iteration. The sequence of samples is processed exhaustively if `eval_budget=None`
or up to the `eval_budget` iterations if specified.
Te report takes the form of a
pandas' `dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
in which the rows correspond to different samples, and the columns inform of the true prevalence values,
the estimated prevalence values, and the score obtained by each of the evaluation measures indicated.
:param model: the model in charge of generating the class prevalence estimations
:param gen_fn: a generator function yielding one sample at each iteration
:param eval_budget: a maximum number of evaluations to run. Set to None (default) for exploring the
entire sequence
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
generated. The first one contains the true prevalence values
for the samples generated while the second one contains the prevalence estimations
"""
true_prevs, estim_prevs = gen_prevalence_prediction(model, gen_fn, eval_budget)
return _prevalence_report(true_prevs, estim_prevs, error_metrics)
def _prevalence_report(
true_prevs,
estim_prevs,
error_metrics: Iterable[Union[str, Callable]] = 'mae'):
def _prevalence_report(true_prevs, estim_prevs, error_metrics: Iterable[Union[str, Callable]] = 'mae'):
if isinstance(error_metrics, str):
error_metrics = [error_metrics]
error_names = [e if isinstance(e, str) else e.__name__ for e in error_metrics]
error_funcs = [qp.error.from_name(e) if isinstance(e, str) else e for e in error_metrics]
assert all(hasattr(e, '__call__') for e in error_funcs), 'invalid error functions'
error_names = [e.__name__ for e in error_funcs]
df = pd.DataFrame(columns=['true-prev', 'estim-prev'] + error_names)
for true_prev, estim_prev in zip(true_prevs, estim_prevs):
@ -303,145 +129,59 @@ def _prevalence_report(
return df
def artificial_prevalence_protocol(
def evaluate(
model: BaseQuantifier,
test: LabelledCollection,
sample_size=None,
n_prevpoints=101,
repeats=1,
eval_budget: int = None,
n_jobs=1,
random_seed=42,
error_metric:Union[str,Callable]='mae',
protocol: AbstractProtocol,
error_metric: Union[str, Callable],
aggr_speedup: Union[str, bool] = 'auto',
verbose=False):
"""
Generates samples according to the Artificial Prevalence Protocol (APP).
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
[1, 0, 0] prevalence values of size `sample_size` will be considered). The number of samples for each valid
combination of prevalence values is indicated by `repeats`.
Evaluates a quantification model according to a specific sample generation protocol and in terms of one
evaluation metric (error).
:param model: the model in charge of generating the class prevalence estimations
:param test: the test set on which to perform APP
:param sample_size: integer, the size of the samples; if None, then the sample size is
taken from qp.environ['SAMPLE_SIZE']
:param n_prevpoints: integer, the number of different prevalences to sample (or set to None if eval_budget
is specified; default 101, i.e., steps of 1%)
:param repeats: integer, the number of repetitions for each prevalence (default 1)
:param eval_budget: integer, if specified, sets a ceil on the number of evaluations to perform. For example, if
there are 3 classes, `repeats=1`, and `eval_budget=20`, then `n_prevpoints` will be set to 5, since this
will generate 15 different prevalence vectors ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0]) and
since setting `n_prevpoints=6` would produce more than 20 evaluations.
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:param random_seed: integer, allows to replicate the samplings. The seed is local to the method and does not affect
any other random process (default 42)
:param error_metric: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
callable error function
:param verbose: set to True (default False) for displaying some information on standard output
:return: yields one sample at a time
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
:param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of
:class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the
protocol in charge of generating the samples in which the model is evaluated.
:param error_metric: a string representing the name(s) of an error function in `qp.error`
(e.g., 'mae'), or a callable function implementing the error function itself.
:param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of
instances in the original collection on which the protocol acts is larger than the number of instances
in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is
convenient or not. Set to False to deactivate.
:param verbose: boolean, show or not information in stdout
:return: if the error metric is not averaged (e.g., 'ae', 'rae'), returns an array of shape `(n_samples,)` with
the error scores for each sample; if the error metric is averaged (e.g., 'mae', 'mrae') then returns
a single float
"""
if isinstance(error_metric, str):
error_metric = qp.error.from_name(error_metric)
assert hasattr(error_metric, '__call__'), 'invalid error function'
true_prevs, estim_prevs = artificial_prevalence_prediction(
model, test, sample_size, n_prevpoints, repeats, eval_budget, n_jobs, random_seed, verbose
)
true_prevs, estim_prevs = prediction(model, protocol, aggr_speedup=aggr_speedup, verbose=verbose)
return error_metric(true_prevs, estim_prevs)
def natural_prevalence_protocol(
def evaluate_on_samples(
model: BaseQuantifier,
test: LabelledCollection,
sample_size=None,
repeats=100,
n_jobs=1,
random_seed=42,
error_metric:Union[str,Callable]='mae',
samples: Iterable[qp.data.LabelledCollection],
error_metric: Union[str, Callable],
verbose=False):
"""
Generates samples according to the Natural Prevalence Protocol (NPP).
The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural
prevalence of the collection.
Evaluates a quantification model on a given set of samples and in terms of one evaluation metric (error).
:param model: the model in charge of generating the class prevalence estimations
:param test: the test set on which to perform NPP
:param sample_size: integer, the size of the samples; if None, then the sample size is
taken from qp.environ['SAMPLE_SIZE']
:param repeats: integer, the number of samples to generate
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect
any other random process (default 42)
:param error_metric: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
callable error function
:param verbose: if True, shows a progress bar
:return: yields one sample at a time
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
:param samples: a list of samples on which the quantifier is to be evaluated
:param error_metric: a string representing the name(s) of an error function in `qp.error`
(e.g., 'mae'), or a callable function implementing the error function itself.
:param verbose: boolean, show or not information in stdout
:return: if the error metric is not averaged (e.g., 'ae', 'rae'), returns an array of shape `(n_samples,)` with
the error scores for each sample; if the error metric is averaged (e.g., 'mae', 'mrae') then returns
a single float
"""
if isinstance(error_metric, str):
error_metric = qp.error.from_name(error_metric)
assert hasattr(error_metric, '__call__'), 'invalid error function'
true_prevs, estim_prevs = natural_prevalence_prediction(
model, test, sample_size, repeats, n_jobs, random_seed, verbose
)
return error_metric(true_prevs, estim_prevs)
return evaluate(model, IterateProtocol(samples), error_metric, aggr_speedup=False, verbose=verbose)
def evaluate(model: BaseQuantifier, test_samples:Iterable[LabelledCollection], error_metric:Union[str, Callable], n_jobs:int=-1):
"""
Evaluates a model on a sequence of test samples in terms of a given error metric.
:param model: the model in charge of generating the class prevalence estimations
:param test_samples: an iterable yielding one sample at a time
:param error_metric: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
callable error function
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
:return: the score obtained using `error_metric`
"""
if isinstance(error_metric, str):
error_metric = qp.error.from_name(error_metric)
scores = qp.util.parallel(_delayed_eval, ((model, Ti, error_metric) for Ti in test_samples), n_jobs=n_jobs)
return np.mean(scores)
def _delayed_eval(args):
model, test, error = args
prev_estim = model.quantify(test.instances)
prev_true = test.prevalence()
return error(prev_true, prev_estim)
def _check_num_evals(n_classes, n_prevpoints=None, eval_budget=None, repeats=1, verbose=False):
if n_prevpoints is None and eval_budget is None:
raise ValueError('either n_prevpoints or eval_budget has to be specified')
elif n_prevpoints is None:
assert eval_budget > 0, 'eval_budget must be a positive integer'
n_prevpoints = F.get_nprevpoints_approximation(eval_budget, n_classes, repeats)
eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
if verbose:
print(f'setting n_prevpoints={n_prevpoints} so that the number of '
f'evaluations ({eval_computations}) does not exceed the evaluation '
f'budget ({eval_budget})')
elif eval_budget is None:
eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
if verbose:
print(f'{eval_computations} evaluations will be performed for each '
f'combination of hyper-parameters')
else:
eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
if eval_computations > eval_budget:
n_prevpoints = F.get_nprevpoints_approximation(eval_budget, n_classes, repeats)
new_eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
if verbose:
print(f'the budget of evaluations would be exceeded with '
f'n_prevpoints={n_prevpoints}. Chaning to n_prevpoints={n_prevpoints}. This will produce '
f'{new_eval_computations} evaluation computations for each hyper-parameter combination.')
return n_prevpoints, eval_computations

View File

@ -4,37 +4,6 @@ import scipy
import numpy as np
def artificial_prevalence_sampling(dimensions, n_prevalences=21, repeat=1, return_constrained_dim=False):
"""
Generates vectors of prevalence values artificially drawn from an exhaustive grid of prevalence values. The
number of prevalence values explored for each dimension depends on `n_prevalences`, so that, if, for example,
`n_prevalences=11` then the prevalence values of the grid are taken from [0, 0.1, 0.2, ..., 0.9, 1]. Only
valid prevalence distributions are returned, i.e., vectors of prevalence values that sum up to 1. For each
valid vector of prevalence values, `repeat` copies are returned. The vector of prevalence values can be
implicit (by setting `return_constrained_dim=False`), meaning that the last dimension (which is constrained
to 1 - sum of the rest) is not returned (note that, quite obviously, in this case the vector does not sum up to 1).
:param dimensions: the number of classes
:param n_prevalences: the number of equidistant prevalence points to extract from the [0,1] interval for the grid
(default is 21)
:param repeat: number of copies for each valid prevalence vector (default is 1)
:param return_constrained_dim: set to True to return all dimensions, or to False (default) for ommitting the
constrained dimension
:return: a `np.ndarray` of shape `(n, dimensions)` if `return_constrained_dim=True` or of shape `(n, dimensions-1)`
if `return_constrained_dim=False`, where `n` is the number of valid combinations found in the grid multiplied
by `repeat`
"""
s = np.linspace(0., 1., n_prevalences, endpoint=True)
s = [s] * (dimensions - 1)
prevs = [p for p in itertools.product(*s, repeat=1) if sum(p)<=1]
if return_constrained_dim:
prevs = [p+(1-sum(p),) for p in prevs]
prevs = np.asarray(prevs).reshape(len(prevs), -1)
if repeat>1:
prevs = np.repeat(prevs, repeat, axis=0)
return prevs
def prevalence_linspace(n_prevalences=21, repeats=1, smooth_limits_epsilon=0.01):
"""
Produces an array of uniformly separated values of prevalence.
@ -70,7 +39,7 @@ def prevalence_from_labels(labels, classes):
raise ValueError(f'param labels does not seem to be a ndarray of label predictions')
unique, counts = np.unique(labels, return_counts=True)
by_class = defaultdict(lambda:0, dict(zip(unique, counts)))
prevalences = np.asarray([by_class[class_] for class_ in classes], dtype=np.float)
prevalences = np.asarray([by_class[class_] for class_ in classes], dtype=float)
prevalences /= prevalences.sum()
return prevalences
@ -101,7 +70,7 @@ def HellingerDistance(P, Q):
The HD for two discrete distributions of `k` bins is defined as:
.. math::
HD(P,Q) = \\frac{ 1 }{ \\sqrt{ 2 } } \\sqrt{ \sum_{i=1}^k ( \\sqrt{p_i} - \\sqrt{q_i} )^2 }
HD(P,Q) = \\frac{ 1 }{ \\sqrt{ 2 } } \\sqrt{ \\sum_{i=1}^k ( \\sqrt{p_i} - \\sqrt{q_i} )^2 }
:param P: real-valued array-like of shape `(k,)` representing a discrete distribution
:param Q: real-valued array-like of shape `(k,)` representing a discrete distribution
@ -110,6 +79,22 @@ def HellingerDistance(P, Q):
return np.sqrt(np.sum((np.sqrt(P) - np.sqrt(Q))**2))
def TopsoeDistance(P, Q, epsilon=1e-20):
"""
Topsoe distance between two (discretized) distributions `P` and `Q`.
The Topsoe distance for two discrete distributions of `k` bins is defined as:
.. math::
Topsoe(P,Q) = \\sum_{i=1}^k \\left( p_i \\log\\left(\\frac{ 2 p_i + \\epsilon }{ p_i+q_i+\\epsilon }\\right) +
q_i \\log\\left(\\frac{ 2 q_i + \\epsilon }{ p_i+q_i+\\epsilon }\\right) \\right)
:param P: real-valued array-like of shape `(k,)` representing a discrete distribution
:param Q: real-valued array-like of shape `(k,)` representing a discrete distribution
:return: float
"""
return np.sum(P*np.log((2*P+epsilon)/(P+Q+epsilon)) + Q*np.log((2*Q+epsilon)/(P+Q+epsilon)))
def uniform_prevalence_sampling(n_classes, size=1):
"""
Implements the `Kraemer algorithm <http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf>`_
@ -161,7 +146,6 @@ def adjusted_quantification(prevalence_estim, tpr, fpr, clip=True):
.. math::
ACC(p) = \\frac{ p - fpr }{ tpr - fpr }
:param prevalence_estim: float, the estimated value for the positive class
:param tpr: float, the true positive rate of the classifier
:param fpr: float, the false positive rate of the classifier
@ -209,7 +193,7 @@ def __num_prevalence_combinations_depr(n_prevpoints:int, n_classes:int, n_repeat
:param n_prevpoints: integer, number of prevalence points.
:param n_repeats: integer, number of repetitions for each prevalence combination
:return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
"""
__cache={}
def __f(nc,np):
@ -241,7 +225,7 @@ def num_prevalence_combinations(n_prevpoints:int, n_classes:int, n_repeats:int=1
:param n_prevpoints: integer, number of prevalence points.
:param n_repeats: integer, number of repetitions for each prevalence combination
:return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
"""
N = n_prevpoints-1
C = n_classes
@ -255,7 +239,7 @@ def get_nprevpoints_approximation(combinations_budget:int, n_classes:int, n_repe
that the number of valid prevalence values generated as combinations of prevalence points (points in a
`n_classes`-dimensional simplex) do not exceed combinations_budget.
:param combinations_budget: integer, maximum number of combinatios allowed
:param combinations_budget: integer, maximum number of combinations allowed
:param n_classes: integer, number of classes
:param n_repeats: integer, number of repetitions for each prevalence combination
:return: the largest number of prevalence points that generate less than combinations_budget valid prevalences
@ -269,3 +253,26 @@ def get_nprevpoints_approximation(combinations_budget:int, n_classes:int, n_repe
else:
n_prevpoints += 1
def check_prevalence_vector(p, raise_exception=False, toleranze=1e-08):
"""
Checks that p is a valid prevalence vector, i.e., that it contains values in [0,1] and that the values sum up to 1.
:param p: the prevalence vector to check
:return: True if `p` is valid, False otherwise
"""
p = np.asarray(p)
if not all(p>=0):
if raise_exception:
raise ValueError('the prevalence vector contains negative numbers')
return False
if not all(p<=1):
if raise_exception:
raise ValueError('the prevalence vector contains values >1')
return False
if not np.isclose(p.sum(), 1, atol=toleranze):
if raise_exception:
raise ValueError('the prevalence vector does not sum up to 1')
return False
return True

View File

@ -3,15 +3,6 @@ from . import base
from . import meta
from . import non_aggregative
EXPLICIT_LOSS_MINIMIZATION_METHODS = {
aggregative.ELM,
aggregative.SVMQ,
aggregative.SVMAE,
aggregative.SVMKLD,
aggregative.SVMRAE,
aggregative.SVMNKLD
}
AGGREGATIVE_METHODS = {
aggregative.CC,
aggregative.ACC,
@ -19,12 +10,14 @@ AGGREGATIVE_METHODS = {
aggregative.PACC,
aggregative.EMQ,
aggregative.HDy,
aggregative.DyS,
aggregative.SMM,
aggregative.X,
aggregative.T50,
aggregative.MAX,
aggregative.MS,
aggregative.MS2,
} | EXPLICIT_LOSS_MINIMIZATION_METHODS
}
NON_AGGREGATIVE_METHODS = {

File diff suppressed because it is too large Load Diff

View File

@ -1,11 +1,17 @@
from abc import ABCMeta, abstractmethod
from copy import deepcopy
from joblib import Parallel, delayed
from sklearn.base import BaseEstimator
import quapy as qp
from quapy.data import LabelledCollection
import numpy as np
# Base Quantifier abstract class
# ------------------------------------
class BaseQuantifier(metaclass=ABCMeta):
class BaseQuantifier(BaseEstimator):
"""
Abstract Quantifier. A quantifier is defined as an object of a class that implements the method :meth:`fit` on
:class:`quapy.data.base.LabelledCollection`, the method :meth:`quantify`, and the :meth:`set_params` and
@ -28,79 +34,10 @@ class BaseQuantifier(metaclass=ABCMeta):
Generate class prevalence estimates for the sample's instances
:param instances: array-like
:return: `np.ndarray` of shape `(self.n_classes_,)` with class prevalence estimates.
:return: `np.ndarray` of shape `(n_classes,)` with class prevalence estimates.
"""
...
@abstractmethod
def set_params(self, **parameters):
"""
Set the parameters of the quantifier.
:param parameters: dictionary of param-value pairs
"""
...
@abstractmethod
def get_params(self, deep=True):
"""
Return the current parameters of the quantifier.
:param deep: for compatibility with sklearn
:return: a dictionary of param-value pairs
"""
...
@property
@abstractmethod
def classes_(self):
"""
Class labels, in the same order in which class prevalence values are to be computed.
:return: array-like
"""
...
@property
def n_classes(self):
"""
Returns the number of classes
:return: integer
"""
return len(self.classes_)
# these methods allows meta-learners to reimplement the decision based on their constituents, and not
# based on class structure
@property
def binary(self):
"""
Indicates whether the quantifier is binary or not.
:return: False (to be overridden)
"""
return False
@property
def aggregative(self):
"""
Indicates whether the quantifier is of type aggregative or not
:return: False (to be overridden)
"""
return False
@property
def probabilistic(self):
"""
Indicates whether the quantifier is of type probabilistic or not
:return: False (to be overridden)
"""
return False
class BinaryQuantifier(BaseQuantifier):
"""
@ -112,90 +49,61 @@ class BinaryQuantifier(BaseQuantifier):
assert data.binary, f'{quantifier_name} works only on problems of binary classification. ' \
f'Use the class OneVsAll to enable {quantifier_name} work on single-label data.'
class OneVsAll:
pass
def newOneVsAll(binary_quantifier, n_jobs=None):
assert isinstance(binary_quantifier, BaseQuantifier), \
f'{binary_quantifier} does not seem to be a Quantifier'
if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):
return qp.method.aggregative.OneVsAllAggregative(binary_quantifier, n_jobs)
else:
return OneVsAllGeneric(binary_quantifier, n_jobs)
class OneVsAllGeneric(OneVsAll,BaseQuantifier):
"""
Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary
quantifier for each class, and then l1-normalizes the outputs so that the class prevelence values sum up to 1.
"""
def __init__(self, binary_quantifier, n_jobs=None):
assert isinstance(binary_quantifier, BaseQuantifier), \
f'{binary_quantifier} does not seem to be a Quantifier'
if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):
print('[warning] the quantifier seems to be an instance of qp.method.aggregative.AggregativeQuantifier; '
f'you might prefer instantiating {qp.method.aggregative.OneVsAllAggregative.__name__}')
self.binary_quantifier = binary_quantifier
self.n_jobs = qp._get_njobs(n_jobs)
def fit(self, data: LabelledCollection, fit_classifier=True):
assert not data.binary, f'{self.__class__.__name__} expect non-binary data'
assert fit_classifier == True, 'fit_classifier must be True'
self.dict_binary_quantifiers = {c: deepcopy(self.binary_quantifier) for c in data.classes_}
self._parallel(self._delayed_binary_fit, data)
return self
def _parallel(self, func, *args, **kwargs):
return np.asarray(
Parallel(n_jobs=self.n_jobs, backend='threading')(
delayed(func)(c, *args, **kwargs) for c in self.classes_
)
)
def quantify(self, instances):
prevalences = self._parallel(self._delayed_binary_predict, instances)
return qp.functional.normalize_prevalence(prevalences)
@property
def binary(self):
"""
Informs that the quantifier is binary
:return: True
"""
return True
def isbinary(model:BaseQuantifier):
"""
Alias for property `binary`
:param model: the model
:return: True if the model is binary, False otherwise
"""
return model.binary
def isaggregative(model:BaseQuantifier):
"""
Alias for property `aggregative`
:param model: the model
:return: True if the model is aggregative, False otherwise
"""
return model.aggregative
def isprobabilistic(model:BaseQuantifier):
"""
Alias for property `probabilistic`
:param model: the model
:return: True if the model is probabilistic, False otherwise
"""
return model.probabilistic
# class OneVsAll:
# """
# Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary
# quantifier for each class, and then l1-normalizes the outputs so that the class prevelences sum up to 1.
# """
#
# def __init__(self, binary_method, n_jobs=-1):
# self.binary_method = binary_method
# self.n_jobs = n_jobs
#
# def fit(self, data: LabelledCollection, **kwargs):
# assert not data.binary, f'{self.__class__.__name__} expect non-binary data'
# assert isinstance(self.binary_method, BaseQuantifier), f'{self.binary_method} does not seem to be a Quantifier'
# self.class_method = {c: deepcopy(self.binary_method) for c in data.classes_}
# Parallel(n_jobs=self.n_jobs, backend='threading')(
# delayed(self._delayed_binary_fit)(c, self.class_method, data, **kwargs) for c in data.classes_
# )
# return self
#
# def quantify(self, X, *args):
# prevalences = np.asarray(
# Parallel(n_jobs=self.n_jobs, backend='threading')(
# delayed(self._delayed_binary_predict)(c, self.class_method, X) for c in self.classes
# )
# )
# return F.normalize_prevalence(prevalences)
#
# @property
# def classes(self):
# return sorted(self.class_method.keys())
#
# def set_params(self, **parameters):
# self.binary_method.set_params(**parameters)
#
# def get_params(self, deep=True):
# return self.binary_method.get_params()
#
# def _delayed_binary_predict(self, c, learners, X):
# return learners[c].quantify(X)[:,1] # the mean is the estimation for the positive class prevalence
#
# def _delayed_binary_fit(self, c, learners, data, **kwargs):
# bindata = LabelledCollection(data.instances, data.labels == c, n_classes=2)
# learners[c].fit(bindata, **kwargs)
def classes_(self):
return sorted(self.dict_binary_quantifiers.keys())
def _delayed_binary_predict(self, c, X):
return self.dict_binary_quantifiers[c].quantify(X)[1]
def _delayed_binary_fit(self, c, data):
bindata = LabelledCollection(data.instances, data.labels == c, classes=[False, True])
self.dict_binary_quantifiers[c].fit(bindata)

View File

@ -9,7 +9,6 @@ from tqdm import tqdm
import quapy as qp
from quapy import functional as F
from quapy.data import LabelledCollection
from quapy.evaluation import evaluate
from quapy.model_selection import GridSearchQ
try:
@ -73,7 +72,7 @@ class Ensemble(BaseQuantifier):
policy='ave',
max_sample_size=None,
val_split:Union[qp.data.LabelledCollection, float]=None,
n_jobs=1,
n_jobs=None,
verbose=False):
assert policy in Ensemble.VALID_POLICIES, \
f'unknown policy={policy}; valid are {Ensemble.VALID_POLICIES}'
@ -85,7 +84,7 @@ class Ensemble(BaseQuantifier):
self.red_size = red_size
self.policy = policy
self.val_split = val_split
self.n_jobs = n_jobs
self.n_jobs = qp._get_njobs(n_jobs)
self.post_proba_fn = None
self.verbose = verbose
self.max_sample_size = max_sample_size
@ -147,15 +146,15 @@ class Ensemble(BaseQuantifier):
This function should not be used within :class:`quapy.model_selection.GridSearchQ` (is here for compatibility
with the abstract class).
Instead, use `Ensemble(GridSearchQ(q),...)`, with `q` a Quantifier (recommended), or
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a learner `l` optimized for
classification (not recommended).
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a classifier `l` optimized for
classification (not recommended).
:param parameters: dictionary
:return: raises an Exception
"""
raise NotImplementedError(f'{self.__class__.__name__} should not be used within GridSearchQ; '
f'instead, use Ensemble(GridSearchQ(q),...), with q a Quantifier (recommended), '
f'or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a learner '
f'or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a classifier '
f'l optimized for classification (not recommended).')
def get_params(self, deep=True):
@ -163,11 +162,13 @@ class Ensemble(BaseQuantifier):
This function should not be used within :class:`quapy.model_selection.GridSearchQ` (is here for compatibility
with the abstract class).
Instead, use `Ensemble(GridSearchQ(q),...)`, with `q` a Quantifier (recommended), or
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a learner `l` optimized for
classification (not recommended).
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a classifier `l` optimized for
classification (not recommended).
:param deep: for compatibility with scikit-learn
:return: raises an Exception
"""
raise NotImplementedError()
def _accuracy_policy(self, error_name):
@ -176,11 +177,12 @@ class Ensemble(BaseQuantifier):
For each model in the ensemble, the performance is measured in terms of _error_name_ on the quantification of
the samples used for training the rest of the models in the ensemble.
"""
from quapy.evaluation import evaluate_on_samples
error = qp.error.from_name(error_name)
tests = [m[3] for m in self.ensemble]
scores = []
for i, model in enumerate(self.ensemble):
scores.append(evaluate(model[0], tests[:i] + tests[i + 1:], error, self.n_jobs))
scores.append(evaluate_on_samples(model[0], tests[:i] + tests[i + 1:], error))
order = np.argsort(scores)
self.ensemble = _select_k(self.ensemble, order, k=self.red_size)
@ -234,19 +236,6 @@ class Ensemble(BaseQuantifier):
order = np.argsort(dist)
return _select_k(predictions, order, k=self.red_size)
@property
def classes_(self):
return self.base_quantifier.classes_
@property
def binary(self):
"""
Returns a boolean indicating whether the base quantifiers are binary or not
:return: boolean
"""
return self.base_quantifier.binary
@property
def aggregative(self):
"""
@ -339,18 +328,18 @@ def _draw_simplex(ndim, min_val, max_trials=100):
f'>= {min_val} is unlikely (it failed after {max_trials} trials)')
def _instantiate_ensemble(learner, base_quantifier_class, param_grid, optim, param_model_sel, **kwargs):
def _instantiate_ensemble(classifier, base_quantifier_class, param_grid, optim, param_model_sel, **kwargs):
if optim is None:
base_quantifier = base_quantifier_class(learner)
base_quantifier = base_quantifier_class(classifier)
elif optim in qp.error.CLASSIFICATION_ERROR:
if optim == qp.error.f1e:
scoring = make_scorer(f1_score)
elif optim == qp.error.acce:
scoring = make_scorer(accuracy_score)
learner = GridSearchCV(learner, param_grid, scoring=scoring)
base_quantifier = base_quantifier_class(learner)
classifier = GridSearchCV(classifier, param_grid, scoring=scoring)
base_quantifier = base_quantifier_class(classifier)
else:
base_quantifier = GridSearchQ(base_quantifier_class(learner),
base_quantifier = GridSearchQ(base_quantifier_class(classifier),
param_grid=param_grid,
**param_model_sel,
error=optim)
@ -370,7 +359,7 @@ def _check_error(error):
f'the name of an error function in {qp.error.ERROR_NAMES}')
def ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None, param_model_sel: dict = None,
def ensembleFactory(classifier, base_quantifier_class, param_grid=None, optim=None, param_model_sel: dict = None,
**kwargs):
"""
Ensemble factory. Provides a unified interface for instantiating ensembles that can be optimized (via model
@ -403,7 +392,7 @@ def ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None,
>>>
>>> ensembleFactory(LogisticRegression(), PACC, optim='mae', policy='mae', **common)
:param learner: sklearn's Estimator that generates a classifier
:param classifier: sklearn's Estimator that generates a classifier
:param base_quantifier_class: a class of quantifiers
:param param_grid: a dictionary with the grid of parameters to optimize for
:param optim: a valid quantification or classification error, or a string name of it
@ -418,21 +407,21 @@ def ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None,
if param_model_sel is None:
raise ValueError(f'param_model_sel is None but optim was requested.')
error = _check_error(optim)
return _instantiate_ensemble(learner, base_quantifier_class, param_grid, error, param_model_sel, **kwargs)
return _instantiate_ensemble(classifier, base_quantifier_class, param_grid, error, param_model_sel, **kwargs)
def ECC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
def ECC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
"""
Implements an ensemble of :class:`quapy.method.aggregative.CC` quantifiers, as used by
`Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.
Equivalent to:
>>> ensembleFactory(learner, CC, param_grid, optim, param_mod_sel, **kwargs)
>>> ensembleFactory(classifier, CC, param_grid, optim, param_mod_sel, **kwargs)
See :meth:`ensembleFactory` for further details.
:param learner: sklearn's Estimator that generates a classifier
:param classifier: sklearn's Estimator that generates a classifier
:param param_grid: a dictionary with the grid of parameters to optimize for
:param optim: a valid quantification or classification error, or a string name of it
:param param_model_sel: a dictionary containing any keyworded argument to pass to
@ -441,21 +430,21 @@ def ECC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
:return: an instance of :class:`Ensemble`
"""
return ensembleFactory(learner, CC, param_grid, optim, param_mod_sel, **kwargs)
return ensembleFactory(classifier, CC, param_grid, optim, param_mod_sel, **kwargs)
def EACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
def EACC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
"""
Implements an ensemble of :class:`quapy.method.aggregative.ACC` quantifiers, as used by
`Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.
Equivalent to:
>>> ensembleFactory(learner, ACC, param_grid, optim, param_mod_sel, **kwargs)
>>> ensembleFactory(classifier, ACC, param_grid, optim, param_mod_sel, **kwargs)
See :meth:`ensembleFactory` for further details.
:param learner: sklearn's Estimator that generates a classifier
:param classifier: sklearn's Estimator that generates a classifier
:param param_grid: a dictionary with the grid of parameters to optimize for
:param optim: a valid quantification or classification error, or a string name of it
:param param_model_sel: a dictionary containing any keyworded argument to pass to
@ -464,20 +453,20 @@ def EACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
:return: an instance of :class:`Ensemble`
"""
return ensembleFactory(learner, ACC, param_grid, optim, param_mod_sel, **kwargs)
return ensembleFactory(classifier, ACC, param_grid, optim, param_mod_sel, **kwargs)
def EPACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
def EPACC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
"""
Implements an ensemble of :class:`quapy.method.aggregative.PACC` quantifiers.
Equivalent to:
>>> ensembleFactory(learner, PACC, param_grid, optim, param_mod_sel, **kwargs)
>>> ensembleFactory(classifier, PACC, param_grid, optim, param_mod_sel, **kwargs)
See :meth:`ensembleFactory` for further details.
:param learner: sklearn's Estimator that generates a classifier
:param classifier: sklearn's Estimator that generates a classifier
:param param_grid: a dictionary with the grid of parameters to optimize for
:param optim: a valid quantification or classification error, or a string name of it
:param param_model_sel: a dictionary containing any keyworded argument to pass to
@ -486,21 +475,21 @@ def EPACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
:return: an instance of :class:`Ensemble`
"""
return ensembleFactory(learner, PACC, param_grid, optim, param_mod_sel, **kwargs)
return ensembleFactory(classifier, PACC, param_grid, optim, param_mod_sel, **kwargs)
def EHDy(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
def EHDy(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
"""
Implements an ensemble of :class:`quapy.method.aggregative.HDy` quantifiers, as used by
`Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.
Equivalent to:
>>> ensembleFactory(learner, HDy, param_grid, optim, param_mod_sel, **kwargs)
>>> ensembleFactory(classifier, HDy, param_grid, optim, param_mod_sel, **kwargs)
See :meth:`ensembleFactory` for further details.
:param learner: sklearn's Estimator that generates a classifier
:param classifier: sklearn's Estimator that generates a classifier
:param param_grid: a dictionary with the grid of parameters to optimize for
:param optim: a valid quantification or classification error, or a string name of it
:param param_model_sel: a dictionary containing any keyworded argument to pass to
@ -509,20 +498,20 @@ def EHDy(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
:return: an instance of :class:`Ensemble`
"""
return ensembleFactory(learner, HDy, param_grid, optim, param_mod_sel, **kwargs)
return ensembleFactory(classifier, HDy, param_grid, optim, param_mod_sel, **kwargs)
def EEMQ(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
def EEMQ(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
"""
Implements an ensemble of :class:`quapy.method.aggregative.EMQ` quantifiers.
Equivalent to:
>>> ensembleFactory(learner, EMQ, param_grid, optim, param_mod_sel, **kwargs)
>>> ensembleFactory(classifier, EMQ, param_grid, optim, param_mod_sel, **kwargs)
See :meth:`ensembleFactory` for further details.
:param learner: sklearn's Estimator that generates a classifier
:param classifier: sklearn's Estimator that generates a classifier
:param param_grid: a dictionary with the grid of parameters to optimize for
:param optim: a valid quantification or classification error, or a string name of it
:param param_model_sel: a dictionary containing any keyworded argument to pass to
@ -531,4 +520,4 @@ def EEMQ(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
:return: an instance of :class:`Ensemble`
"""
return ensembleFactory(learner, EMQ, param_grid, optim, param_mod_sel, **kwargs)
return ensembleFactory(classifier, EMQ, param_grid, optim, param_mod_sel, **kwargs)

View File

@ -6,6 +6,7 @@ import torch
from torch.nn import MSELoss
from torch.nn.functional import relu
from quapy.protocol import UPP
from quapy.method.aggregative import *
from quapy.util import EarlyStop
@ -31,17 +32,18 @@ class QuaNetTrainer(BaseQuantifier):
>>>
>>> # the text classifier is a CNN trained by NeuralClassifierTrainer
>>> cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
>>> learner = NeuralClassifierTrainer(cnn, device='cuda')
>>> classifier = NeuralClassifierTrainer(cnn, device='cuda')
>>>
>>> # train QuaNet (QuaNet is an alias to QuaNetTrainer)
>>> model = QuaNet(learner, qp.environ['SAMPLE_SIZE'], device='cuda')
>>> model = QuaNet(classifier, qp.environ['SAMPLE_SIZE'], device='cuda')
>>> model.fit(dataset.training)
>>> estim_prevalence = model.quantify(dataset.test.instances)
:param learner: an object implementing `fit` (i.e., that can be trained on labelled data),
:param classifier: an object implementing `fit` (i.e., that can be trained on labelled data),
`predict_proba` (i.e., that can generate posterior probabilities of unlabelled examples) and
`transform` (i.e., that can generate embedded representations of the unlabelled instances).
:param sample_size: integer, the sample size
:param sample_size: integer, the sample size; default is None, meaning that the sample size should be
taken from qp.environ["SAMPLE_SIZE"]
:param n_epochs: integer, maximum number of training epochs
:param tr_iter_per_poch: integer, number of training iterations before considering an epoch complete
:param va_iter_per_poch: integer, number of validation iterations to perform after each epoch
@ -60,8 +62,8 @@ class QuaNetTrainer(BaseQuantifier):
"""
def __init__(self,
learner,
sample_size,
classifier,
sample_size=None,
n_epochs=100,
tr_iter_per_poch=500,
va_iter_per_poch=100,
@ -76,14 +78,14 @@ class QuaNetTrainer(BaseQuantifier):
checkpointname=None,
device='cuda'):
assert hasattr(learner, 'transform'), \
f'the learner {learner.__class__.__name__} does not seem to be able to produce document embeddings ' \
assert hasattr(classifier, 'transform'), \
f'the classifier {classifier.__class__.__name__} does not seem to be able to produce document embeddings ' \
f'since it does not implement the method "transform"'
assert hasattr(learner, 'predict_proba'), \
f'the learner {learner.__class__.__name__} does not seem to be able to produce posterior probabilities ' \
assert hasattr(classifier, 'predict_proba'), \
f'the classifier {classifier.__class__.__name__} does not seem to be able to produce posterior probabilities ' \
f'since it does not implement the method "predict_proba"'
self.learner = learner
self.sample_size = sample_size
self.classifier = classifier
self.sample_size = qp._get_sample_size(sample_size)
self.n_epochs = n_epochs
self.tr_iter = tr_iter_per_poch
self.va_iter = va_iter_per_poch
@ -105,26 +107,26 @@ class QuaNetTrainer(BaseQuantifier):
self.checkpoint = os.path.join(checkpointdir, checkpointname)
self.device = torch.device(device)
self.__check_params_colision(self.quanet_params, self.learner.get_params())
self.__check_params_colision(self.quanet_params, self.classifier.get_params())
self._classes_ = None
def fit(self, data: LabelledCollection, fit_learner=True):
def fit(self, data: LabelledCollection, fit_classifier=True):
"""
Trains QuaNet.
:param data: the training data on which to train QuaNet. If `fit_learner=True`, the data will be split in
:param data: the training data on which to train QuaNet. If `fit_classifier=True`, the data will be split in
40/40/20 for training the classifier, training QuaNet, and validating QuaNet, respectively. If
`fit_learner=False`, the data will be split in 66/34 for training QuaNet and validating it, respectively.
:param fit_learner: if True, trains the classifier on a split containing 40% of the data
`fit_classifier=False`, the data will be split in 66/34 for training QuaNet and validating it, respectively.
:param fit_classifier: if True, trains the classifier on a split containing 40% of the data
:return: self
"""
self._classes_ = data.classes_
os.makedirs(self.checkpointdir, exist_ok=True)
if fit_learner:
if fit_classifier:
classifier_data, unused_data = data.split_stratified(0.4)
train_data, valid_data = unused_data.split_stratified(0.66) # 0.66 split of 60% makes 40% and 20%
self.learner.fit(*classifier_data.Xy)
self.classifier.fit(*classifier_data.Xy)
else:
classifier_data = None
train_data, valid_data = data.split_stratified(0.66)
@ -133,21 +135,21 @@ class QuaNetTrainer(BaseQuantifier):
self.tr_prev = data.prevalence()
# compute the posterior probabilities of the instances
valid_posteriors = self.learner.predict_proba(valid_data.instances)
train_posteriors = self.learner.predict_proba(train_data.instances)
valid_posteriors = self.classifier.predict_proba(valid_data.instances)
train_posteriors = self.classifier.predict_proba(train_data.instances)
# turn instances' original representations into embeddings
valid_data_embed = LabelledCollection(self.learner.transform(valid_data.instances), valid_data.labels, self._classes_)
train_data_embed = LabelledCollection(self.learner.transform(train_data.instances), train_data.labels, self._classes_)
valid_data_embed = LabelledCollection(self.classifier.transform(valid_data.instances), valid_data.labels, self._classes_)
train_data_embed = LabelledCollection(self.classifier.transform(train_data.instances), train_data.labels, self._classes_)
self.quantifiers = {
'cc': CC(self.learner).fit(None, fit_learner=False),
'acc': ACC(self.learner).fit(None, fit_learner=False, val_split=valid_data),
'pcc': PCC(self.learner).fit(None, fit_learner=False),
'pacc': PACC(self.learner).fit(None, fit_learner=False, val_split=valid_data),
'cc': CC(self.classifier).fit(None, fit_classifier=False),
'acc': ACC(self.classifier).fit(None, fit_classifier=False, val_split=valid_data),
'pcc': PCC(self.classifier).fit(None, fit_classifier=False),
'pacc': PACC(self.classifier).fit(None, fit_classifier=False, val_split=valid_data),
}
if classifier_data is not None:
self.quantifiers['emq'] = EMQ(self.learner).fit(classifier_data, fit_learner=False)
self.quantifiers['emq'] = EMQ(self.classifier).fit(classifier_data, fit_classifier=False)
self.status = {
'tr-loss': -1,
@ -191,7 +193,7 @@ class QuaNetTrainer(BaseQuantifier):
label_predictions = np.argmax(posteriors, axis=-1)
prevs_estim = []
for quantifier in self.quantifiers.values():
predictions = posteriors if quantifier.probabilistic else label_predictions
predictions = posteriors if isinstance(quantifier, AggregativeProbabilisticQuantifier) else label_predictions
prevs_estim.extend(quantifier.aggregate(predictions))
# there is no real need for adding static estims like the TPR or FPR from training since those are constant
@ -199,8 +201,8 @@ class QuaNetTrainer(BaseQuantifier):
return prevs_estim
def quantify(self, instances):
posteriors = self.learner.predict_proba(instances)
embeddings = self.learner.transform(instances)
posteriors = self.classifier.predict_proba(instances)
embeddings = self.classifier.transform(instances)
quant_estims = self._get_aggregative_estims(posteriors)
self.quanet.eval()
with torch.no_grad():
@ -216,16 +218,13 @@ class QuaNetTrainer(BaseQuantifier):
self.quanet.train(mode=train)
losses = []
mae_errors = []
if train==False:
prevpoints = F.get_nprevpoints_approximation(iterations, self.quanet.n_classes)
iterations = F.num_prevalence_combinations(prevpoints, self.quanet.n_classes)
with qp.util.temp_seed(0):
sampling_index_gen = data.artificial_sampling_index_generator(self.sample_size, prevpoints)
else:
sampling_index_gen = [data.sampling_index(self.sample_size, *prev) for prev in
F.uniform_simplex_sampling(data.n_classes, iterations)]
pbar = tqdm(sampling_index_gen, total=iterations) if train else sampling_index_gen
sampler = UPP(
data,
sample_size=self.sample_size,
repeats=iterations,
random_state=None if train else 0 # different samples during train, same samples during validation
)
pbar = tqdm(sampler.samples_parameters(), total=sampler.total())
for it, index in enumerate(pbar):
sample_data = data.sampling_from_index(index)
sample_posteriors = posteriors[index]
@ -264,7 +263,7 @@ class QuaNetTrainer(BaseQuantifier):
f'patience={early_stop.patience}/{early_stop.PATIENCE_LIMIT}')
def get_params(self, deep=True):
return {**self.learner.get_params(), **self.quanet_params}
return {**self.classifier.get_params(), **self.quanet_params}
def set_params(self, **parameters):
learner_params = {}
@ -273,7 +272,7 @@ class QuaNetTrainer(BaseQuantifier):
self.quanet_params[key] = val
else:
learner_params[key] = val
self.learner.set_params(**learner_params)
self.classifier.set_params(**learner_params)
def __check_params_colision(self, quanet_params, learner_params):
quanet_keys = set(quanet_params.keys())
@ -281,7 +280,7 @@ class QuaNetTrainer(BaseQuantifier):
intersection = quanet_keys.intersection(learner_keys)
if len(intersection) > 0:
raise ValueError(f'the use of parameters {intersection} is ambiguous sine those can refer to '
f'the parameters of QuaNet or the learner {self.learner.__class__.__name__}')
f'the parameters of QuaNet or the learner {self.classifier.__class__.__name__}')
def clean_checkpoint(self):
"""

View File

@ -21,7 +21,6 @@ class MaximumLikelihoodPrevalenceEstimation(BaseQuantifier):
:param data: the training sample
:return: self
"""
self._classes_ = data.classes_
self.estimated_prevalence = data.prevalence()
return self
@ -34,29 +33,3 @@ class MaximumLikelihoodPrevalenceEstimation(BaseQuantifier):
"""
return self.estimated_prevalence
@property
def classes_(self):
"""
Number of classes
:return: integer
"""
return self._classes_
def get_params(self, deep=True):
"""
Does nothing, since this learner has no parameters.
:param deep: for compatibility with sklearn
:return: `None`
"""
return None
def set_params(self, **parameters):
"""
Does nothing, since this learner has no parameters.
:param parameters: dictionary of param-value pairs (ignored)
"""
pass

View File

@ -4,14 +4,14 @@ from copy import deepcopy
from typing import Union, Callable
import numpy as np
from sklearn import clone
import quapy as qp
from quapy import evaluation
from quapy.protocol import AbstractProtocol, OnLabelledCollectionProtocol
from quapy.data.base import LabelledCollection
from quapy.evaluation import artificial_prevalence_prediction, natural_prevalence_prediction, gen_prevalence_prediction
from quapy.method.aggregative import BaseQuantifier
import inspect
from util import _check_sample_size
from time import time
class GridSearchQ(BaseQuantifier):
@ -23,33 +23,11 @@ class GridSearchQ(BaseQuantifier):
:param model: the quantifier to optimize
:type model: BaseQuantifier
:param param_grid: a dictionary with keys the parameter names and values the list of values to explore
:param sample_size: the size of the samples to extract from the validation set (ignored if protocl='gen')
:param protocol: either 'app' for the artificial prevalence protocol, 'npp' for the natural prevalence
protocol, or 'gen' for using a custom sampling generator function
:param n_prevpoints: if specified, indicates the number of equally distant points to extract from the interval
[0,1] in order to define the prevalences of the samples; e.g., if n_prevpoints=5, then the prevalences for
each class will be explored in [0.00, 0.25, 0.50, 0.75, 1.00]. If not specified, then eval_budget is requested.
Ignored if protocol!='app'.
:param n_repetitions: the number of repetitions for each combination of prevalences. This parameter is ignored
for the protocol='app' if eval_budget is set and is lower than the number of combinations that would be
generated using the value assigned to n_prevpoints (for the current number of classes and n_repetitions).
Ignored for protocol='npp' and protocol='gen' (use eval_budget for setting a maximum number of samples in
those cases).
:param eval_budget: if specified, sets a ceil on the number of evaluations to perform for each hyper-parameter
combination. For example, if protocol='app', there are 3 classes, n_repetitions=1 and eval_budget=20, then
n_prevpoints will be set to 5, since this will generate 15 different prevalences, i.e., [0, 0, 1],
[0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0], and since setting it to 6 would generate more than
20. When protocol='gen', indicates the maximum number of samples to generate, but less samples will be
generated if the generator yields less samples.
:param protocol: a sample generation protocol, an instance of :class:`quapy.protocol.AbstractProtocol`
:param error: an error function (callable) or a string indicating the name of an error function (valid ones
are those in qp.error.QUANTIFICATION_ERROR
are those in :class:`quapy.error.QUANTIFICATION_ERROR`
:param refit: whether or not to refit the model on the whole labelled collection (training+validation) with
the best chosen hyperparameter combination. Ignored if protocol='gen'
:param val_split: either a LabelledCollection on which to test the performance of the different settings, or
a float in [0,1] indicating the proportion of labelled data to extract from the training set, or a callable
returning a generator function each time it is invoked (only for protocol='gen').
:param n_jobs: number of parallel jobs
:param random_seed: set the seed of the random generator to replicate experiments. Ignored if protocol='gen'.
:param timeout: establishes a timer (in seconds) for each of the hyperparameters configurations being tested.
Whenever a run takes longer than this timer, that configuration will be ignored. If all configurations end up
being ignored, a TimeoutError exception is raised. If -1 (default) then no time bound is set.
@ -59,65 +37,27 @@ class GridSearchQ(BaseQuantifier):
def __init__(self,
model: BaseQuantifier,
param_grid: dict,
sample_size: Union[int, None] = None,
protocol='app',
n_prevpoints: int = None,
n_repetitions: int = 1,
eval_budget: int = None,
protocol: AbstractProtocol,
error: Union[Callable, str] = qp.error.mae,
refit=True,
val_split=0.4,
n_jobs=1,
random_seed=42,
timeout=-1,
n_jobs=None,
verbose=False):
self.model = model
self.param_grid = param_grid
self.sample_size = sample_size
self.protocol = protocol.lower()
self.n_prevpoints = n_prevpoints
self.n_repetitions = n_repetitions
self.eval_budget = eval_budget
self.protocol = protocol
self.refit = refit
self.val_split = val_split
self.n_jobs = n_jobs
self.random_seed = random_seed
self.timeout = timeout
self.n_jobs = qp._get_njobs(n_jobs)
self.verbose = verbose
self.__check_error(error)
assert self.protocol in {'app', 'npp', 'gen'}, \
'unknown protocol: valid ones are "app" or "npp" for the "artificial" or the "natural" prevalence ' \
'protocols. Use protocol="gen" when passing a generator function thorough val_split that yields a ' \
'sample (instances) and their prevalence (ndarray) at each iteration.'
assert self.eval_budget is None or isinstance(self.eval_budget, int)
if self.protocol in ['npp', 'gen']:
if self.protocol=='npp' and (self.eval_budget is None or self.eval_budget <= 0):
raise ValueError(f'when protocol="npp" the parameter eval_budget should be '
f'indicated (and should be >0).')
if self.n_repetitions != 1:
print('[warning] n_repetitions has been set and will be ignored for the selected protocol')
assert isinstance(protocol, AbstractProtocol), 'unknown protocol'
def _sout(self, msg):
if self.verbose:
print(f'[{self.__class__.__name__}]: {msg}')
def __check_training_validation(self, training, validation):
if isinstance(validation, LabelledCollection):
return training, validation
elif isinstance(validation, float):
assert 0. < validation < 1., 'validation proportion should be in (0,1)'
training, validation = training.split_stratified(train_prop=1 - validation, random_state=self.random_seed)
return training, validation
elif self.protocol=='gen' and inspect.isgenerator(validation()):
return training, validation
else:
raise ValueError(f'"validation" must either be a LabelledCollection or a float in (0,1) indicating the'
f'proportion of training documents to extract (type found: {type(validation)}). '
f'Optionally, "validation" can be a callable function returning a generator that yields '
f'the sample instances along with their true prevalence at each iteration by '
f'setting protocol="gen".')
def __check_error(self, error):
if error in qp.error.QUANTIFICATION_ERROR:
self.error = error
@ -129,95 +69,103 @@ class GridSearchQ(BaseQuantifier):
raise ValueError(f'unexpected error type; must either be a callable function or a str representing\n'
f'the name of an error function in {qp.error.QUANTIFICATION_ERROR_NAMES}')
def __generate_predictions(self, model, val_split):
commons = {
'n_repetitions': self.n_repetitions,
'n_jobs': self.n_jobs,
'random_seed': self.random_seed,
'verbose': False
}
if self.protocol == 'app':
return artificial_prevalence_prediction(
model, val_split, self.sample_size,
n_prevpoints=self.n_prevpoints,
eval_budget=self.eval_budget,
**commons
)
elif self.protocol == 'npp':
return natural_prevalence_prediction(
model, val_split, self.sample_size,
**commons)
elif self.protocol == 'gen':
return gen_prevalence_prediction(model, gen_fn=val_split, eval_budget=self.eval_budget)
else:
raise ValueError('unknown protocol')
def fit(self, training: LabelledCollection, val_split: Union[LabelledCollection, float, Callable] = None):
def fit(self, training: LabelledCollection):
""" Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing
the error metric.
:param training: the training set on which to optimize the hyperparameters
:param val_split: either a LabelledCollection on which to test the performance of the different settings, or
a float in [0,1] indicating the proportion of labelled data to extract from the training set
:return: self
"""
if val_split is None:
val_split = self.val_split
training, val_split = self.__check_training_validation(training, val_split)
if self.protocol != 'gen':
self.sample_size = _check_sample_size(self.sample_size)
params_keys = list(self.param_grid.keys())
params_values = list(self.param_grid.values())
model = self.model
protocol = self.protocol
self.param_scores_ = {}
self.best_score_ = None
tinit = time()
hyper = [dict({k: val[i] for i, k in enumerate(params_keys)}) for val in itertools.product(*params_values)]
self._sout(f'starting model selection with {self.n_jobs =}')
#pass a seed to parallel so it is set in clild processes
scores = qp.util.parallel(
self._delayed_eval,
((params, training) for params in hyper),
seed=qp.environ.get('_R_SEED', None),
n_jobs=self.n_jobs
)
for params, score, model in scores:
if score is not None:
if self.best_score_ is None or score < self.best_score_:
self.best_score_ = score
self.best_params_ = params
self.best_model_ = model
self.param_scores_[str(params)] = score
else:
self.param_scores_[str(params)] = 'timeout'
tend = time()-tinit
if self.best_score_ is None:
raise TimeoutError('no combination of hyperparameters seem to work')
self._sout(f'optimization finished: best params {self.best_params_} (score={self.best_score_:.5f}) '
f'[took {tend:.4f}s]')
if self.refit:
if isinstance(protocol, OnLabelledCollectionProtocol):
self._sout(f'refitting on the whole development set')
self.best_model_.fit(training + protocol.get_labelled_collection())
else:
raise RuntimeWarning(f'"refit" was requested, but the protocol does not '
f'implement the {OnLabelledCollectionProtocol.__name__} interface')
return self
def _delayed_eval(self, args):
params, training = args
protocol = self.protocol
error = self.error
if self.timeout > 0:
def handler(signum, frame):
self._sout('timeout reached')
raise TimeoutError()
signal.signal(signal.SIGALRM, handler)
self.param_scores_ = {}
self.best_score_ = None
some_timeouts = False
for values in itertools.product(*params_values):
params = dict({k: values[i] for i, k in enumerate(params_keys)})
tinit = time()
if self.timeout > 0:
signal.alarm(self.timeout)
try:
model = deepcopy(self.model)
# overrides default parameters with the parameters being explored at this iteration
model.set_params(**params)
model.fit(training)
score = evaluation.evaluate(model, protocol=protocol, error_metric=error)
ttime = time()-tinit
self._sout(f'hyperparams={params}\t got {error.__name__} score {score:.5f} [took {ttime:.4f}s]')
if self.timeout > 0:
signal.alarm(self.timeout)
signal.alarm(0)
except TimeoutError:
self._sout(f'timeout ({self.timeout}s) reached for config {params}')
score = None
except ValueError as e:
self._sout(f'the combination of hyperparameters {params} is invalid')
raise e
except Exception as e:
self._sout(f'something went wrong for config {params}; skipping:')
self._sout(f'\tException: {e}')
score = None
try:
# overrides default parameters with the parameters being explored at this iteration
model.set_params(**params)
model.fit(training)
true_prevalences, estim_prevalences = self.__generate_predictions(model, val_split)
score = self.error(true_prevalences, estim_prevalences)
return params, score, model
self._sout(f'checking hyperparams={params} got {self.error.__name__} score {score:.5f}')
if self.best_score_ is None or score < self.best_score_:
self.best_score_ = score
self.best_params_ = params
self.best_model_ = deepcopy(model)
self.param_scores_[str(params)] = score
if self.timeout > 0:
signal.alarm(0)
except TimeoutError:
print(f'timeout reached for config {params}')
some_timeouts = True
if self.best_score_ is None and some_timeouts:
raise TimeoutError('all jobs took more than the timeout time to end')
self._sout(f'optimization finished: best params {self.best_params_} (score={self.best_score_:.5f})')
if self.refit:
self._sout(f'refitting on the whole development set')
self.best_model_.fit(training + val_split)
return self
def quantify(self, instances):
"""Estimate class prevalence values using the best model found after calling the :meth:`fit` method.
@ -229,14 +177,6 @@ class GridSearchQ(BaseQuantifier):
assert hasattr(self, 'best_model_'), 'quantify called before fit'
return self.best_model().quantify(instances)
@property
def classes_(self):
"""
Classes on which the quantifier has been trained on.
:return: a ndarray of shape `(n_classes)` with the class identifiers
"""
return self.best_model().classes_
def set_params(self, **parameters):
"""Sets the hyper-parameters to explore.
@ -262,3 +202,30 @@ class GridSearchQ(BaseQuantifier):
if hasattr(self, 'best_model_'):
return self.best_model_
raise ValueError('best_model called before fit')
def cross_val_predict(quantifier: BaseQuantifier, data: LabelledCollection, nfolds=3, random_state=0):
"""
Akin to `scikit-learn's cross_val_predict <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html>`_
but for quantification.
:param quantifier: a quantifier issuing class prevalence values
:param data: a labelled collection
:param nfolds: number of folds for k-fold cross validation generation
:param random_state: random seed for reproducibility
:return: a vector of class prevalence values
"""
total_prev = np.zeros(shape=data.n_classes)
for train, test in data.kFCV(nfolds=nfolds, random_state=random_state):
quantifier.fit(train)
fold_prev = quantifier.quantify(test.X)
rel_size = len(test.X)/len(data)
total_prev += fold_prev*rel_size
return total_prev

View File

@ -4,6 +4,8 @@ from matplotlib.cm import get_cmap
import numpy as np
from matplotlib import cm
from scipy.stats import ttest_ind_from_stats
from matplotlib.ticker import ScalarFormatter
import math
import quapy as qp
@ -49,9 +51,10 @@ def binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=No
table = {method_name:[true_prev, estim_prev] for method_name, true_prev, estim_prev in order}
order = [(method_name, *table[method_name]) for method_name in method_order]
#cm = plt.get_cmap('tab20')
#NUM_COLORS = len(method_names)
#ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
NUM_COLORS = len(method_names)
if NUM_COLORS>10:
cm = plt.get_cmap('tab20')
ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
for method, true_prev, estim_prev in order:
true_prev = true_prev[:,pos_class]
estim_prev = estim_prev[:,pos_class]
@ -74,13 +77,12 @@ def binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=No
ax.set_xlim(0, 1)
if legend:
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# box = ax.get_position()
# ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
#ax.legend(loc='lower center',
# bbox_to_anchor=(1, -0.5),
# ncol=(len(method_names)+1)//2)
# ax.legend(loc='lower center',
# bbox_to_anchor=(1, -0.5),
# ncol=(len(method_names)+1)//2)
_save_or_show(savepath)
@ -212,6 +214,7 @@ def binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=N
def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
n_bins=20, error_name='ae', show_std=False,
show_density=True,
show_legend=True,
logscale=False,
title=f'Quantification error as a function of distribution shift',
vlines=None,
@ -234,6 +237,7 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
:param error_name: a string representing the name of an error function (as defined in `quapy.error`, default is "ae")
:param show_std: whether or not to show standard deviations as color bands (default is False)
:param show_density: whether or not to display the distribution of experiments for each bin (default is True)
:param show_density: whether or not to display the legend of the chart (default is True)
:param logscale: whether or not to log-scale the y-error measure (default is False)
:param title: title of the plot (default is "Quantification error as a function of distribution shift")
:param vlines: array-like list of values (default is None). If indicated, highlights some regions of the space
@ -254,6 +258,9 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
# x_error function) and 'y' is the estim-test shift (computed as according to y_error)
data = _join_data_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, x_error, y_error, method_order)
if method_order is None:
method_order = method_names
_set_colors(ax, n_methods=len(method_order))
bins = np.linspace(0, 1, n_bins+1)
@ -264,7 +271,10 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
tr_test_drifts = data[method]['x']
method_drifts = data[method]['y']
if logscale:
method_drifts=np.log(1+method_drifts)
ax.set_yscale("log")
ax.yaxis.set_major_formatter(ScalarFormatter())
ax.yaxis.get_major_formatter().set_scientific(False)
ax.minorticks_off()
inds = np.digitize(tr_test_drifts, bins, right=True)
@ -295,9 +305,14 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
ax.fill_between(xs, ys-ystds, ys+ystds, alpha=0.25)
if show_density:
ax.bar([ind * binwidth-binwidth/2 for ind in range(len(bins))],
max_y*npoints/np.max(npoints), alpha=0.15, color='g', width=binwidth, label='density')
ax2 = ax.twinx()
densities = npoints/np.sum(npoints)
ax2.bar([ind * binwidth-binwidth/2 for ind in range(len(bins))],
densities, alpha=0.15, color='g', width=binwidth, label='density')
ax2.set_ylim(0,max(densities))
ax2.spines['right'].set_color('g')
ax2.tick_params(axis='y', colors='g')
ax.set(xlabel=f'Distribution shift between training set and test sample',
ylabel=f'{error_name.upper()} (true distribution, predicted distribution)',
title=title)
@ -306,9 +321,18 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
if vlines:
for vline in vlines:
ax.axvline(vline, 0, 1, linestyle='--', color='k')
ax.set_xlim(0, max_x)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.set_xlim(min_x, max_x)
if logscale:
#nice scale for the logaritmic axis
ax.set_ylim(0,10 ** math.ceil(math.log10(max_y)))
if show_legend:
fig.legend(loc='lower center',
bbox_to_anchor=(1, 0.5),
ncol=(len(method_names)+1)//2)
_save_or_show(savepath)
@ -370,7 +394,7 @@ def brokenbar_supremacy_by_drift(method_names, true_prevs, estim_prevs, tr_prevs
bins[-1] += 0.001
# we use this to keep track of how many datapoits contribute to each bin
inds_histogram_global = np.zeros(n_bins, dtype=np.float)
inds_histogram_global = np.zeros(n_bins, dtype=float)
n_methods = len(method_order)
buckets = np.zeros(shape=(n_methods, n_bins, 3))
for i, method in enumerate(method_order):

490
quapy/protocol.py Normal file
View File

@ -0,0 +1,490 @@
from copy import deepcopy
import quapy as qp
import numpy as np
import itertools
from contextlib import ExitStack
from abc import ABCMeta, abstractmethod
from quapy.data import LabelledCollection
import quapy.functional as F
from os.path import exists
from glob import glob
class AbstractProtocol(metaclass=ABCMeta):
"""
Abstract parent class for sample generation protocols.
"""
@abstractmethod
def __call__(self):
"""
Implements the protocol. Yields one sample at a time along with its prevalence
:return: yields a tuple `(sample, prev) at a time, where `sample` is a set of instances
and in which `prev` is an `nd.array` with the class prevalence values
"""
...
def total(self):
"""
Indicates the total number of samples that the protocol generates.
:return: The number of samples to generate if known, or `None` otherwise.
"""
return None
class IterateProtocol(AbstractProtocol):
"""
A very simple protocol which simply iterates over a list of previously generated samples
:param samples: a list of :class:`quapy.data.base.LabelledCollection`
"""
def __init__(self, samples: [LabelledCollection]):
self.samples = samples
def __call__(self):
"""
Yields one sample from the initial list at a time
:return: yields a tuple `(sample, prev) at a time, where `sample` is a set of instances
and in which `prev` is an `nd.array` with the class prevalence values
"""
for sample in self.samples:
yield sample.Xp
def total(self):
"""
Returns the number of samples in this protocol
:return: int
"""
return len(self.samples)
class AbstractStochasticSeededProtocol(AbstractProtocol):
"""
An `AbstractStochasticSeededProtocol` is a protocol that generates, via any random procedure (e.g.,
via random sampling), sequences of :class:`quapy.data.base.LabelledCollection` samples.
The protocol abstraction enforces
the object to be instantiated using a seed, so that the sequence can be fully replicated.
In order to make this functionality possible, the classes extending this abstraction need to
implement only two functions, :meth:`samples_parameters` which generates all the parameters
needed for extracting the samples, and :meth:`sample` that, given some parameters as input,
deterministically generates a sample.
:param random_state: the seed for allowing to replicate any sequence of samples. Default is 0, meaning that
the sequence will be consistent every time the protocol is called.
"""
_random_state = -1 # means "not set"
def __init__(self, random_state=0):
self.random_state = random_state
@property
def random_state(self):
return self._random_state
@random_state.setter
def random_state(self, random_state):
self._random_state = random_state
@abstractmethod
def samples_parameters(self):
"""
This function has to return all the necessary parameters to replicate the samples
:return: a list of parameters, each of which serves to deterministically generate a sample
"""
...
@abstractmethod
def sample(self, params):
"""
Extract one sample determined by the given parameters
:param params: all the necessary parameters to generate a sample
:return: one sample (the same sample has to be generated for the same parameters)
"""
...
def __call__(self):
"""
Yields one sample at a time. The type of object returned depends on the `collator` function. The
default behaviour returns tuples of the form `(sample, prevalence)`.
:return: a tuple `(sample, prevalence)` if return_type='sample_prev', or an instance of
:class:`qp.data.LabelledCollection` if return_type='labelled_collection'
"""
with ExitStack() as stack:
if self.random_state == -1:
raise ValueError('The random seed has never been initialized. '
'Set it to None not to impose replicability.')
if self.random_state is not None:
stack.enter_context(qp.util.temp_seed(self.random_state))
for params in self.samples_parameters():
yield self.collator(self.sample(params))
def collator(self, sample, *args):
"""
The collator prepares the sample to accommodate the desired output format before returning the output.
This collator simply returns the sample as it is. Classes inheriting from this abstract class can
implement their custom collators.
:param sample: the sample to be returned
:param args: additional arguments
:return: the sample adhering to a desired output format (in this case, the sample is returned as it is)
"""
return sample
class OnLabelledCollectionProtocol:
"""
Protocols that generate samples from a :class:`qp.data.LabelledCollection` object.
"""
RETURN_TYPES = ['sample_prev', 'labelled_collection', 'index']
def get_labelled_collection(self):
"""
Returns the labelled collection on which this protocol acts.
:return: an object of type :class:`qp.data.LabelledCollection`
"""
return self.data
def on_preclassified_instances(self, pre_classifications, in_place=False):
"""
Returns a copy of this protocol that acts on a modified version of the original
:class:`qp.data.LabelledCollection` in which the original instances have been replaced
with the outputs of a classifier for each instance. (This is convenient for speeding-up
the evaluation procedures for many samples, by pre-classifying the instances in advance.)
:param pre_classifications: the predictions issued by a classifier, typically an array-like
with shape `(n_instances,)` when the classifier is a hard one, or with shape
`(n_instances, n_classes)` when the classifier is a probabilistic one.
:param in_place: whether or not to apply the modification in-place or in a new copy (default).
:return: a copy of this protocol
"""
assert len(pre_classifications) == len(self.data), \
f'error: the pre-classified data has different shape ' \
f'(expected {len(self.data)}, found {len(pre_classifications)})'
if in_place:
self.data.instances = pre_classifications
return self
else:
new = deepcopy(self)
return new.on_preclassified_instances(pre_classifications, in_place=True)
@classmethod
def get_collator(cls, return_type='sample_prev'):
"""
Returns a collator function, i.e., a function that prepares the yielded data
:param return_type: either 'sample_prev' (default) if the collator is requested to yield tuples of
`(sample, prevalence)`, or 'labelled_collection' when it is requested to yield instances of
:class:`qp.data.LabelledCollection`
:return: the collator function (a callable function that takes as input an instance of
:class:`qp.data.LabelledCollection`)
"""
assert return_type in cls.RETURN_TYPES, \
f'unknown return type passed as argument; valid ones are {cls.RETURN_TYPES}'
if return_type=='sample_prev':
return lambda lc:lc.Xp
elif return_type=='labelled_collection':
return lambda lc:lc
class APP(AbstractStochasticSeededProtocol, OnLabelledCollectionProtocol):
"""
Implementation of the artificial prevalence protocol (APP).
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
[1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of samples for each valid
combination of prevalence values is indicated by `repeats`.
:param data: a `LabelledCollection` from which the samples will be drawn
:param sample_size: integer, number of instances in each sample; if None (default) then it is taken from
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
:param n_prevalences: the number of equidistant prevalence points to extract from the [0,1] interval for the
grid (default is 21)
:param repeats: number of copies for each valid prevalence vector (default is 10)
:param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
will be the same every time the protocol is called)
:param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or
to "labelled_collection" to get instead instances of LabelledCollection
"""
def __init__(self, data:LabelledCollection, sample_size=None, n_prevalences=21, repeats=10,
smooth_limits_epsilon=0, random_state=0, return_type='sample_prev'):
super(APP, self).__init__(random_state)
self.data = data
self.sample_size = qp._get_sample_size(sample_size)
self.n_prevalences = n_prevalences
self.repeats = repeats
self.smooth_limits_epsilon = smooth_limits_epsilon
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
def prevalence_grid(self):
"""
Generates vectors of prevalence values from an exhaustive grid of prevalence values. The
number of prevalence values explored for each dimension depends on `n_prevalences`, so that, if, for example,
`n_prevalences=11` then the prevalence values of the grid are taken from [0, 0.1, 0.2, ..., 0.9, 1]. Only
valid prevalence distributions are returned, i.e., vectors of prevalence values that sum up to 1. For each
valid vector of prevalence values, `repeat` copies are returned. The vector of prevalence values can be
implicit (by setting `return_constrained_dim=False`), meaning that the last dimension (which is constrained
to 1 - sum of the rest) is not returned (note that, quite obviously, in this case the vector does not sum up to
1). Note that this method is deterministic, i.e., there is no random sampling anywhere.
:return: a `np.ndarray` of shape `(n, dimensions)` if `return_constrained_dim=True` or of shape
`(n, dimensions-1)` if `return_constrained_dim=False`, where `n` is the number of valid combinations found
in the grid multiplied by `repeat`
"""
dimensions = self.data.n_classes
s = F.prevalence_linspace(self.n_prevalences, repeats=1, smooth_limits_epsilon=self.smooth_limits_epsilon)
s = [s] * (dimensions - 1)
prevs = [p for p in itertools.product(*s, repeat=1) if (sum(p) <= 1.0)]
prevs = np.asarray(prevs).reshape(len(prevs), -1)
if self.repeats > 1:
prevs = np.repeat(prevs, self.repeats, axis=0)
return prevs
def samples_parameters(self):
"""
Return all the necessary parameters to replicate the samples as according to the APP protocol.
:return: a list of indexes that realize the APP sampling
"""
indexes = []
for prevs in self.prevalence_grid():
index = self.data.sampling_index(self.sample_size, *prevs)
indexes.append(index)
return indexes
def sample(self, index):
"""
Realizes the sample given the index of the instances.
:param index: indexes of the instances to select
:return: an instance of :class:`qp.data.LabelledCollection`
"""
return self.data.sampling_from_index(index)
def total(self):
"""
Returns the number of samples that will be generated
:return: int
"""
return F.num_prevalence_combinations(self.n_prevalences, self.data.n_classes, self.repeats)
class NPP(AbstractStochasticSeededProtocol, OnLabelledCollectionProtocol):
"""
A generator of samples that implements the natural prevalence protocol (NPP). The NPP consists of drawing
samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
:param data: a `LabelledCollection` from which the samples will be drawn
:param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
:param repeats: the number of samples to generate. Default is 100.
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
will be the same every time the protocol is called)
:param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or
to "labelled_collection" to get instead instances of LabelledCollection
"""
def __init__(self, data:LabelledCollection, sample_size=None, repeats=100, random_state=0,
return_type='sample_prev'):
super(NPP, self).__init__(random_state)
self.data = data
self.sample_size = qp._get_sample_size(sample_size)
self.repeats = repeats
self.random_state = random_state
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
def samples_parameters(self):
"""
Return all the necessary parameters to replicate the samples as according to the NPP protocol.
:return: a list of indexes that realize the NPP sampling
"""
indexes = []
for _ in range(self.repeats):
index = self.data.uniform_sampling_index(self.sample_size)
indexes.append(index)
return indexes
def sample(self, index):
"""
Realizes the sample given the index of the instances.
:param index: indexes of the instances to select
:return: an instance of :class:`qp.data.LabelledCollection`
"""
return self.data.sampling_from_index(index)
def total(self):
"""
Returns the number of samples that will be generated (equals to "repeats")
:return: int
"""
return self.repeats
class UPP(AbstractStochasticSeededProtocol, OnLabelledCollectionProtocol):
"""
A variant of :class:`APP` that, instead of using a grid of equidistant prevalence values,
relies on the Kraemer algorithm for sampling unit (k-1)-simplex uniformly at random, with
k the number of classes. This protocol covers the entire range of prevalence values in a
statistical sense, i.e., unlike APP there is no guarantee that it is covered precisely
equally for all classes, but it is preferred in cases in which the number of possible
combinations of the grid values of APP makes this endeavour intractable.
:param data: a `LabelledCollection` from which the samples will be drawn
:param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
:param repeats: the number of samples to generate. Default is 100.
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
will be the same every time the protocol is called)
:param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or
to "labelled_collection" to get instead instances of LabelledCollection
"""
def __init__(self, data: LabelledCollection, sample_size=None, repeats=100, random_state=0,
return_type='sample_prev'):
super(UPP, self).__init__(random_state)
self.data = data
self.sample_size = qp._get_sample_size(sample_size)
self.repeats = repeats
self.random_state = random_state
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
def samples_parameters(self):
"""
Return all the necessary parameters to replicate the samples as according to the UPP protocol.
:return: a list of indexes that realize the UPP sampling
"""
indexes = []
for prevs in F.uniform_simplex_sampling(n_classes=self.data.n_classes, size=self.repeats):
index = self.data.sampling_index(self.sample_size, *prevs)
indexes.append(index)
return indexes
def sample(self, index):
"""
Realizes the sample given the index of the instances.
:param index: indexes of the instances to select
:return: an instance of :class:`qp.data.LabelledCollection`
"""
return self.data.sampling_from_index(index)
def total(self):
"""
Returns the number of samples that will be generated (equals to "repeats")
:return: int
"""
return self.repeats
class DomainMixer(AbstractStochasticSeededProtocol):
"""
Generates mixtures of two domains (A and B) at controlled rates, but preserving the original class prevalence.
:param domainA: one domain, an object of :class:`qp.data.LabelledCollection`
:param domainB: another domain, an object of :class:`qp.data.LabelledCollection`
:param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
:param repeats: int, number of samples to draw for every mixture rate
:param prevalence: the prevalence to preserv along the mixtures. If specified, should be an array containing
one prevalence value (positive float) for each class and summing up to one. If not specified, the prevalence
will be taken from the domain A (default).
:param mixture_points: an integer indicating the number of points to take from a linear scale (e.g., 21 will
generate the mixture points [1, 0.95, 0.9, ..., 0]), or the array of mixture values itself.
the specific points
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
will be the same every time the protocol is called)
"""
def __init__(
self,
domainA: LabelledCollection,
domainB: LabelledCollection,
sample_size,
repeats=1,
prevalence=None,
mixture_points=11,
random_state=0,
return_type='sample_prev'):
super(DomainMixer, self).__init__(random_state)
self.A = domainA
self.B = domainB
self.sample_size = qp._get_sample_size(sample_size)
self.repeats = repeats
if prevalence is None:
self.prevalence = domainA.prevalence()
else:
self.prevalence = np.asarray(prevalence)
assert len(self.prevalence) == domainA.n_classes, \
f'wrong shape for the vector prevalence (expected {domainA.n_classes})'
assert F.check_prevalence_vector(self.prevalence), \
f'the prevalence vector is not valid (either it contains values outside [0,1] or does not sum up to 1)'
if isinstance(mixture_points, int):
self.mixture_points = np.linspace(0, 1, mixture_points)[::-1]
else:
self.mixture_points = np.asarray(mixture_points)
assert all(np.logical_and(self.mixture_points >= 0, self.mixture_points<=1)), \
'mixture_model datatype not understood (expected int or a sequence of real values in [0,1])'
self.random_state = random_state
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
def samples_parameters(self):
"""
Return all the necessary parameters to replicate the samples as according to the this protocol.
:return: a list of zipped indexes (from A and B) that realize the sampling
"""
indexesA, indexesB = [], []
for propA in self.mixture_points:
for _ in range(self.repeats):
nA = int(np.round(self.sample_size * propA))
nB = self.sample_size-nA
sampleAidx = self.A.sampling_index(nA, *self.prevalence)
sampleBidx = self.B.sampling_index(nB, *self.prevalence)
indexesA.append(sampleAidx)
indexesB.append(sampleBidx)
return list(zip(indexesA, indexesB))
def sample(self, indexes):
"""
Realizes the sample given a pair of indexes of the instances from A and B.
:param indexes: indexes of the instances to select from A and B
:return: an instance of :class:`qp.data.LabelledCollection`
"""
indexesA, indexesB = indexes
sampleA = self.A.sampling_from_index(indexesA)
sampleB = self.B.sampling_from_index(indexesB)
return sampleA+sampleB
def total(self):
"""
Returns the number of samples that will be generated (equals to "repeats * mixture_points")
:return: int
"""
return self.repeats * len(self.mixture_points)
# aliases
ArtificialPrevalenceProtocol = APP
NaturalPrevalenceProtocol = NPP
UniformPrevalenceProtocol = UPP

View File

@ -1,7 +1,8 @@
import pytest
from quapy.data.datasets import REVIEWS_SENTIMENT_DATASETS, TWITTER_SENTIMENT_DATASETS_TEST, \
TWITTER_SENTIMENT_DATASETS_TRAIN, UCI_DATASETS, fetch_reviews, fetch_twitter, fetch_UCIDataset
TWITTER_SENTIMENT_DATASETS_TRAIN, UCI_DATASETS, LEQUA2022_TASKS, \
fetch_reviews, fetch_twitter, fetch_UCIDataset, fetch_lequa2022
@pytest.mark.parametrize('dataset_name', REVIEWS_SENTIMENT_DATASETS)
@ -41,3 +42,11 @@ def test_fetch_UCIDataset(dataset_name):
print('Training set stats')
dataset.training.stats()
print('Test set stats')
@pytest.mark.parametrize('dataset_name', LEQUA2022_TASKS)
def test_fetch_lequa2022(dataset_name):
train, gen_val, gen_test = fetch_lequa2022(dataset_name)
print(train.stats())
print('Val:', gen_val.total())
print('Test:', gen_test.total())

View File

@ -0,0 +1,84 @@
import unittest
import numpy as np
import quapy as qp
from sklearn.linear_model import LogisticRegression
from time import time
from error import QUANTIFICATION_ERROR_SINGLE, QUANTIFICATION_ERROR, QUANTIFICATION_ERROR_NAMES, \
QUANTIFICATION_ERROR_SINGLE_NAMES
from quapy.method.aggregative import EMQ, PCC
from quapy.method.base import BaseQuantifier
class EvalTestCase(unittest.TestCase):
def test_eval_speedup(self):
data = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=10, pickle=True)
train, test = data.training, data.test
protocol = qp.protocol.APP(test, sample_size=1000, n_prevalences=11, repeats=1, random_state=1)
class SlowLR(LogisticRegression):
def predict_proba(self, X):
import time
time.sleep(1)
return super().predict_proba(X)
emq = EMQ(SlowLR()).fit(train)
tinit = time()
score = qp.evaluation.evaluate(emq, protocol, error_metric='mae', verbose=True, aggr_speedup='force')
tend_optim = time()-tinit
print(f'evaluation (with optimization) took {tend_optim}s [MAE={score:.4f}]')
class NonAggregativeEMQ(BaseQuantifier):
def __init__(self, cls):
self.emq = EMQ(cls)
def quantify(self, instances):
return self.emq.quantify(instances)
def fit(self, data):
self.emq.fit(data)
return self
emq = NonAggregativeEMQ(SlowLR()).fit(train)
tinit = time()
score = qp.evaluation.evaluate(emq, protocol, error_metric='mae', verbose=True)
tend_no_optim = time() - tinit
print(f'evaluation (w/o optimization) took {tend_no_optim}s [MAE={score:.4f}]')
self.assertEqual(tend_no_optim>(tend_optim/2), True)
def test_evaluation_output(self):
data = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=10, pickle=True)
train, test = data.training, data.test
qp.environ['SAMPLE_SIZE']=100
protocol = qp.protocol.APP(test, random_state=0)
q = PCC(LogisticRegression()).fit(train)
single_errors = list(QUANTIFICATION_ERROR_SINGLE_NAMES)
averaged_errors = ['m'+e for e in single_errors]
single_errors = single_errors + [qp.error.from_name(e) for e in single_errors]
averaged_errors = averaged_errors + [qp.error.from_name(e) for e in averaged_errors]
for error_metric, averaged_error_metric in zip(single_errors, averaged_errors):
score = qp.evaluation.evaluate(q, protocol, error_metric=averaged_error_metric)
self.assertTrue(isinstance(score, float))
scores = qp.evaluation.evaluate(q, protocol, error_metric=error_metric)
self.assertTrue(isinstance(scores, np.ndarray))
self.assertEqual(scores.mean(), score)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,31 @@
import unittest
from sklearn.linear_model import LogisticRegression
import quapy as qp
from quapy.method.aggregative import *
class HierarchyTestCase(unittest.TestCase):
def test_aggregative(self):
lr = LogisticRegression()
for m in [CC(lr), PCC(lr), ACC(lr), PACC(lr)]:
self.assertEqual(isinstance(m, AggregativeQuantifier), True)
def test_binary(self):
lr = LogisticRegression()
for m in [HDy(lr)]:
self.assertEqual(isinstance(m, BinaryQuantifier), True)
def test_probabilistic(self):
lr = LogisticRegression()
for m in [CC(lr), ACC(lr)]:
self.assertEqual(isinstance(m, AggregativeProbabilisticQuantifier), False)
for m in [PCC(lr), PACC(lr)]:
self.assertEqual(isinstance(m, AggregativeProbabilisticQuantifier), True)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,65 @@
import unittest
import numpy as np
from scipy.sparse import csr_matrix
import quapy as qp
class LabelCollectionTestCase(unittest.TestCase):
def test_split(self):
x = np.arange(100)
y = np.random.randint(0,5,100)
data = qp.data.LabelledCollection(x,y)
tr, te = data.split_random(0.7)
check_prev = tr.prevalence()*0.7 + te.prevalence()*0.3
self.assertEqual(len(tr), 70)
self.assertEqual(len(te), 30)
self.assertEqual(np.allclose(check_prev, data.prevalence()), True)
self.assertEqual(len(tr+te), len(data))
def test_join(self):
x = np.arange(50)
y = np.random.randint(2, 5, 50)
data1 = qp.data.LabelledCollection(x, y)
x = np.arange(200)
y = np.random.randint(0, 3, 200)
data2 = qp.data.LabelledCollection(x, y)
x = np.arange(100)
y = np.random.randint(0, 6, 100)
data3 = qp.data.LabelledCollection(x, y)
combined = qp.data.LabelledCollection.join(data1, data2, data3)
self.assertEqual(len(combined), len(data1)+len(data2)+len(data3))
self.assertEqual(all(combined.classes_ == np.arange(6)), True)
x = np.random.rand(10, 3)
y = np.random.randint(0, 1, 10)
data4 = qp.data.LabelledCollection(x, y)
with self.assertRaises(Exception):
combined = qp.data.LabelledCollection.join(data1, data2, data3, data4)
x = np.random.rand(20, 3)
y = np.random.randint(0, 1, 20)
data5 = qp.data.LabelledCollection(x, y)
combined = qp.data.LabelledCollection.join(data4, data5)
self.assertEqual(len(combined), len(data4)+len(data5))
x = np.random.rand(10, 4)
y = np.random.randint(0, 1, 10)
data6 = qp.data.LabelledCollection(x, y)
with self.assertRaises(Exception):
combined = qp.data.LabelledCollection.join(data4, data5, data6)
data4.instances = csr_matrix(data4.instances)
with self.assertRaises(Exception):
combined = qp.data.LabelledCollection.join(data4, data5)
data5.instances = csr_matrix(data5.instances)
combined = qp.data.LabelledCollection.join(data4, data5)
self.assertEqual(len(combined), len(data4) + len(data5))
if __name__ == '__main__':
unittest.main()

View File

@ -4,24 +4,28 @@ from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
import quapy as qp
from quapy.method.base import BinaryQuantifier
from quapy.data import Dataset, LabelledCollection
from quapy.method import AGGREGATIVE_METHODS, NON_AGGREGATIVE_METHODS, EXPLICIT_LOSS_MINIMIZATION_METHODS
from quapy.method import AGGREGATIVE_METHODS, NON_AGGREGATIVE_METHODS
from quapy.method.aggregative import ACC, PACC, HDy
from quapy.method.meta import Ensemble
datasets = [pytest.param(qp.datasets.fetch_twitter('hcr'), id='hcr'),
datasets = [pytest.param(qp.datasets.fetch_twitter('hcr', pickle=True), id='hcr'),
pytest.param(qp.datasets.fetch_UCIDataset('ionosphere'), id='ionosphere')]
tinydatasets = [pytest.param(qp.datasets.fetch_twitter('hcr', pickle=True).reduce(), id='tiny_hcr'),
pytest.param(qp.datasets.fetch_UCIDataset('ionosphere').reduce(), id='tiny_ionosphere')]
learners = [LogisticRegression, LinearSVC]
@pytest.mark.parametrize('dataset', datasets)
@pytest.mark.parametrize('aggregative_method', AGGREGATIVE_METHODS.difference(EXPLICIT_LOSS_MINIMIZATION_METHODS))
@pytest.mark.parametrize('aggregative_method', AGGREGATIVE_METHODS)
@pytest.mark.parametrize('learner', learners)
def test_aggregative_methods(dataset: Dataset, aggregative_method, learner):
model = aggregative_method(learner())
if model.binary and not dataset.binary:
if isinstance(model, BinaryQuantifier) and not dataset.binary:
print(f'skipping the test of binary model {type(model)} on non-binary dataset {dataset}')
return
@ -35,36 +39,12 @@ def test_aggregative_methods(dataset: Dataset, aggregative_method, learner):
assert type(error) == numpy.float64
@pytest.mark.parametrize('dataset', datasets)
@pytest.mark.parametrize('elm_method', EXPLICIT_LOSS_MINIMIZATION_METHODS)
def test_elm_methods(dataset: Dataset, elm_method):
try:
model = elm_method()
except AssertionError as ae:
if ae.args[0].find('does not seem to point to a valid path') > 0:
print('Missing SVMperf binary program, skipping test')
return
if model.binary and not dataset.binary:
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
return
model.fit(dataset.training)
estim_prevalences = model.quantify(dataset.test.instances)
true_prevalences = dataset.test.prevalence()
error = qp.error.mae(true_prevalences, estim_prevalences)
assert type(error) == numpy.float64
@pytest.mark.parametrize('dataset', datasets)
@pytest.mark.parametrize('non_aggregative_method', NON_AGGREGATIVE_METHODS)
def test_non_aggregative_methods(dataset: Dataset, non_aggregative_method):
model = non_aggregative_method()
if model.binary and not dataset.binary:
if isinstance(model, BinaryQuantifier) and not dataset.binary:
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
return
@ -78,16 +58,20 @@ def test_non_aggregative_methods(dataset: Dataset, non_aggregative_method):
assert type(error) == numpy.float64
@pytest.mark.parametrize('base_method', AGGREGATIVE_METHODS.difference(EXPLICIT_LOSS_MINIMIZATION_METHODS))
@pytest.mark.parametrize('learner', learners)
@pytest.mark.parametrize('dataset', datasets)
@pytest.mark.parametrize('base_method', AGGREGATIVE_METHODS)
@pytest.mark.parametrize('learner', [LogisticRegression])
@pytest.mark.parametrize('dataset', tinydatasets)
@pytest.mark.parametrize('policy', Ensemble.VALID_POLICIES)
def test_ensemble_method(base_method, learner, dataset: Dataset, policy):
qp.environ['SAMPLE_SIZE'] = len(dataset.training)
model = Ensemble(quantifier=base_method(learner()), size=5, policy=policy, n_jobs=-1)
if model.binary and not dataset.binary:
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
qp.environ['SAMPLE_SIZE'] = 20
base_quantifier=base_method(learner())
if isinstance(base_quantifier, BinaryQuantifier) and not dataset.binary:
print(f'skipping the test of binary model {base_quantifier} on non-binary dataset {dataset}')
return
if not dataset.binary and policy=='ds':
print(f'skipping the test of binary policy ds on non-binary dataset {dataset}')
return
model = Ensemble(quantifier=base_quantifier, size=5, policy=policy, n_jobs=-1)
model.fit(dataset.training)
@ -106,21 +90,25 @@ def test_quanet_method():
print('skipping QuaNet test due to missing torch package')
return
qp.environ['SAMPLE_SIZE'] = 100
# load the kindle dataset as text, and convert words to numerical indexes
dataset = qp.datasets.fetch_reviews('kindle', pickle=True)
dataset = Dataset(dataset.training.sampling(100, *dataset.training.prevalence()),
dataset.test.sampling(100, *dataset.test.prevalence()))
dataset = Dataset(dataset.training.sampling(200, *dataset.training.prevalence()),
dataset.test.sampling(200, *dataset.test.prevalence()))
qp.data.preprocessing.index(dataset, min_df=5, inplace=True)
from quapy.classification.neural import CNNnet
cnn = CNNnet(dataset.vocabulary_size, dataset.training.n_classes)
cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
from quapy.classification.neural import NeuralClassifierTrainer
learner = NeuralClassifierTrainer(cnn, device='cuda')
from quapy.method.meta import QuaNet
model = QuaNet(learner, sample_size=len(dataset.training), device='cuda')
model = QuaNet(learner, device='cuda')
if model.binary and not dataset.binary:
if isinstance(model, BinaryQuantifier) and not dataset.binary:
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
return
@ -134,28 +122,15 @@ def test_quanet_method():
assert type(error) == numpy.float64
def models_to_test_for_str_label_names():
models = list()
learner = LogisticRegression
for method in AGGREGATIVE_METHODS.difference(EXPLICIT_LOSS_MINIMIZATION_METHODS):
models.append(method(learner()))
for method in NON_AGGREGATIVE_METHODS:
models.append(method())
return models
@pytest.mark.parametrize('model', models_to_test_for_str_label_names())
def test_str_label_names(model):
if type(model) in {ACC, PACC, HDy}:
print(
f'skipping the test of binary model {type(model)} because it currently does not support random seed control.')
return
def test_str_label_names():
model = qp.method.aggregative.CC(LogisticRegression())
dataset = qp.datasets.fetch_reviews('imdb', pickle=True)
dataset = Dataset(dataset.training.sampling(1000, *dataset.training.prevalence()),
dataset.test.sampling(1000, *dataset.test.prevalence()))
dataset.test.sampling(1000, 0.25, 0.75))
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
numpy.random.seed(0)
model.fit(dataset.training)
int_estim_prevalences = model.quantify(dataset.test.instances)
@ -168,7 +143,8 @@ def test_str_label_names(model):
['one' if label == 1 else 'zero' for label in dataset.training.labels]),
LabelledCollection(dataset.test.instances,
['one' if label == 1 else 'zero' for label in dataset.test.labels]))
assert all(dataset_str.training.classes_ == dataset_str.test.classes_), 'wrong indexation'
numpy.random.seed(0)
model.fit(dataset_str.training)
str_estim_prevalences = model.quantify(dataset_str.test.instances)

108
quapy/tests/test_modsel.py Normal file
View File

@ -0,0 +1,108 @@
import unittest
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
import quapy as qp
from quapy.method.aggregative import PACC
from quapy.model_selection import GridSearchQ
from quapy.protocol import APP
import time
class ModselTestCase(unittest.TestCase):
def test_modsel(self):
q = PACC(LogisticRegression(random_state=1, max_iter=5000))
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
training, validation = data.training.split_stratified(0.7, random_state=1)
param_grid = {'classifier__C': np.logspace(-3,3,7)}
app = APP(validation, sample_size=100, random_state=1)
q = GridSearchQ(
q, param_grid, protocol=app, error='mae', refit=True, timeout=-1, verbose=True
).fit(training)
print('best params', q.best_params_)
print('best score', q.best_score_)
self.assertEqual(q.best_params_['classifier__C'], 10.0)
self.assertEqual(q.best_model().get_params()['classifier__C'], 10.0)
def test_modsel_parallel(self):
q = PACC(LogisticRegression(random_state=1, max_iter=5000))
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
training, validation = data.training.split_stratified(0.7, random_state=1)
# test = data.test
param_grid = {'classifier__C': np.logspace(-3,3,7)}
app = APP(validation, sample_size=100, random_state=1)
q = GridSearchQ(
q, param_grid, protocol=app, error='mae', refit=True, timeout=-1, n_jobs=-1, verbose=True
).fit(training)
print('best params', q.best_params_)
print('best score', q.best_score_)
self.assertEqual(q.best_params_['classifier__C'], 10.0)
self.assertEqual(q.best_model().get_params()['classifier__C'], 10.0)
def test_modsel_parallel_speedup(self):
class SlowLR(LogisticRegression):
def fit(self, X, y, sample_weight=None):
time.sleep(1)
return super(SlowLR, self).fit(X, y, sample_weight)
q = PACC(SlowLR(random_state=1, max_iter=5000))
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
training, validation = data.training.split_stratified(0.7, random_state=1)
param_grid = {'classifier__C': np.logspace(-3, 3, 7)}
app = APP(validation, sample_size=100, random_state=1)
tinit = time.time()
GridSearchQ(
q, param_grid, protocol=app, error='mae', refit=False, timeout=-1, n_jobs=1, verbose=True
).fit(training)
tend_nooptim = time.time()-tinit
tinit = time.time()
GridSearchQ(
q, param_grid, protocol=app, error='mae', refit=False, timeout=-1, n_jobs=-1, verbose=True
).fit(training)
tend_optim = time.time() - tinit
print(f'parallel training took {tend_optim:.4f}s')
print(f'sequential training took {tend_nooptim:.4f}s')
self.assertEqual(tend_optim < (0.5*tend_nooptim), True)
def test_modsel_timeout(self):
class SlowLR(LogisticRegression):
def fit(self, X, y, sample_weight=None):
import time
time.sleep(10)
super(SlowLR, self).fit(X, y, sample_weight)
q = PACC(SlowLR())
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
training, validation = data.training.split_stratified(0.7, random_state=1)
# test = data.test
param_grid = {'classifier__C': np.logspace(-3,3,7)}
app = APP(validation, sample_size=100, random_state=1)
q = GridSearchQ(
q, param_grid, protocol=app, error='mae', refit=True, timeout=3, n_jobs=-1, verbose=True
)
with self.assertRaises(TimeoutError):
q.fit(training)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,179 @@
import unittest
import numpy as np
from quapy.data import LabelledCollection
from quapy.protocol import APP, NPP, UPP, DomainMixer, AbstractStochasticSeededProtocol
def mock_labelled_collection(prefix=''):
y = [0] * 250 + [1] * 250 + [2] * 250 + [3] * 250
X = [prefix + str(i) + '-' + str(yi) for i, yi in enumerate(y)]
return LabelledCollection(X, y, classes=sorted(np.unique(y)))
def samples_to_str(protocol):
samples_str = ""
for instances, prev in protocol():
samples_str += f'{instances}\t{prev}\n'
return samples_str
class TestProtocols(unittest.TestCase):
def test_app_replicate(self):
data = mock_labelled_collection()
p = APP(data, sample_size=5, n_prevalences=11, random_state=42)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
p = APP(data, sample_size=5, n_prevalences=11) # <- random_state is by default set to 0
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
def test_app_not_replicate(self):
data = mock_labelled_collection()
p = APP(data, sample_size=5, n_prevalences=11, random_state=None)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertNotEqual(samples1, samples2)
p = APP(data, sample_size=5, n_prevalences=11, random_state=42)
samples1 = samples_to_str(p)
p = APP(data, sample_size=5, n_prevalences=11, random_state=0)
samples2 = samples_to_str(p)
self.assertNotEqual(samples1, samples2)
def test_app_number(self):
data = mock_labelled_collection()
p = APP(data, sample_size=100, n_prevalences=10, repeats=1)
# surprisingly enough, for some n_prevalences the test fails, notwithstanding
# everything is correct. The problem is that in function APP.prevalence_grid()
# there is sometimes one rounding error that gets cumulated and
# surpasses 1.0 (by a very small float value, 0.0000000000002 or sthe like)
# so these tuples are mistakenly removed... I have tried with np.close, and
# other workarounds, but eventually happens that there is some negative probability
# in the sampling function...
count = 0
for _ in p():
count+=1
self.assertEqual(count, p.total())
def test_npp_replicate(self):
data = mock_labelled_collection()
p = NPP(data, sample_size=5, repeats=5, random_state=42)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
p = NPP(data, sample_size=5, repeats=5) # <- random_state is by default set to 0
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
def test_npp_not_replicate(self):
data = mock_labelled_collection()
p = NPP(data, sample_size=5, repeats=5, random_state=None)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertNotEqual(samples1, samples2)
p = NPP(data, sample_size=5, repeats=5, random_state=42)
samples1 = samples_to_str(p)
p = NPP(data, sample_size=5, repeats=5, random_state=0)
samples2 = samples_to_str(p)
self.assertNotEqual(samples1, samples2)
def test_kraemer_replicate(self):
data = mock_labelled_collection()
p = UPP(data, sample_size=5, repeats=10, random_state=42)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
p = UPP(data, sample_size=5, repeats=10) # <- random_state is by default set to 0
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
def test_kraemer_not_replicate(self):
data = mock_labelled_collection()
p = UPP(data, sample_size=5, repeats=10, random_state=None)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertNotEqual(samples1, samples2)
def test_covariate_shift_replicate(self):
dataA = mock_labelled_collection('domA')
dataB = mock_labelled_collection('domB')
p = DomainMixer(dataA, dataB, sample_size=10, mixture_points=11, random_state=1)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
p = DomainMixer(dataA, dataB, sample_size=10, mixture_points=11) # <- random_state is by default set to 0
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertEqual(samples1, samples2)
def test_covariate_shift_not_replicate(self):
dataA = mock_labelled_collection('domA')
dataB = mock_labelled_collection('domB')
p = DomainMixer(dataA, dataB, sample_size=10, mixture_points=11, random_state=None)
samples1 = samples_to_str(p)
samples2 = samples_to_str(p)
self.assertNotEqual(samples1, samples2)
def test_no_seed_init(self):
class NoSeedInit(AbstractStochasticSeededProtocol):
def __init__(self):
self.data = mock_labelled_collection()
def samples_parameters(self):
# return a matrix containing sampling indexes in the rows
return np.random.randint(0, len(self.data), 10*10).reshape(10, 10)
def sample(self, params):
index = np.unique(params)
return self.data.sampling_from_index(index)
p = NoSeedInit()
# this should raise a ValueError, since the class is said to be AbstractStochasticSeededProtocol but the
# random_seed has never been passed to super(NoSeedInit, self).__init__(random_seed)
with self.assertRaises(ValueError):
for sample in p():
pass
print('done')
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,78 @@
import unittest
import quapy as qp
from quapy.data import LabelledCollection
from quapy.functional import strprev
from sklearn.linear_model import LogisticRegression
from quapy.method.aggregative import PACC
class MyTestCase(unittest.TestCase):
def test_prediction_replicability(self):
dataset = qp.datasets.fetch_UCIDataset('yeast')
with qp.util.temp_seed(0):
lr = LogisticRegression(random_state=0, max_iter=10000)
pacc = PACC(lr)
prev = pacc.fit(dataset.training).quantify(dataset.test.X)
str_prev1 = strprev(prev, prec=5)
with qp.util.temp_seed(0):
lr = LogisticRegression(random_state=0, max_iter=10000)
pacc = PACC(lr)
prev2 = pacc.fit(dataset.training).quantify(dataset.test.X)
str_prev2 = strprev(prev2, prec=5)
self.assertEqual(str_prev1, str_prev2) # add assertion here
def test_samping_replicability(self):
import numpy as np
def equal_collections(c1, c2, value=True):
self.assertEqual(np.all(c1.X == c2.X), value)
self.assertEqual(np.all(c1.y == c2.y), value)
if value:
self.assertEqual(np.all(c1.classes_ == c2.classes_), value)
X = list(map(str, range(100)))
y = np.random.randint(0, 2, 100)
data = LabelledCollection(instances=X, labels=y)
sample1 = data.sampling(50)
sample2 = data.sampling(50)
equal_collections(sample1, sample2, False)
sample1 = data.sampling(50, random_state=0)
sample2 = data.sampling(50, random_state=0)
equal_collections(sample1, sample2, True)
sample1 = data.sampling(50, *[0.7, 0.3], random_state=0)
sample2 = data.sampling(50, *[0.7, 0.3], random_state=0)
equal_collections(sample1, sample2, True)
with qp.util.temp_seed(0):
sample1 = data.sampling(50, *[0.7, 0.3])
with qp.util.temp_seed(0):
sample2 = data.sampling(50, *[0.7, 0.3])
equal_collections(sample1, sample2, True)
sample1 = data.sampling(50, *[0.7, 0.3], random_state=0)
sample2 = data.sampling(50, *[0.7, 0.3], random_state=0)
equal_collections(sample1, sample2, True)
sample1_tr, sample1_te = data.split_stratified(train_prop=0.7, random_state=0)
sample2_tr, sample2_te = data.split_stratified(train_prop=0.7, random_state=0)
equal_collections(sample1_tr, sample2_tr, True)
equal_collections(sample1_te, sample2_te, True)
with qp.util.temp_seed(0):
sample1_tr, sample1_te = data.split_stratified(train_prop=0.7)
with qp.util.temp_seed(0):
sample2_tr, sample2_te = data.split_stratified(train_prop=0.7)
equal_collections(sample1_tr, sample2_tr, True)
equal_collections(sample1_te, sample2_te, True)
if __name__ == '__main__':
unittest.main()

View File

@ -5,13 +5,14 @@ import os
import pickle
import urllib
from pathlib import Path
from contextlib import ExitStack
import quapy as qp
import numpy as np
from joblib import Parallel, delayed
def _get_parallel_slices(n_tasks, n_jobs=-1):
def _get_parallel_slices(n_tasks, n_jobs):
if n_jobs == -1:
n_jobs = multiprocessing.cpu_count()
batch = int(n_tasks / n_jobs)
@ -21,8 +22,9 @@ def _get_parallel_slices(n_tasks, n_jobs=-1):
def map_parallel(func, args, n_jobs):
"""
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then
func is applied in two parallel processes to args[0:50] and to args[50:99]
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and `n_jobs`=2, then
func is applied in two parallel processes to args[0:50] and to args[50:99]. func is a function
that already works with a list of arguments.
:param func: function to be parallelized
:param args: array-like of arguments to be passed to the function in different parallel calls
@ -36,7 +38,7 @@ def map_parallel(func, args, n_jobs):
return list(itertools.chain.from_iterable(results))
def parallel(func, args, n_jobs):
def parallel(func, args, n_jobs, seed=None):
"""
A wrapper of multiprocessing:
@ -44,32 +46,43 @@ def parallel(func, args, n_jobs):
>>> delayed(func)(args_i) for args_i in args
>>> )
that takes the `quapy.environ` variable as input silently
that takes the `quapy.environ` variable as input silently.
Seeds the child processes to ensure reproducibility when n_jobs>1
"""
def func_dec(environ, *args):
qp.environ = environ
return func(*args)
def func_dec(environ, seed, *args):
qp.environ = environ.copy()
qp.environ['N_JOBS'] = 1
#set a context with a temporal seed to ensure results are reproducibles in parallel
with ExitStack() as stack:
if seed is not None:
stack.enter_context(qp.util.temp_seed(seed))
return func(*args)
return Parallel(n_jobs=n_jobs)(
delayed(func_dec)(qp.environ, args_i) for args_i in args
delayed(func_dec)(qp.environ, None if seed is None else seed+i, args_i) for i, args_i in enumerate(args)
)
@contextlib.contextmanager
def temp_seed(seed):
def temp_seed(random_state):
"""
Can be used in a "with" context to set a temporal seed without modifying the outer numpy's current state. E.g.:
>>> with temp_seed(random_seed):
>>> pass # do any computation depending on np.random functionality
:param seed: the seed to set within the "with" context
:param random_state: the seed to set within the "with" context
"""
state = np.random.get_state()
np.random.seed(seed)
if random_state is not None:
state = np.random.get_state()
#save the seed just in case is needed (for instance for setting the seed to child processes)
qp.environ['_R_SEED'] = random_state
np.random.seed(random_state)
try:
yield
finally:
np.random.set_state(state)
if random_state is not None:
np.random.set_state(state)
def download_file(url, archive_filename):
@ -117,6 +130,7 @@ def create_if_not_exist(path):
def get_quapy_home():
"""
Gets the home directory of QuaPy, i.e., the directory where QuaPy saves permanent data, such as dowloaded datasets.
This directory is `~/quapy_data`
:return: a string representing the path
"""
@ -151,7 +165,7 @@ def save_text_file(path, text):
def pickled_resource(pickle_path:str, generation_func:callable, *args):
"""
Allows for fast reuse of resources that are generated only once by calling generation_func(*args). The next times
Allows for fast reuse of resources that are generated only once by calling generation_func(\\*args). The next times
this function is invoked, it loads the pickled resource. Example:
>>> def some_array(n): # a mock resource created with one parameter (`n`)
@ -190,10 +204,6 @@ class EarlyStop:
"""
A class implementing the early-stopping condition typically used for training neural networks.
:param patience: the number of (consecutive) times that a monitored evaluation metric (typically obtaind in a
held-out validation split) can be found to be worse than the best one obtained so far, before flagging the
stopping condition. An instance of this class is `callable`, and is to be used as follows:
>>> earlystop = EarlyStop(patience=2, lower_is_better=True)
>>> earlystop(0.9, epoch=0)
>>> earlystop(0.7, epoch=1)
@ -205,14 +215,14 @@ class EarlyStop:
>>> earlystop.best_epoch # is 1
>>> earlystop.best_score # is 0.7
:param patience: the number of (consecutive) times that a monitored evaluation metric (typically obtaind in a
held-out validation split) can be found to be worse than the best one obtained so far, before flagging the
stopping condition. An instance of this class is `callable`, and is to be used as follows:
:param lower_is_better: if True (default) the metric is to be minimized.
:ivar best_score: keeps track of the best value seen so far
:ivar best_epoch: keeps track of the epoch in which the best score was set
:ivar STOP: flag (boolean) indicating the stopping condition
:ivar IMPROVED: flag (boolean) indicating whether there was an improvement in the last call
"""
def __init__(self, patience, lower_is_better=True):
@ -242,4 +252,5 @@ class EarlyStop:
else:
self.patience -= 1
if self.patience <= 0:
self.STOP = True
self.STOP = True

View File

@ -14,6 +14,7 @@ def get_version(rel_path):
return line.split(delim)[1]
else:
raise RuntimeError("Unable to find version string.")
# Arguments marked as "Required" below must be included for upload to PyPI.
# Fields marked as "Optional" may be commented out.
@ -114,7 +115,7 @@ setup(
python_requires='>=3.6, <4',
install_requires=['scikit-learn', 'pandas', 'tqdm', 'matplotlib'],
install_requires=['scikit-learn', 'pandas', 'tqdm', 'matplotlib', 'joblib', 'xlrd', 'abstention'],
# List additional groups of dependencies here (e.g. development
# dependencies). Users will be able to install these using the "extras"
@ -158,7 +159,8 @@ setup(
project_urls={ # Optional
'Contributors': 'https://github.com/HLT-ISTI/QuaPy/graphs/contributors',
'Bug Reports': 'https://github.com/HLT-ISTI/QuaPy/issues',
'Documentation': 'https://github.com/HLT-ISTI/QuaPy/wiki',
'Wiki': 'https://github.com/HLT-ISTI/QuaPy/wiki',
'Documentation': 'https://hlt-isti.github.io/QuaPy/build/html/index.html',
'Source': 'https://github.com/HLT-ISTI/QuaPy/',
},
)