1
0
Fork 0

merging from protocols (aka v0.1.7)

This commit is contained in:
Alejandro Moreo Fernandez 2023-02-14 18:13:21 +01:00
commit e0718b6e1b
84 changed files with 7623 additions and 17299 deletions

View File

@ -13,6 +13,7 @@ for facilitating the analysis and interpretation of the experimental results.
### Last updates:
* Version 0.1.7 is released! major changes can be consulted [here](quapy/FCHANGE_LOG.txt).
* A detailed documentation is now available [here](https://hlt-isti.github.io/QuaPy/)
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)
@ -73,13 +74,14 @@ See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
## Features
* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
* Versatile functionality for performing evaluation based on artificial sampling protocols.
quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
* Datasets frequently used in quantification (textual and numeric), including:
* 32 UCI Machine Learning datasets.
* 11 Twitter quantification-by-sentiment datasets.
* 3 product reviews quantification-by-sentiment datasets.
* 4 tasks from LeQua competition (_new in v0.1.7!_)
* Native support for binary and single-label multiclass quantification scenarios.
* Model selection functionality that minimizes quantification-oriented loss functions.
* Visualization tools for analysing the experimental results.
@ -94,29 +96,6 @@ quantification methods based on structured output learning, HDy, QuaNet, and qua
* pandas, xlrd
* matplotlib
## SVM-perf with quantification-oriented losses
In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
SVM(AE), or SVM(RAE), you have to first download the
[svmperf](http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
package, apply the patch
[svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch), and compile the sources.
The script [prepare_svmperf.sh](prepare_svmperf.sh) does all the job. Simply run:
```
./prepare_svmperf.sh
```
The resulting directory [svm_perf_quantification](./svm_perf_quantification) contains the
patched version of _svmperf_ with quantification-oriented losses.
The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is an extension of the patch made available by
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
that allows SVMperf to optimize for
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
This patch extends the above one by also allowing SVMperf to optimize for
_AE_ and _RAE_.
## Documentation
@ -127,6 +106,8 @@ are provided:
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)

View File

@ -1,7 +1,20 @@
sample_size should not be mandatory when qp.environ['SAMPLE_SIZE'] has been specified
clean all the cumbersome methods that have to be implemented for new quantifiers (e.g., n_classes_ prop, etc.)
make truly parallel the GridSearchQ
make more examples in the "examples" directory
merge with master, because I had to fix some problems with QuaNet due to an issue notified via GitHub!
added cross_val_predict in qp.model_selection (i.e., a cross_val_predict for quantification) --would be nice to have
it parallelized
check the OneVsAll module(s)
check the set_params de neural.py, because the separation of estimator__<param> is not implemented; see also
__check_params_colision
HDy can be customized so that the number of bins is specified, instead of explored within the fit method
Packaging:
==========================================
Documentation with sphinx
Document methods with paper references
unit-tests
clean wiki_examples!

View File

@ -2,23 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Datasets &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Datasets &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="quapy" href="modules.html" />
<link rel="prev" title="Getting Started" href="readme.html" />
<link rel="next" title="Evaluation" href="Evaluation.html" />
<link rel="prev" title="Installation" href="Installation.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -34,12 +37,12 @@
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="modules.html" title="quapy"
<a href="Evaluation.html" title="Evaluation"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="readme.html" title="Getting Started"
<a href="Installation.html" title="Installation"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="datasets">
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this headline"></a></h1>
<section id="datasets">
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this heading"></a></h1>
<p>QuaPy makes available several datasets that have been used in
quantification literature, as well as an interface to allow
anyone import their custom datasets.</p>
@ -77,7 +80,7 @@ Take a look at the following code:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.17</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.33</span><span class="p">]</span>
</pre></div>
</div>
<p>One can easily produce new samples at desired class prevalences:</p>
<p>One can easily produce new samples at desired class prevalence values:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">sample_size</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">prev</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span>
<span class="n">sample</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prev</span><span class="p">)</span>
@ -106,31 +109,12 @@ the indexes, that can then be used to generate the sample:</p>
<span class="o">...</span>
</pre></div>
</div>
<p>QuaPy also implements the artificial sampling protocol that produces (via a
Pythons generator) a series of <em>LabelledCollection</em> objects with equidistant
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">artificial_sampling_generator</span><span class="p">(</span><span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">(),</span> <span class="n">prec</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
</pre></div>
</div>
<p>produces one sampling for each (valid) combination of prevalences originating from
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
that is:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
<span class="o">...</span>
<span class="p">[</span><span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
</pre></div>
</div>
<p>See the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">Evaluation wiki</a> for
further details on how to use the artificial sampling protocol to properly
evaluate a quantification method.</p>
<div class="section" id="reviews-datasets">
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this headline"></a></h2>
<p>However, generating samples for evaluation purposes is tackled in QuaPy
by means of the evaluation protocols (see the dedicated entries in the Wiki
for <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluation</a> and
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">protocols</a>).</p>
<section id="reviews-datasets">
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this heading"></a></h2>
<p>Three datasets of reviews about Kindle devices, Harry Potters series, and
the well-known IMDb movie reviews can be fetched using a unified interface.
For example:</p>
@ -150,47 +134,47 @@ For example:</p>
</pre></div>
</div>
<p>Some statistics of the fhe available datasets are summarized below:</p>
<table class="colwidths-auto docutils align-default">
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="text-align:center head"><p>classes</p></th>
<th class="text-align:center head"><p>train size</p></th>
<th class="text-align:center head"><p>test size</p></th>
<th class="text-align:center head"><p>train prev</p></th>
<th class="text-align:center head"><p>test prev</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>train size</p></th>
<th class="head text-center"><p>test size</p></th>
<th class="head text-center"><p>train prev</p></th>
<th class="head text-center"><p>test prev</p></th>
<th class="head"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>hp</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>9533</p></td>
<td class="text-align:center"><p>18399</p></td>
<td class="text-align:center"><p>[0.018, 0.982]</p></td>
<td class="text-align:center"><p>[0.065, 0.935]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>9533</p></td>
<td class="text-center"><p>18399</p></td>
<td class="text-center"><p>[0.018, 0.982]</p></td>
<td class="text-center"><p>[0.065, 0.935]</p></td>
<td><p>text</p></td>
</tr>
<tr class="row-odd"><td><p>kindle</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>3821</p></td>
<td class="text-align:center"><p>21591</p></td>
<td class="text-align:center"><p>[0.081, 0.919]</p></td>
<td class="text-align:center"><p>[0.063, 0.937]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>3821</p></td>
<td class="text-center"><p>21591</p></td>
<td class="text-center"><p>[0.081, 0.919]</p></td>
<td class="text-center"><p>[0.063, 0.937]</p></td>
<td><p>text</p></td>
</tr>
<tr class="row-even"><td><p>imdb</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>25000</p></td>
<td class="text-align:center"><p>25000</p></td>
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>25000</p></td>
<td class="text-center"><p>25000</p></td>
<td class="text-center"><p>[0.500, 0.500]</p></td>
<td class="text-center"><p>[0.500, 0.500]</p></td>
<td><p>text</p></td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="twitter-sentiment-datasets">
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this headline"></a></h2>
</section>
<section id="twitter-sentiment-datasets">
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this heading"></a></h2>
<p>11 Twitter datasets for sentiment analysis.
Text is not accessible, and the documents were made available
in tf-idf format. Each dataset presents two splits: a train/val
@ -221,123 +205,123 @@ The lists of the Twitter datasets ids can be consulted in:</p>
</pre></div>
</div>
<p>Some details can be found below:</p>
<table class="colwidths-auto docutils align-default">
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="text-align:center head"><p>classes</p></th>
<th class="text-align:center head"><p>train size</p></th>
<th class="text-align:center head"><p>test size</p></th>
<th class="text-align:center head"><p>features</p></th>
<th class="text-align:center head"><p>train prev</p></th>
<th class="text-align:center head"><p>test prev</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>train size</p></th>
<th class="head text-center"><p>test size</p></th>
<th class="head text-center"><p>features</p></th>
<th class="head text-center"><p>train prev</p></th>
<th class="head text-center"><p>test prev</p></th>
<th class="head"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>gasp</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>8788</p></td>
<td class="text-align:center"><p>3765</p></td>
<td class="text-align:center"><p>694582</p></td>
<td class="text-align:center"><p>[0.421, 0.496, 0.082]</p></td>
<td class="text-align:center"><p>[0.407, 0.507, 0.086]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>8788</p></td>
<td class="text-center"><p>3765</p></td>
<td class="text-center"><p>694582</p></td>
<td class="text-center"><p>[0.421, 0.496, 0.082]</p></td>
<td class="text-center"><p>[0.407, 0.507, 0.086]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>hcr</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>1594</p></td>
<td class="text-align:center"><p>798</p></td>
<td class="text-align:center"><p>222046</p></td>
<td class="text-align:center"><p>[0.546, 0.211, 0.243]</p></td>
<td class="text-align:center"><p>[0.640, 0.167, 0.193]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>1594</p></td>
<td class="text-center"><p>798</p></td>
<td class="text-center"><p>222046</p></td>
<td class="text-center"><p>[0.546, 0.211, 0.243]</p></td>
<td class="text-center"><p>[0.640, 0.167, 0.193]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>omd</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>1839</p></td>
<td class="text-align:center"><p>787</p></td>
<td class="text-align:center"><p>199151</p></td>
<td class="text-align:center"><p>[0.463, 0.271, 0.266]</p></td>
<td class="text-align:center"><p>[0.437, 0.283, 0.280]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>1839</p></td>
<td class="text-center"><p>787</p></td>
<td class="text-center"><p>199151</p></td>
<td class="text-center"><p>[0.463, 0.271, 0.266]</p></td>
<td class="text-center"><p>[0.437, 0.283, 0.280]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>sanders</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>2155</p></td>
<td class="text-align:center"><p>923</p></td>
<td class="text-align:center"><p>229399</p></td>
<td class="text-align:center"><p>[0.161, 0.691, 0.148]</p></td>
<td class="text-align:center"><p>[0.164, 0.688, 0.148]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>2155</p></td>
<td class="text-center"><p>923</p></td>
<td class="text-center"><p>229399</p></td>
<td class="text-center"><p>[0.161, 0.691, 0.148]</p></td>
<td class="text-center"><p>[0.164, 0.688, 0.148]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>semeval13</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>11338</p></td>
<td class="text-align:center"><p>3813</p></td>
<td class="text-align:center"><p>1215742</p></td>
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-align:center"><p>[0.158, 0.430, 0.412]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>11338</p></td>
<td class="text-center"><p>3813</p></td>
<td class="text-center"><p>1215742</p></td>
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-center"><p>[0.158, 0.430, 0.412]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>semeval14</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>11338</p></td>
<td class="text-align:center"><p>1853</p></td>
<td class="text-align:center"><p>1215742</p></td>
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-align:center"><p>[0.109, 0.361, 0.530]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>11338</p></td>
<td class="text-center"><p>1853</p></td>
<td class="text-center"><p>1215742</p></td>
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-center"><p>[0.109, 0.361, 0.530]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>semeval15</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>11338</p></td>
<td class="text-align:center"><p>2390</p></td>
<td class="text-align:center"><p>1215742</p></td>
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-align:center"><p>[0.153, 0.413, 0.434]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>11338</p></td>
<td class="text-center"><p>2390</p></td>
<td class="text-center"><p>1215742</p></td>
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
<td class="text-center"><p>[0.153, 0.413, 0.434]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>semeval16</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>8000</p></td>
<td class="text-align:center"><p>2000</p></td>
<td class="text-align:center"><p>889504</p></td>
<td class="text-align:center"><p>[0.157, 0.351, 0.492]</p></td>
<td class="text-align:center"><p>[0.163, 0.341, 0.497]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>8000</p></td>
<td class="text-center"><p>2000</p></td>
<td class="text-center"><p>889504</p></td>
<td class="text-center"><p>[0.157, 0.351, 0.492]</p></td>
<td class="text-center"><p>[0.163, 0.341, 0.497]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>sst</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>2971</p></td>
<td class="text-align:center"><p>1271</p></td>
<td class="text-align:center"><p>376132</p></td>
<td class="text-align:center"><p>[0.261, 0.452, 0.288]</p></td>
<td class="text-align:center"><p>[0.207, 0.481, 0.312]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>2971</p></td>
<td class="text-center"><p>1271</p></td>
<td class="text-center"><p>376132</p></td>
<td class="text-center"><p>[0.261, 0.452, 0.288]</p></td>
<td class="text-center"><p>[0.207, 0.481, 0.312]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-odd"><td><p>wa</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>2184</p></td>
<td class="text-align:center"><p>936</p></td>
<td class="text-align:center"><p>248563</p></td>
<td class="text-align:center"><p>[0.305, 0.414, 0.281]</p></td>
<td class="text-align:center"><p>[0.282, 0.446, 0.272]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>2184</p></td>
<td class="text-center"><p>936</p></td>
<td class="text-center"><p>248563</p></td>
<td class="text-center"><p>[0.305, 0.414, 0.281]</p></td>
<td class="text-center"><p>[0.282, 0.446, 0.272]</p></td>
<td><p>sparse</p></td>
</tr>
<tr class="row-even"><td><p>wb</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>4259</p></td>
<td class="text-align:center"><p>1823</p></td>
<td class="text-align:center"><p>404333</p></td>
<td class="text-align:center"><p>[0.270, 0.392, 0.337]</p></td>
<td class="text-align:center"><p>[0.274, 0.392, 0.335]</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>4259</p></td>
<td class="text-center"><p>1823</p></td>
<td class="text-center"><p>404333</p></td>
<td class="text-center"><p>[0.270, 0.392, 0.337]</p></td>
<td class="text-center"><p>[0.274, 0.392, 0.335]</p></td>
<td><p>sparse</p></td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="uci-machine-learning">
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this headline"></a></h2>
</section>
<section id="uci-machine-learning">
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this heading"></a></h2>
<p>A set of 32 datasets from the <a class="reference external" href="https://archive.ics.uci.edu/ml/datasets.php">UCI Machine Learning repository</a>
used in:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Pérez</span><span class="o">-</span><span class="n">Gállego</span><span class="p">,</span> <span class="n">P</span><span class="o">.</span><span class="p">,</span> <span class="n">Quevedo</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">R</span><span class="o">.</span><span class="p">,</span> <span class="o">&amp;</span> <span class="k">del</span> <span class="n">Coz</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">J</span><span class="o">.</span> <span class="p">(</span><span class="mi">2017</span><span class="p">)</span><span class="o">.</span>
@ -371,252 +355,252 @@ training+test dataset at a time, following a kFCV protocol:</p>
<p>Above code will allow to conduct a 2x5FCV evaluation on the “yeast” dataset.</p>
<p>All datasets come in numerical form (dense matrices); some statistics
are summarized below.</p>
<table class="colwidths-auto docutils align-default">
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="text-align:center head"><p>classes</p></th>
<th class="text-align:center head"><p>instances</p></th>
<th class="text-align:center head"><p>features</p></th>
<th class="text-align:center head"><p>prev</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>instances</p></th>
<th class="head text-center"><p>features</p></th>
<th class="head text-center"><p>prev</p></th>
<th class="head"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>acute.a</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>120</p></td>
<td class="text-align:center"><p>6</p></td>
<td class="text-align:center"><p>[0.508, 0.492]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>120</p></td>
<td class="text-center"><p>6</p></td>
<td class="text-center"><p>[0.508, 0.492]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>acute.b</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>120</p></td>
<td class="text-align:center"><p>6</p></td>
<td class="text-align:center"><p>[0.583, 0.417]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>120</p></td>
<td class="text-center"><p>6</p></td>
<td class="text-center"><p>[0.583, 0.417]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>balance.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>625</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>625</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.539, 0.461]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>balance.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>625</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.922, 0.078]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>625</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.922, 0.078]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>balance.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>625</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>625</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.539, 0.461]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>breast-cancer</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>683</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.350, 0.650]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>683</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.350, 0.650]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>cmc.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1473</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.573, 0.427]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1473</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.573, 0.427]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>cmc.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1473</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.774, 0.226]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1473</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.774, 0.226]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>cmc.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1473</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1473</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.653, 0.347]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>ctg.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>2126</p></td>
<td class="text-align:center"><p>22</p></td>
<td class="text-align:center"><p>[0.222, 0.778]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>2126</p></td>
<td class="text-center"><p>22</p></td>
<td class="text-center"><p>[0.222, 0.778]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>ctg.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>2126</p></td>
<td class="text-align:center"><p>22</p></td>
<td class="text-align:center"><p>[0.861, 0.139]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>2126</p></td>
<td class="text-center"><p>22</p></td>
<td class="text-center"><p>[0.861, 0.139]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>ctg.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>2126</p></td>
<td class="text-align:center"><p>22</p></td>
<td class="text-align:center"><p>[0.917, 0.083]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>2126</p></td>
<td class="text-center"><p>22</p></td>
<td class="text-center"><p>[0.917, 0.083]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>german</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1000</p></td>
<td class="text-align:center"><p>24</p></td>
<td class="text-align:center"><p>[0.300, 0.700]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>24</p></td>
<td class="text-center"><p>[0.300, 0.700]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>haberman</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>306</p></td>
<td class="text-align:center"><p>3</p></td>
<td class="text-align:center"><p>[0.735, 0.265]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>306</p></td>
<td class="text-center"><p>3</p></td>
<td class="text-center"><p>[0.735, 0.265]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>ionosphere</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>351</p></td>
<td class="text-align:center"><p>34</p></td>
<td class="text-align:center"><p>[0.641, 0.359]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>351</p></td>
<td class="text-center"><p>34</p></td>
<td class="text-center"><p>[0.641, 0.359]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>iris.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>150</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>150</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.667, 0.333]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>iris.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>150</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>150</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.667, 0.333]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>iris.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>150</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>150</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.667, 0.333]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>mammographic</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>830</p></td>
<td class="text-align:center"><p>5</p></td>
<td class="text-align:center"><p>[0.514, 0.486]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>830</p></td>
<td class="text-center"><p>5</p></td>
<td class="text-center"><p>[0.514, 0.486]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>pageblocks.5</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>5473</p></td>
<td class="text-align:center"><p>10</p></td>
<td class="text-align:center"><p>[0.979, 0.021]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>5473</p></td>
<td class="text-center"><p>10</p></td>
<td class="text-center"><p>[0.979, 0.021]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>semeion</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1593</p></td>
<td class="text-align:center"><p>256</p></td>
<td class="text-align:center"><p>[0.901, 0.099]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1593</p></td>
<td class="text-center"><p>256</p></td>
<td class="text-center"><p>[0.901, 0.099]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>sonar</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>208</p></td>
<td class="text-align:center"><p>60</p></td>
<td class="text-align:center"><p>[0.534, 0.466]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>208</p></td>
<td class="text-center"><p>60</p></td>
<td class="text-center"><p>[0.534, 0.466]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>spambase</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>4601</p></td>
<td class="text-align:center"><p>57</p></td>
<td class="text-align:center"><p>[0.606, 0.394]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>4601</p></td>
<td class="text-center"><p>57</p></td>
<td class="text-center"><p>[0.606, 0.394]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>spectf</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>267</p></td>
<td class="text-align:center"><p>44</p></td>
<td class="text-align:center"><p>[0.794, 0.206]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>267</p></td>
<td class="text-center"><p>44</p></td>
<td class="text-center"><p>[0.794, 0.206]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>tictactoe</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>958</p></td>
<td class="text-align:center"><p>9</p></td>
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>958</p></td>
<td class="text-center"><p>9</p></td>
<td class="text-center"><p>[0.653, 0.347]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>transfusion</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>748</p></td>
<td class="text-align:center"><p>4</p></td>
<td class="text-align:center"><p>[0.762, 0.238]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>748</p></td>
<td class="text-center"><p>4</p></td>
<td class="text-center"><p>[0.762, 0.238]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>wdbc</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>569</p></td>
<td class="text-align:center"><p>30</p></td>
<td class="text-align:center"><p>[0.627, 0.373]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>569</p></td>
<td class="text-center"><p>30</p></td>
<td class="text-center"><p>[0.627, 0.373]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>wine.1</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>178</p></td>
<td class="text-align:center"><p>13</p></td>
<td class="text-align:center"><p>[0.669, 0.331]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>178</p></td>
<td class="text-center"><p>13</p></td>
<td class="text-center"><p>[0.669, 0.331]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>wine.2</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>178</p></td>
<td class="text-align:center"><p>13</p></td>
<td class="text-align:center"><p>[0.601, 0.399]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>178</p></td>
<td class="text-center"><p>13</p></td>
<td class="text-center"><p>[0.601, 0.399]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>wine.3</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>178</p></td>
<td class="text-align:center"><p>13</p></td>
<td class="text-align:center"><p>[0.730, 0.270]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>178</p></td>
<td class="text-center"><p>13</p></td>
<td class="text-center"><p>[0.730, 0.270]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>wine-q-red</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1599</p></td>
<td class="text-align:center"><p>11</p></td>
<td class="text-align:center"><p>[0.465, 0.535]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1599</p></td>
<td class="text-center"><p>11</p></td>
<td class="text-center"><p>[0.465, 0.535]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-odd"><td><p>wine-q-white</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>4898</p></td>
<td class="text-align:center"><p>11</p></td>
<td class="text-align:center"><p>[0.335, 0.665]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>4898</p></td>
<td class="text-center"><p>11</p></td>
<td class="text-center"><p>[0.335, 0.665]</p></td>
<td><p>dense</p></td>
</tr>
<tr class="row-even"><td><p>yeast</p></td>
<td class="text-align:center"><p>2</p></td>
<td class="text-align:center"><p>1484</p></td>
<td class="text-align:center"><p>8</p></td>
<td class="text-align:center"><p>[0.711, 0.289]</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>1484</p></td>
<td class="text-center"><p>8</p></td>
<td class="text-center"><p>[0.711, 0.289]</p></td>
<td><p>dense</p></td>
</tr>
</tbody>
</table>
<div class="section" id="issues">
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this headline"></a></h3>
<section id="issues">
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this heading"></a></h3>
<p>All datasets will be downloaded automatically the first time they are requested, and
stored in the <em>quapy_data</em> folder for faster further reuse.
However, some datasets require special actions that at the moment are not fully
@ -631,10 +615,82 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
OS-dependent software manually. Information on how to do it will be printed the first
time the dataset is invoked.</p></li>
</ul>
</section>
</section>
<section id="lequa-datasets">
<h2>LeQua Datasets<a class="headerlink" href="#lequa-datasets" title="Permalink to this heading"></a></h2>
<p>QuaPy also provides the datasets used for the LeQua competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
are multiclass quantification problems consisting of estimating the class prevalence
values of 28 different merchandise products.</p>
<p>Every task consists of a training set, a set of validation samples (for model selection)
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
(training) and two generation protocols (for validation and test samples), as follows:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">training</span><span class="p">,</span> <span class="n">val_generator</span><span class="p">,</span> <span class="n">test_generator</span> <span class="o">=</span> <span class="n">fetch_lequa2022</span><span class="p">(</span><span class="n">task</span><span class="o">=</span><span class="n">task</span><span class="p">)</span>
</pre></div>
</div>
<p>See the <code class="docutils literal notranslate"><span class="pre">lequa2022_experiments.py</span></code> in the examples folder for further details on how to
carry out experiments using these datasets.</p>
<p>The datasets are downloaded only once, and stored for fast reuse.</p>
<p>Some statistics are summarized below:</p>
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
<th class="head text-center"><p>classes</p></th>
<th class="head text-center"><p>train size</p></th>
<th class="head text-center"><p>validation samples</p></th>
<th class="head text-center"><p>test samples</p></th>
<th class="head text-center"><p>docs by sample</p></th>
<th class="head text-center"><p>type</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>T1A</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>250</p></td>
<td class="text-center"><p>vector</p></td>
</tr>
<tr class="row-odd"><td><p>T1B</p></td>
<td class="text-center"><p>28</p></td>
<td class="text-center"><p>20000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>vector</p></td>
</tr>
<tr class="row-even"><td><p>T2A</p></td>
<td class="text-center"><p>2</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>250</p></td>
<td class="text-center"><p>text</p></td>
</tr>
<tr class="row-odd"><td><p>T2B</p></td>
<td class="text-center"><p>28</p></td>
<td class="text-center"><p>20000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>5000</p></td>
<td class="text-center"><p>1000</p></td>
<td class="text-center"><p>text</p></td>
</tr>
</tbody>
</table>
<p>For further details on the datasets, we refer to the original
<a class="reference external" href="https://ceur-ws.org/Vol-3180/paper-146.pdf">paper</a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Esuli</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Moreo</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Sebastiani</span><span class="p">,</span> <span class="n">F</span><span class="o">.</span><span class="p">,</span> <span class="o">&amp;</span> <span class="n">Sperduti</span><span class="p">,</span> <span class="n">G</span><span class="o">.</span> <span class="p">(</span><span class="mi">2022</span><span class="p">)</span><span class="o">.</span>
<span class="n">A</span> <span class="n">Detailed</span> <span class="n">Overview</span> <span class="n">of</span> <span class="n">LeQua</span><span class="o">@</span> <span class="n">CLEF</span> <span class="mi">2022</span><span class="p">:</span> <span class="n">Learning</span> <span class="n">to</span> <span class="n">Quantify</span><span class="o">.</span>
</pre></div>
</div>
<div class="section" id="adding-custom-datasets">
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this headline"></a></h2>
</section>
<section id="adding-custom-datasets">
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this heading"></a></h2>
<p>QuaPy provides data loaders for simple formats dealing with
text, following the format:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">class</span><span class="o">-</span><span class="nb">id</span> \<span class="n">t</span> <span class="n">first</span> <span class="n">document</span><span class="s1">&#39;s pre-processed text </span><span class="se">\n</span>
@ -664,17 +720,20 @@ all classes to be present in the collection).</p>
paths, in order to create a training and test pair of <em>LabelledCollection</em>,
e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="n">train_path</span> <span class="o">=</span> <span class="s1">&#39;../my_data/train.dat&#39;</span>
<span class="n">test_path</span> <span class="o">=</span> <span class="s1">&#39;../my_data/test.dat&#39;</span>
<span class="k">def</span> <span class="nf">my_custom_loader</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">return</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">Dataset</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">my_custom_loader</span><span class="p">)</span>
</pre></div>
</div>
<div class="section" id="data-processing">
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this headline"></a></h3>
<section id="data-processing">
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this heading"></a></h3>
<p>QuaPy implements a number of preprocessing functions in the package <em>qp.data.preprocessing</em>, including:</p>
<ul class="simple">
<li><p><em>text2tfidf</em>: tfidf vectorization</p></li>
@ -683,9 +742,9 @@ e.g.:</p>
that the column values have zero mean and unit variance).</p></li>
<li><p><em>index</em>: transforms textual tokens into lists of numeric ids)</p></li>
</ul>
</div>
</div>
</div>
</section>
</section>
</section>
<div class="clearer"></div>
@ -694,8 +753,9 @@ that the column values have zero mean and unit variance).</p></li>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Datasets</a><ul>
<li><a class="reference internal" href="#reviews-datasets">Reviews Datasets</a></li>
<li><a class="reference internal" href="#twitter-sentiment-datasets">Twitter Sentiment Datasets</a></li>
@ -703,6 +763,7 @@ that the column values have zero mean and unit variance).</p></li>
<li><a class="reference internal" href="#issues">Issues:</a></li>
</ul>
</li>
<li><a class="reference internal" href="#lequa-datasets">LeQua Datasets</a></li>
<li><a class="reference internal" href="#adding-custom-datasets">Adding Custom Datasets</a><ul>
<li><a class="reference internal" href="#data-processing">Data Processing</a></li>
</ul>
@ -711,12 +772,17 @@ that the column values have zero mean and unit variance).</p></li>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="readme.html"
title="previous chapter">Getting Started</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Installation.html"
title="previous chapter">Installation</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Evaluation.html"
title="next chapter">Evaluation</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -733,7 +799,7 @@ that the column values have zero mean and unit variance).</p></li>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -748,18 +814,18 @@ that the column values have zero mean and unit variance).</p></li>
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="modules.html" title="quapy"
<a href="Evaluation.html" title="Evaluation"
>next</a> |</li>
<li class="right" >
<a href="readme.html" title="Getting Started"
<a href="Installation.html" title="Installation"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,22 +2,25 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Evaluation &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Evaluation &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Quantification Methods" href="Methods.html" />
<link rel="next" title="Protocols" href="Protocols.html" />
<link rel="prev" title="Datasets" href="Datasets.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
@ -34,12 +37,12 @@
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Protocols.html" title="Protocols"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Datasets.html" title="Datasets"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="evaluation">
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this headline"></a></h1>
<section id="evaluation">
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this heading"></a></h1>
<p>Quantification is an appealing tool in scenarios of dataset shift,
and particularly in scenarios of prior-probability shift.
That is, the interest in estimating the class prevalences arises
@ -62,8 +65,8 @@ to be unlikely (as is the case in general scenarios of
machine learning governed by the iid assumption).
In brief, quantification requires dedicated evaluation protocols,
which are implemented in QuaPy and explained here.</p>
<div class="section" id="error-measures">
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this headline"></a></h2>
<section id="error-measures">
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this heading"></a></h2>
<p>The module quapy.error implements the following error measures for quantification:</p>
<ul class="simple">
<li><p><em>mae</em>: mean absolute error</p></li>
@ -96,13 +99,13 @@ third argument, e.g.:</p>
Traditionally, this value is set to 1/(2T) in past literature,
with T the sampling size. One could either pass this value
to the function each time, or to set a QuaPys environment
variable <em>SAMPLE_SIZE</em> once, and ommit this argument
variable <em>SAMPLE_SIZE</em> once, and omit this argument
thereafter (recommended);
e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># once for all</span>
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">])</span> <span class="c1"># let&#39;s assume 3 classes</span>
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">])</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;mrae(</span><span class="si">{</span><span class="n">true_prev</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">estim_prev</span><span class="si">}</span><span class="s1">) = </span><span class="si">{</span><span class="n">error</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
</div>
@ -112,150 +115,95 @@ e.g.:</p>
</div>
<p>Finally, it is possible to instantiate QuaPys quantification
error functions from strings using, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">&#39;mse&#39;</span><span class="p">)</span>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">&#39;mse&#39;</span><span class="p">)</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">error_function</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="evaluation-protocols">
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this headline"></a></h2>
<p>QuaPy implements the so-called “artificial sampling protocol”,
according to which a test set is used to generate samplings at
desired prevalences of fixed size and covering the full spectrum
of prevalences. This protocol is called “artificial” in contrast
to the “natural prevalence sampling” protocol that,
despite introducing some variability during sampling, approximately
preserves the training class prevalence.</p>
<p>In the artificial sampling procol, the user specifies the number
of (equally distant) points to be generated from the interval [0,1].</p>
<p>For example, if n_prevpoints=11 then, for each class, the prevalences
[0., 0.1, 0.2, …, 1.] will be used. This means that, for two classes,
the number of different prevalences will be 11 (since, once the prevalence
of one class is determined, the other one is constrained). For 3 classes,
the number of valid combinations can be obtained as 11 + 10 + … + 1 = 66.
In general, the number of valid combinations that will be produced for a given
value of n_prevpoints can be consulted by invoking
quapy.functional.num_prevalence_combinations, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="mi">21</span>
<span class="n">n_classes</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</section>
<section id="evaluation-protocols">
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this heading"></a></h2>
<p>An <em>evaluation protocol</em> is an evaluation procedure that uses
one specific <em>sample generation procotol</em> to genereate many
samples, typically characterized by widely varying amounts of
<em>shift</em> with respect to the original distribution, that are then
used to evaluate the performance of a (trained) quantifier.
These protocols are explained in more detail in a dedicated <a class="reference internal" href="Protocols.html"><span class="doc std std-doc">entry
in the wiki</span></a>. For the moment being, let us assume we already have
chosen and instantiated one specific such protocol, that we here
simply call <em>prot</em>. Let also assume our model is called
<em>quantifier</em> and that our evaluatio measure of choice is
<em>mae</em>. The evaluation comes down to:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mae</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;MAE = </span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>in this example, n=1771. Note the last argument, n_repeats, that
informs of the number of examples that will be generated for any
valid combination (typical values are, e.g., 1 for a single sample,
or 10 or higher for computing standard deviations of performing statistical
significance tests).</p>
<p>One can instead work the other way around, i.e., one could set a
maximum budged of evaluations and get the number of prevalence points that
will generate a number of evaluations close, but not higher, than
the fixed budget. This can be achieved with the function
quapy.functional.get_nprevpoints_approximation, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">budget</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">get_nprevpoints_approximation</span><span class="p">(</span><span class="n">budget</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;by setting n_prevpoints=</span><span class="si">{</span><span class="n">n_prevpoints</span><span class="si">}</span><span class="s1"> the number of evaluations for </span><span class="si">{</span><span class="n">n_classes</span><span class="si">}</span><span class="s1"> classes will be </span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<p>It is often desirable to evaluate our system using more than one
single evaluatio measure. In this case, it is convenient to generate
a <em>report</em>. A report in QuaPy is a dataframe accounting for all the
true prevalence values with their corresponding prevalence values
as estimated by the quantifier, along with the error each has given
rise.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">report</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluation_report</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span> <span class="s1">&#39;mrae&#39;</span><span class="p">,</span> <span class="s1">&#39;mkld&#39;</span><span class="p">])</span>
</pre></div>
</div>
<p>that will print:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">by</span> <span class="n">setting</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">30</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">evaluations</span> <span class="k">for</span> <span class="mi">4</span> <span class="n">classes</span> <span class="n">will</span> <span class="n">be</span> <span class="mi">4960</span>
</pre></div>
</div>
<p>The cost of evaluation will depend on the values of <em>n_prevpoints</em>, <em>n_classes</em>,
and <em>n_repeats</em>. Since it might sometimes be cumbersome to control the overall
cost of an experiment having to do with the number of combinations that
will be generated for a particular setting of these arguments (particularly
when <em>n_classes&gt;2</em>), evaluation functions
typically allow the user to rather specify an <em>evaluation budget</em>, i.e., a maximum
number of samplings to generate. By specifying this argument, one could avoid
specifying <em>n_prevpoints</em>, and the value for it that would lead to a closer
number of evaluation budget, without surpassing it, will be automatically set.</p>
<p>The following script shows a full example in which a PACC model relying
on a Logistic Regressor classifier is
tested on the <em>kindle</em> dataset by means of the artificial prevalence
sampling protocol on samples of size 500, in terms of various
evaluation metrics.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
<p>From a pandas dataframe, it is straightforward to visualize all the results,
and compute the averaged values, e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">&#39;display.expand_frame_repr&#39;</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="n">report</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">report</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="p">)</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;kindle&#39;</span><span class="p">)</span>
<span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">preprocessing</span><span class="o">.</span><span class="n">text2tfidf</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
<span class="n">pacc</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_report</span><span class="p">(</span>
<span class="n">pacc</span><span class="p">,</span> <span class="c1"># the quantification method</span>
<span class="n">test</span><span class="p">,</span> <span class="c1"># the test set on which the method will be evaluated</span>
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="c1">#indicates the size of samples to be drawn</span>
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="c1"># how many prevalence points will be extracted from the interval [0, 1] for each category</span>
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># number of times each prevalence will be used to generate a test sample</span>
<span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)</span>
<span class="n">random_seed</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span> <span class="c1"># setting a random seed allows to replicate the test samples across runs</span>
<span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span> <span class="s1">&#39;mrae&#39;</span><span class="p">,</span> <span class="s1">&#39;mkld&#39;</span><span class="p">],</span> <span class="c1"># specify the evaluation metrics</span>
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># set to True to show some standard-line outputs</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Averaged values:&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
</pre></div>
</div>
<p>The resulting report is a pandas dataframe that can be directly printed.
Here, we set some display options from pandas just to make the output clearer;
note also that the estimated prevalences are shown as strings using the
function strprev function that simply converts a prevalence into a
string representing it, with a fixed decimal precision (default 3):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">&#39;display.expand_frame_repr&#39;</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s2">&quot;precision&quot;</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">df</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s1">&#39;estim-prev&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</pre></div>
</div>
<p>The output should look like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.000</span><span class="p">,</span> <span class="mf">1.000</span><span class="p">]</span> <span class="mf">0.000</span> <span class="mf">0.000</span> <span class="mf">0.000e+00</span>
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.091</span><span class="p">,</span> <span class="mf">0.909</span><span class="p">]</span> <span class="mf">0.009</span> <span class="mf">0.048</span> <span class="mf">4.426e-04</span>
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.163</span><span class="p">,</span> <span class="mf">0.837</span><span class="p">]</span> <span class="mf">0.037</span> <span class="mf">0.114</span> <span class="mf">4.633e-03</span>
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.283</span><span class="p">,</span> <span class="mf">0.717</span><span class="p">]</span> <span class="mf">0.017</span> <span class="mf">0.041</span> <span class="mf">7.383e-04</span>
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.366</span><span class="p">,</span> <span class="mf">0.634</span><span class="p">]</span> <span class="mf">0.034</span> <span class="mf">0.070</span> <span class="mf">2.412e-03</span>
<span class="mi">5</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.459</span><span class="p">,</span> <span class="mf">0.541</span><span class="p">]</span> <span class="mf">0.041</span> <span class="mf">0.082</span> <span class="mf">3.387e-03</span>
<span class="mi">6</span> <span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.565</span><span class="p">,</span> <span class="mf">0.435</span><span class="p">]</span> <span class="mf">0.035</span> <span class="mf">0.073</span> <span class="mf">2.535e-03</span>
<span class="mi">7</span> <span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.654</span><span class="p">,</span> <span class="mf">0.346</span><span class="p">]</span> <span class="mf">0.046</span> <span class="mf">0.108</span> <span class="mf">4.701e-03</span>
<span class="mi">8</span> <span class="p">[</span><span class="mf">0.8</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.725</span><span class="p">,</span> <span class="mf">0.275</span><span class="p">]</span> <span class="mf">0.075</span> <span class="mf">0.235</span> <span class="mf">1.515e-02</span>
<span class="mi">9</span> <span class="p">[</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.858</span><span class="p">,</span> <span class="mf">0.142</span><span class="p">]</span> <span class="mf">0.042</span> <span class="mf">0.229</span> <span class="mf">7.740e-03</span>
<span class="mi">10</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.945</span><span class="p">,</span> <span class="mf">0.055</span><span class="p">]</span> <span class="mf">0.055</span> <span class="mf">27.357</span> <span class="mf">5.219e-02</span>
</pre></div>
</div>
<p>One can get the averaged scores using standard pandas
functions, i.e.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
</pre></div>
</div>
<p>will produce the following output:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="mf">0.500</span>
<span class="n">mae</span> <span class="mf">0.035</span>
<span class="n">mrae</span> <span class="mf">2.578</span>
<span class="n">mkld</span> <span class="mf">0.009</span>
<p>This will produce an output like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.308</span><span class="p">,</span> <span class="mf">0.692</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.314</span><span class="p">,</span> <span class="mf">0.686</span><span class="p">]</span> <span class="mf">0.005649</span> <span class="mf">0.013182</span> <span class="mf">0.000074</span>
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.896</span><span class="p">,</span> <span class="mf">0.104</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.909</span><span class="p">,</span> <span class="mf">0.091</span><span class="p">]</span> <span class="mf">0.013145</span> <span class="mf">0.069323</span> <span class="mf">0.000985</span>
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.848</span><span class="p">,</span> <span class="mf">0.152</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.809</span><span class="p">,</span> <span class="mf">0.191</span><span class="p">]</span> <span class="mf">0.039063</span> <span class="mf">0.149806</span> <span class="mf">0.005175</span>
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.016</span><span class="p">,</span> <span class="mf">0.984</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.033</span><span class="p">,</span> <span class="mf">0.967</span><span class="p">]</span> <span class="mf">0.017236</span> <span class="mf">0.487529</span> <span class="mf">0.005298</span>
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.728</span><span class="p">,</span> <span class="mf">0.272</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.751</span><span class="p">,</span> <span class="mf">0.249</span><span class="p">]</span> <span class="mf">0.022769</span> <span class="mf">0.057146</span> <span class="mf">0.001350</span>
<span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span>
<span class="mi">4995</span> <span class="p">[</span><span class="mf">0.72</span><span class="p">,</span> <span class="mf">0.28</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.698</span><span class="p">,</span> <span class="mf">0.302</span><span class="p">]</span> <span class="mf">0.021752</span> <span class="mf">0.053631</span> <span class="mf">0.001133</span>
<span class="mi">4996</span> <span class="p">[</span><span class="mf">0.868</span><span class="p">,</span> <span class="mf">0.132</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.888</span><span class="p">,</span> <span class="mf">0.112</span><span class="p">]</span> <span class="mf">0.020490</span> <span class="mf">0.088230</span> <span class="mf">0.001985</span>
<span class="mi">4997</span> <span class="p">[</span><span class="mf">0.292</span><span class="p">,</span> <span class="mf">0.708</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.298</span><span class="p">,</span> <span class="mf">0.702</span><span class="p">]</span> <span class="mf">0.006149</span> <span class="mf">0.014788</span> <span class="mf">0.000090</span>
<span class="mi">4998</span> <span class="p">[</span><span class="mf">0.24</span><span class="p">,</span> <span class="mf">0.76</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.220</span><span class="p">,</span> <span class="mf">0.780</span><span class="p">]</span> <span class="mf">0.019950</span> <span class="mf">0.054309</span> <span class="mf">0.001127</span>
<span class="mi">4999</span> <span class="p">[</span><span class="mf">0.948</span><span class="p">,</span> <span class="mf">0.052</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.965</span><span class="p">,</span> <span class="mf">0.035</span><span class="p">]</span> <span class="mf">0.016941</span> <span class="mf">0.165776</span> <span class="mf">0.003538</span>
<span class="p">[</span><span class="mi">5000</span> <span class="n">rows</span> <span class="n">x</span> <span class="mi">5</span> <span class="n">columns</span><span class="p">]</span>
<span class="n">Averaged</span> <span class="n">values</span><span class="p">:</span>
<span class="n">mae</span> <span class="mf">0.023588</span>
<span class="n">mrae</span> <span class="mf">0.108779</span>
<span class="n">mkld</span> <span class="mf">0.003631</span>
<span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>
<span class="n">Process</span> <span class="n">finished</span> <span class="k">with</span> <span class="n">exit</span> <span class="n">code</span> <span class="mi">0</span>
</pre></div>
</div>
<p>Other evaluation functions include:</p>
<ul class="simple">
<li><p><em>artificial_sampling_eval</em>: that computes the evaluation for a
given evaluation metric, returning the average instead of a dataframe.</p></li>
<li><p><em>artificial_sampling_prediction</em>: that returns two np.arrays containing the
true prevalences and the estimated prevalences.</p></li>
</ul>
<p>See the documentation for further details.</p>
</div>
<p>Alternatively, we can simply generate all the predictions by:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">)</span>
</pre></div>
</div>
<p>All the evaluation functions implement specific optimizations for speeding-up
the evaluation of aggregative quantifiers (i.e., of instances of <em>AggregativeQuantifier</em>).
The optimization comes down to generating classification predictions (either crisp or soft)
only once for the entire test set, and then applying the sampling procedure to the
predictions, instead of generating samples of instances and then computing the
classification predictions every time. This is only possible when the protocol
is an instance of <em>OnLabelledCollectionProtocol</em>. The optimization is only
carried out when the number of classification predictions thus generated would be
smaller than the number of predictions required for the entire protocol; e.g.,
if the original dataset contains 1M instances, but the protocol is such that it would
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
classification for each sample. This behaviour is indicated by setting
<em>aggr_speedup=”auto”</em>. Conversely, when indicating <em>aggr_speedup=”force”</em> QuaPy will
precompute all the predictions irrespectively of the number of instances and number of samples.
Finally, this can be deactivated by setting <em>aggr_speedup=False</em>. Note that this optimization
is not only applied for the final evaluation, but also for the internal evaluations carried
out during <em>model selection</em>. Since these are typically many, the heuristic can help reduce the
execution time a lot.</p>
</section>
</section>
<div class="clearer"></div>
@ -264,8 +212,9 @@ true prevalences and the estimated prevalences.</p></li>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Evaluation</a><ul>
<li><a class="reference internal" href="#error-measures">Error Measures</a></li>
<li><a class="reference internal" href="#evaluation-protocols">Evaluation Protocols</a></li>
@ -273,12 +222,17 @@ true prevalences and the estimated prevalences.</p></li>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="Datasets.html"
title="previous chapter">Datasets</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="Methods.html"
title="next chapter">Quantification Methods</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Datasets.html"
title="previous chapter">Datasets</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Protocols.html"
title="next chapter">Protocols</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -295,7 +249,7 @@ true prevalences and the estimated prevalences.</p></li>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -310,18 +264,18 @@ true prevalences and the estimated prevalences.</p></li>
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Protocols.html" title="Protocols"
>next</a> |</li>
<li class="right" >
<a href="Datasets.html" title="Datasets"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,18 +2,21 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Installation &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Installation &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
@ -39,7 +42,7 @@
<li class="right" >
<a href="index.html" title="Welcome to QuaPys documentation!"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
</ul>
</div>
@ -49,15 +52,15 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline"></a></h1>
<section id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this heading"></a></h1>
<p>QuaPy can be easily installed via <cite>pip</cite></p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">quapy</span>
</pre></div>
</div>
<p>See <a class="reference external" href="https://pypi.org/project/QuaPy/">pip page</a> for older versions.</p>
<div class="section" id="requirements">
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this headline"></a></h2>
<section id="requirements">
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this heading"></a></h2>
<ul class="simple">
<li><p>scikit-learn, numpy, scipy</p></li>
<li><p>pytorch (for QuaNet)</p></li>
@ -67,9 +70,9 @@
<li><p>pandas, xlrd</p></li>
<li><p>matplotlib</p></li>
</ul>
</div>
<div class="section" id="svm-perf-with-quantification-oriented-losses">
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this headline"></a></h2>
</section>
<section id="svm-perf-with-quantification-oriented-losses">
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this heading"></a></h2>
<p>In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
SVM(AE), or SVM(RAE), you have to first download the
<a class="reference external" href="http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">svmperf</a>
@ -96,8 +99,8 @@ and for the <cite>KLD</cite> and <cite>NKLD</cite> as proposed by
for quantification.
This patch extends the former by also allowing SVMperf to optimize for
<cite>AE</cite> and <cite>RAE</cite>.</p>
</div>
</div>
</section>
</section>
<div class="clearer"></div>
@ -106,8 +109,9 @@ This patch extends the former by also allowing SVMperf to optimize for
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Installation</a><ul>
<li><a class="reference internal" href="#requirements">Requirements</a></li>
<li><a class="reference internal" href="#svm-perf-with-quantification-oriented-losses">SVM-perf with quantification-oriented losses</a></li>
@ -115,12 +119,17 @@ This patch extends the former by also allowing SVMperf to optimize for
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Welcome to QuaPys documentation!</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="Datasets.html"
title="next chapter">Datasets</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Welcome to QuaPys documentation!</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Datasets.html"
title="next chapter">Datasets</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -137,7 +146,7 @@ This patch extends the former by also allowing SVMperf to optimize for
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -157,13 +166,13 @@ This patch extends the former by also allowing SVMperf to optimize for
<li class="right" >
<a href="index.html" title="Welcome to QuaPys documentation!"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,23 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Quantification Methods &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Quantification Methods &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Plotting" href="Plotting.html" />
<link rel="prev" title="Evaluation" href="Evaluation.html" />
<link rel="next" title="Model Selection" href="Model-Selection.html" />
<link rel="prev" title="Protocols" href="Protocols.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -34,12 +37,12 @@
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
<a href="Model-Selection.html" title="Model Selection"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Evaluation.html" title="Evaluation"
<a href="Protocols.html" title="Protocols"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="quantification-methods">
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this headline"></a></h1>
<section id="quantification-methods">
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this heading"></a></h1>
<p>Quantification methods can be categorized as belonging to
<em>aggregative</em> and <em>non-aggregative</em> groups.
Most methods included in QuaPy at the moment are of type <em>aggregative</em>
@ -65,12 +68,6 @@ and implement some abstract methods:</p>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span> <span class="o">...</span>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span> <span class="o">...</span>
<span class="nd">@abstractmethod</span>
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span> <span class="o">...</span>
</pre></div>
</div>
<p>The meaning of those functions should be familiar to those
@ -82,12 +79,12 @@ scikit-learn structure has not been adopted <em>as is</em> in QuaPy responds
the fact that scikit-learns <em>predict</em> function is expected to return
one output for each input element e.g., a predicted label for each
instance in a sample while in quantification the output for a sample
is one single array of class prevalences), while functions <em>set_params</em>
and <em>get_params</em> allow a
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>
to automate the process of hyperparameter search.</p>
<div class="section" id="aggregative-methods">
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this headline"></a></h2>
is one single array of class prevalences).
Quantifiers also extend from scikit-learns <code class="docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, in order
to simplify the use of <em>set_params</em> and <em>get_params</em> used in
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>.</p>
<section id="aggregative-methods">
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this heading"></a></h2>
<p>All quantification methods are implemented as part of the
<em>qp.method</em> package. In particular, <em>aggregative</em> methods are defined in
<em>qp.method.aggregative</em>, and extend <em>AggregativeQuantifier(BaseQuantifier)</em>.
@ -103,12 +100,12 @@ The methods that any <em>aggregative</em> quantifier must implement are:</p>
individual predictions of a classifier. Indeed, a default implementation
of <em>BaseQuantifier.quantify</em> is already provided, which looks like:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">preclassify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">)</span>
</pre></div>
</div>
<p>Aggregative quantifiers are expected to maintain a classifier (which is
accessed through the <em>&#64;property</em> <em>learner</em>). This classifier is
accessed through the <em>&#64;property</em> <em>classifier</em>). This classifier is
given as input to the quantifier, and can be already fit
on external data (in which case, the <em>fit_learner</em> argument should
be set to False), or be fit by the quantifiers fit (default).</p>
@ -118,12 +115,8 @@ aggregative methods, that should inherit from the abstract class
The particularity of <em>probabilistic</em> aggregative methods (w.r.t.
non-probabilistic ones), is that the default quantifier is defined
in terms of the posterior probabilities returned by a probabilistic
classifier, and not by the crisp decisions of a hard classifier; i.e.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
<span class="n">classif_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">posterior_probabilities</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_posteriors</span><span class="p">)</span>
</pre></div>
</div>
classifier, and not by the crisp decisions of a hard classifier.
In any case, the interface <em>classify(instances)</em> remains unchanged.</p>
<p>One advantage of <em>aggregative</em> methods (either probabilistic or not)
is that the evaluation according to any sampling procedure (e.g.,
the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>)
@ -133,8 +126,8 @@ reuse these predictions, without requiring to classify each element every time.
QuaPy leverages this property to speed-up any procedure having to do with
quantification over samples, as is customarily done in model selection or
in evaluation.</p>
<div class="section" id="the-classify-count-variants">
<h3>The Classify &amp; Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this headline"></a></h3>
<section id="the-classify-count-variants">
<h3>The Classify &amp; Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this heading"></a></h3>
<p>QuaPy implements the four CC variants, i.e.:</p>
<ul class="simple">
<li><p><em>CC</em> (Classify &amp; Count), the simplest aggregative quantifier; one that
@ -150,9 +143,7 @@ with a SVM as the classifier:</p>
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">&#39;hcr&#39;</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">&#39;hcr&#39;</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="c1"># instantiate a classifier learner, in this case a SVM</span>
<span class="n">svm</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
@ -196,7 +187,7 @@ e.g.:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PCC</span><span class="p">(</span><span class="n">svm</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;classifier:&#39;</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">&#39;classifier:&#39;</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span>
</pre></div>
</div>
<p>In this case, QuaPy will print:</p>
@ -214,9 +205,9 @@ be applied to hard classifiers when <em>fit_learner=True</em>; an exception
will be raised otherwise.</p>
<p>Lastly, everything we said aboud ACC and PCC
applies to PACC as well.</p>
</div>
<div class="section" id="expectation-maximization-emq">
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this headline"></a></h3>
</section>
<section id="expectation-maximization-emq">
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this heading"></a></h3>
<p>The Expectation Maximization Quantifier (EMQ), also known as
the SLD, is available at <em>qp.method.aggregative.EMQ</em> or via the
alias <em>qp.method.aggregative.ExpectationMaximizationQuantifier</em>.
@ -241,13 +232,21 @@ experiments we have carried out.</p>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="hellinger-distance-y-hdy">
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this headline"></a></h3>
<p>The method HDy is described in:</p>
<p><em>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164.</em></p>
<p><em>New in v0.1.7</em>: EMQ now accepts two new parameters in the construction method, namely
<em>exact_train_prev</em> which allows to use the true training prevalence as the departing
prevalence estimation (default behaviour), or instead an approximation of it as
suggested by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>
(by setting <em>exact_train_prev=False</em>).
The other parameter is <em>recalib</em> which allows to indicate a calibration method, among those
proposed by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>,
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
See the API documentation for further details.</p>
</section>
<section id="hellinger-distance-y-hdy">
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this heading"></a></h3>
<p>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
<a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S0020025512004069">González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164.</a></p>
<p>It is implemented in <em>qp.method.aggregative.HDy</em> (also accessible
through the allias <em>qp.method.aggregative.HellingerDistanceY</em>).
This method works with a probabilistic classifier (hard classifiers
@ -274,30 +273,48 @@ provided in QuaPy accepts only binary datasets.</p>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the generalized
“Distribution Matching” approaches for multiclass, inspired by the framework
of <a class="reference external" href="https://arxiv.org/abs/1606.00868">Firat (2016)</a>. One can instantiate
a variant of HDy for multiclass quantification as follows:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mutliclassHDy</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">DistributionMatching</span><span class="p">(</span><span class="n">classifier</span><span class="o">=</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">divergence</span><span class="o">=</span><span class="s1">&#39;HD&#39;</span><span class="p">,</span> <span class="n">cdf</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</pre></div>
</div>
<div class="section" id="explicit-loss-minimization">
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this headline"></a></h3>
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the “DyS”
framework proposed by <a class="reference external" href="https://ojs.aaai.org/index.php/AAAI/article/view/4376">Maletzke et al (2020)</a>
and the “SMM” method proposed by <a class="reference external" href="https://ieeexplore.ieee.org/document/9260028">Hassan et al (2019)</a>
(thanks to <em>Pablo González</em> for the contributions!)</p>
</section>
<section id="threshold-optimization-methods">
<h3>Threshold Optimization methods<a class="headerlink" href="#threshold-optimization-methods" title="Permalink to this heading"></a></h3>
<p><em>New in v0.1.7:</em> QuaPy now implements Formans threshold optimization methods;
see, e.g., <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/1150402.1150423">(Forman 2006)</a>
and <a class="reference external" href="https://link.springer.com/article/10.1007/s10618-008-0097-y">(Forman 2008)</a>.
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.</p>
</section>
<section id="explicit-loss-minimization">
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this heading"></a></h3>
<p>The Explicit Loss Minimization (ELM) represent a family of methods
based on structured output learning, i.e., quantifiers relying on
classifiers that have been optimized targeting a
quantification-oriented evaluation measure.</p>
<p>In QuaPy, the following methods, all relying on Joachims
<a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
implementation, are available in <em>qp.method.aggregative</em>:</p>
quantification-oriented evaluation measure.
The original methods are implemented in QuaPy as classify &amp; count (CC)
quantifiers that use Joachims <a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
as the underlying classifier, properly set to optimize for the desired loss.</p>
<p>In QuaPy, this can be more achieved by calling the functions:</p>
<ul class="simple">
<li><p>SVMQ (SVM-Q) is a quantification method optimizing the metric <em>Q</em> defined
in <em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604.</em></p></li>
<li><p>SVMKLD (SVM for Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
<li><p><em>newSVMQ</em>: returns the quantification method called SVM(Q) that optimizes for the metric <em>Q</em> defined
in <a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S003132031400291X"><em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604.</em></a></p></li>
<li><p><em>newSVMKLD</em> and <em>newSVMNKLD</em>: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/2700406"><em>Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
<li><p>SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
<li><p>SVMAE (SVM for Mean Absolute Error)</p></li>
<li><p>SVMRAE (SVM for Mean Relative Absolute Error)</p></li>
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></a></p></li>
<li><p><em>newSVMAE</em> and <em>newSVMRAE</em>: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
(Mean) Relative Absolute Error, as first used by
<a class="reference external" href="https://arxiv.org/abs/2011.02552"><em>Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23.</em></a></p></li>
</ul>
<p>the last two methods (SVMAE and SVMRAE) have been implemented in
<p>the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
QuaPy in order to make available ELM variants for what nowadays
are considered the most well-behaved evaluation metrics in quantification.</p>
<p>In order to make these models work, you would need to run the script
@ -327,11 +344,15 @@ currently supports only binary classification.
ELM variants (any binary quantifier in general) can be extended
to operate in single-label scenarios trivially by adopting a
“one-vs-all” strategy (as, e.g., in
<em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122</em>).
In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
<a class="reference external" href="https://link.springer.com/article/10.1007/s13278-016-0327-z"><em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122</em></a>).
In QuaPy this is possible by using the <em>OneVsAll</em> class.</p>
<p>There are two ways for instantiating this class, <em>OneVsAllGeneric</em> that works for
any quantifier, and <em>OneVsAllAggregative</em> that is optimized for aggregative quantifiers.
In general, you can simply use the <em>getOneVsAll</em> function and QuaPy will choose
the more convenient of the two.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span><span class="p">,</span> <span class="n">OneVsAll</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span>
<span class="c1"># load a single-label dataset (this one contains 3 classes)</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">&#39;hcr&#39;</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
@ -339,30 +360,32 @@ In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
<span class="c1"># let qp know where svmperf is</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SVMPERF_HOME&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;../svm_perf_quantification&#39;</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">OneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">getOneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="meta-models">
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this headline"></a></h2>
<p>Check the examples <em><span class="xref myst">explicit_loss_minimization.py</span></em>
and <span class="xref myst">one_vs_all.py</span> for more details.</p>
</section>
</section>
<section id="meta-models">
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this heading"></a></h2>
<p>By <em>meta</em> models we mean quantification methods that are defined on top of other
quantification methods, and that thus do not squarely belong to the aggregative nor
the non-aggregative group (indeed, <em>meta</em> models could use quantifiers from any of those
groups).
<em>Meta</em> models are implemented in the <em>qp.method.meta</em> module.</p>
<div class="section" id="ensembles">
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this headline"></a></h3>
<section id="ensembles">
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this heading"></a></h3>
<p>QuaPy implements (some of) the variants proposed in:</p>
<ul class="simple">
<li><p><em>Pérez-Gállego, P., Quevedo, J. R., &amp; del Coz, J. J. (2017).
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253516300628"><em>Pérez-Gállego, P., Quevedo, J. R., &amp; del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100.</em></p></li>
<li><p><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., &amp; del Coz, J. J. (2019).
Information Fusion, 34, 87-100.</em></a></p></li>
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253517303652"><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., &amp; del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15.</em></p></li>
Information Fusion, 45, 1-15.</em></a></p></li>
</ul>
<p>The following code shows how to instantiate an Ensemble of 30 <em>Adjusted Classify &amp; Count</em> (ACC)
quantifiers operating with a <em>Logistic Regressor</em> (LR) as the base classifier, and using the
@ -391,14 +414,14 @@ the performance estimated for each member of the ensemble in terms of that evalu
informs of the number of members to retain.</p>
<p>Please, check the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selection</a>
wiki if you want to optimize the hyperparameters of ensemble for classification or quantification.</p>
</div>
<div class="section" id="the-quanet-neural-network">
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this headline"></a></h3>
</section>
<section id="the-quanet-neural-network">
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this heading"></a></h3>
<p>QuaPy offers an implementation of QuaNet, a deep learning model presented in:</p>
<p><em>Esuli, A., Moreo, A., &amp; Sebastiani, F. (2018, October).
<p><a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/3269206.3269287"><em>Esuli, A., Moreo, A., &amp; Sebastiani, F. (2018, October).
A recurrent neural network for sentiment quantification.
In Proceedings of the 27th ACM International Conference on
Information and Knowledge Management (pp. 1775-1778).</em></p>
Information and Knowledge Management (pp. 1775-1778).</em></a></p>
<p>This model requires <em>torch</em> to be installed.
QuaNet also requires a classifier that can provide embedded representations
of the inputs.
@ -420,14 +443,14 @@ In the following example, we show an instantiation of QuaNet that instead uses C
<span class="n">learner</span> <span class="o">=</span> <span class="n">NeuralClassifierTrainer</span><span class="p">(</span><span class="n">cnn</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cuda&#39;</span><span class="p">)</span>
<span class="c1"># train QuaNet</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cuda&#39;</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">&#39;cuda&#39;</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
</section>
</section>
</section>
<div class="clearer"></div>
@ -436,13 +459,15 @@ In the following example, we show an instantiation of QuaNet that instead uses C
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Quantification Methods</a><ul>
<li><a class="reference internal" href="#aggregative-methods">Aggregative Methods</a><ul>
<li><a class="reference internal" href="#the-classify-count-variants">The Classify &amp; Count variants</a></li>
<li><a class="reference internal" href="#expectation-maximization-emq">Expectation Maximization (EMQ)</a></li>
<li><a class="reference internal" href="#hellinger-distance-y-hdy">Hellinger Distance y (HDy)</a></li>
<li><a class="reference internal" href="#threshold-optimization-methods">Threshold Optimization methods</a></li>
<li><a class="reference internal" href="#explicit-loss-minimization">Explicit Loss Minimization</a></li>
</ul>
</li>
@ -455,12 +480,17 @@ In the following example, we show an instantiation of QuaNet that instead uses C
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="Evaluation.html"
title="previous chapter">Evaluation</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="Plotting.html"
title="next chapter">Plotting</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Protocols.html"
title="previous chapter">Protocols</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Model-Selection.html"
title="next chapter">Model Selection</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -477,7 +507,7 @@ In the following example, we show an instantiation of QuaNet that instead uses C
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -492,18 +522,18 @@ In the following example, we show an instantiation of QuaNet that instead uses C
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
<a href="Model-Selection.html" title="Model Selection"
>next</a> |</li>
<li class="right" >
<a href="Evaluation.html" title="Evaluation"
<a href="Protocols.html" title="Protocols"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,21 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Model Selection &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Model Selection &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Plotting" href="Plotting.html" />
<link rel="prev" title="Quantification Methods" href="Methods.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -31,7 +36,13 @@
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
</ul>
</div>
@ -41,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="model-selection">
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this headline"></a></h1>
<section id="model-selection">
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this heading"></a></h1>
<p>As a supervised machine learning task, quantification methods
can strongly depend on a good choice of model hyper-parameters.
The process whereby those hyper-parameters are chosen is
@ -50,8 +61,8 @@ typically known as <em>Model Selection</em>, and typically consists of
testing different settings and picking the one that performed
best in a held-out validation set in terms of any given
evaluation measure.</p>
<div class="section" id="targeting-a-quantification-oriented-loss">
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this headline"></a></h2>
<section id="targeting-a-quantification-oriented-loss">
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this heading"></a></h2>
<p>The task being optimized determines the evaluation protocol,
i.e., the criteria according to which the performance of
any given method for solving is to be assessed.
@ -63,81 +74,91 @@ specifically designed for the task of quantification.</p>
classification, and thus the model selection strategies
customarily adopted in classification have simply been
applied to quantification (see the next section).
It has been argued in <em>Moreo, Alejandro, and Fabrizio Sebastiani.
Re-Assessing the Classify and Count” Quantification Method.
arXiv preprint arXiv:2011.02552 (2020).</em>
It has been argued in <a class="reference external" href="https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6">Moreo, Alejandro, and Fabrizio Sebastiani.
Re-Assessing the Classify and Count” Quantification Method.
ECIR 2021: Advances in Information Retrieval pp 7591.</a>
that specific model selection strategies should
be adopted for quantification. That is, model selection
strategies for quantification should target
quantification-oriented losses and be tested in a variety
of scenarios exhibiting different degrees of prior
probability shift.</p>
<p>The class
<em>qp.model_selection.GridSearchQ</em>
implements a grid-search exploration over the space of
hyper-parameter combinations that evaluates each<br />
combination of hyper-parameters
by means of a given quantification-oriented
<p>The class <em>qp.model_selection.GridSearchQ</em> implements a grid-search exploration over the space of
hyper-parameter combinations that <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluates</a>
each combination of hyper-parameters by means of a given quantification-oriented
error metric (e.g., any of the error functions implemented
in <em>qp.error</em>) and according to the
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation"><em>artificial sampling protocol</em></a>.</p>
<p>The following is an example of model selection for quantification:</p>
in <em>qp.error</em>) and according to a
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">sampling generation protocol</a>.</p>
<p>The following is an example (also included in the examples folder) of model selection for quantification:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">PCC</span>
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">DistributionMatching</span>
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="c1"># set a seed to replicate runs</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="sd">In this example, we show how to perform model selection on a DistributionMatching quantifier.</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;hp&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;N_JOBS&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># explore hyper-parameters in parallel</span>
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;imdb&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="c1"># The model will be returned by the fit method of GridSearchQ.</span>
<span class="c1"># Model selection will be performed with a fixed budget of 1000 evaluations</span>
<span class="c1"># for each hyper-parameter combination. The error to optimize is the MAE for</span>
<span class="c1"># quantification, as evaluated on artificially drawn samples at prevalences </span>
<span class="c1"># covering the entire spectrum on a held-out portion (40%) of the training set.</span>
<span class="c1"># Every combination of hyper-parameters will be evaluated by confronting the</span>
<span class="c1"># quantifier thus configured against a series of samples generated by means</span>
<span class="c1"># of a sample generation protocol. For this example, we will use the</span>
<span class="c1"># artificial-prevalence protocol (APP), that generates samples with prevalence</span>
<span class="c1"># values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).</span>
<span class="c1"># We devote 30% of the dataset for this exploration.</span>
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
<span class="n">protocol</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">)</span>
<span class="c1"># We will explore a classification-dependent hyper-parameter (e.g., the &#39;C&#39;</span>
<span class="c1"># hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter</span>
<span class="c1"># (e.g., the number of bins in a DistributionMatching quantifier.</span>
<span class="c1"># Classifier-dependent hyper-parameters have to be marked with a prefix &quot;classifier__&quot;</span>
<span class="c1"># in order to let the quantifier know this hyper-parameter belongs to its underlying</span>
<span class="c1"># classifier.</span>
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">),</span>
<span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">64</span><span class="p">],</span>
<span class="p">}</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">GridSearchQ</span><span class="p">(</span>
<span class="n">model</span><span class="o">=</span><span class="n">PCC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">()),</span>
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;C&#39;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">10</span><span class="p">),</span> <span class="s1">&#39;class_weight&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;balanced&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span>
<span class="n">eval_budget</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
<span class="n">error</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span>
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set</span>
<span class="n">val_split</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span>
<span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
<span class="n">param_grid</span><span class="o">=</span><span class="n">param_grid</span><span class="p">,</span>
<span class="n">protocol</span><span class="o">=</span><span class="n">protocol</span><span class="p">,</span>
<span class="n">error</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">,</span> <span class="c1"># the error to optimize is the MAE (a quantification-oriented loss)</span>
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set once done</span>
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># show information as the process goes on</span>
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">best_model_</span>
<span class="c1"># evaluation in terms of MAE</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_eval</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span>
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="p">,</span>
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span>
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">101</span><span class="p">,</span>
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">error_metric</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span>
<span class="p">)</span>
<span class="c1"># we use the same evaluation protocol (APP) on the test set</span>
<span class="n">mae_score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">),</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">&#39;mae&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;MAE=</span><span class="si">{</span><span class="n">results</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;MAE=</span><span class="si">{</span><span class="n">mae_score</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>In this example, the system outputs:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>[GridSearchQ]: starting optimization with n_jobs=1
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 0.0001, &#39;class_weight&#39;: &#39;balanced&#39;} got mae score 0.24987
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 0.0001, &#39;class_weight&#39;: None} got mae score 0.48135
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 0.001, &#39;class_weight&#39;: &#39;balanced&#39;} got mae score 0.24866
[...]
[GridSearchQ]: checking hyperparams={&#39;C&#39;: 100000.0, &#39;class_weight&#39;: None} got mae score 0.43676
[GridSearchQ]: optimization finished: best params {&#39;C&#39;: 0.1, &#39;class_weight&#39;: &#39;balanced&#39;} (score=0.19982)
[GridSearchQ]: refitting on the whole development set
model selection ended: best hyper-parameters={&#39;C&#39;: 0.1, &#39;class_weight&#39;: &#39;balanced&#39;}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00&lt;00:00, 5005.54it/s]
MAE=0.20342
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">starting</span> <span class="n">model</span> <span class="n">selection</span> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=-</span><span class="mi">1</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">64</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04021</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.1356</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04286</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2139</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">16</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04888</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2491</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">0.001</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">8</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.05163</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.5372</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">1000.0</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.02445</span> <span class="p">[</span><span class="n">took</span> <span class="mf">2.9056</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">optimization</span> <span class="n">finished</span><span class="p">:</span> <span class="n">best</span> <span class="n">params</span> <span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="p">(</span><span class="n">score</span><span class="o">=</span><span class="mf">0.02234</span><span class="p">)</span> <span class="p">[</span><span class="n">took</span> <span class="mf">7.3114</span><span class="n">s</span><span class="p">]</span>
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">refitting</span> <span class="n">on</span> <span class="n">the</span> <span class="n">whole</span> <span class="n">development</span> <span class="nb">set</span>
<span class="n">model</span> <span class="n">selection</span> <span class="n">ended</span><span class="p">:</span> <span class="n">best</span> <span class="n">hyper</span><span class="o">-</span><span class="n">parameters</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;classifier__C&#39;</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">&#39;nbins&#39;</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span>
<span class="n">MAE</span><span class="o">=</span><span class="mf">0.03102</span>
</pre></div>
</div>
<p>The parameter <em>val_split</em> can alternatively be used to indicate
@ -145,9 +166,9 @@ a validation set (i.e., an instance of <em>LabelledCollection</em>) instead
of a proportion. This could be useful if one wants to have control
on the specific data split to be used across different model selection
experiments.</p>
</div>
<div class="section" id="targeting-a-classification-oriented-loss">
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this headline"></a></h2>
</section>
<section id="targeting-a-classification-oriented-loss">
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this heading"></a></h2>
<p>Optimizing a model for quantification could rather be
computationally costly.
In aggregative methods, one could alternatively try to optimize
@ -161,32 +182,15 @@ The following code illustrates how to do that:</p>
<span class="n">LogisticRegression</span><span class="p">(),</span>
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;C&#39;</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="s1">&#39;class_weight&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&#39;balanced&#39;</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
<span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">PCC</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
</pre></div>
</div>
<p>In this example, the system outputs:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>model selection ended: best hyper-parameters={&#39;C&#39;: 10000.0, &#39;class_weight&#39;: None}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00&lt;00:00, 5379.55it/s]
MAE=0.41734
</pre></div>
</div>
<p>Note that the MAE is worse than the one we obtained when optimizing
for quantification and, indeed, the hyper-parameters found optimal
largely differ between the two selection modalities. The
hyper-parameters C=10000 and class_weight=None have been found
to work well for the specific training prevalence of the HP dataset,
but these hyper-parameters turned out to be suboptimal when the
class prevalences of the test set differs (as is indeed tested
in scenarios of quantification).</p>
<p>This is, however, not always the case, and one could, in practice,
find examples
in which optimizing for classification ends up resulting in a better
quantifier than when optimizing for quantification.
Nonetheless, this is theoretically unlikely to happen.</p>
</div>
</div>
<p>However, this is conceptually flawed, since the model should be
optimized for the task at hand (quantification), and not for a surrogate task (classification),
i.e., the model should be requested to deliver low quantification errors, rather
than low classification errors.</p>
</section>
</section>
<div class="clearer"></div>
@ -195,8 +199,9 @@ Nonetheless, this is theoretically unlikely to happen.</p>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Model Selection</a><ul>
<li><a class="reference internal" href="#targeting-a-quantification-oriented-loss">Targeting a Quantification-oriented loss</a></li>
<li><a class="reference internal" href="#targeting-a-classification-oriented-loss">Targeting a Classification-oriented loss</a></li>
@ -204,6 +209,17 @@ Nonetheless, this is theoretically unlikely to happen.</p>
</li>
</ul>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Methods.html"
title="previous chapter">Quantification Methods</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="Plotting.html"
title="next chapter">Plotting</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -220,7 +236,7 @@ Nonetheless, this is theoretically unlikely to happen.</p>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -234,13 +250,19 @@ Nonetheless, this is theoretically unlikely to happen.</p>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="right" >
<a href="Plotting.html" title="Plotting"
>next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -2,23 +2,26 @@
<!doctype html>
<html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Plotting &#8212; QuaPy 0.1.6 documentation</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Plotting &#8212; QuaPy 0.1.7 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/bizstyle.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="quapy" href="modules.html" />
<link rel="prev" title="Quantification Methods" href="Methods.html" />
<link rel="prev" title="Model Selection" href="Model-Selection.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
@ -37,9 +40,9 @@
<a href="modules.html" title="quapy"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Model-Selection.html" title="Model Selection"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
</ul>
</div>
@ -49,8 +52,8 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="tex2jax_ignore mathjax_ignore section" id="plotting">
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this headline"></a></h1>
<section id="plotting">
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this heading"></a></h1>
<p>The module <em>qp.plot</em> implements some basic plotting functions
that can help analyse the performance of a quantification method.</p>
<p>All plotting functions receive as inputs the outcomes of
@ -91,7 +94,7 @@ quantification methods across different scenarios showcasing
the accuracy of the quantifier in predicting class prevalences
for a wide range of prior distributions. This can easily be
achieved by means of the
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">artificial sampling protocol</a>
that is implemented in QuaPy.</p>
<p>The following code shows how to perform one simple experiment
in which the 4 <em>CC-variants</em>, all equipped with a linear SVM, are
@ -100,6 +103,7 @@ tested across the entire spectrum of class priors (taking 21 splits
of the interval [0,1], i.e., using prevalence steps of 0.05, and
generating 100 random samples at each prevalence).</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
<span class="kn">from</span> <span class="nn">protocol</span> <span class="kn">import</span> <span class="n">APP</span>
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">CC</span><span class="p">,</span> <span class="n">ACC</span><span class="p">,</span> <span class="n">PCC</span><span class="p">,</span> <span class="n">PACC</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
@ -108,28 +112,26 @@ generating 100 random samples at each prevalence).</p>
<span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
<span class="k">def</span> <span class="nf">base_classifier</span><span class="p">():</span>
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">(</span><span class="n">class_weight</span><span class="o">=</span><span class="s1">&#39;balanced&#39;</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">models</span><span class="p">():</span>
<span class="k">yield</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;CC&#39;</span><span class="p">,</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;ACC&#39;</span><span class="p">,</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;PCC&#39;</span><span class="p">,</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="k">yield</span> <span class="s1">&#39;PACC&#39;</span><span class="p">,</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;kindle&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;kindle&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">test</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
<span class="p">)</span>
<span class="k">for</span> <span class="n">method_name</span><span class="p">,</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">)</span>
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method_name</span><span class="p">)</span>
<span class="n">true_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">true_prev</span><span class="p">)</span>
<span class="n">estim_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prev</span><span class="p">)</span>
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">train</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
<span class="k">return</span> <span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span>
@ -137,8 +139,8 @@ generating 100 random samples at each prevalence).</p>
</pre></div>
</div>
<p>the plots that can be generated are explained below.</p>
<div class="section" id="diagonal-plot">
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this headline"></a></h2>
<section id="diagonal-plot">
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this heading"></a></h2>
<p>The <em>diagonal</em> plot shows a very insightful view of the
quantifiers performance. It plots the predicted class
prevalence (in the y-axis) against the true class prevalence
@ -164,9 +166,9 @@ the complete list of arguments in the documentation).</p>
<p>Finally, note how most quantifiers, and specially the “unadjusted”
variants CC and PCC, are strongly biased towards the
prevalence seen during training.</p>
</div>
<div class="section" id="quantification-bias">
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this headline"></a></h2>
</section>
<section id="quantification-bias">
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this heading"></a></h2>
<p>This plot aims at evincing the bias that any quantifier
displays with respect to the training prevalences by
means of <a class="reference external" href="https://en.wikipedia.org/wiki/Box_plot">box plots</a>.
@ -196,21 +198,19 @@ IMDb dataset, and generate the bias plot again.
This example can be run by rewritting the <em>gen_data()</em> function
like this:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;imdb&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">&#39;imdb&#39;</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">CC</span><span class="p">(</span><span class="n">LinearSVC</span><span class="p">())</span>
<span class="n">method_data</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">training_prevalence</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">,</span> <span class="mi">9</span><span class="p">):</span>
<span class="n">training_size</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)</span>
<span class="n">training</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">training_prevalence</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">sample</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">&#39;SAMPLE_SIZE&#39;</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
<span class="p">)</span>
<span class="c1"># method names can contain Latex syntax</span>
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">&#39;CC$_{&#39;</span> <span class="o">+</span> <span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span> <span class="o">*</span> <span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span> <span class="o">+</span> <span class="s1">&#39;\%}$&#39;</span>
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained</span>
<span class="n">train_sample</span> <span class="o">=</span> <span class="n">train</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span><span class="o">-</span><span class="n">training_prevalence</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_sample</span><span class="p">)</span>
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">&#39;CC$_{&#39;</span><span class="o">+</span><span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span><span class="o">*</span><span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">&#39;</span> <span class="o">+</span> <span class="s1">&#39;\%}$&#39;</span>
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">train_sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
<span class="k">return</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">method_data</span><span class="p">)</span>
</pre></div>
@ -237,9 +237,9 @@ and a negative bias (or a tendency to underestimate) in cases of high prevalence
<p><img alt="diag plot on IMDb" src="_images/bin_diag_cc.png" /></p>
<p>showing pretty clearly the dependency of CC on the prior probabilities
of the labeled set it was trained on.</p>
</div>
<div class="section" id="error-by-drift">
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this headline"></a></h2>
</section>
<section id="error-by-drift">
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this heading"></a></h2>
<p>Above discussed plots are useful for analyzing and comparing
the performance of different quantification methods, but are
limited to the binary case. The “error by drift” is a plot
@ -270,8 +270,8 @@ In those cases, however, it is likely that the variances of each
method get higher, to the detriment of the visualization.
We recommend to set <em>show_std=False</em> in those cases
in order to hide the color bands.</p>
</div>
</div>
</section>
</section>
<div class="clearer"></div>
@ -280,8 +280,9 @@ in order to hide the color bands.</p>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<div>
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Plotting</a><ul>
<li><a class="reference internal" href="#diagonal-plot">Diagonal Plot</a></li>
<li><a class="reference internal" href="#quantification-bias">Quantification bias</a></li>
@ -290,12 +291,17 @@ in order to hide the color bands.</p>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="Methods.html"
title="previous chapter">Quantification Methods</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="Model-Selection.html"
title="previous chapter">Model Selection</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="modules.html"
title="next chapter">quapy</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
@ -312,7 +318,7 @@ in order to hide the color bands.</p>
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
@ -330,15 +336,15 @@ in order to hide the color bands.</p>
<a href="modules.html" title="quapy"
>next</a> |</li>
<li class="right" >
<a href="Methods.html" title="Quantification Methods"
<a href="Model-Selection.html" title="Model Selection"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, Alejandro Moreo.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
</body>
</html>

View File

@ -30,7 +30,8 @@ Output the class prevalences (showing 2 digit precision):
[0.17, 0.50, 0.33]
```
One can easily produce new samples at desired class prevalences:
One can easily produce new samples at desired class prevalence values:
```python
sample_size = 10
prev = [0.4, 0.1, 0.5]
@ -63,32 +64,10 @@ for method in methods:
...
```
QuaPy also implements the artificial sampling protocol that produces (via a
Python's generator) a series of _LabelledCollection_ objects with equidistant
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:
```python
for sample in data.artificial_sampling_generator(sample_size=100, n_prevalences=5):
print(F.strprev(sample.prevalence(), prec=2))
```
produces one sampling for each (valid) combination of prevalences originating from
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
that is:
```
[0.00, 0.00, 1.00]
[0.00, 0.25, 0.75]
[0.00, 0.50, 0.50]
[0.00, 0.75, 0.25]
[0.00, 1.00, 0.00]
[0.25, 0.00, 0.75]
...
[1.00, 0.00, 0.00]
```
See the [Evaluation wiki](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) for
further details on how to use the artificial sampling protocol to properly
evaluate a quantification method.
However, generating samples for evaluation purposes is tackled in QuaPy
by means of the evaluation protocols (see the dedicated entries in the Wiki
for [evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) and
[protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)).
## Reviews Datasets
@ -177,6 +156,8 @@ Some details can be found below:
| sst | 3 | 2971 | 1271 | 376132 | [0.261, 0.452, 0.288] | [0.207, 0.481, 0.312] | sparse |
| wa | 3 | 2184 | 936 | 248563 | [0.305, 0.414, 0.281] | [0.282, 0.446, 0.272] | sparse |
| wb | 3 | 4259 | 1823 | 404333 | [0.270, 0.392, 0.337] | [0.274, 0.392, 0.335] | sparse |
## UCI Machine Learning
A set of 32 datasets from the [UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets.php)
@ -272,6 +253,46 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
OS-dependent software manually. Information on how to do it will be printed the first
time the dataset is invoked.
## LeQua Datasets
QuaPy also provides the datasets used for the LeQua competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
are multiclass quantification problems consisting of estimating the class prevalence
values of 28 different merchandise products.
Every task consists of a training set, a set of validation samples (for model selection)
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
(training) and two generation protocols (for validation and test samples), as follows:
```python
training, val_generator, test_generator = fetch_lequa2022(task=task)
```
See the `lequa2022_experiments.py` in the examples folder for further details on how to
carry out experiments using these datasets.
The datasets are downloaded only once, and stored for fast reuse.
Some statistics are summarized below:
| Dataset | classes | train size | validation samples | test samples | docs by sample | type |
|---------|:-------:|:----------:|:------------------:|:------------:|:----------------:|:--------:|
| T1A | 2 | 5000 | 1000 | 5000 | 250 | vector |
| T1B | 28 | 20000 | 1000 | 5000 | 1000 | vector |
| T2A | 2 | 5000 | 1000 | 5000 | 250 | text |
| T2B | 28 | 20000 | 1000 | 5000 | 1000 | text |
For further details on the datasets, we refer to the original
[paper](https://ceur-ws.org/Vol-3180/paper-146.pdf):
```
Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
```
## Adding Custom Datasets
QuaPy provides data loaders for simple formats dealing with
@ -312,12 +333,15 @@ e.g.:
```python
import quapy as qp
train_path = '../my_data/train.dat'
test_path = '../my_data/test.dat'
def my_custom_loader(path):
with open(path, 'rb') as fin:
...
return instances, labels
data = qp.data.Dataset.load(train_path, test_path, my_custom_loader)
```

View File

@ -50,7 +50,7 @@ indicating the value for the smoothing parameter epsilon.
Traditionally, this value is set to 1/(2T) in past literature,
with T the sampling size. One could either pass this value
to the function each time, or to set a QuaPy's environment
variable _SAMPLE_SIZE_ once, and ommit this argument
variable _SAMPLE_SIZE_ once, and omit this argument
thereafter (recommended);
e.g.:
@ -58,7 +58,7 @@ e.g.:
qp.environ['SAMPLE_SIZE'] = 100 # once for all
true_prev = np.asarray([0.5, 0.3, 0.2]) # let's assume 3 classes
estim_prev = np.asarray([0.1, 0.3, 0.6])
error = qp.ae_.mrae(true_prev, estim_prev)
error = qp.error.mrae(true_prev, estim_prev)
print(f'mrae({true_prev}, {estim_prev}) = {error:.3f}')
```
@ -71,162 +71,99 @@ Finally, it is possible to instantiate QuaPy's quantification
error functions from strings using, e.g.:
```python
error_function = qp.ae_.from_name('mse')
error_function = qp.error.from_name('mse')
error = error_function(true_prev, estim_prev)
```
## Evaluation Protocols
QuaPy implements the so-called "artificial sampling protocol",
according to which a test set is used to generate samplings at
desired prevalences of fixed size and covering the full spectrum
of prevalences. This protocol is called "artificial" in contrast
to the "natural prevalence sampling" protocol that,
despite introducing some variability during sampling, approximately
preserves the training class prevalence.
In the artificial sampling procol, the user specifies the number
of (equally distant) points to be generated from the interval [0,1].
For example, if n_prevpoints=11 then, for each class, the prevalences
[0., 0.1, 0.2, ..., 1.] will be used. This means that, for two classes,
the number of different prevalences will be 11 (since, once the prevalence
of one class is determined, the other one is constrained). For 3 classes,
the number of valid combinations can be obtained as 11 + 10 + ... + 1 = 66.
In general, the number of valid combinations that will be produced for a given
value of n_prevpoints can be consulted by invoking
quapy.functional.num_prevalence_combinations, e.g.:
An _evaluation protocol_ is an evaluation procedure that uses
one specific _sample generation procotol_ to genereate many
samples, typically characterized by widely varying amounts of
_shift_ with respect to the original distribution, that are then
used to evaluate the performance of a (trained) quantifier.
These protocols are explained in more detail in a dedicated [entry
in the wiki](Protocols.md). For the moment being, let us assume we already have
chosen and instantiated one specific such protocol, that we here
simply call _prot_. Let also assume our model is called
_quantifier_ and that our evaluatio measure of choice is
_mae_. The evaluation comes down to:
```python
import quapy.functional as F
n_prevpoints = 21
n_classes = 4
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
mae = qp.evaluation.evaluate(quantifier, protocol=prot, error_metric='mae')
print(f'MAE = {mae:.4f}')
```
in this example, n=1771. Note the last argument, n_repeats, that
informs of the number of examples that will be generated for any
valid combination (typical values are, e.g., 1 for a single sample,
or 10 or higher for computing standard deviations of performing statistical
significance tests).
One can instead work the other way around, i.e., one could set a
maximum budged of evaluations and get the number of prevalence points that
will generate a number of evaluations close, but not higher, than
the fixed budget. This can be achieved with the function
quapy.functional.get_nprevpoints_approximation, e.g.:
It is often desirable to evaluate our system using more than one
single evaluatio measure. In this case, it is convenient to generate
a _report_. A report in QuaPy is a dataframe accounting for all the
true prevalence values with their corresponding prevalence values
as estimated by the quantifier, along with the error each has given
rise.
```python
budget = 5000
n_prevpoints = F.get_nprevpoints_approximation(budget, n_classes, n_repeats=1)
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
print(f'by setting n_prevpoints={n_prevpoints} the number of evaluations for {n_classes} classes will be {n}')
```
that will print:
```
by setting n_prevpoints=30 the number of evaluations for 4 classes will be 4960
report = qp.evaluation.evaluation_report(quantifier, protocol=prot, error_metrics=['mae', 'mrae', 'mkld'])
```
The cost of evaluation will depend on the values of _n_prevpoints_, _n_classes_,
and _n_repeats_. Since it might sometimes be cumbersome to control the overall
cost of an experiment having to do with the number of combinations that
will be generated for a particular setting of these arguments (particularly
when _n_classes>2_), evaluation functions
typically allow the user to rather specify an _evaluation budget_, i.e., a maximum
number of samplings to generate. By specifying this argument, one could avoid
specifying _n_prevpoints_, and the value for it that would lead to a closer
number of evaluation budget, without surpassing it, will be automatically set.
The following script shows a full example in which a PACC model relying
on a Logistic Regressor classifier is
tested on the _kindle_ dataset by means of the artificial prevalence
sampling protocol on samples of size 500, in terms of various
evaluation metrics.
````python
import quapy as qp
import quapy.functional as F
from sklearn.linear_model import LogisticRegression
qp.environ['SAMPLE_SIZE'] = 500
dataset = qp.datasets.fetch_reviews('kindle')
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
training = dataset.training
test = dataset.test
lr = LogisticRegression()
pacc = qp.method.aggregative.PACC(lr)
pacc.fit(training)
df = qp.evaluation.artificial_sampling_report(
pacc, # the quantification method
test, # the test set on which the method will be evaluated
sample_size=qp.environ['SAMPLE_SIZE'], #indicates the size of samples to be drawn
n_prevpoints=11, # how many prevalence points will be extracted from the interval [0, 1] for each category
n_repetitions=1, # number of times each prevalence will be used to generate a test sample
n_jobs=-1, # indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)
random_seed=42, # setting a random seed allows to replicate the test samples across runs
error_metrics=['mae', 'mrae', 'mkld'], # specify the evaluation metrics
verbose=True # set to True to show some standard-line outputs
)
````
The resulting report is a pandas' dataframe that can be directly printed.
Here, we set some display options from pandas just to make the output clearer;
note also that the estimated prevalences are shown as strings using the
function strprev function that simply converts a prevalence into a
string representing it, with a fixed decimal precision (default 3):
From a pandas' dataframe, it is straightforward to visualize all the results,
and compute the averaged values, e.g.:
```python
import pandas as pd
pd.set_option('display.expand_frame_repr', False)
pd.set_option("precision", 3)
df['estim-prev'] = df['estim-prev'].map(F.strprev)
print(df)
report['estim-prev'] = report['estim-prev'].map(F.strprev)
print(report)
print('Averaged values:')
print(report.mean())
```
The output should look like:
This will produce an output like:
```
true-prev estim-prev mae mrae mkld
0 [0.0, 1.0] [0.000, 1.000] 0.000 0.000 0.000e+00
1 [0.1, 0.9] [0.091, 0.909] 0.009 0.048 4.426e-04
2 [0.2, 0.8] [0.163, 0.837] 0.037 0.114 4.633e-03
3 [0.3, 0.7] [0.283, 0.717] 0.017 0.041 7.383e-04
4 [0.4, 0.6] [0.366, 0.634] 0.034 0.070 2.412e-03
5 [0.5, 0.5] [0.459, 0.541] 0.041 0.082 3.387e-03
6 [0.6, 0.4] [0.565, 0.435] 0.035 0.073 2.535e-03
7 [0.7, 0.3] [0.654, 0.346] 0.046 0.108 4.701e-03
8 [0.8, 0.2] [0.725, 0.275] 0.075 0.235 1.515e-02
9 [0.9, 0.1] [0.858, 0.142] 0.042 0.229 7.740e-03
10 [1.0, 0.0] [0.945, 0.055] 0.055 27.357 5.219e-02
true-prev estim-prev mae mrae mkld
0 [0.308, 0.692] [0.314, 0.686] 0.005649 0.013182 0.000074
1 [0.896, 0.104] [0.909, 0.091] 0.013145 0.069323 0.000985
2 [0.848, 0.152] [0.809, 0.191] 0.039063 0.149806 0.005175
3 [0.016, 0.984] [0.033, 0.967] 0.017236 0.487529 0.005298
4 [0.728, 0.272] [0.751, 0.249] 0.022769 0.057146 0.001350
... ... ... ... ... ...
4995 [0.72, 0.28] [0.698, 0.302] 0.021752 0.053631 0.001133
4996 [0.868, 0.132] [0.888, 0.112] 0.020490 0.088230 0.001985
4997 [0.292, 0.708] [0.298, 0.702] 0.006149 0.014788 0.000090
4998 [0.24, 0.76] [0.220, 0.780] 0.019950 0.054309 0.001127
4999 [0.948, 0.052] [0.965, 0.035] 0.016941 0.165776 0.003538
[5000 rows x 5 columns]
Averaged values:
mae 0.023588
mrae 0.108779
mkld 0.003631
dtype: float64
Process finished with exit code 0
```
One can get the averaged scores using standard pandas'
functions, i.e.:
Alternatively, we can simply generate all the predictions by:
```python
print(df.mean())
true_prevs, estim_prevs = qp.evaluation.prediction(quantifier, protocol=prot)
```
will produce the following output:
```
true-prev 0.500
mae 0.035
mrae 2.578
mkld 0.009
dtype: float64
```
Other evaluation functions include:
* _artificial_sampling_eval_: that computes the evaluation for a
given evaluation metric, returning the average instead of a dataframe.
* _artificial_sampling_prediction_: that returns two np.arrays containing the
true prevalences and the estimated prevalences.
See the documentation for further details.
All the evaluation functions implement specific optimizations for speeding-up
the evaluation of aggregative quantifiers (i.e., of instances of _AggregativeQuantifier_).
The optimization comes down to generating classification predictions (either crisp or soft)
only once for the entire test set, and then applying the sampling procedure to the
predictions, instead of generating samples of instances and then computing the
classification predictions every time. This is only possible when the protocol
is an instance of _OnLabelledCollectionProtocol_. The optimization is only
carried out when the number of classification predictions thus generated would be
smaller than the number of predictions required for the entire protocol; e.g.,
if the original dataset contains 1M instances, but the protocol is such that it would
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
classification for each sample. This behaviour is indicated by setting
_aggr_speedup="auto"_. Conversely, when indicating _aggr_speedup="force"_ QuaPy will
precompute all the predictions irrespectively of the number of instances and number of samples.
Finally, this can be deactivated by setting _aggr_speedup=False_. Note that this optimization
is not only applied for the final evaluation, but also for the internal evaluations carried
out during _model selection_. Since these are typically many, the heuristic can help reduce the
execution time a lot.

View File

@ -16,12 +16,6 @@ and implement some abstract methods:
@abstractmethod
def quantify(self, instances): ...
@abstractmethod
def set_params(self, **parameters): ...
@abstractmethod
def get_params(self, deep=True): ...
```
The meaning of those functions should be familiar to those
used to work with scikit-learn since the class structure of QuaPy
@ -32,10 +26,10 @@ scikit-learn' structure has not been adopted _as is_ in QuaPy responds to
the fact that scikit-learn's _predict_ function is expected to return
one output for each input element --e.g., a predicted label for each
instance in a sample-- while in quantification the output for a sample
is one single array of class prevalences), while functions _set_params_
and _get_params_ allow a
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
to automate the process of hyperparameter search.
is one single array of class prevalences).
Quantifiers also extend from scikit-learn's `BaseEstimator`, in order
to simplify the use of _set_params_ and _get_params_ used in
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection).
## Aggregative Methods
@ -58,11 +52,11 @@ of _BaseQuantifier.quantify_ is already provided, which looks like:
```python
def quantify(self, instances):
classif_predictions = self.preclassify(instances)
classif_predictions = self.classify(instances)
return self.aggregate(classif_predictions)
```
Aggregative quantifiers are expected to maintain a classifier (which is
accessed through the _@property_ _learner_). This classifier is
accessed through the _@property_ _classifier_). This classifier is
given as input to the quantifier, and can be already fit
on external data (in which case, the _fit_learner_ argument should
be set to False), or be fit by the quantifier's fit (default).
@ -73,13 +67,8 @@ _AggregativeProbabilisticQuantifier(AggregativeQuantifier)_.
The particularity of _probabilistic_ aggregative methods (w.r.t.
non-probabilistic ones), is that the default quantifier is defined
in terms of the posterior probabilities returned by a probabilistic
classifier, and not by the crisp decisions of a hard classifier; i.e.:
```python
def quantify(self, instances):
classif_posteriors = self.posterior_probabilities(instances)
return self.aggregate(classif_posteriors)
```
classifier, and not by the crisp decisions of a hard classifier.
In any case, the interface _classify(instances)_ remains unchanged.
One advantage of _aggregative_ methods (either probabilistic or not)
is that the evaluation according to any sampling procedure (e.g.,
@ -110,9 +99,7 @@ import quapy as qp
import quapy.functional as F
from sklearn.svm import LinearSVC
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
training = dataset.training
test = dataset.test
training, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
# instantiate a classifier learner, in this case a SVM
svm = LinearSVC()
@ -156,11 +143,12 @@ model.fit(training, val_split=5)
```
The following code illustrates the case in which PCC is used:
```python
model = qp.method.aggregative.PCC(svm)
model.fit(training)
estim_prevalence = model.quantify(test.instances)
print('classifier:', model.learner)
print('classifier:', model.classifier)
```
In this case, QuaPy will print:
```
@ -211,14 +199,22 @@ model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```
_New in v0.1.7_: EMQ now accepts two new parameters in the construction method, namely
_exact_train_prev_ which allows to use the true training prevalence as the departing
prevalence estimation (default behaviour), or instead an approximation of it as
suggested by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html)
(by setting _exact_train_prev=False_).
The other parameter is _recalib_ which allows to indicate a calibration method, among those
proposed by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html),
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
See the API documentation for further details.
### Hellinger Distance y (HDy)
The method HDy is described in:
_Implementation of the method based on the Hellinger Distance y (HDy) proposed by
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164._
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
[González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
estimation based on the Hellinger distance. Information Sciences, 218:146164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
It is implemented in _qp.method.aggregative.HDy_ (also accessible
through the allias _qp.method.aggregative.HellingerDistanceY_).
@ -249,30 +245,51 @@ model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```
_New in v0.1.7:_ QuaPy now provides an implementation of the generalized
"Distribution Matching" approaches for multiclass, inspired by the framework
of [Firat (2016)](https://arxiv.org/abs/1606.00868). One can instantiate
a variant of HDy for multiclass quantification as follows:
```python
mutliclassHDy = qp.method.aggregative.DistributionMatching(classifier=LogisticRegression(), divergence='HD', cdf=False)
```
_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
(thanks to _Pablo González_ for the contributions!)
### Threshold Optimization methods
_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423)
and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
### Explicit Loss Minimization
The Explicit Loss Minimization (ELM) represent a family of methods
based on structured output learning, i.e., quantifiers relying on
classifiers that have been optimized targeting a
quantification-oriented evaluation measure.
The original methods are implemented in QuaPy as classify & count (CC)
quantifiers that use Joachim's [SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
as the underlying classifier, properly set to optimize for the desired loss.
In QuaPy, the following methods, all relying on Joachim's
[SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
implementation, are available in _qp.method.aggregative_:
In QuaPy, this can be more achieved by calling the functions:
* SVMQ (SVM-Q) is a quantification method optimizing the metric _Q_ defined
in _Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604._
* SVMKLD (SVM for Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
* _newSVMQ_: returns the quantification method called SVM(Q) that optimizes for the metric _Q_ defined
in [_Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
on reliable classifiers. Pattern Recognition, 48(2):591604._](https://www.sciencedirect.com/science/article/pii/S003132031400291X)
* _newSVMKLD_ and _newSVMNKLD_: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in [_Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
* SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
Optimizing text quantifiers for multivariate loss functions.
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
* SVMAE (SVM for Mean Absolute Error)
* SVMRAE (SVM for Mean Relative Absolute Error)
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._](https://dl.acm.org/doi/abs/10.1145/2700406)
* _newSVMAE_ and _newSVMRAE_: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
(Mean) Relative Absolute Error, as first used by
[_Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23._](https://arxiv.org/abs/2011.02552)
the last two methods (SVMAE and SVMRAE) have been implemented in
the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
QuaPy in order to make available ELM variants for what nowadays
are considered the most well-behaved evaluation metrics in quantification.
@ -306,13 +323,18 @@ currently supports only binary classification.
ELM variants (any binary quantifier in general) can be extended
to operate in single-label scenarios trivially by adopting a
"one-vs-all" strategy (as, e.g., in
_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122_).
In QuaPy this is possible by using the _OneVsAll_ class:
[_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
analysis. Social Network Analysis and Mining, 6(19):122_](https://link.springer.com/article/10.1007/s13278-016-0327-z)).
In QuaPy this is possible by using the _OneVsAll_ class.
There are two ways for instantiating this class, _OneVsAllGeneric_ that works for
any quantifier, and _OneVsAllAggregative_ that is optimized for aggregative quantifiers.
In general, you can simply use the _getOneVsAll_ function and QuaPy will choose
the more convenient of the two.
```python
import quapy as qp
from quapy.method.aggregative import SVMQ, OneVsAll
from quapy.method.aggregative import SVMQ
# load a single-label dataset (this one contains 3 classes)
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
@ -320,11 +342,14 @@ dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
# let qp know where svmperf is
qp.environ['SVMPERF_HOME'] = '../svm_perf_quantification'
model = OneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
model = getOneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```
Check the examples _[explicit_loss_minimization.py](..%2Fexamples%2Fexplicit_loss_minimization.py)_
and [one_vs_all.py](..%2Fexamples%2Fone_vs_all.py) for more details.
## Meta Models
By _meta_ models we mean quantification methods that are defined on top of other
@ -337,12 +362,12 @@ _Meta_ models are implemented in the _qp.method.meta_ module.
QuaPy implements (some of) the variants proposed in:
* _Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
* [_Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100._
* _Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Information Fusion, 34, 87-100._](https://www.sciencedirect.com/science/article/pii/S1566253516300628)
* [_Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15._
Information Fusion, 45, 1-15._](https://www.sciencedirect.com/science/article/pii/S1566253517303652)
The following code shows how to instantiate an Ensemble of 30 _Adjusted Classify & Count_ (ACC)
quantifiers operating with a _Logistic Regressor_ (LR) as the base classifier, and using the
@ -378,10 +403,10 @@ wiki if you want to optimize the hyperparameters of ensemble for classification
QuaPy offers an implementation of QuaNet, a deep learning model presented in:
_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
[_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
A recurrent neural network for sentiment quantification.
In Proceedings of the 27th ACM International Conference on
Information and Knowledge Management (pp. 1775-1778)._
Information and Knowledge Management (pp. 1775-1778)._](https://dl.acm.org/doi/abs/10.1145/3269206.3269287)
This model requires _torch_ to be installed.
QuaNet also requires a classifier that can provide embedded representations
@ -406,7 +431,8 @@ cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
learner = NeuralClassifierTrainer(cnn, device='cuda')
# train QuaNet
model = QuaNet(learner, qp.environ['SAMPLE_SIZE'], device='cuda')
model = QuaNet(learner, device='cuda')
model.fit(dataset.training)
estim_prevalence = model.quantify(dataset.test.instances)
```

View File

@ -22,9 +22,9 @@ Quantification has long been regarded as an add-on of
classification, and thus the model selection strategies
customarily adopted in classification have simply been
applied to quantification (see the next section).
It has been argued in _Moreo, Alejandro, and Fabrizio Sebastiani.
"Re-Assessing the" Classify and Count" Quantification Method."
arXiv preprint arXiv:2011.02552 (2020)._
It has been argued in [Moreo, Alejandro, and Fabrizio Sebastiani.
Re-Assessing the "Classify and Count" Quantification Method.
ECIR 2021: Advances in Information Retrieval pp 7591.](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6)
that specific model selection strategies should
be adopted for quantification. That is, model selection
strategies for quantification should target
@ -32,76 +32,86 @@ quantification-oriented losses and be tested in a variety
of scenarios exhibiting different degrees of prior
probability shift.
The class
_qp.model_selection.GridSearchQ_
implements a grid-search exploration over the space of
hyper-parameter combinations that evaluates each
combination of hyper-parameters
by means of a given quantification-oriented
The class _qp.model_selection.GridSearchQ_ implements a grid-search exploration over the space of
hyper-parameter combinations that [evaluates](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
each combination of hyper-parameters by means of a given quantification-oriented
error metric (e.g., any of the error functions implemented
in _qp.error_) and according to the
[_artificial sampling protocol_](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation).
in _qp.error_) and according to a
[sampling generation protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols).
The following is an example of model selection for quantification:
The following is an example (also included in the examples folder) of model selection for quantification:
```python
import quapy as qp
from quapy.method.aggregative import PCC
from quapy.protocol import APP
from quapy.method.aggregative import DistributionMatching
from sklearn.linear_model import LogisticRegression
import numpy as np
# set a seed to replicate runs
np.random.seed(0)
qp.environ['SAMPLE_SIZE'] = 500
"""
In this example, we show how to perform model selection on a DistributionMatching quantifier.
"""
dataset = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=5)
model = DistributionMatching(LogisticRegression())
qp.environ['SAMPLE_SIZE'] = 100
qp.environ['N_JOBS'] = -1 # explore hyper-parameters in parallel
training, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
# The model will be returned by the fit method of GridSearchQ.
# Model selection will be performed with a fixed budget of 1000 evaluations
# for each hyper-parameter combination. The error to optimize is the MAE for
# quantification, as evaluated on artificially drawn samples at prevalences
# covering the entire spectrum on a held-out portion (40%) of the training set.
# Every combination of hyper-parameters will be evaluated by confronting the
# quantifier thus configured against a series of samples generated by means
# of a sample generation protocol. For this example, we will use the
# artificial-prevalence protocol (APP), that generates samples with prevalence
# values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).
# We devote 30% of the dataset for this exploration.
training, validation = training.split_stratified(train_prop=0.7)
protocol = APP(validation)
# We will explore a classification-dependent hyper-parameter (e.g., the 'C'
# hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter
# (e.g., the number of bins in a DistributionMatching quantifier.
# Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"
# in order to let the quantifier know this hyper-parameter belongs to its underlying
# classifier.
param_grid = {
'classifier__C': np.logspace(-3,3,7),
'nbins': [8, 16, 32, 64],
}
model = qp.model_selection.GridSearchQ(
model=PCC(LogisticRegression()),
param_grid={'C': np.logspace(-4,5,10), 'class_weight': ['balanced', None]},
sample_size=qp.environ['SAMPLE_SIZE'],
eval_budget=1000,
error='mae',
refit=True, # retrain on the whole labelled set
val_split=0.4,
model=model,
param_grid=param_grid,
protocol=protocol,
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
refit=True, # retrain on the whole labelled set once done
verbose=True # show information as the process goes on
).fit(dataset.training)
).fit(training)
print(f'model selection ended: best hyper-parameters={model.best_params_}')
model = model.best_model_
# evaluation in terms of MAE
results = qp.evaluation.artificial_sampling_eval(
model,
dataset.test,
sample_size=qp.environ['SAMPLE_SIZE'],
n_prevpoints=101,
n_repetitions=10,
error_metric='mae'
)
# we use the same evaluation protocol (APP) on the test set
mae_score = qp.evaluation.evaluate(model, protocol=APP(test), error_metric='mae')
print(f'MAE={results:.5f}')
print(f'MAE={mae_score:.5f}')
```
In this example, the system outputs:
```
[GridSearchQ]: starting optimization with n_jobs=1
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': 'balanced'} got mae score 0.24987
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': None} got mae score 0.48135
[GridSearchQ]: checking hyperparams={'C': 0.001, 'class_weight': 'balanced'} got mae score 0.24866
[GridSearchQ]: starting model selection with self.n_jobs =-1
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 64} got mae score 0.04021 [took 1.1356s]
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 32} got mae score 0.04286 [took 1.2139s]
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 16} got mae score 0.04888 [took 1.2491s]
[GridSearchQ]: hyperparams={'classifier__C': 0.001, 'nbins': 8} got mae score 0.05163 [took 1.5372s]
[...]
[GridSearchQ]: checking hyperparams={'C': 100000.0, 'class_weight': None} got mae score 0.43676
[GridSearchQ]: optimization finished: best params {'C': 0.1, 'class_weight': 'balanced'} (score=0.19982)
[GridSearchQ]: hyperparams={'classifier__C': 1000.0, 'nbins': 32} got mae score 0.02445 [took 2.9056s]
[GridSearchQ]: optimization finished: best params {'classifier__C': 100.0, 'nbins': 32} (score=0.02234) [took 7.3114s]
[GridSearchQ]: refitting on the whole development set
model selection ended: best hyper-parameters={'C': 0.1, 'class_weight': 'balanced'}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5005.54it/s]
MAE=0.20342
model selection ended: best hyper-parameters={'classifier__C': 100.0, 'nbins': 32}
MAE=0.03102
```
The parameter _val_split_ can alternatively be used to indicate
@ -121,39 +131,20 @@ quantification literature have opted for this strategy.
In QuaPy, this is achieved by simply instantiating the
classifier learner as a GridSearchCV from scikit-learn.
The following code illustrates how to do that:
The following code illustrates how to do that:
```python
learner = GridSearchCV(
LogisticRegression(),
param_grid={'C': np.logspace(-4, 5, 10), 'class_weight': ['balanced', None]},
cv=5)
model = PCC(learner).fit(dataset.training)
print(f'model selection ended: best hyper-parameters={model.learner.best_params_}')
model = DistributionMatching(learner).fit(dataset.training)
```
In this example, the system outputs:
```
model selection ended: best hyper-parameters={'C': 10000.0, 'class_weight': None}
1010 evaluations will be performed for each combination of hyper-parameters
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5379.55it/s]
MAE=0.41734
```
Note that the MAE is worse than the one we obtained when optimizing
for quantification and, indeed, the hyper-parameters found optimal
largely differ between the two selection modalities. The
hyper-parameters C=10000 and class_weight=None have been found
to work well for the specific training prevalence of the HP dataset,
but these hyper-parameters turned out to be suboptimal when the
class prevalences of the test set differs (as is indeed tested
in scenarios of quantification).
This is, however, not always the case, and one could, in practice,
find examples
in which optimizing for classification ends up resulting in a better
quantifier than when optimizing for quantification.
Nonetheless, this is theoretically unlikely to happen.
However, this is conceptually flawed, since the model should be
optimized for the task at hand (quantification), and not for a surrogate task (classification),
i.e., the model should be requested to deliver low quantification errors, rather
than low classification errors.

View File

@ -43,7 +43,7 @@ quantification methods across different scenarios showcasing
the accuracy of the quantifier in predicting class prevalences
for a wide range of prior distributions. This can easily be
achieved by means of the
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
that is implemented in QuaPy.
The following code shows how to perform one simple experiment
@ -55,6 +55,7 @@ generating 100 random samples at each prevalence).
```python
import quapy as qp
from protocol import APP
from quapy.method.aggregative import CC, ACC, PCC, PACC
from sklearn.svm import LinearSVC
@ -63,28 +64,26 @@ qp.environ['SAMPLE_SIZE'] = 500
def gen_data():
def base_classifier():
return LinearSVC()
return LinearSVC(class_weight='balanced')
def models():
yield CC(base_classifier())
yield ACC(base_classifier())
yield PCC(base_classifier())
yield PACC(base_classifier())
yield 'CC', CC(base_classifier())
yield 'ACC', ACC(base_classifier())
yield 'PCC', PCC(base_classifier())
yield 'PACC', PACC(base_classifier())
data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)
train, test = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5).train_test
method_names, true_prevs, estim_prevs, tr_prevs = [], [], [], []
for model in models():
model.fit(data.training)
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
model, data.test, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
)
for method_name, model in models():
model.fit(train)
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
method_names.append(model.__class__.__name__)
method_names.append(method_name)
true_prevs.append(true_prev)
estim_prevs.append(estim_prev)
tr_prevs.append(data.training.prevalence())
tr_prevs.append(train.prevalence())
return method_names, true_prevs, estim_prevs, tr_prevs
@ -163,21 +162,19 @@ like this:
```python
def gen_data():
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5)
train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
model = CC(LinearSVC())
method_data = []
for training_prevalence in np.linspace(0.1, 0.9, 9):
training_size = 5000
# since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)
training = data.training.sampling(training_size, 1 - training_prevalence)
model.fit(training)
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
model, data.sample, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
)
# method names can contain Latex syntax
method_name = 'CC$_{' + f'{int(100 * training_prevalence)}' + '\%}$'
method_data.append((method_name, true_prev, estim_prev, training.prevalence()))
# since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained
train_sample = train.sampling(training_size, 1-training_prevalence)
model.fit(train_sample)
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
method_name = 'CC$_{'+f'{int(100*training_prevalence)}' + '\%}$'
method_data.append((method_name, true_prev, estim_prev, train_sample.prevalence()))
return zip(*method_data)
```

View File

@ -64,6 +64,7 @@ Features
* 32 UCI Machine Learning datasets.
* 11 Twitter Sentiment datasets.
* 3 Reviews Sentiment datasets.
* 4 tasks from LeQua competition (_new in v0.1.7!_)
* Native supports for binary and single-label scenarios of quantification.
* Model selection functionality targeting quantification-oriented losses.
* Visualization tools for analysing results.
@ -75,8 +76,9 @@ Features
Installation
Datasets
Evaluation
Protocols
Methods
Model Selection
Model-Selection
Plotting
API Developers documentation<modules>

View File

@ -1,27 +1,38 @@
:tocdepth: 2
quapy.classification package
============================
Submodules
----------
quapy.classification.methods module
-----------------------------------
quapy.classification.calibration
--------------------------------
.. versionadded:: 0.1.7
.. automodule:: quapy.classification.calibration
:members:
:undoc-members:
:show-inheritance:
quapy.classification.methods
----------------------------
.. automodule:: quapy.classification.methods
:members:
:undoc-members:
:show-inheritance:
quapy.classification.neural module
----------------------------------
quapy.classification.neural
---------------------------
.. automodule:: quapy.classification.neural
:members:
:undoc-members:
:show-inheritance:
quapy.classification.svmperf module
-----------------------------------
quapy.classification.svmperf
----------------------------
.. automodule:: quapy.classification.svmperf
:members:

View File

@ -1,35 +1,37 @@
:tocdepth: 2
quapy.data package
==================
Submodules
----------
quapy.data.base module
----------------------
quapy.data.base
---------------
.. automodule:: quapy.data.base
:members:
:undoc-members:
:show-inheritance:
quapy.data.datasets module
--------------------------
quapy.data.datasets
-------------------
.. automodule:: quapy.data.datasets
:members:
:undoc-members:
:show-inheritance:
quapy.data.preprocessing module
-------------------------------
quapy.data.preprocessing
------------------------
.. automodule:: quapy.data.preprocessing
:members:
:undoc-members:
:show-inheritance:
quapy.data.reader module
------------------------
quapy.data.reader
-----------------
.. automodule:: quapy.data.reader
:members:

View File

@ -1,43 +1,45 @@
:tocdepth: 2
quapy.method package
====================
Submodules
----------
quapy.method.aggregative module
-------------------------------
quapy.method.aggregative
------------------------
.. automodule:: quapy.method.aggregative
:members:
:undoc-members:
:show-inheritance:
quapy.method.base module
------------------------
quapy.method.base
-----------------
.. automodule:: quapy.method.base
:members:
:undoc-members:
:show-inheritance:
quapy.method.meta module
------------------------
quapy.method.meta
-----------------
.. automodule:: quapy.method.meta
:members:
:undoc-members:
:show-inheritance:
quapy.method.neural module
--------------------------
quapy.method.neural
-------------------
.. automodule:: quapy.method.neural
:members:
:undoc-members:
:show-inheritance:
quapy.method.non\_aggregative module
------------------------------------
quapy.method.non\_aggregative
-----------------------------
.. automodule:: quapy.method.non_aggregative
:members:

View File

@ -1,68 +1,79 @@
:tocdepth: 2
quapy package
=============
Subpackages
-----------
.. toctree::
:maxdepth: 4
quapy.classification
quapy.data
quapy.method
quapy.tests
Submodules
----------
quapy.error module
------------------
quapy.error
-----------
.. automodule:: quapy.error
:members:
:undoc-members:
:show-inheritance:
quapy.evaluation module
-----------------------
quapy.evaluation
----------------
.. automodule:: quapy.evaluation
:members:
:undoc-members:
:show-inheritance:
quapy.functional module
-----------------------
quapy.protocol
--------------
.. versionadded:: 0.1.7
.. automodule:: quapy.protocol
:members:
:undoc-members:
:show-inheritance:
quapy.functional
----------------
.. automodule:: quapy.functional
:members:
:undoc-members:
:show-inheritance:
quapy.model\_selection module
-----------------------------
quapy.model\_selection
----------------------
.. automodule:: quapy.model_selection
:members:
:undoc-members:
:show-inheritance:
quapy.plot module
-----------------
quapy.plot
----------
.. automodule:: quapy.plot
:members:
:undoc-members:
:show-inheritance:
quapy.util module
-----------------
quapy.util
----------
.. automodule:: quapy.util
:members:
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 3
quapy.classification
quapy.data
quapy.method
Module contents
---------------
@ -70,3 +81,4 @@ Module contents
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,37 +0,0 @@
quapy.tests package
===================
Submodules
----------
quapy.tests.test\_base module
-----------------------------
.. automodule:: quapy.tests.test_base
:members:
:undoc-members:
:show-inheritance:
quapy.tests.test\_datasets module
---------------------------------
.. automodule:: quapy.tests.test_datasets
:members:
:undoc-members:
:show-inheritance:
quapy.tests.test\_methods module
--------------------------------
.. automodule:: quapy.tests.test_methods
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: quapy.tests
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,7 +0,0 @@
Getting Started
===============
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python.
Installation
------------
>>> pip install quapy

View File

@ -1 +0,0 @@
.. include:: ../../README.md

View File

@ -1,701 +0,0 @@
@import url("basic.css");
/* -- page layout ----------------------------------------------------------- */
body {
font-family: Georgia, serif;
font-size: 17px;
background-color: #fff;
color: #000;
margin: 0;
padding: 0;
}
div.document {
width: 940px;
margin: 30px auto 0 auto;
}
div.documentwrapper {
float: left;
width: 100%;
}
div.bodywrapper {
margin: 0 0 0 220px;
}
div.sphinxsidebar {
width: 220px;
font-size: 14px;
line-height: 1.5;
}
hr {
border: 1px solid #B1B4B6;
}
div.body {
background-color: #fff;
color: #3E4349;
padding: 0 30px 0 30px;
}
div.body > .section {
text-align: left;
}
div.footer {
width: 940px;
margin: 20px auto 30px auto;
font-size: 14px;
color: #888;
text-align: right;
}
div.footer a {
color: #888;
}
p.caption {
font-family: inherit;
font-size: inherit;
}
div.relations {
display: none;
}
div.sphinxsidebar a {
color: #444;
text-decoration: none;
border-bottom: 1px dotted #999;
}
div.sphinxsidebar a:hover {
border-bottom: 1px solid #999;
}
div.sphinxsidebarwrapper {
padding: 18px 10px;
}
div.sphinxsidebarwrapper p.logo {
padding: 0;
margin: -10px 0 0 0px;
text-align: center;
}
div.sphinxsidebarwrapper h1.logo {
margin-top: -10px;
text-align: center;
margin-bottom: 5px;
text-align: left;
}
div.sphinxsidebarwrapper h1.logo-name {
margin-top: 0px;
}
div.sphinxsidebarwrapper p.blurb {
margin-top: 0;
font-style: normal;
}
div.sphinxsidebar h3,
div.sphinxsidebar h4 {
font-family: Georgia, serif;
color: #444;
font-size: 24px;
font-weight: normal;
margin: 0 0 5px 0;
padding: 0;
}
div.sphinxsidebar h4 {
font-size: 20px;
}
div.sphinxsidebar h3 a {
color: #444;
}
div.sphinxsidebar p.logo a,
div.sphinxsidebar h3 a,
div.sphinxsidebar p.logo a:hover,
div.sphinxsidebar h3 a:hover {
border: none;
}
div.sphinxsidebar p {
color: #555;
margin: 10px 0;
}
div.sphinxsidebar ul {
margin: 10px 0;
padding: 0;
color: #000;
}
div.sphinxsidebar ul li.toctree-l1 > a {
font-size: 120%;
}
div.sphinxsidebar ul li.toctree-l2 > a {
font-size: 110%;
}
div.sphinxsidebar input {
border: 1px solid #CCC;
font-family: Georgia, serif;
font-size: 1em;
}
div.sphinxsidebar hr {
border: none;
height: 1px;
color: #AAA;
background: #AAA;
text-align: left;
margin-left: 0;
width: 50%;
}
div.sphinxsidebar .badge {
border-bottom: none;
}
div.sphinxsidebar .badge:hover {
border-bottom: none;
}
/* To address an issue with donation coming after search */
div.sphinxsidebar h3.donation {
margin-top: 10px;
}
/* -- body styles ----------------------------------------------------------- */
a {
color: #004B6B;
text-decoration: underline;
}
a:hover {
color: #6D4100;
text-decoration: underline;
}
div.body h1,
div.body h2,
div.body h3,
div.body h4,
div.body h5,
div.body h6 {
font-family: Georgia, serif;
font-weight: normal;
margin: 30px 0px 10px 0px;
padding: 0;
}
div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; }
div.body h2 { font-size: 180%; }
div.body h3 { font-size: 150%; }
div.body h4 { font-size: 130%; }
div.body h5 { font-size: 100%; }
div.body h6 { font-size: 100%; }
a.headerlink {
color: #DDD;
padding: 0 4px;
text-decoration: none;
}
a.headerlink:hover {
color: #444;
background: #EAEAEA;
}
div.body p, div.body dd, div.body li {
line-height: 1.4em;
}
div.admonition {
margin: 20px 0px;
padding: 10px 30px;
background-color: #EEE;
border: 1px solid #CCC;
}
div.admonition tt.xref, div.admonition code.xref, div.admonition a tt {
background-color: #FBFBFB;
border-bottom: 1px solid #fafafa;
}
div.admonition p.admonition-title {
font-family: Georgia, serif;
font-weight: normal;
font-size: 24px;
margin: 0 0 10px 0;
padding: 0;
line-height: 1;
}
div.admonition p.last {
margin-bottom: 0;
}
div.highlight {
background-color: #fff;
}
dt:target, .highlight {
background: #FAF3E8;
}
div.warning {
background-color: #FCC;
border: 1px solid #FAA;
}
div.danger {
background-color: #FCC;
border: 1px solid #FAA;
-moz-box-shadow: 2px 2px 4px #D52C2C;
-webkit-box-shadow: 2px 2px 4px #D52C2C;
box-shadow: 2px 2px 4px #D52C2C;
}
div.error {
background-color: #FCC;
border: 1px solid #FAA;
-moz-box-shadow: 2px 2px 4px #D52C2C;
-webkit-box-shadow: 2px 2px 4px #D52C2C;
box-shadow: 2px 2px 4px #D52C2C;
}
div.caution {
background-color: #FCC;
border: 1px solid #FAA;
}
div.attention {
background-color: #FCC;
border: 1px solid #FAA;
}
div.important {
background-color: #EEE;
border: 1px solid #CCC;
}
div.note {
background-color: #EEE;
border: 1px solid #CCC;
}
div.tip {
background-color: #EEE;
border: 1px solid #CCC;
}
div.hint {
background-color: #EEE;
border: 1px solid #CCC;
}
div.seealso {
background-color: #EEE;
border: 1px solid #CCC;
}
div.topic {
background-color: #EEE;
}
p.admonition-title {
display: inline;
}
p.admonition-title:after {
content: ":";
}
pre, tt, code {
font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
font-size: 0.9em;
}
.hll {
background-color: #FFC;
margin: 0 -12px;
padding: 0 12px;
display: block;
}
img.screenshot {
}
tt.descname, tt.descclassname, code.descname, code.descclassname {
font-size: 0.95em;
}
tt.descname, code.descname {
padding-right: 0.08em;
}
img.screenshot {
-moz-box-shadow: 2px 2px 4px #EEE;
-webkit-box-shadow: 2px 2px 4px #EEE;
box-shadow: 2px 2px 4px #EEE;
}
table.docutils {
border: 1px solid #888;
-moz-box-shadow: 2px 2px 4px #EEE;
-webkit-box-shadow: 2px 2px 4px #EEE;
box-shadow: 2px 2px 4px #EEE;
}
table.docutils td, table.docutils th {
border: 1px solid #888;
padding: 0.25em 0.7em;
}
table.field-list, table.footnote {
border: none;
-moz-box-shadow: none;
-webkit-box-shadow: none;
box-shadow: none;
}
table.footnote {
margin: 15px 0;
width: 100%;
border: 1px solid #EEE;
background: #FDFDFD;
font-size: 0.9em;
}
table.footnote + table.footnote {
margin-top: -15px;
border-top: none;
}
table.field-list th {
padding: 0 0.8em 0 0;
}
table.field-list td {
padding: 0;
}
table.field-list p {
margin-bottom: 0.8em;
}
/* Cloned from
* https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68
*/
.field-name {
-moz-hyphens: manual;
-ms-hyphens: manual;
-webkit-hyphens: manual;
hyphens: manual;
}
table.footnote td.label {
width: .1px;
padding: 0.3em 0 0.3em 0.5em;
}
table.footnote td {
padding: 0.3em 0.5em;
}
dl {
margin: 0;
padding: 0;
}
dl dd {
margin-left: 30px;
}
blockquote {
margin: 0 0 0 30px;
padding: 0;
}
ul, ol {
/* Matches the 30px from the narrow-screen "li > ul" selector below */
margin: 10px 0 10px 30px;
padding: 0;
}
pre {
background: #EEE;
padding: 7px 30px;
margin: 15px 0px;
line-height: 1.3em;
}
div.viewcode-block:target {
background: #ffd;
}
dl pre, blockquote pre, li pre {
margin-left: 0;
padding-left: 30px;
}
tt, code {
background-color: #ecf0f3;
color: #222;
/* padding: 1px 2px; */
}
tt.xref, code.xref, a tt {
background-color: #FBFBFB;
border-bottom: 1px solid #fff;
}
a.reference {
text-decoration: none;
border-bottom: 1px dotted #004B6B;
}
/* Don't put an underline on images */
a.image-reference, a.image-reference:hover {
border-bottom: none;
}
a.reference:hover {
border-bottom: 1px solid #6D4100;
}
a.footnote-reference {
text-decoration: none;
font-size: 0.7em;
vertical-align: top;
border-bottom: 1px dotted #004B6B;
}
a.footnote-reference:hover {
border-bottom: 1px solid #6D4100;
}
a:hover tt, a:hover code {
background: #EEE;
}
@media screen and (max-width: 870px) {
div.sphinxsidebar {
display: none;
}
div.document {
width: 100%;
}
div.documentwrapper {
margin-left: 0;
margin-top: 0;
margin-right: 0;
margin-bottom: 0;
}
div.bodywrapper {
margin-top: 0;
margin-right: 0;
margin-bottom: 0;
margin-left: 0;
}
ul {
margin-left: 0;
}
li > ul {
/* Matches the 30px from the "ul, ol" selector above */
margin-left: 30px;
}
.document {
width: auto;
}
.footer {
width: auto;
}
.bodywrapper {
margin: 0;
}
.footer {
width: auto;
}
.github {
display: none;
}
}
@media screen and (max-width: 875px) {
body {
margin: 0;
padding: 20px 30px;
}
div.documentwrapper {
float: none;
background: #fff;
}
div.sphinxsidebar {
display: block;
float: none;
width: 102.5%;
margin: 50px -30px -20px -30px;
padding: 10px 20px;
background: #333;
color: #FFF;
}
div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p,
div.sphinxsidebar h3 a {
color: #fff;
}
div.sphinxsidebar a {
color: #AAA;
}
div.sphinxsidebar p.logo {
display: none;
}
div.document {
width: 100%;
margin: 0;
}
div.footer {
display: none;
}
div.bodywrapper {
margin: 0;
}
div.body {
min-height: 0;
padding: 0;
}
.rtd_doc_footer {
display: none;
}
.document {
width: auto;
}
.footer {
width: auto;
}
.footer {
width: auto;
}
.github {
display: none;
}
}
/* misc. */
.revsys-inline {
display: none!important;
}
/* Make nested-list/multi-paragraph items look better in Releases changelog
* pages. Without this, docutils' magical list fuckery causes inconsistent
* formatting between different release sub-lists.
*/
div#changelog > div.section > ul > li > p:only-child {
margin-bottom: 0;
}
/* Hide fugly table cell borders in ..bibliography:: directive output */
table.docutils.citation, table.docutils.citation td, table.docutils.citation th {
border: none;
/* Below needed in some edge cases; if not applied, bottom shadows appear */
-moz-box-shadow: none;
-webkit-box-shadow: none;
box-shadow: none;
}
/* relbar */
.related {
line-height: 30px;
width: 100%;
font-size: 0.9rem;
}
.related.top {
border-bottom: 1px solid #EEE;
margin-bottom: 20px;
}
.related.bottom {
border-top: 1px solid #EEE;
}
.related ul {
padding: 0;
margin: 0;
list-style: none;
}
.related li {
display: inline;
}
nav#rellinks {
float: right;
}
nav#rellinks li+li:before {
content: "|";
}
nav#breadcrumbs li+li:before {
content: "\00BB";
}
/* Hide certain items when printing */
@media print {
div.related {
display: none;
}
}

View File

@ -4,7 +4,7 @@
*
* Sphinx stylesheet -- basic theme.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
@ -222,7 +222,7 @@ table.modindextable td {
/* -- general body styles --------------------------------------------------- */
div.body {
min-width: 450px;
min-width: 360px;
max-width: 800px;
}
@ -237,16 +237,6 @@ a.headerlink {
visibility: hidden;
}
a.brackets:before,
span.brackets > a:before{
content: "[";
}
a.brackets:after,
span.brackets > a:after {
content: "]";
}
h1:hover > a.headerlink,
h2:hover > a.headerlink,
h3:hover > a.headerlink,
@ -334,13 +324,15 @@ aside.sidebar {
p.sidebar-title {
font-weight: bold;
}
nav.contents,
aside.topic,
div.admonition, div.topic, blockquote {
clear: left;
}
/* -- topics ---------------------------------------------------------------- */
nav.contents,
aside.topic,
div.topic {
border: 1px solid #ccc;
padding: 7px;
@ -379,6 +371,8 @@ div.body p.centered {
div.sidebar > :last-child,
aside.sidebar > :last-child,
nav.contents > :last-child,
aside.topic > :last-child,
div.topic > :last-child,
div.admonition > :last-child {
margin-bottom: 0;
@ -386,6 +380,8 @@ div.admonition > :last-child {
div.sidebar::after,
aside.sidebar::after,
nav.contents::after,
aside.topic::after,
div.topic::after,
div.admonition::after,
blockquote::after {
@ -428,10 +424,6 @@ table.docutils td, table.docutils th {
border-bottom: 1px solid #aaa;
}
table.footnote td, table.footnote th {
border: 0 !important;
}
th {
text-align: left;
padding-right: 5px;
@ -614,20 +606,26 @@ ol.simple p,
ul.simple p {
margin-bottom: 0;
}
dl.footnote > dt,
dl.citation > dt {
aside.footnote > span,
div.citation > span {
float: left;
margin-right: 0.5em;
}
dl.footnote > dd,
dl.citation > dd {
aside.footnote > span:last-of-type,
div.citation > span:last-of-type {
padding-right: 0.5em;
}
aside.footnote > p {
margin-left: 2em;
}
div.citation > p {
margin-left: 4em;
}
aside.footnote > p:last-of-type,
div.citation > p:last-of-type {
margin-bottom: 0em;
}
dl.footnote > dd:after,
dl.citation > dd:after {
aside.footnote > p:last-of-type:after,
div.citation > p:last-of-type:after {
content: "";
clear: both;
}
@ -644,10 +642,6 @@ dl.field-list > dt {
padding-right: 5px;
}
dl.field-list > dt:after {
content: ":";
}
dl.field-list > dd {
padding-left: 0.5em;
margin-top: 0em;
@ -731,8 +725,9 @@ dl.glossary dt {
.classifier:before {
font-style: normal;
margin: 0.5em;
margin: 0 0.5em;
content: ":";
display: inline-block;
}
abbr, acronym {
@ -756,6 +751,7 @@ span.pre {
-ms-hyphens: none;
-webkit-hyphens: none;
hyphens: none;
white-space: nowrap;
}
div[class*="highlight-"] {

View File

@ -294,6 +294,8 @@ div.quotebar {
padding: 2px 7px;
border: 1px solid #ccc;
}
nav.contents,
aside.topic,
div.topic {
background-color: #f8f8f8;

View File

@ -9,33 +9,22 @@
// :copyright: Copyright 2012-2014 by Sphinx team, see AUTHORS.
// :license: BSD, see LICENSE for details.
//
$(document).ready(function(){
if (navigator.userAgent.indexOf('iPhone') > 0 ||
navigator.userAgent.indexOf('Android') > 0) {
$("li.nav-item-0 a").text("Top");
const initialiseBizStyle = () => {
if (navigator.userAgent.indexOf("iPhone") > 0 || navigator.userAgent.indexOf("Android") > 0) {
document.querySelector("li.nav-item-0 a").innerText = "Top"
}
const truncator = item => {if (item.textContent.length > 20) {
item.title = item.innerText
item.innerText = item.innerText.substr(0, 17) + "..."
}
}
document.querySelectorAll("div.related:first ul li:not(.right) a").slice(1).forEach(truncator);
document.querySelectorAll("div.related:last ul li:not(.right) a").slice(1).forEach(truncator);
}
$("div.related:first ul li:not(.right) a").slice(1).each(function(i, item){
if (item.text.length > 20) {
var tmpstr = item.text
$(item).attr("title", tmpstr);
$(item).text(tmpstr.substr(0, 17) + "...");
}
});
$("div.related:last ul li:not(.right) a").slice(1).each(function(i, item){
if (item.text.length > 20) {
var tmpstr = item.text
$(item).attr("title", tmpstr);
$(item).text(tmpstr.substr(0, 17) + "...");
}
});
});
window.addEventListener("resize",
() => (document.querySelector("li.nav-item-0 a").innerText = (window.innerWidth <= 776) ? "Top" : "QuaPy 0.1.7 documentation")
)
$(window).resize(function(){
if ($(window).width() <= 776) {
$("li.nav-item-0 a").text("Top");
}
else {
$("li.nav-item-0 a").text("QuaPy 0.1.6 documentation");
}
});
if (document.readyState !== "loading") initialiseBizStyle()
else document.addEventListener("DOMContentLoaded", initialiseBizStyle)

View File

@ -1 +0,0 @@
/* This file intentionally left blank. */

View File

@ -2,322 +2,155 @@
* doctools.js
* ~~~~~~~~~~~
*
* Sphinx JavaScript utilities for all documentation.
* Base JavaScript utilities for all Sphinx HTML documentation.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
"use strict";
/**
* select a different prefix for underscore
*/
$u = _.noConflict();
const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([
"TEXTAREA",
"INPUT",
"SELECT",
"BUTTON",
]);
/**
* make the code below compatible with browsers without
* an installed firebug like debugger
if (!window.console || !console.firebug) {
var names = ["log", "debug", "info", "warn", "error", "assert", "dir",
"dirxml", "group", "groupEnd", "time", "timeEnd", "count", "trace",
"profile", "profileEnd"];
window.console = {};
for (var i = 0; i < names.length; ++i)
window.console[names[i]] = function() {};
}
*/
/**
* small helper function to urldecode strings
*
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
*/
jQuery.urldecode = function(x) {
if (!x) {
return x
const _ready = (callback) => {
if (document.readyState !== "loading") {
callback();
} else {
document.addEventListener("DOMContentLoaded", callback);
}
return decodeURIComponent(x.replace(/\+/g, ' '));
};
/**
* small helper function to urlencode strings
*/
jQuery.urlencode = encodeURIComponent;
/**
* This function returns the parsed url parameters of the
* current request. Multiple values per key are supported,
* it will always return arrays of strings for the value parts.
*/
jQuery.getQueryParameters = function(s) {
if (typeof s === 'undefined')
s = document.location.search;
var parts = s.substr(s.indexOf('?') + 1).split('&');
var result = {};
for (var i = 0; i < parts.length; i++) {
var tmp = parts[i].split('=', 2);
var key = jQuery.urldecode(tmp[0]);
var value = jQuery.urldecode(tmp[1]);
if (key in result)
result[key].push(value);
else
result[key] = [value];
}
return result;
};
/**
* highlight a given string on a jquery object by wrapping it in
* span elements with the given class name.
*/
jQuery.fn.highlightText = function(text, className) {
function highlight(node, addItems) {
if (node.nodeType === 3) {
var val = node.nodeValue;
var pos = val.toLowerCase().indexOf(text);
if (pos >= 0 &&
!jQuery(node.parentNode).hasClass(className) &&
!jQuery(node.parentNode).hasClass("nohighlight")) {
var span;
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
if (isInSVG) {
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
} else {
span = document.createElement("span");
span.className = className;
}
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
document.createTextNode(val.substr(pos + text.length)),
node.nextSibling));
node.nodeValue = val.substr(0, pos);
if (isInSVG) {
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
var bbox = node.parentElement.getBBox();
rect.x.baseVal.value = bbox.x;
rect.y.baseVal.value = bbox.y;
rect.width.baseVal.value = bbox.width;
rect.height.baseVal.value = bbox.height;
rect.setAttribute('class', className);
addItems.push({
"parent": node.parentNode,
"target": rect});
}
}
}
else if (!jQuery(node).is("button, select, textarea")) {
jQuery.each(node.childNodes, function() {
highlight(this, addItems);
});
}
}
var addItems = [];
var result = this.each(function() {
highlight(this, addItems);
});
for (var i = 0; i < addItems.length; ++i) {
jQuery(addItems[i].parent).before(addItems[i].target);
}
return result;
};
/*
* backward compatibility for jQuery.browser
* This will be supported until firefox bug is fixed.
*/
if (!jQuery.browser) {
jQuery.uaMatch = function(ua) {
ua = ua.toLowerCase();
var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
/(msie) ([\w.]+)/.exec(ua) ||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
[];
return {
browser: match[ 1 ] || "",
version: match[ 2 ] || "0"
};
};
jQuery.browser = {};
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
}
/**
* Small JavaScript module for the documentation.
*/
var Documentation = {
init : function() {
this.fixFirefoxAnchorBug();
this.highlightSearchWords();
this.initIndexTable();
if (DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) {
this.initOnKeyListeners();
}
const Documentation = {
init: () => {
Documentation.initDomainIndexTable();
Documentation.initOnKeyListeners();
},
/**
* i18n support
*/
TRANSLATIONS : {},
PLURAL_EXPR : function(n) { return n === 1 ? 0 : 1; },
LOCALE : 'unknown',
TRANSLATIONS: {},
PLURAL_EXPR: (n) => (n === 1 ? 0 : 1),
LOCALE: "unknown",
// gettext and ngettext don't access this so that the functions
// can safely bound to a different name (_ = Documentation.gettext)
gettext : function(string) {
var translated = Documentation.TRANSLATIONS[string];
if (typeof translated === 'undefined')
return string;
return (typeof translated === 'string') ? translated : translated[0];
gettext: (string) => {
const translated = Documentation.TRANSLATIONS[string];
switch (typeof translated) {
case "undefined":
return string; // no translation
case "string":
return translated; // translation exists
default:
return translated[0]; // (singular, plural) translation tuple exists
}
},
ngettext : function(singular, plural, n) {
var translated = Documentation.TRANSLATIONS[singular];
if (typeof translated === 'undefined')
return (n == 1) ? singular : plural;
return translated[Documentation.PLURALEXPR(n)];
ngettext: (singular, plural, n) => {
const translated = Documentation.TRANSLATIONS[singular];
if (typeof translated !== "undefined")
return translated[Documentation.PLURAL_EXPR(n)];
return n === 1 ? singular : plural;
},
addTranslations : function(catalog) {
for (var key in catalog.messages)
this.TRANSLATIONS[key] = catalog.messages[key];
this.PLURAL_EXPR = new Function('n', 'return +(' + catalog.plural_expr + ')');
this.LOCALE = catalog.locale;
addTranslations: (catalog) => {
Object.assign(Documentation.TRANSLATIONS, catalog.messages);
Documentation.PLURAL_EXPR = new Function(
"n",
`return (${catalog.plural_expr})`
);
Documentation.LOCALE = catalog.locale;
},
/**
* add context elements like header anchor links
* helper function to focus on search bar
*/
addContextElements : function() {
$('div[id] > :header:first').each(function() {
$('<a class="headerlink">\u00B6</a>').
attr('href', '#' + this.id).
attr('title', _('Permalink to this headline')).
appendTo(this);
});
$('dt[id]').each(function() {
$('<a class="headerlink">\u00B6</a>').
attr('href', '#' + this.id).
attr('title', _('Permalink to this definition')).
appendTo(this);
});
focusSearchBar: () => {
document.querySelectorAll("input[name=q]")[0]?.focus();
},
/**
* workaround a firefox stupidity
* see: https://bugzilla.mozilla.org/show_bug.cgi?id=645075
* Initialise the domain index toggle buttons
*/
fixFirefoxAnchorBug : function() {
if (document.location.hash && $.browser.mozilla)
window.setTimeout(function() {
document.location.href += '';
}, 10);
},
/**
* highlight the search words provided in the url in the text
*/
highlightSearchWords : function() {
var params = $.getQueryParameters();
var terms = (params.highlight) ? params.highlight[0].split(/\s+/) : [];
if (terms.length) {
var body = $('div.body');
if (!body.length) {
body = $('body');
initDomainIndexTable: () => {
const toggler = (el) => {
const idNumber = el.id.substr(7);
const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`);
if (el.src.substr(-9) === "minus.png") {
el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`;
toggledRows.forEach((el) => (el.style.display = "none"));
} else {
el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`;
toggledRows.forEach((el) => (el.style.display = ""));
}
window.setTimeout(function() {
$.each(terms, function() {
body.highlightText(this.toLowerCase(), 'highlighted');
});
}, 10);
$('<p class="highlight-link"><a href="javascript:Documentation.' +
'hideSearchWords()">' + _('Hide Search Matches') + '</a></p>')
.appendTo($('#searchbox'));
}
};
const togglerElements = document.querySelectorAll("img.toggler");
togglerElements.forEach((el) =>
el.addEventListener("click", (event) => toggler(event.currentTarget))
);
togglerElements.forEach((el) => (el.style.display = ""));
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler);
},
/**
* init the domain index toggle buttons
*/
initIndexTable : function() {
var togglers = $('img.toggler').click(function() {
var src = $(this).attr('src');
var idnum = $(this).attr('id').substr(7);
$('tr.cg-' + idnum).toggle();
if (src.substr(-9) === 'minus.png')
$(this).attr('src', src.substr(0, src.length-9) + 'plus.png');
else
$(this).attr('src', src.substr(0, src.length-8) + 'minus.png');
}).css('display', '');
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) {
togglers.click();
}
},
initOnKeyListeners: () => {
// only install a listener if it is really needed
if (
!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS &&
!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS
)
return;
/**
* helper function to hide the search marks again
*/
hideSearchWords : function() {
$('#searchbox .highlight-link').fadeOut(300);
$('span.highlighted').removeClass('highlighted');
},
document.addEventListener("keydown", (event) => {
// bail for input elements
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
// bail with special keys
if (event.altKey || event.ctrlKey || event.metaKey) return;
/**
* make the url absolute
*/
makeURL : function(relativeURL) {
return DOCUMENTATION_OPTIONS.URL_ROOT + '/' + relativeURL;
},
if (!event.shiftKey) {
switch (event.key) {
case "ArrowLeft":
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
/**
* get the current relative url
*/
getCurrentURL : function() {
var path = document.location.pathname;
var parts = path.split(/\//);
$.each(DOCUMENTATION_OPTIONS.URL_ROOT.split(/\//), function() {
if (this === '..')
parts.pop();
});
var url = parts.join('/');
return path.substring(url.lastIndexOf('/') + 1, path.length - 1);
},
initOnKeyListeners: function() {
$(document).keydown(function(event) {
var activeElementType = document.activeElement.tagName;
// don't navigate when in search box, textarea, dropdown or button
if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT'
&& activeElementType !== 'BUTTON' && !event.altKey && !event.ctrlKey && !event.metaKey
&& !event.shiftKey) {
switch (event.keyCode) {
case 37: // left
var prevHref = $('link[rel="prev"]').prop('href');
if (prevHref) {
window.location.href = prevHref;
return false;
const prevLink = document.querySelector('link[rel="prev"]');
if (prevLink && prevLink.href) {
window.location.href = prevLink.href;
event.preventDefault();
}
break;
case 39: // right
var nextHref = $('link[rel="next"]').prop('href');
if (nextHref) {
window.location.href = nextHref;
return false;
case "ArrowRight":
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
const nextLink = document.querySelector('link[rel="next"]');
if (nextLink && nextLink.href) {
window.location.href = nextLink.href;
event.preventDefault();
}
break;
}
}
// some keyboard layouts may need Shift to get /
switch (event.key) {
case "/":
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break;
Documentation.focusSearchBar();
event.preventDefault();
}
});
}
},
};
// quick alias for translations
_ = Documentation.gettext;
const _ = Documentation.gettext;
$(document).ready(function() {
Documentation.init();
});
_ready(Documentation.init);

View File

@ -1,12 +1,14 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '0.1.6',
LANGUAGE: 'None',
VERSION: '0.1.7',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
FILE_SUFFIX: '.html',
LINK_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt',
NAVIGATION_WITH_KEYS: false
NAVIGATION_WITH_KEYS: false,
SHOW_SEARCH_SUMMARY: true,
ENABLE_SEARCH_SHORTCUTS: true,
};

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -5,12 +5,12 @@
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
var stopwords = ["a","and","are","as","at","be","but","by","for","if","in","into","is","it","near","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"];
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
/* Non-minified version is copied as a separate JS file, is available */
@ -197,101 +197,3 @@ var Stemmer = function() {
}
}
var splitChars = (function() {
var result = {};
var singles = [96, 180, 187, 191, 215, 247, 749, 885, 903, 907, 909, 930, 1014, 1648,
1748, 1809, 2416, 2473, 2481, 2526, 2601, 2609, 2612, 2615, 2653, 2702,
2706, 2729, 2737, 2740, 2857, 2865, 2868, 2910, 2928, 2948, 2961, 2971,
2973, 3085, 3089, 3113, 3124, 3213, 3217, 3241, 3252, 3295, 3341, 3345,
3369, 3506, 3516, 3633, 3715, 3721, 3736, 3744, 3748, 3750, 3756, 3761,
3781, 3912, 4239, 4347, 4681, 4695, 4697, 4745, 4785, 4799, 4801, 4823,
4881, 5760, 5901, 5997, 6313, 7405, 8024, 8026, 8028, 8030, 8117, 8125,
8133, 8181, 8468, 8485, 8487, 8489, 8494, 8527, 11311, 11359, 11687, 11695,
11703, 11711, 11719, 11727, 11735, 12448, 12539, 43010, 43014, 43019, 43587,
43696, 43713, 64286, 64297, 64311, 64317, 64319, 64322, 64325, 65141];
var i, j, start, end;
for (i = 0; i < singles.length; i++) {
result[singles[i]] = true;
}
var ranges = [[0, 47], [58, 64], [91, 94], [123, 169], [171, 177], [182, 184], [706, 709],
[722, 735], [741, 747], [751, 879], [888, 889], [894, 901], [1154, 1161],
[1318, 1328], [1367, 1368], [1370, 1376], [1416, 1487], [1515, 1519], [1523, 1568],
[1611, 1631], [1642, 1645], [1750, 1764], [1767, 1773], [1789, 1790], [1792, 1807],
[1840, 1868], [1958, 1968], [1970, 1983], [2027, 2035], [2038, 2041], [2043, 2047],
[2070, 2073], [2075, 2083], [2085, 2087], [2089, 2307], [2362, 2364], [2366, 2383],
[2385, 2391], [2402, 2405], [2419, 2424], [2432, 2436], [2445, 2446], [2449, 2450],
[2483, 2485], [2490, 2492], [2494, 2509], [2511, 2523], [2530, 2533], [2546, 2547],
[2554, 2564], [2571, 2574], [2577, 2578], [2618, 2648], [2655, 2661], [2672, 2673],
[2677, 2692], [2746, 2748], [2750, 2767], [2769, 2783], [2786, 2789], [2800, 2820],
[2829, 2830], [2833, 2834], [2874, 2876], [2878, 2907], [2914, 2917], [2930, 2946],
[2955, 2957], [2966, 2968], [2976, 2978], [2981, 2983], [2987, 2989], [3002, 3023],
[3025, 3045], [3059, 3076], [3130, 3132], [3134, 3159], [3162, 3167], [3170, 3173],
[3184, 3191], [3199, 3204], [3258, 3260], [3262, 3293], [3298, 3301], [3312, 3332],
[3386, 3388], [3390, 3423], [3426, 3429], [3446, 3449], [3456, 3460], [3479, 3481],
[3518, 3519], [3527, 3584], [3636, 3647], [3655, 3663], [3674, 3712], [3717, 3718],
[3723, 3724], [3726, 3731], [3752, 3753], [3764, 3772], [3774, 3775], [3783, 3791],
[3802, 3803], [3806, 3839], [3841, 3871], [3892, 3903], [3949, 3975], [3980, 4095],
[4139, 4158], [4170, 4175], [4182, 4185], [4190, 4192], [4194, 4196], [4199, 4205],
[4209, 4212], [4226, 4237], [4250, 4255], [4294, 4303], [4349, 4351], [4686, 4687],
[4702, 4703], [4750, 4751], [4790, 4791], [4806, 4807], [4886, 4887], [4955, 4968],
[4989, 4991], [5008, 5023], [5109, 5120], [5741, 5742], [5787, 5791], [5867, 5869],
[5873, 5887], [5906, 5919], [5938, 5951], [5970, 5983], [6001, 6015], [6068, 6102],
[6104, 6107], [6109, 6111], [6122, 6127], [6138, 6159], [6170, 6175], [6264, 6271],
[6315, 6319], [6390, 6399], [6429, 6469], [6510, 6511], [6517, 6527], [6572, 6592],
[6600, 6607], [6619, 6655], [6679, 6687], [6741, 6783], [6794, 6799], [6810, 6822],
[6824, 6916], [6964, 6980], [6988, 6991], [7002, 7042], [7073, 7085], [7098, 7167],
[7204, 7231], [7242, 7244], [7294, 7400], [7410, 7423], [7616, 7679], [7958, 7959],
[7966, 7967], [8006, 8007], [8014, 8015], [8062, 8063], [8127, 8129], [8141, 8143],
[8148, 8149], [8156, 8159], [8173, 8177], [8189, 8303], [8306, 8307], [8314, 8318],
[8330, 8335], [8341, 8449], [8451, 8454], [8456, 8457], [8470, 8472], [8478, 8483],
[8506, 8507], [8512, 8516], [8522, 8525], [8586, 9311], [9372, 9449], [9472, 10101],
[10132, 11263], [11493, 11498], [11503, 11516], [11518, 11519], [11558, 11567],
[11622, 11630], [11632, 11647], [11671, 11679], [11743, 11822], [11824, 12292],
[12296, 12320], [12330, 12336], [12342, 12343], [12349, 12352], [12439, 12444],
[12544, 12548], [12590, 12592], [12687, 12689], [12694, 12703], [12728, 12783],
[12800, 12831], [12842, 12880], [12896, 12927], [12938, 12976], [12992, 13311],
[19894, 19967], [40908, 40959], [42125, 42191], [42238, 42239], [42509, 42511],
[42540, 42559], [42592, 42593], [42607, 42622], [42648, 42655], [42736, 42774],
[42784, 42785], [42889, 42890], [42893, 43002], [43043, 43055], [43062, 43071],
[43124, 43137], [43188, 43215], [43226, 43249], [43256, 43258], [43260, 43263],
[43302, 43311], [43335, 43359], [43389, 43395], [43443, 43470], [43482, 43519],
[43561, 43583], [43596, 43599], [43610, 43615], [43639, 43641], [43643, 43647],
[43698, 43700], [43703, 43704], [43710, 43711], [43715, 43738], [43742, 43967],
[44003, 44015], [44026, 44031], [55204, 55215], [55239, 55242], [55292, 55295],
[57344, 63743], [64046, 64047], [64110, 64111], [64218, 64255], [64263, 64274],
[64280, 64284], [64434, 64466], [64830, 64847], [64912, 64913], [64968, 65007],
[65020, 65135], [65277, 65295], [65306, 65312], [65339, 65344], [65371, 65381],
[65471, 65473], [65480, 65481], [65488, 65489], [65496, 65497]];
for (i = 0; i < ranges.length; i++) {
start = ranges[i][0];
end = ranges[i][1];
for (j = start; j <= end; j++) {
result[j] = true;
}
}
return result;
})();
function splitQuery(query) {
var result = [];
var start = -1;
for (var i = 0; i < query.length; i++) {
if (splitChars[query.charCodeAt(i)]) {
if (start !== -1) {
result.push(query.slice(start, i));
start = -1;
}
} else if (start === -1) {
start = i;
}
}
if (start !== -1) {
result.push(query.slice(start));
}
return result;
}

View File

@ -4,22 +4,24 @@
*
* Sphinx JavaScript utilities for the full-text search.
*
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
"use strict";
if (!Scorer) {
/**
* Simple result scoring code.
*/
/**
* Simple result scoring code.
*/
if (typeof Scorer === "undefined") {
var Scorer = {
// Implement the following function to further tweak the score for each result
// The function takes a result array [filename, title, anchor, descr, score]
// The function takes a result array [docname, title, anchor, descr, score, filename]
// and returns the new score.
/*
score: function(result) {
return result[4];
score: result => {
const [docname, title, anchor, descr, score, filename] = result
return score
},
*/
@ -28,9 +30,11 @@ if (!Scorer) {
// or matches in the last dotted part of the object name
objPartialMatch: 6,
// Additive scores depending on the priority of the object
objPrio: {0: 15, // used to be importantResults
1: 5, // used to be objectResults
2: -5}, // used to be unimportantResults
objPrio: {
0: 15, // used to be importantResults
1: 5, // used to be objectResults
2: -5, // used to be unimportantResults
},
// Used when the priority is not in the mapping.
objPrioDefault: 0,
@ -39,455 +43,495 @@ if (!Scorer) {
partialTitle: 7,
// query found in terms
term: 5,
partialTerm: 2
partialTerm: 2,
};
}
if (!splitQuery) {
function splitQuery(query) {
return query.split(/\s+/);
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
/**
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
*/
const _escapeRegExp = (string) =>
string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
const _displayItem = (item, searchTerms) => {
const docBuilder = DOCUMENTATION_OPTIONS.BUILDER;
const docUrlRoot = DOCUMENTATION_OPTIONS.URL_ROOT;
const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX;
const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX;
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const [docName, title, anchor, descr, score, _filename] = item;
let listItem = document.createElement("li");
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
// dirhtml builder
let dirname = docName + "/";
if (dirname.match(/\/index\/$/))
dirname = dirname.substring(0, dirname.length - 6);
else if (dirname === "index/") dirname = "";
requestUrl = docUrlRoot + dirname;
linkUrl = requestUrl;
} else {
// normal html builders
requestUrl = docUrlRoot + docName + docFileSuffix;
linkUrl = docName + docLinkSuffix;
}
let linkEl = listItem.appendChild(document.createElement("a"));
linkEl.href = linkUrl + anchor;
linkEl.dataset.score = score;
linkEl.innerHTML = title;
if (descr)
listItem.appendChild(document.createElement("span")).innerHTML =
" (" + descr + ")";
else if (showSearchSummary)
fetch(requestUrl)
.then((responseData) => responseData.text())
.then((data) => {
if (data)
listItem.appendChild(
Search.makeSearchSummary(data, searchTerms)
);
});
Search.output.appendChild(listItem);
};
const _finishSearch = (resultCount) => {
Search.stopPulse();
Search.title.innerText = _("Search Results");
if (!resultCount)
Search.status.innerText = Documentation.gettext(
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
Search.status.innerText = _(
`Search finished, found ${resultCount} page(s) matching the search query.`
);
};
const _displayNextItem = (
results,
resultCount,
searchTerms
) => {
// results left, load the summary and display it
// this is intended to be dynamic (don't sub resultsCount)
if (results.length) {
_displayItem(results.pop(), searchTerms);
setTimeout(
() => _displayNextItem(results, resultCount, searchTerms),
5
);
}
// search finished, update title and status message
else _finishSearch(resultCount);
};
/**
* Default splitQuery function. Can be overridden in ``sphinx.search`` with a
* custom function per language.
*
* The regular expression works by splitting the string on consecutive characters
* that are not Unicode letters, numbers, underscores, or emoji characters.
* This is the same as ``\W+`` in Python, preserving the surrogate pair area.
*/
if (typeof splitQuery === "undefined") {
var splitQuery = (query) => query
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu)
.filter(term => term) // remove remaining empty strings
}
/**
* Search Module
*/
var Search = {
const Search = {
_index: null,
_queued_query: null,
_pulse_status: -1,
_index : null,
_queued_query : null,
_pulse_status : -1,
htmlToText : function(htmlString) {
var virtualDocument = document.implementation.createHTMLDocument('virtual');
var htmlElement = $(htmlString, virtualDocument);
htmlElement.find('.headerlink').remove();
docContent = htmlElement.find('[role=main]')[0];
if(docContent === undefined) {
console.warn("Content block not found. Sphinx search tries to obtain it " +
"via '[role=main]'. Could you check your theme or template.");
return "";
}
return docContent.textContent || docContent.innerText;
htmlToText: (htmlString) => {
const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html');
htmlElement.querySelectorAll(".headerlink").forEach((el) => { el.remove() });
const docContent = htmlElement.querySelector('[role="main"]');
if (docContent !== undefined) return docContent.textContent;
console.warn(
"Content block not found. Sphinx search tries to obtain it via '[role=main]'. Could you check your theme or template."
);
return "";
},
init : function() {
var params = $.getQueryParameters();
if (params.q) {
var query = params.q[0];
$('input[name="q"]')[0].value = query;
this.performSearch(query);
}
init: () => {
const query = new URLSearchParams(window.location.search).get("q");
document
.querySelectorAll('input[name="q"]')
.forEach((el) => (el.value = query));
if (query) Search.performSearch(query);
},
loadIndex : function(url) {
$.ajax({type: "GET", url: url, data: null,
dataType: "script", cache: true,
complete: function(jqxhr, textstatus) {
if (textstatus != "success") {
document.getElementById("searchindexloader").src = url;
}
}});
},
loadIndex: (url) =>
(document.body.appendChild(document.createElement("script")).src = url),
setIndex : function(index) {
var q;
this._index = index;
if ((q = this._queued_query) !== null) {
this._queued_query = null;
Search.query(q);
setIndex: (index) => {
Search._index = index;
if (Search._queued_query !== null) {
const query = Search._queued_query;
Search._queued_query = null;
Search.query(query);
}
},
hasIndex : function() {
return this._index !== null;
},
hasIndex: () => Search._index !== null,
deferQuery : function(query) {
this._queued_query = query;
},
deferQuery: (query) => (Search._queued_query = query),
stopPulse : function() {
this._pulse_status = 0;
},
stopPulse: () => (Search._pulse_status = -1),
startPulse : function() {
if (this._pulse_status >= 0)
return;
function pulse() {
var i;
startPulse: () => {
if (Search._pulse_status >= 0) return;
const pulse = () => {
Search._pulse_status = (Search._pulse_status + 1) % 4;
var dotString = '';
for (i = 0; i < Search._pulse_status; i++)
dotString += '.';
Search.dots.text(dotString);
if (Search._pulse_status > -1)
window.setTimeout(pulse, 500);
}
Search.dots.innerText = ".".repeat(Search._pulse_status);
if (Search._pulse_status >= 0) window.setTimeout(pulse, 500);
};
pulse();
},
/**
* perform a search for something (or wait until index is loaded)
*/
performSearch : function(query) {
performSearch: (query) => {
// create the required interface elements
this.out = $('#search-results');
this.title = $('<h2>' + _('Searching') + '</h2>').appendTo(this.out);
this.dots = $('<span></span>').appendTo(this.title);
this.status = $('<p class="search-summary">&nbsp;</p>').appendTo(this.out);
this.output = $('<ul class="search"/>').appendTo(this.out);
const searchText = document.createElement("h2");
searchText.textContent = _("Searching");
const searchSummary = document.createElement("p");
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
searchList.classList.add("search");
$('#search-progress').text(_('Preparing search...'));
this.startPulse();
const out = document.getElementById("search-results");
Search.title = out.appendChild(searchText);
Search.dots = Search.title.appendChild(document.createElement("span"));
Search.status = out.appendChild(searchSummary);
Search.output = out.appendChild(searchList);
const searchProgress = document.getElementById("search-progress");
// Some themes don't use the search progress node
if (searchProgress) {
searchProgress.innerText = _("Preparing search...");
}
Search.startPulse();
// index already loaded, the browser was quick!
if (this.hasIndex())
this.query(query);
else
this.deferQuery(query);
if (Search.hasIndex()) Search.query(query);
else Search.deferQuery(query);
},
/**
* execute search (requires search index to be loaded)
*/
query : function(query) {
var i;
query: (query) => {
const filenames = Search._index.filenames;
const docNames = Search._index.docnames;
const titles = Search._index.titles;
const allTitles = Search._index.alltitles;
const indexEntries = Search._index.indexentries;
// stem the searchterms and add them to the correct list
var stemmer = new Stemmer();
var searchterms = [];
var excluded = [];
var hlterms = [];
var tmp = splitQuery(query);
var objectterms = [];
for (i = 0; i < tmp.length; i++) {
if (tmp[i] !== "") {
objectterms.push(tmp[i].toLowerCase());
}
// stem the search terms and add them to the correct list
const stemmer = new Stemmer();
const searchTerms = new Set();
const excludedTerms = new Set();
const highlightTerms = new Set();
const objectTerms = new Set(splitQuery(query.toLowerCase().trim()));
splitQuery(query.trim()).forEach((queryTerm) => {
const queryTermLower = queryTerm.toLowerCase();
// maybe skip this "word"
// stopwords array is from language_data.js
if (
stopwords.indexOf(queryTermLower) !== -1 ||
queryTerm.match(/^\d+$/)
)
return;
if ($u.indexOf(stopwords, tmp[i].toLowerCase()) != -1 || tmp[i] === "") {
// skip this "word"
continue;
}
// stem the word
var word = stemmer.stemWord(tmp[i].toLowerCase());
// prevent stemmer from cutting word smaller than two chars
if(word.length < 3 && tmp[i].length >= 3) {
word = tmp[i];
}
var toAppend;
let word = stemmer.stemWord(queryTermLower);
// select the correct list
if (word[0] == '-') {
toAppend = excluded;
word = word.substr(1);
}
if (word[0] === "-") excludedTerms.add(word.substr(1));
else {
toAppend = searchterms;
hlterms.push(tmp[i].toLowerCase());
searchTerms.add(word);
highlightTerms.add(queryTermLower);
}
// only add if not already in the list
if (!$u.contains(toAppend, word))
toAppend.push(word);
});
if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js
localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" "))
}
var highlightstring = '?highlight=' + $.urlencode(hlterms.join(" "));
// console.debug('SEARCH: searching for:');
// console.info('required: ', searchterms);
// console.info('excluded: ', excluded);
// console.debug("SEARCH: searching for:");
// console.info("required: ", [...searchTerms]);
// console.info("excluded: ", [...excludedTerms]);
// prepare search
var terms = this._index.terms;
var titleterms = this._index.titleterms;
// array of [docname, title, anchor, descr, score, filename]
let results = [];
_removeChildren(document.getElementById("search-progress"));
// array of [filename, title, anchor, descr, score]
var results = [];
$('#search-progress').empty();
const queryLower = query.toLowerCase();
for (const [title, foundTitles] of Object.entries(allTitles)) {
if (title.toLowerCase().includes(queryLower) && (queryLower.length >= title.length/2)) {
for (const [file, id] of foundTitles) {
let score = Math.round(100 * queryLower.length / title.length)
results.push([
docNames[file],
titles[file] !== title ? `${titles[file]} > ${title}` : title,
id !== null ? "#" + id : "",
null,
score,
filenames[file],
]);
}
}
}
// search for explicit entries in index directives
for (const [entry, foundEntries] of Object.entries(indexEntries)) {
if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) {
for (const [file, id] of foundEntries) {
let score = Math.round(100 * queryLower.length / entry.length)
results.push([
docNames[file],
titles[file],
id ? "#" + id : "",
null,
score,
filenames[file],
]);
}
}
}
// lookup as object
for (i = 0; i < objectterms.length; i++) {
var others = [].concat(objectterms.slice(0, i),
objectterms.slice(i+1, objectterms.length));
results = results.concat(this.performObjectSearch(objectterms[i], others));
}
objectTerms.forEach((term) =>
results.push(...Search.performObjectSearch(term, objectTerms))
);
// lookup as search terms in fulltext
results = results.concat(this.performTermsSearch(searchterms, excluded, terms, titleterms));
results.push(...Search.performTermsSearch(searchTerms, excludedTerms));
// let the scorer override scores with a custom scoring function
if (Scorer.score) {
for (i = 0; i < results.length; i++)
results[i][4] = Scorer.score(results[i]);
}
if (Scorer.score) results.forEach((item) => (item[4] = Scorer.score(item)));
// now sort the results by score (in opposite order of appearance, since the
// display function below uses pop() to retrieve items) and then
// alphabetically
results.sort(function(a, b) {
var left = a[4];
var right = b[4];
if (left > right) {
return 1;
<