forked from moreo/QuaPy
merging from protocols (aka v0.1.7)
This commit is contained in:
commit
e0718b6e1b
31
README.md
31
README.md
|
|
@ -13,6 +13,7 @@ for facilitating the analysis and interpretation of the experimental results.
|
|||
|
||||
### Last updates:
|
||||
|
||||
* Version 0.1.7 is released! major changes can be consulted [here](quapy/FCHANGE_LOG.txt).
|
||||
* A detailed documentation is now available [here](https://hlt-isti.github.io/QuaPy/)
|
||||
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)
|
||||
|
||||
|
|
@ -73,13 +74,14 @@ See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
|
|||
## Features
|
||||
|
||||
* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
|
||||
quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
|
||||
* Versatile functionality for performing evaluation based on artificial sampling protocols.
|
||||
quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
|
||||
* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
|
||||
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
|
||||
* Datasets frequently used in quantification (textual and numeric), including:
|
||||
* 32 UCI Machine Learning datasets.
|
||||
* 11 Twitter quantification-by-sentiment datasets.
|
||||
* 3 product reviews quantification-by-sentiment datasets.
|
||||
* 4 tasks from LeQua competition (_new in v0.1.7!_)
|
||||
* Native support for binary and single-label multiclass quantification scenarios.
|
||||
* Model selection functionality that minimizes quantification-oriented loss functions.
|
||||
* Visualization tools for analysing the experimental results.
|
||||
|
|
@ -94,29 +96,6 @@ quantification methods based on structured output learning, HDy, QuaNet, and qua
|
|||
* pandas, xlrd
|
||||
* matplotlib
|
||||
|
||||
## SVM-perf with quantification-oriented losses
|
||||
In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
|
||||
SVM(AE), or SVM(RAE), you have to first download the
|
||||
[svmperf](http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
|
||||
package, apply the patch
|
||||
[svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch), and compile the sources.
|
||||
The script [prepare_svmperf.sh](prepare_svmperf.sh) does all the job. Simply run:
|
||||
|
||||
```
|
||||
./prepare_svmperf.sh
|
||||
```
|
||||
|
||||
The resulting directory [svm_perf_quantification](./svm_perf_quantification) contains the
|
||||
patched version of _svmperf_ with quantification-oriented losses.
|
||||
|
||||
The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is an extension of the patch made available by
|
||||
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
|
||||
that allows SVMperf to optimize for
|
||||
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
|
||||
and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
|
||||
This patch extends the above one by also allowing SVMperf to optimize for
|
||||
_AE_ and _RAE_.
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
|
|
@ -127,6 +106,8 @@ are provided:
|
|||
|
||||
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
|
||||
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
|
||||
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
|
||||
* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)
|
||||
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
|
||||
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)
|
||||
|
|
|
|||
15
TODO.txt
15
TODO.txt
|
|
@ -1,7 +1,20 @@
|
|||
sample_size should not be mandatory when qp.environ['SAMPLE_SIZE'] has been specified
|
||||
clean all the cumbersome methods that have to be implemented for new quantifiers (e.g., n_classes_ prop, etc.)
|
||||
make truly parallel the GridSearchQ
|
||||
make more examples in the "examples" directory
|
||||
merge with master, because I had to fix some problems with QuaNet due to an issue notified via GitHub!
|
||||
added cross_val_predict in qp.model_selection (i.e., a cross_val_predict for quantification) --would be nice to have
|
||||
it parallelized
|
||||
|
||||
check the OneVsAll module(s)
|
||||
|
||||
check the set_params de neural.py, because the separation of estimator__<param> is not implemented; see also
|
||||
__check_params_colision
|
||||
|
||||
HDy can be customized so that the number of bins is specified, instead of explored within the fit method
|
||||
|
||||
Packaging:
|
||||
==========================================
|
||||
Documentation with sphinx
|
||||
Document methods with paper references
|
||||
unit-tests
|
||||
clean wiki_examples!
|
||||
|
|
|
|||
|
|
@ -2,23 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Datasets — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Datasets — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="quapy" href="modules.html" />
|
||||
<link rel="prev" title="Getting Started" href="readme.html" />
|
||||
<link rel="next" title="Evaluation" href="Evaluation.html" />
|
||||
<link rel="prev" title="Installation" href="Installation.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
|
@ -34,12 +37,12 @@
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="modules.html" title="quapy"
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="readme.html" title="Getting Started"
|
||||
<a href="Installation.html" title="Installation"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="datasets">
|
||||
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="datasets">
|
||||
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this heading">¶</a></h1>
|
||||
<p>QuaPy makes available several datasets that have been used in
|
||||
quantification literature, as well as an interface to allow
|
||||
anyone import their custom datasets.</p>
|
||||
|
|
@ -77,7 +80,7 @@ Take a look at the following code:</p>
|
|||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.17</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.33</span><span class="p">]</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>One can easily produce new samples at desired class prevalences:</p>
|
||||
<p>One can easily produce new samples at desired class prevalence values:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">sample_size</span> <span class="o">=</span> <span class="mi">10</span>
|
||||
<span class="n">prev</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span>
|
||||
<span class="n">sample</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prev</span><span class="p">)</span>
|
||||
|
|
@ -106,31 +109,12 @@ the indexes, that can then be used to generate the sample:</p>
|
|||
<span class="o">...</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>QuaPy also implements the artificial sampling protocol that produces (via a
|
||||
Python’s generator) a series of <em>LabelledCollection</em> objects with equidistant
|
||||
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">artificial_sampling_generator</span><span class="p">(</span><span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">(),</span> <span class="n">prec</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>produces one sampling for each (valid) combination of prevalences originating from
|
||||
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
|
||||
that is:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
|
||||
<span class="o">...</span>
|
||||
<span class="p">[</span><span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>See the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">Evaluation wiki</a> for
|
||||
further details on how to use the artificial sampling protocol to properly
|
||||
evaluate a quantification method.</p>
|
||||
<div class="section" id="reviews-datasets">
|
||||
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this headline">¶</a></h2>
|
||||
<p>However, generating samples for evaluation purposes is tackled in QuaPy
|
||||
by means of the evaluation protocols (see the dedicated entries in the Wiki
|
||||
for <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluation</a> and
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">protocols</a>).</p>
|
||||
<section id="reviews-datasets">
|
||||
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>Three datasets of reviews about Kindle devices, Harry Potter’s series, and
|
||||
the well-known IMDb movie reviews can be fetched using a unified interface.
|
||||
For example:</p>
|
||||
|
|
@ -150,47 +134,47 @@ For example:</p>
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>Some statistics of the fhe available datasets are summarized below:</p>
|
||||
<table class="colwidths-auto docutils align-default">
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="text-align:center head"><p>classes</p></th>
|
||||
<th class="text-align:center head"><p>train size</p></th>
|
||||
<th class="text-align:center head"><p>test size</p></th>
|
||||
<th class="text-align:center head"><p>train prev</p></th>
|
||||
<th class="text-align:center head"><p>test prev</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>train size</p></th>
|
||||
<th class="head text-center"><p>test size</p></th>
|
||||
<th class="head text-center"><p>train prev</p></th>
|
||||
<th class="head text-center"><p>test prev</p></th>
|
||||
<th class="head"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>hp</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>9533</p></td>
|
||||
<td class="text-align:center"><p>18399</p></td>
|
||||
<td class="text-align:center"><p>[0.018, 0.982]</p></td>
|
||||
<td class="text-align:center"><p>[0.065, 0.935]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>9533</p></td>
|
||||
<td class="text-center"><p>18399</p></td>
|
||||
<td class="text-center"><p>[0.018, 0.982]</p></td>
|
||||
<td class="text-center"><p>[0.065, 0.935]</p></td>
|
||||
<td><p>text</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>kindle</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>3821</p></td>
|
||||
<td class="text-align:center"><p>21591</p></td>
|
||||
<td class="text-align:center"><p>[0.081, 0.919]</p></td>
|
||||
<td class="text-align:center"><p>[0.063, 0.937]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>3821</p></td>
|
||||
<td class="text-center"><p>21591</p></td>
|
||||
<td class="text-center"><p>[0.081, 0.919]</p></td>
|
||||
<td class="text-center"><p>[0.063, 0.937]</p></td>
|
||||
<td><p>text</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>imdb</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>25000</p></td>
|
||||
<td class="text-align:center"><p>25000</p></td>
|
||||
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
|
||||
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>25000</p></td>
|
||||
<td class="text-center"><p>25000</p></td>
|
||||
<td class="text-center"><p>[0.500, 0.500]</p></td>
|
||||
<td class="text-center"><p>[0.500, 0.500]</p></td>
|
||||
<td><p>text</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<div class="section" id="twitter-sentiment-datasets">
|
||||
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="twitter-sentiment-datasets">
|
||||
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>11 Twitter datasets for sentiment analysis.
|
||||
Text is not accessible, and the documents were made available
|
||||
in tf-idf format. Each dataset presents two splits: a train/val
|
||||
|
|
@ -221,123 +205,123 @@ The lists of the Twitter dataset’s ids can be consulted in:</p>
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>Some details can be found below:</p>
|
||||
<table class="colwidths-auto docutils align-default">
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="text-align:center head"><p>classes</p></th>
|
||||
<th class="text-align:center head"><p>train size</p></th>
|
||||
<th class="text-align:center head"><p>test size</p></th>
|
||||
<th class="text-align:center head"><p>features</p></th>
|
||||
<th class="text-align:center head"><p>train prev</p></th>
|
||||
<th class="text-align:center head"><p>test prev</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>train size</p></th>
|
||||
<th class="head text-center"><p>test size</p></th>
|
||||
<th class="head text-center"><p>features</p></th>
|
||||
<th class="head text-center"><p>train prev</p></th>
|
||||
<th class="head text-center"><p>test prev</p></th>
|
||||
<th class="head"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>gasp</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>8788</p></td>
|
||||
<td class="text-align:center"><p>3765</p></td>
|
||||
<td class="text-align:center"><p>694582</p></td>
|
||||
<td class="text-align:center"><p>[0.421, 0.496, 0.082]</p></td>
|
||||
<td class="text-align:center"><p>[0.407, 0.507, 0.086]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>8788</p></td>
|
||||
<td class="text-center"><p>3765</p></td>
|
||||
<td class="text-center"><p>694582</p></td>
|
||||
<td class="text-center"><p>[0.421, 0.496, 0.082]</p></td>
|
||||
<td class="text-center"><p>[0.407, 0.507, 0.086]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>hcr</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>1594</p></td>
|
||||
<td class="text-align:center"><p>798</p></td>
|
||||
<td class="text-align:center"><p>222046</p></td>
|
||||
<td class="text-align:center"><p>[0.546, 0.211, 0.243]</p></td>
|
||||
<td class="text-align:center"><p>[0.640, 0.167, 0.193]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>1594</p></td>
|
||||
<td class="text-center"><p>798</p></td>
|
||||
<td class="text-center"><p>222046</p></td>
|
||||
<td class="text-center"><p>[0.546, 0.211, 0.243]</p></td>
|
||||
<td class="text-center"><p>[0.640, 0.167, 0.193]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>omd</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>1839</p></td>
|
||||
<td class="text-align:center"><p>787</p></td>
|
||||
<td class="text-align:center"><p>199151</p></td>
|
||||
<td class="text-align:center"><p>[0.463, 0.271, 0.266]</p></td>
|
||||
<td class="text-align:center"><p>[0.437, 0.283, 0.280]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>1839</p></td>
|
||||
<td class="text-center"><p>787</p></td>
|
||||
<td class="text-center"><p>199151</p></td>
|
||||
<td class="text-center"><p>[0.463, 0.271, 0.266]</p></td>
|
||||
<td class="text-center"><p>[0.437, 0.283, 0.280]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>sanders</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>2155</p></td>
|
||||
<td class="text-align:center"><p>923</p></td>
|
||||
<td class="text-align:center"><p>229399</p></td>
|
||||
<td class="text-align:center"><p>[0.161, 0.691, 0.148]</p></td>
|
||||
<td class="text-align:center"><p>[0.164, 0.688, 0.148]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>2155</p></td>
|
||||
<td class="text-center"><p>923</p></td>
|
||||
<td class="text-center"><p>229399</p></td>
|
||||
<td class="text-center"><p>[0.161, 0.691, 0.148]</p></td>
|
||||
<td class="text-center"><p>[0.164, 0.688, 0.148]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>semeval13</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>11338</p></td>
|
||||
<td class="text-align:center"><p>3813</p></td>
|
||||
<td class="text-align:center"><p>1215742</p></td>
|
||||
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-align:center"><p>[0.158, 0.430, 0.412]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>11338</p></td>
|
||||
<td class="text-center"><p>3813</p></td>
|
||||
<td class="text-center"><p>1215742</p></td>
|
||||
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-center"><p>[0.158, 0.430, 0.412]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>semeval14</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>11338</p></td>
|
||||
<td class="text-align:center"><p>1853</p></td>
|
||||
<td class="text-align:center"><p>1215742</p></td>
|
||||
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-align:center"><p>[0.109, 0.361, 0.530]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>11338</p></td>
|
||||
<td class="text-center"><p>1853</p></td>
|
||||
<td class="text-center"><p>1215742</p></td>
|
||||
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-center"><p>[0.109, 0.361, 0.530]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>semeval15</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>11338</p></td>
|
||||
<td class="text-align:center"><p>2390</p></td>
|
||||
<td class="text-align:center"><p>1215742</p></td>
|
||||
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-align:center"><p>[0.153, 0.413, 0.434]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>11338</p></td>
|
||||
<td class="text-center"><p>2390</p></td>
|
||||
<td class="text-center"><p>1215742</p></td>
|
||||
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-center"><p>[0.153, 0.413, 0.434]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>semeval16</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>8000</p></td>
|
||||
<td class="text-align:center"><p>2000</p></td>
|
||||
<td class="text-align:center"><p>889504</p></td>
|
||||
<td class="text-align:center"><p>[0.157, 0.351, 0.492]</p></td>
|
||||
<td class="text-align:center"><p>[0.163, 0.341, 0.497]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>8000</p></td>
|
||||
<td class="text-center"><p>2000</p></td>
|
||||
<td class="text-center"><p>889504</p></td>
|
||||
<td class="text-center"><p>[0.157, 0.351, 0.492]</p></td>
|
||||
<td class="text-center"><p>[0.163, 0.341, 0.497]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>sst</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>2971</p></td>
|
||||
<td class="text-align:center"><p>1271</p></td>
|
||||
<td class="text-align:center"><p>376132</p></td>
|
||||
<td class="text-align:center"><p>[0.261, 0.452, 0.288]</p></td>
|
||||
<td class="text-align:center"><p>[0.207, 0.481, 0.312]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>2971</p></td>
|
||||
<td class="text-center"><p>1271</p></td>
|
||||
<td class="text-center"><p>376132</p></td>
|
||||
<td class="text-center"><p>[0.261, 0.452, 0.288]</p></td>
|
||||
<td class="text-center"><p>[0.207, 0.481, 0.312]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wa</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>2184</p></td>
|
||||
<td class="text-align:center"><p>936</p></td>
|
||||
<td class="text-align:center"><p>248563</p></td>
|
||||
<td class="text-align:center"><p>[0.305, 0.414, 0.281]</p></td>
|
||||
<td class="text-align:center"><p>[0.282, 0.446, 0.272]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>2184</p></td>
|
||||
<td class="text-center"><p>936</p></td>
|
||||
<td class="text-center"><p>248563</p></td>
|
||||
<td class="text-center"><p>[0.305, 0.414, 0.281]</p></td>
|
||||
<td class="text-center"><p>[0.282, 0.446, 0.272]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wb</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>4259</p></td>
|
||||
<td class="text-align:center"><p>1823</p></td>
|
||||
<td class="text-align:center"><p>404333</p></td>
|
||||
<td class="text-align:center"><p>[0.270, 0.392, 0.337]</p></td>
|
||||
<td class="text-align:center"><p>[0.274, 0.392, 0.335]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>4259</p></td>
|
||||
<td class="text-center"><p>1823</p></td>
|
||||
<td class="text-center"><p>404333</p></td>
|
||||
<td class="text-center"><p>[0.270, 0.392, 0.337]</p></td>
|
||||
<td class="text-center"><p>[0.274, 0.392, 0.335]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<div class="section" id="uci-machine-learning">
|
||||
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="uci-machine-learning">
|
||||
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this heading">¶</a></h2>
|
||||
<p>A set of 32 datasets from the <a class="reference external" href="https://archive.ics.uci.edu/ml/datasets.php">UCI Machine Learning repository</a>
|
||||
used in:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Pérez</span><span class="o">-</span><span class="n">Gállego</span><span class="p">,</span> <span class="n">P</span><span class="o">.</span><span class="p">,</span> <span class="n">Quevedo</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">R</span><span class="o">.</span><span class="p">,</span> <span class="o">&</span> <span class="k">del</span> <span class="n">Coz</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">J</span><span class="o">.</span> <span class="p">(</span><span class="mi">2017</span><span class="p">)</span><span class="o">.</span>
|
||||
|
|
@ -371,252 +355,252 @@ training+test dataset at a time, following a kFCV protocol:</p>
|
|||
<p>Above code will allow to conduct a 2x5FCV evaluation on the “yeast” dataset.</p>
|
||||
<p>All datasets come in numerical form (dense matrices); some statistics
|
||||
are summarized below.</p>
|
||||
<table class="colwidths-auto docutils align-default">
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="text-align:center head"><p>classes</p></th>
|
||||
<th class="text-align:center head"><p>instances</p></th>
|
||||
<th class="text-align:center head"><p>features</p></th>
|
||||
<th class="text-align:center head"><p>prev</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>instances</p></th>
|
||||
<th class="head text-center"><p>features</p></th>
|
||||
<th class="head text-center"><p>prev</p></th>
|
||||
<th class="head"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>acute.a</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>120</p></td>
|
||||
<td class="text-align:center"><p>6</p></td>
|
||||
<td class="text-align:center"><p>[0.508, 0.492]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>120</p></td>
|
||||
<td class="text-center"><p>6</p></td>
|
||||
<td class="text-center"><p>[0.508, 0.492]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>acute.b</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>120</p></td>
|
||||
<td class="text-align:center"><p>6</p></td>
|
||||
<td class="text-align:center"><p>[0.583, 0.417]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>120</p></td>
|
||||
<td class="text-center"><p>6</p></td>
|
||||
<td class="text-center"><p>[0.583, 0.417]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>balance.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>625</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>625</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.539, 0.461]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>balance.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>625</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.922, 0.078]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>625</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.922, 0.078]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>balance.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>625</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>625</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.539, 0.461]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>breast-cancer</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>683</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.350, 0.650]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>683</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.350, 0.650]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>cmc.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1473</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.573, 0.427]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1473</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.573, 0.427]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>cmc.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1473</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.774, 0.226]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1473</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.774, 0.226]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>cmc.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1473</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1473</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.653, 0.347]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>ctg.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>2126</p></td>
|
||||
<td class="text-align:center"><p>22</p></td>
|
||||
<td class="text-align:center"><p>[0.222, 0.778]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>2126</p></td>
|
||||
<td class="text-center"><p>22</p></td>
|
||||
<td class="text-center"><p>[0.222, 0.778]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>ctg.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>2126</p></td>
|
||||
<td class="text-align:center"><p>22</p></td>
|
||||
<td class="text-align:center"><p>[0.861, 0.139]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>2126</p></td>
|
||||
<td class="text-center"><p>22</p></td>
|
||||
<td class="text-center"><p>[0.861, 0.139]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>ctg.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>2126</p></td>
|
||||
<td class="text-align:center"><p>22</p></td>
|
||||
<td class="text-align:center"><p>[0.917, 0.083]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>2126</p></td>
|
||||
<td class="text-center"><p>22</p></td>
|
||||
<td class="text-center"><p>[0.917, 0.083]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>german</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1000</p></td>
|
||||
<td class="text-align:center"><p>24</p></td>
|
||||
<td class="text-align:center"><p>[0.300, 0.700]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>24</p></td>
|
||||
<td class="text-center"><p>[0.300, 0.700]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>haberman</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>306</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>[0.735, 0.265]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>306</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>[0.735, 0.265]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>ionosphere</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>351</p></td>
|
||||
<td class="text-align:center"><p>34</p></td>
|
||||
<td class="text-align:center"><p>[0.641, 0.359]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>351</p></td>
|
||||
<td class="text-center"><p>34</p></td>
|
||||
<td class="text-center"><p>[0.641, 0.359]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>iris.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>150</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>150</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.667, 0.333]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>iris.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>150</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>150</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.667, 0.333]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>iris.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>150</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>150</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.667, 0.333]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>mammographic</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>830</p></td>
|
||||
<td class="text-align:center"><p>5</p></td>
|
||||
<td class="text-align:center"><p>[0.514, 0.486]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>830</p></td>
|
||||
<td class="text-center"><p>5</p></td>
|
||||
<td class="text-center"><p>[0.514, 0.486]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>pageblocks.5</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>5473</p></td>
|
||||
<td class="text-align:center"><p>10</p></td>
|
||||
<td class="text-align:center"><p>[0.979, 0.021]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>5473</p></td>
|
||||
<td class="text-center"><p>10</p></td>
|
||||
<td class="text-center"><p>[0.979, 0.021]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>semeion</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1593</p></td>
|
||||
<td class="text-align:center"><p>256</p></td>
|
||||
<td class="text-align:center"><p>[0.901, 0.099]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1593</p></td>
|
||||
<td class="text-center"><p>256</p></td>
|
||||
<td class="text-center"><p>[0.901, 0.099]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>sonar</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>208</p></td>
|
||||
<td class="text-align:center"><p>60</p></td>
|
||||
<td class="text-align:center"><p>[0.534, 0.466]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>208</p></td>
|
||||
<td class="text-center"><p>60</p></td>
|
||||
<td class="text-center"><p>[0.534, 0.466]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>spambase</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>4601</p></td>
|
||||
<td class="text-align:center"><p>57</p></td>
|
||||
<td class="text-align:center"><p>[0.606, 0.394]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>4601</p></td>
|
||||
<td class="text-center"><p>57</p></td>
|
||||
<td class="text-center"><p>[0.606, 0.394]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>spectf</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>267</p></td>
|
||||
<td class="text-align:center"><p>44</p></td>
|
||||
<td class="text-align:center"><p>[0.794, 0.206]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>267</p></td>
|
||||
<td class="text-center"><p>44</p></td>
|
||||
<td class="text-center"><p>[0.794, 0.206]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>tictactoe</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>958</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>958</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.653, 0.347]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>transfusion</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>748</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.762, 0.238]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>748</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.762, 0.238]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wdbc</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>569</p></td>
|
||||
<td class="text-align:center"><p>30</p></td>
|
||||
<td class="text-align:center"><p>[0.627, 0.373]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>569</p></td>
|
||||
<td class="text-center"><p>30</p></td>
|
||||
<td class="text-center"><p>[0.627, 0.373]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wine.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>178</p></td>
|
||||
<td class="text-align:center"><p>13</p></td>
|
||||
<td class="text-align:center"><p>[0.669, 0.331]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>178</p></td>
|
||||
<td class="text-center"><p>13</p></td>
|
||||
<td class="text-center"><p>[0.669, 0.331]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wine.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>178</p></td>
|
||||
<td class="text-align:center"><p>13</p></td>
|
||||
<td class="text-align:center"><p>[0.601, 0.399]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>178</p></td>
|
||||
<td class="text-center"><p>13</p></td>
|
||||
<td class="text-center"><p>[0.601, 0.399]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wine.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>178</p></td>
|
||||
<td class="text-align:center"><p>13</p></td>
|
||||
<td class="text-align:center"><p>[0.730, 0.270]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>178</p></td>
|
||||
<td class="text-center"><p>13</p></td>
|
||||
<td class="text-center"><p>[0.730, 0.270]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wine-q-red</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1599</p></td>
|
||||
<td class="text-align:center"><p>11</p></td>
|
||||
<td class="text-align:center"><p>[0.465, 0.535]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1599</p></td>
|
||||
<td class="text-center"><p>11</p></td>
|
||||
<td class="text-center"><p>[0.465, 0.535]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wine-q-white</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>4898</p></td>
|
||||
<td class="text-align:center"><p>11</p></td>
|
||||
<td class="text-align:center"><p>[0.335, 0.665]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>4898</p></td>
|
||||
<td class="text-center"><p>11</p></td>
|
||||
<td class="text-center"><p>[0.335, 0.665]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>yeast</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1484</p></td>
|
||||
<td class="text-align:center"><p>8</p></td>
|
||||
<td class="text-align:center"><p>[0.711, 0.289]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1484</p></td>
|
||||
<td class="text-center"><p>8</p></td>
|
||||
<td class="text-center"><p>[0.711, 0.289]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="section" id="issues">
|
||||
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="issues">
|
||||
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this heading">¶</a></h3>
|
||||
<p>All datasets will be downloaded automatically the first time they are requested, and
|
||||
stored in the <em>quapy_data</em> folder for faster further reuse.
|
||||
However, some datasets require special actions that at the moment are not fully
|
||||
|
|
@ -631,10 +615,82 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
|
|||
OS-dependent software manually. Information on how to do it will be printed the first
|
||||
time the dataset is invoked.</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
<section id="lequa-datasets">
|
||||
<h2>LeQua Datasets<a class="headerlink" href="#lequa-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>QuaPy also provides the datasets used for the LeQua competition.
|
||||
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
|
||||
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
|
||||
raw documents instead.
|
||||
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
|
||||
are multiclass quantification problems consisting of estimating the class prevalence
|
||||
values of 28 different merchandise products.</p>
|
||||
<p>Every task consists of a training set, a set of validation samples (for model selection)
|
||||
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
|
||||
(training) and two generation protocols (for validation and test samples), as follows:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">training</span><span class="p">,</span> <span class="n">val_generator</span><span class="p">,</span> <span class="n">test_generator</span> <span class="o">=</span> <span class="n">fetch_lequa2022</span><span class="p">(</span><span class="n">task</span><span class="o">=</span><span class="n">task</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>See the <code class="docutils literal notranslate"><span class="pre">lequa2022_experiments.py</span></code> in the examples folder for further details on how to
|
||||
carry out experiments using these datasets.</p>
|
||||
<p>The datasets are downloaded only once, and stored for fast reuse.</p>
|
||||
<p>Some statistics are summarized below:</p>
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>train size</p></th>
|
||||
<th class="head text-center"><p>validation samples</p></th>
|
||||
<th class="head text-center"><p>test samples</p></th>
|
||||
<th class="head text-center"><p>docs by sample</p></th>
|
||||
<th class="head text-center"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>T1A</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>250</p></td>
|
||||
<td class="text-center"><p>vector</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>T1B</p></td>
|
||||
<td class="text-center"><p>28</p></td>
|
||||
<td class="text-center"><p>20000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>vector</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>T2A</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>250</p></td>
|
||||
<td class="text-center"><p>text</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>T2B</p></td>
|
||||
<td class="text-center"><p>28</p></td>
|
||||
<td class="text-center"><p>20000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>text</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For further details on the datasets, we refer to the original
|
||||
<a class="reference external" href="https://ceur-ws.org/Vol-3180/paper-146.pdf">paper</a>:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Esuli</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Moreo</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Sebastiani</span><span class="p">,</span> <span class="n">F</span><span class="o">.</span><span class="p">,</span> <span class="o">&</span> <span class="n">Sperduti</span><span class="p">,</span> <span class="n">G</span><span class="o">.</span> <span class="p">(</span><span class="mi">2022</span><span class="p">)</span><span class="o">.</span>
|
||||
<span class="n">A</span> <span class="n">Detailed</span> <span class="n">Overview</span> <span class="n">of</span> <span class="n">LeQua</span><span class="o">@</span> <span class="n">CLEF</span> <span class="mi">2022</span><span class="p">:</span> <span class="n">Learning</span> <span class="n">to</span> <span class="n">Quantify</span><span class="o">.</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="adding-custom-datasets">
|
||||
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="adding-custom-datasets">
|
||||
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>QuaPy provides data loaders for simple formats dealing with
|
||||
text, following the format:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">class</span><span class="o">-</span><span class="nb">id</span> \<span class="n">t</span> <span class="n">first</span> <span class="n">document</span><span class="s1">'s pre-processed text </span><span class="se">\n</span>
|
||||
|
|
@ -664,17 +720,20 @@ all classes to be present in the collection).</p>
|
|||
paths, in order to create a training and test pair of <em>LabelledCollection</em>,
|
||||
e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
|
||||
<span class="n">train_path</span> <span class="o">=</span> <span class="s1">'../my_data/train.dat'</span>
|
||||
<span class="n">test_path</span> <span class="o">=</span> <span class="s1">'../my_data/test.dat'</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">my_custom_loader</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
|
||||
<span class="o">...</span>
|
||||
<span class="k">return</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">Dataset</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">my_custom_loader</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="data-processing">
|
||||
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="data-processing">
|
||||
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy implements a number of preprocessing functions in the package <em>qp.data.preprocessing</em>, including:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>text2tfidf</em>: tfidf vectorization</p></li>
|
||||
|
|
@ -683,9 +742,9 @@ e.g.:</p>
|
|||
that the column values have zero mean and unit variance).</p></li>
|
||||
<li><p><em>index</em>: transforms textual tokens into lists of numeric ids)</p></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -694,8 +753,9 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Datasets</a><ul>
|
||||
<li><a class="reference internal" href="#reviews-datasets">Reviews Datasets</a></li>
|
||||
<li><a class="reference internal" href="#twitter-sentiment-datasets">Twitter Sentiment Datasets</a></li>
|
||||
|
|
@ -703,6 +763,7 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
<li><a class="reference internal" href="#issues">Issues:</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#lequa-datasets">LeQua Datasets</a></li>
|
||||
<li><a class="reference internal" href="#adding-custom-datasets">Adding Custom Datasets</a><ul>
|
||||
<li><a class="reference internal" href="#data-processing">Data Processing</a></li>
|
||||
</ul>
|
||||
|
|
@ -711,12 +772,17 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="readme.html"
|
||||
title="previous chapter">Getting Started</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Installation.html"
|
||||
title="previous chapter">Installation</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Evaluation.html"
|
||||
title="next chapter">Evaluation</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
|
@ -733,7 +799,7 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -748,18 +814,18 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="modules.html" title="quapy"
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="readme.html" title="Getting Started"
|
||||
<a href="Installation.html" title="Installation"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -2,22 +2,25 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Evaluation — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Evaluation — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="Quantification Methods" href="Methods.html" />
|
||||
<link rel="next" title="Protocols" href="Protocols.html" />
|
||||
<link rel="prev" title="Datasets" href="Datasets.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
|
|
@ -34,12 +37,12 @@
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Datasets.html" title="Datasets"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="evaluation">
|
||||
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="evaluation">
|
||||
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this heading">¶</a></h1>
|
||||
<p>Quantification is an appealing tool in scenarios of dataset shift,
|
||||
and particularly in scenarios of prior-probability shift.
|
||||
That is, the interest in estimating the class prevalences arises
|
||||
|
|
@ -62,8 +65,8 @@ to be unlikely (as is the case in general scenarios of
|
|||
machine learning governed by the iid assumption).
|
||||
In brief, quantification requires dedicated evaluation protocols,
|
||||
which are implemented in QuaPy and explained here.</p>
|
||||
<div class="section" id="error-measures">
|
||||
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="error-measures">
|
||||
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this heading">¶</a></h2>
|
||||
<p>The module quapy.error implements the following error measures for quantification:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>mae</em>: mean absolute error</p></li>
|
||||
|
|
@ -96,13 +99,13 @@ third argument, e.g.:</p>
|
|||
Traditionally, this value is set to 1/(2T) in past literature,
|
||||
with T the sampling size. One could either pass this value
|
||||
to the function each time, or to set a QuaPy’s environment
|
||||
variable <em>SAMPLE_SIZE</em> once, and ommit this argument
|
||||
variable <em>SAMPLE_SIZE</em> once, and omit this argument
|
||||
thereafter (recommended);
|
||||
e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># once for all</span>
|
||||
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">])</span> <span class="c1"># let's assume 3 classes</span>
|
||||
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">])</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'mrae(</span><span class="si">{</span><span class="n">true_prev</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">estim_prev</span><span class="si">}</span><span class="s1">) = </span><span class="si">{</span><span class="n">error</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
|
|
@ -112,150 +115,95 @@ e.g.:</p>
|
|||
</div>
|
||||
<p>Finally, it is possible to instantiate QuaPy’s quantification
|
||||
error functions from strings using, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">'mse'</span><span class="p">)</span>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">'mse'</span><span class="p">)</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">error_function</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="evaluation-protocols">
|
||||
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this headline">¶</a></h2>
|
||||
<p>QuaPy implements the so-called “artificial sampling protocol”,
|
||||
according to which a test set is used to generate samplings at
|
||||
desired prevalences of fixed size and covering the full spectrum
|
||||
of prevalences. This protocol is called “artificial” in contrast
|
||||
to the “natural prevalence sampling” protocol that,
|
||||
despite introducing some variability during sampling, approximately
|
||||
preserves the training class prevalence.</p>
|
||||
<p>In the artificial sampling procol, the user specifies the number
|
||||
of (equally distant) points to be generated from the interval [0,1].</p>
|
||||
<p>For example, if n_prevpoints=11 then, for each class, the prevalences
|
||||
[0., 0.1, 0.2, …, 1.] will be used. This means that, for two classes,
|
||||
the number of different prevalences will be 11 (since, once the prevalence
|
||||
of one class is determined, the other one is constrained). For 3 classes,
|
||||
the number of valid combinations can be obtained as 11 + 10 + … + 1 = 66.
|
||||
In general, the number of valid combinations that will be produced for a given
|
||||
value of n_prevpoints can be consulted by invoking
|
||||
quapy.functional.num_prevalence_combinations, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="mi">21</span>
|
||||
<span class="n">n_classes</span> <span class="o">=</span> <span class="mi">4</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
</section>
|
||||
<section id="evaluation-protocols">
|
||||
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this heading">¶</a></h2>
|
||||
<p>An <em>evaluation protocol</em> is an evaluation procedure that uses
|
||||
one specific <em>sample generation procotol</em> to genereate many
|
||||
samples, typically characterized by widely varying amounts of
|
||||
<em>shift</em> with respect to the original distribution, that are then
|
||||
used to evaluate the performance of a (trained) quantifier.
|
||||
These protocols are explained in more detail in a dedicated <a class="reference internal" href="Protocols.html"><span class="doc std std-doc">entry
|
||||
in the wiki</span></a>. For the moment being, let us assume we already have
|
||||
chosen and instantiated one specific such protocol, that we here
|
||||
simply call <em>prot</em>. Let also assume our model is called
|
||||
<em>quantifier</em> and that our evaluatio measure of choice is
|
||||
<em>mae</em>. The evaluation comes down to:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mae</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE = </span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>in this example, n=1771. Note the last argument, n_repeats, that
|
||||
informs of the number of examples that will be generated for any
|
||||
valid combination (typical values are, e.g., 1 for a single sample,
|
||||
or 10 or higher for computing standard deviations of performing statistical
|
||||
significance tests).</p>
|
||||
<p>One can instead work the other way around, i.e., one could set a
|
||||
maximum budged of evaluations and get the number of prevalence points that
|
||||
will generate a number of evaluations close, but not higher, than
|
||||
the fixed budget. This can be achieved with the function
|
||||
quapy.functional.get_nprevpoints_approximation, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">budget</span> <span class="o">=</span> <span class="mi">5000</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">get_nprevpoints_approximation</span><span class="p">(</span><span class="n">budget</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'by setting n_prevpoints=</span><span class="si">{</span><span class="n">n_prevpoints</span><span class="si">}</span><span class="s1"> the number of evaluations for </span><span class="si">{</span><span class="n">n_classes</span><span class="si">}</span><span class="s1"> classes will be </span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<p>It is often desirable to evaluate our system using more than one
|
||||
single evaluatio measure. In this case, it is convenient to generate
|
||||
a <em>report</em>. A report in QuaPy is a dataframe accounting for all the
|
||||
true prevalence values with their corresponding prevalence values
|
||||
as estimated by the quantifier, along with the error each has given
|
||||
rise.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">report</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluation_report</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'mae'</span><span class="p">,</span> <span class="s1">'mrae'</span><span class="p">,</span> <span class="s1">'mkld'</span><span class="p">])</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>that will print:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">by</span> <span class="n">setting</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">30</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">evaluations</span> <span class="k">for</span> <span class="mi">4</span> <span class="n">classes</span> <span class="n">will</span> <span class="n">be</span> <span class="mi">4960</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The cost of evaluation will depend on the values of <em>n_prevpoints</em>, <em>n_classes</em>,
|
||||
and <em>n_repeats</em>. Since it might sometimes be cumbersome to control the overall
|
||||
cost of an experiment having to do with the number of combinations that
|
||||
will be generated for a particular setting of these arguments (particularly
|
||||
when <em>n_classes>2</em>), evaluation functions
|
||||
typically allow the user to rather specify an <em>evaluation budget</em>, i.e., a maximum
|
||||
number of samplings to generate. By specifying this argument, one could avoid
|
||||
specifying <em>n_prevpoints</em>, and the value for it that would lead to a closer
|
||||
number of evaluation budget, without surpassing it, will be automatically set.</p>
|
||||
<p>The following script shows a full example in which a PACC model relying
|
||||
on a Logistic Regressor classifier is
|
||||
tested on the <em>kindle</em> dataset by means of the artificial prevalence
|
||||
sampling protocol on samples of size 500, in terms of various
|
||||
evaluation metrics.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<p>From a pandas’ dataframe, it is straightforward to visualize all the results,
|
||||
and compute the averaged values, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">'display.expand_frame_repr'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">report</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span> <span class="o">=</span> <span class="n">report</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="p">)</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">)</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">preprocessing</span><span class="o">.</span><span class="n">text2tfidf</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
|
||||
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
|
||||
|
||||
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_report</span><span class="p">(</span>
|
||||
<span class="n">pacc</span><span class="p">,</span> <span class="c1"># the quantification method</span>
|
||||
<span class="n">test</span><span class="p">,</span> <span class="c1"># the test set on which the method will be evaluated</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="c1">#indicates the size of samples to be drawn</span>
|
||||
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="c1"># how many prevalence points will be extracted from the interval [0, 1] for each category</span>
|
||||
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># number of times each prevalence will be used to generate a test sample</span>
|
||||
<span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)</span>
|
||||
<span class="n">random_seed</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span> <span class="c1"># setting a random seed allows to replicate the test samples across runs</span>
|
||||
<span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'mae'</span><span class="p">,</span> <span class="s1">'mrae'</span><span class="p">,</span> <span class="s1">'mkld'</span><span class="p">],</span> <span class="c1"># specify the evaluation metrics</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># set to True to show some standard-line outputs</span>
|
||||
<span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Averaged values:'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The resulting report is a pandas’ dataframe that can be directly printed.
|
||||
Here, we set some display options from pandas just to make the output clearer;
|
||||
note also that the estimated prevalences are shown as strings using the
|
||||
function strprev function that simply converts a prevalence into a
|
||||
string representing it, with a fixed decimal precision (default 3):</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
|
||||
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">'display.expand_frame_repr'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s2">"precision"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
|
||||
<span class="n">df</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output should look like:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
|
||||
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.000</span><span class="p">,</span> <span class="mf">1.000</span><span class="p">]</span> <span class="mf">0.000</span> <span class="mf">0.000</span> <span class="mf">0.000e+00</span>
|
||||
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.091</span><span class="p">,</span> <span class="mf">0.909</span><span class="p">]</span> <span class="mf">0.009</span> <span class="mf">0.048</span> <span class="mf">4.426e-04</span>
|
||||
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.163</span><span class="p">,</span> <span class="mf">0.837</span><span class="p">]</span> <span class="mf">0.037</span> <span class="mf">0.114</span> <span class="mf">4.633e-03</span>
|
||||
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.283</span><span class="p">,</span> <span class="mf">0.717</span><span class="p">]</span> <span class="mf">0.017</span> <span class="mf">0.041</span> <span class="mf">7.383e-04</span>
|
||||
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.366</span><span class="p">,</span> <span class="mf">0.634</span><span class="p">]</span> <span class="mf">0.034</span> <span class="mf">0.070</span> <span class="mf">2.412e-03</span>
|
||||
<span class="mi">5</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.459</span><span class="p">,</span> <span class="mf">0.541</span><span class="p">]</span> <span class="mf">0.041</span> <span class="mf">0.082</span> <span class="mf">3.387e-03</span>
|
||||
<span class="mi">6</span> <span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.565</span><span class="p">,</span> <span class="mf">0.435</span><span class="p">]</span> <span class="mf">0.035</span> <span class="mf">0.073</span> <span class="mf">2.535e-03</span>
|
||||
<span class="mi">7</span> <span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.654</span><span class="p">,</span> <span class="mf">0.346</span><span class="p">]</span> <span class="mf">0.046</span> <span class="mf">0.108</span> <span class="mf">4.701e-03</span>
|
||||
<span class="mi">8</span> <span class="p">[</span><span class="mf">0.8</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.725</span><span class="p">,</span> <span class="mf">0.275</span><span class="p">]</span> <span class="mf">0.075</span> <span class="mf">0.235</span> <span class="mf">1.515e-02</span>
|
||||
<span class="mi">9</span> <span class="p">[</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.858</span><span class="p">,</span> <span class="mf">0.142</span><span class="p">]</span> <span class="mf">0.042</span> <span class="mf">0.229</span> <span class="mf">7.740e-03</span>
|
||||
<span class="mi">10</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.945</span><span class="p">,</span> <span class="mf">0.055</span><span class="p">]</span> <span class="mf">0.055</span> <span class="mf">27.357</span> <span class="mf">5.219e-02</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>One can get the averaged scores using standard pandas’
|
||||
functions, i.e.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>will produce the following output:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="mf">0.500</span>
|
||||
<span class="n">mae</span> <span class="mf">0.035</span>
|
||||
<span class="n">mrae</span> <span class="mf">2.578</span>
|
||||
<span class="n">mkld</span> <span class="mf">0.009</span>
|
||||
<p>This will produce an output like:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
|
||||
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.308</span><span class="p">,</span> <span class="mf">0.692</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.314</span><span class="p">,</span> <span class="mf">0.686</span><span class="p">]</span> <span class="mf">0.005649</span> <span class="mf">0.013182</span> <span class="mf">0.000074</span>
|
||||
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.896</span><span class="p">,</span> <span class="mf">0.104</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.909</span><span class="p">,</span> <span class="mf">0.091</span><span class="p">]</span> <span class="mf">0.013145</span> <span class="mf">0.069323</span> <span class="mf">0.000985</span>
|
||||
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.848</span><span class="p">,</span> <span class="mf">0.152</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.809</span><span class="p">,</span> <span class="mf">0.191</span><span class="p">]</span> <span class="mf">0.039063</span> <span class="mf">0.149806</span> <span class="mf">0.005175</span>
|
||||
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.016</span><span class="p">,</span> <span class="mf">0.984</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.033</span><span class="p">,</span> <span class="mf">0.967</span><span class="p">]</span> <span class="mf">0.017236</span> <span class="mf">0.487529</span> <span class="mf">0.005298</span>
|
||||
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.728</span><span class="p">,</span> <span class="mf">0.272</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.751</span><span class="p">,</span> <span class="mf">0.249</span><span class="p">]</span> <span class="mf">0.022769</span> <span class="mf">0.057146</span> <span class="mf">0.001350</span>
|
||||
<span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span>
|
||||
<span class="mi">4995</span> <span class="p">[</span><span class="mf">0.72</span><span class="p">,</span> <span class="mf">0.28</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.698</span><span class="p">,</span> <span class="mf">0.302</span><span class="p">]</span> <span class="mf">0.021752</span> <span class="mf">0.053631</span> <span class="mf">0.001133</span>
|
||||
<span class="mi">4996</span> <span class="p">[</span><span class="mf">0.868</span><span class="p">,</span> <span class="mf">0.132</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.888</span><span class="p">,</span> <span class="mf">0.112</span><span class="p">]</span> <span class="mf">0.020490</span> <span class="mf">0.088230</span> <span class="mf">0.001985</span>
|
||||
<span class="mi">4997</span> <span class="p">[</span><span class="mf">0.292</span><span class="p">,</span> <span class="mf">0.708</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.298</span><span class="p">,</span> <span class="mf">0.702</span><span class="p">]</span> <span class="mf">0.006149</span> <span class="mf">0.014788</span> <span class="mf">0.000090</span>
|
||||
<span class="mi">4998</span> <span class="p">[</span><span class="mf">0.24</span><span class="p">,</span> <span class="mf">0.76</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.220</span><span class="p">,</span> <span class="mf">0.780</span><span class="p">]</span> <span class="mf">0.019950</span> <span class="mf">0.054309</span> <span class="mf">0.001127</span>
|
||||
<span class="mi">4999</span> <span class="p">[</span><span class="mf">0.948</span><span class="p">,</span> <span class="mf">0.052</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.965</span><span class="p">,</span> <span class="mf">0.035</span><span class="p">]</span> <span class="mf">0.016941</span> <span class="mf">0.165776</span> <span class="mf">0.003538</span>
|
||||
|
||||
<span class="p">[</span><span class="mi">5000</span> <span class="n">rows</span> <span class="n">x</span> <span class="mi">5</span> <span class="n">columns</span><span class="p">]</span>
|
||||
<span class="n">Averaged</span> <span class="n">values</span><span class="p">:</span>
|
||||
<span class="n">mae</span> <span class="mf">0.023588</span>
|
||||
<span class="n">mrae</span> <span class="mf">0.108779</span>
|
||||
<span class="n">mkld</span> <span class="mf">0.003631</span>
|
||||
<span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>
|
||||
|
||||
<span class="n">Process</span> <span class="n">finished</span> <span class="k">with</span> <span class="n">exit</span> <span class="n">code</span> <span class="mi">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Other evaluation functions include:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>artificial_sampling_eval</em>: that computes the evaluation for a
|
||||
given evaluation metric, returning the average instead of a dataframe.</p></li>
|
||||
<li><p><em>artificial_sampling_prediction</em>: that returns two np.arrays containing the
|
||||
true prevalences and the estimated prevalences.</p></li>
|
||||
</ul>
|
||||
<p>See the documentation for further details.</p>
|
||||
</div>
|
||||
<p>Alternatively, we can simply generate all the predictions by:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>All the evaluation functions implement specific optimizations for speeding-up
|
||||
the evaluation of aggregative quantifiers (i.e., of instances of <em>AggregativeQuantifier</em>).
|
||||
The optimization comes down to generating classification predictions (either crisp or soft)
|
||||
only once for the entire test set, and then applying the sampling procedure to the
|
||||
predictions, instead of generating samples of instances and then computing the
|
||||
classification predictions every time. This is only possible when the protocol
|
||||
is an instance of <em>OnLabelledCollectionProtocol</em>. The optimization is only
|
||||
carried out when the number of classification predictions thus generated would be
|
||||
smaller than the number of predictions required for the entire protocol; e.g.,
|
||||
if the original dataset contains 1M instances, but the protocol is such that it would
|
||||
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
|
||||
classification for each sample. This behaviour is indicated by setting
|
||||
<em>aggr_speedup=”auto”</em>. Conversely, when indicating <em>aggr_speedup=”force”</em> QuaPy will
|
||||
precompute all the predictions irrespectively of the number of instances and number of samples.
|
||||
Finally, this can be deactivated by setting <em>aggr_speedup=False</em>. Note that this optimization
|
||||
is not only applied for the final evaluation, but also for the internal evaluations carried
|
||||
out during <em>model selection</em>. Since these are typically many, the heuristic can help reduce the
|
||||
execution time a lot.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -264,8 +212,9 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Evaluation</a><ul>
|
||||
<li><a class="reference internal" href="#error-measures">Error Measures</a></li>
|
||||
<li><a class="reference internal" href="#evaluation-protocols">Evaluation Protocols</a></li>
|
||||
|
|
@ -273,12 +222,17 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="previous chapter">Datasets</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Methods.html"
|
||||
title="next chapter">Quantification Methods</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="previous chapter">Datasets</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Protocols.html"
|
||||
title="next chapter">Protocols</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
|
@ -295,7 +249,7 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -310,18 +264,18 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Datasets.html" title="Datasets"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -2,18 +2,21 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Installation — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Installation — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
|
@ -39,7 +42,7 @@
|
|||
<li class="right" >
|
||||
<a href="index.html" title="Welcome to QuaPy’s documentation!"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
|
@ -49,15 +52,15 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="section" id="installation">
|
||||
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="installation">
|
||||
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this heading">¶</a></h1>
|
||||
<p>QuaPy can be easily installed via <cite>pip</cite></p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">quapy</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>See <a class="reference external" href="https://pypi.org/project/QuaPy/">pip page</a> for older versions.</p>
|
||||
<div class="section" id="requirements">
|
||||
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="requirements">
|
||||
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this heading">¶</a></h2>
|
||||
<ul class="simple">
|
||||
<li><p>scikit-learn, numpy, scipy</p></li>
|
||||
<li><p>pytorch (for QuaNet)</p></li>
|
||||
|
|
@ -67,9 +70,9 @@
|
|||
<li><p>pandas, xlrd</p></li>
|
||||
<li><p>matplotlib</p></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="svm-perf-with-quantification-oriented-losses">
|
||||
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="svm-perf-with-quantification-oriented-losses">
|
||||
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this heading">¶</a></h2>
|
||||
<p>In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
|
||||
SVM(AE), or SVM(RAE), you have to first download the
|
||||
<a class="reference external" href="http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">svmperf</a>
|
||||
|
|
@ -96,8 +99,8 @@ and for the <cite>KLD</cite> and <cite>NKLD</cite> as proposed by
|
|||
for quantification.
|
||||
This patch extends the former by also allowing SVMperf to optimize for
|
||||
<cite>AE</cite> and <cite>RAE</cite>.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -106,8 +109,9 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Installation</a><ul>
|
||||
<li><a class="reference internal" href="#requirements">Requirements</a></li>
|
||||
<li><a class="reference internal" href="#svm-perf-with-quantification-oriented-losses">SVM-perf with quantification-oriented losses</a></li>
|
||||
|
|
@ -115,12 +119,17 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="index.html"
|
||||
title="previous chapter">Welcome to QuaPy’s documentation!</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="next chapter">Datasets</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="index.html"
|
||||
title="previous chapter">Welcome to QuaPy’s documentation!</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="next chapter">Datasets</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
|
@ -137,7 +146,7 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -157,13 +166,13 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
<li class="right" >
|
||||
<a href="index.html" title="Welcome to QuaPy’s documentation!"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -2,23 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Quantification Methods — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Quantification Methods — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="Plotting" href="Plotting.html" />
|
||||
<link rel="prev" title="Evaluation" href="Evaluation.html" />
|
||||
<link rel="next" title="Model Selection" href="Model-Selection.html" />
|
||||
<link rel="prev" title="Protocols" href="Protocols.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
|
@ -34,12 +37,12 @@
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="quantification-methods">
|
||||
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="quantification-methods">
|
||||
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this heading">¶</a></h1>
|
||||
<p>Quantification methods can be categorized as belonging to
|
||||
<em>aggregative</em> and <em>non-aggregative</em> groups.
|
||||
Most methods included in QuaPy at the moment are of type <em>aggregative</em>
|
||||
|
|
@ -65,12 +68,6 @@ and implement some abstract methods:</p>
|
|||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span> <span class="o">...</span>
|
||||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span> <span class="o">...</span>
|
||||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span> <span class="o">...</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The meaning of those functions should be familiar to those
|
||||
|
|
@ -82,12 +79,12 @@ scikit-learn’ structure has not been adopted <em>as is</em> in QuaPy responds
|
|||
the fact that scikit-learn’s <em>predict</em> function is expected to return
|
||||
one output for each input element –e.g., a predicted label for each
|
||||
instance in a sample– while in quantification the output for a sample
|
||||
is one single array of class prevalences), while functions <em>set_params</em>
|
||||
and <em>get_params</em> allow a
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>
|
||||
to automate the process of hyperparameter search.</p>
|
||||
<div class="section" id="aggregative-methods">
|
||||
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this headline">¶</a></h2>
|
||||
is one single array of class prevalences).
|
||||
Quantifiers also extend from scikit-learn’s <code class="docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, in order
|
||||
to simplify the use of <em>set_params</em> and <em>get_params</em> used in
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>.</p>
|
||||
<section id="aggregative-methods">
|
||||
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this heading">¶</a></h2>
|
||||
<p>All quantification methods are implemented as part of the
|
||||
<em>qp.method</em> package. In particular, <em>aggregative</em> methods are defined in
|
||||
<em>qp.method.aggregative</em>, and extend <em>AggregativeQuantifier(BaseQuantifier)</em>.
|
||||
|
|
@ -103,12 +100,12 @@ The methods that any <em>aggregative</em> quantifier must implement are:</p>
|
|||
individual predictions of a classifier. Indeed, a default implementation
|
||||
of <em>BaseQuantifier.quantify</em> is already provided, which looks like:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">preclassify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Aggregative quantifiers are expected to maintain a classifier (which is
|
||||
accessed through the <em>@property</em> <em>learner</em>). This classifier is
|
||||
accessed through the <em>@property</em> <em>classifier</em>). This classifier is
|
||||
given as input to the quantifier, and can be already fit
|
||||
on external data (in which case, the <em>fit_learner</em> argument should
|
||||
be set to False), or be fit by the quantifier’s fit (default).</p>
|
||||
|
|
@ -118,12 +115,8 @@ aggregative methods, that should inherit from the abstract class
|
|||
The particularity of <em>probabilistic</em> aggregative methods (w.r.t.
|
||||
non-probabilistic ones), is that the default quantifier is defined
|
||||
in terms of the posterior probabilities returned by a probabilistic
|
||||
classifier, and not by the crisp decisions of a hard classifier; i.e.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">classif_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">posterior_probabilities</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_posteriors</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
classifier, and not by the crisp decisions of a hard classifier.
|
||||
In any case, the interface <em>classify(instances)</em> remains unchanged.</p>
|
||||
<p>One advantage of <em>aggregative</em> methods (either probabilistic or not)
|
||||
is that the evaluation according to any sampling procedure (e.g.,
|
||||
the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>)
|
||||
|
|
@ -133,8 +126,8 @@ reuse these predictions, without requiring to classify each element every time.
|
|||
QuaPy leverages this property to speed-up any procedure having to do with
|
||||
quantification over samples, as is customarily done in model selection or
|
||||
in evaluation.</p>
|
||||
<div class="section" id="the-classify-count-variants">
|
||||
<h3>The Classify & Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="the-classify-count-variants">
|
||||
<h3>The Classify & Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy implements the four CC variants, i.e.:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>CC</em> (Classify & Count), the simplest aggregative quantifier; one that
|
||||
|
|
@ -150,9 +143,7 @@ with a SVM as the classifier:</p>
|
|||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="c1"># instantiate a classifier learner, in this case a SVM</span>
|
||||
<span class="n">svm</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
|
||||
|
|
@ -196,7 +187,7 @@ e.g.:</p>
|
|||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PCC</span><span class="p">(</span><span class="n">svm</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'classifier:'</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'classifier:'</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this case, QuaPy will print:</p>
|
||||
|
|
@ -214,9 +205,9 @@ be applied to hard classifiers when <em>fit_learner=True</em>; an exception
|
|||
will be raised otherwise.</p>
|
||||
<p>Lastly, everything we said aboud ACC and PCC
|
||||
applies to PACC as well.</p>
|
||||
</div>
|
||||
<div class="section" id="expectation-maximization-emq">
|
||||
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this headline">¶</a></h3>
|
||||
</section>
|
||||
<section id="expectation-maximization-emq">
|
||||
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this heading">¶</a></h3>
|
||||
<p>The Expectation Maximization Quantifier (EMQ), also known as
|
||||
the SLD, is available at <em>qp.method.aggregative.EMQ</em> or via the
|
||||
alias <em>qp.method.aggregative.ExpectationMaximizationQuantifier</em>.
|
||||
|
|
@ -241,13 +232,21 @@ experiments we have carried out.</p>
|
|||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="hellinger-distance-y-hdy">
|
||||
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this headline">¶</a></h3>
|
||||
<p>The method HDy is described in:</p>
|
||||
<p><em>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.</em></p>
|
||||
<p><em>New in v0.1.7</em>: EMQ now accepts two new parameters in the construction method, namely
|
||||
<em>exact_train_prev</em> which allows to use the true training prevalence as the departing
|
||||
prevalence estimation (default behaviour), or instead an approximation of it as
|
||||
suggested by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>
|
||||
(by setting <em>exact_train_prev=False</em>).
|
||||
The other parameter is <em>recalib</em> which allows to indicate a calibration method, among those
|
||||
proposed by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>,
|
||||
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
|
||||
See the API documentation for further details.</p>
|
||||
</section>
|
||||
<section id="hellinger-distance-y-hdy">
|
||||
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this heading">¶</a></h3>
|
||||
<p>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
<a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S0020025512004069">González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.</a></p>
|
||||
<p>It is implemented in <em>qp.method.aggregative.HDy</em> (also accessible
|
||||
through the allias <em>qp.method.aggregative.HellingerDistanceY</em>).
|
||||
This method works with a probabilistic classifier (hard classifiers
|
||||
|
|
@ -274,30 +273,48 @@ provided in QuaPy accepts only binary datasets.</p>
|
|||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the generalized
|
||||
“Distribution Matching” approaches for multiclass, inspired by the framework
|
||||
of <a class="reference external" href="https://arxiv.org/abs/1606.00868">Firat (2016)</a>. One can instantiate
|
||||
a variant of HDy for multiclass quantification as follows:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mutliclassHDy</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">DistributionMatching</span><span class="p">(</span><span class="n">classifier</span><span class="o">=</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">divergence</span><span class="o">=</span><span class="s1">'HD'</span><span class="p">,</span> <span class="n">cdf</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="explicit-loss-minimization">
|
||||
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this headline">¶</a></h3>
|
||||
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the “DyS”
|
||||
framework proposed by <a class="reference external" href="https://ojs.aaai.org/index.php/AAAI/article/view/4376">Maletzke et al (2020)</a>
|
||||
and the “SMM” method proposed by <a class="reference external" href="https://ieeexplore.ieee.org/document/9260028">Hassan et al (2019)</a>
|
||||
(thanks to <em>Pablo González</em> for the contributions!)</p>
|
||||
</section>
|
||||
<section id="threshold-optimization-methods">
|
||||
<h3>Threshold Optimization methods<a class="headerlink" href="#threshold-optimization-methods" title="Permalink to this heading">¶</a></h3>
|
||||
<p><em>New in v0.1.7:</em> QuaPy now implements Forman’s threshold optimization methods;
|
||||
see, e.g., <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/1150402.1150423">(Forman 2006)</a>
|
||||
and <a class="reference external" href="https://link.springer.com/article/10.1007/s10618-008-0097-y">(Forman 2008)</a>.
|
||||
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.</p>
|
||||
</section>
|
||||
<section id="explicit-loss-minimization">
|
||||
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this heading">¶</a></h3>
|
||||
<p>The Explicit Loss Minimization (ELM) represent a family of methods
|
||||
based on structured output learning, i.e., quantifiers relying on
|
||||
classifiers that have been optimized targeting a
|
||||
quantification-oriented evaluation measure.</p>
|
||||
<p>In QuaPy, the following methods, all relying on Joachim’s
|
||||
<a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
|
||||
implementation, are available in <em>qp.method.aggregative</em>:</p>
|
||||
quantification-oriented evaluation measure.
|
||||
The original methods are implemented in QuaPy as classify & count (CC)
|
||||
quantifiers that use Joachim’s <a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
|
||||
as the underlying classifier, properly set to optimize for the desired loss.</p>
|
||||
<p>In QuaPy, this can be more achieved by calling the functions:</p>
|
||||
<ul class="simple">
|
||||
<li><p>SVMQ (SVM-Q) is a quantification method optimizing the metric <em>Q</em> defined
|
||||
in <em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604.</em></p></li>
|
||||
<li><p>SVMKLD (SVM for Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
|
||||
<li><p><em>newSVMQ</em>: returns the quantification method called SVM(Q) that optimizes for the metric <em>Q</em> defined
|
||||
in <a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S003132031400291X"><em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604.</em></a></p></li>
|
||||
<li><p><em>newSVMKLD</em> and <em>newSVMNKLD</em>: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
|
||||
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/2700406"><em>Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
|
||||
<li><p>SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
|
||||
<li><p>SVMAE (SVM for Mean Absolute Error)</p></li>
|
||||
<li><p>SVMRAE (SVM for Mean Relative Absolute Error)</p></li>
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></a></p></li>
|
||||
<li><p><em>newSVMAE</em> and <em>newSVMRAE</em>: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
|
||||
(Mean) Relative Absolute Error, as first used by
|
||||
<a class="reference external" href="https://arxiv.org/abs/2011.02552"><em>Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23.</em></a></p></li>
|
||||
</ul>
|
||||
<p>the last two methods (SVMAE and SVMRAE) have been implemented in
|
||||
<p>the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
|
||||
QuaPy in order to make available ELM variants for what nowadays
|
||||
are considered the most well-behaved evaluation metrics in quantification.</p>
|
||||
<p>In order to make these models work, you would need to run the script
|
||||
|
|
@ -327,11 +344,15 @@ currently supports only binary classification.
|
|||
ELM variants (any binary quantifier in general) can be extended
|
||||
to operate in single-label scenarios trivially by adopting a
|
||||
“one-vs-all” strategy (as, e.g., in
|
||||
<em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22</em>).
|
||||
In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
|
||||
<a class="reference external" href="https://link.springer.com/article/10.1007/s13278-016-0327-z"><em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22</em></a>).
|
||||
In QuaPy this is possible by using the <em>OneVsAll</em> class.</p>
|
||||
<p>There are two ways for instantiating this class, <em>OneVsAllGeneric</em> that works for
|
||||
any quantifier, and <em>OneVsAllAggregative</em> that is optimized for aggregative quantifiers.
|
||||
In general, you can simply use the <em>getOneVsAll</em> function and QuaPy will choose
|
||||
the more convenient of the two.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span><span class="p">,</span> <span class="n">OneVsAll</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span>
|
||||
|
||||
<span class="c1"># load a single-label dataset (this one contains 3 classes)</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
|
@ -339,30 +360,32 @@ In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
|
|||
<span class="c1"># let qp know where svmperf is</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SVMPERF_HOME'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'../svm_perf_quantification'</span>
|
||||
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">OneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">getOneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="meta-models">
|
||||
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this headline">¶</a></h2>
|
||||
<p>Check the examples <em><span class="xref myst">explicit_loss_minimization.py</span></em>
|
||||
and <span class="xref myst">one_vs_all.py</span> for more details.</p>
|
||||
</section>
|
||||
</section>
|
||||
<section id="meta-models">
|
||||
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this heading">¶</a></h2>
|
||||
<p>By <em>meta</em> models we mean quantification methods that are defined on top of other
|
||||
quantification methods, and that thus do not squarely belong to the aggregative nor
|
||||
the non-aggregative group (indeed, <em>meta</em> models could use quantifiers from any of those
|
||||
groups).
|
||||
<em>Meta</em> models are implemented in the <em>qp.method.meta</em> module.</p>
|
||||
<div class="section" id="ensembles">
|
||||
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="ensembles">
|
||||
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy implements (some of) the variants proposed in:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253516300628"><em>Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
|
||||
Information Fusion, 34, 87-100.</em></p></li>
|
||||
<li><p><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Information Fusion, 34, 87-100.</em></a></p></li>
|
||||
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253517303652"><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Dynamic ensemble selection for quantification tasks.
|
||||
Information Fusion, 45, 1-15.</em></p></li>
|
||||
Information Fusion, 45, 1-15.</em></a></p></li>
|
||||
</ul>
|
||||
<p>The following code shows how to instantiate an Ensemble of 30 <em>Adjusted Classify & Count</em> (ACC)
|
||||
quantifiers operating with a <em>Logistic Regressor</em> (LR) as the base classifier, and using the
|
||||
|
|
@ -391,14 +414,14 @@ the performance estimated for each member of the ensemble in terms of that evalu
|
|||
informs of the number of members to retain.</p>
|
||||
<p>Please, check the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selection</a>
|
||||
wiki if you want to optimize the hyperparameters of ensemble for classification or quantification.</p>
|
||||
</div>
|
||||
<div class="section" id="the-quanet-neural-network">
|
||||
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this headline">¶</a></h3>
|
||||
</section>
|
||||
<section id="the-quanet-neural-network">
|
||||
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy offers an implementation of QuaNet, a deep learning model presented in:</p>
|
||||
<p><em>Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
<p><a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/3269206.3269287"><em>Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
A recurrent neural network for sentiment quantification.
|
||||
In Proceedings of the 27th ACM International Conference on
|
||||
Information and Knowledge Management (pp. 1775-1778).</em></p>
|
||||
Information and Knowledge Management (pp. 1775-1778).</em></a></p>
|
||||
<p>This model requires <em>torch</em> to be installed.
|
||||
QuaNet also requires a classifier that can provide embedded representations
|
||||
of the inputs.
|
||||
|
|
@ -420,14 +443,14 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
<span class="n">learner</span> <span class="o">=</span> <span class="n">NeuralClassifierTrainer</span><span class="p">(</span><span class="n">cnn</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># train QuaNet</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -436,13 +459,15 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Quantification Methods</a><ul>
|
||||
<li><a class="reference internal" href="#aggregative-methods">Aggregative Methods</a><ul>
|
||||
<li><a class="reference internal" href="#the-classify-count-variants">The Classify & Count variants</a></li>
|
||||
<li><a class="reference internal" href="#expectation-maximization-emq">Expectation Maximization (EMQ)</a></li>
|
||||
<li><a class="reference internal" href="#hellinger-distance-y-hdy">Hellinger Distance y (HDy)</a></li>
|
||||
<li><a class="reference internal" href="#threshold-optimization-methods">Threshold Optimization methods</a></li>
|
||||
<li><a class="reference internal" href="#explicit-loss-minimization">Explicit Loss Minimization</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
|
|
@ -455,12 +480,17 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Evaluation.html"
|
||||
title="previous chapter">Evaluation</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Plotting.html"
|
||||
title="next chapter">Plotting</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Protocols.html"
|
||||
title="previous chapter">Protocols</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Model-Selection.html"
|
||||
title="next chapter">Model Selection</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
|
@ -477,7 +507,7 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -492,18 +522,18 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -2,21 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Model Selection — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Model Selection — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="Plotting" href="Plotting.html" />
|
||||
<link rel="prev" title="Quantification Methods" href="Methods.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
|
@ -31,7 +36,13 @@
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
|
@ -41,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="model-selection">
|
||||
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="model-selection">
|
||||
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this heading">¶</a></h1>
|
||||
<p>As a supervised machine learning task, quantification methods
|
||||
can strongly depend on a good choice of model hyper-parameters.
|
||||
The process whereby those hyper-parameters are chosen is
|
||||
|
|
@ -50,8 +61,8 @@ typically known as <em>Model Selection</em>, and typically consists of
|
|||
testing different settings and picking the one that performed
|
||||
best in a held-out validation set in terms of any given
|
||||
evaluation measure.</p>
|
||||
<div class="section" id="targeting-a-quantification-oriented-loss">
|
||||
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="targeting-a-quantification-oriented-loss">
|
||||
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this heading">¶</a></h2>
|
||||
<p>The task being optimized determines the evaluation protocol,
|
||||
i.e., the criteria according to which the performance of
|
||||
any given method for solving is to be assessed.
|
||||
|
|
@ -63,81 +74,91 @@ specifically designed for the task of quantification.</p>
|
|||
classification, and thus the model selection strategies
|
||||
customarily adopted in classification have simply been
|
||||
applied to quantification (see the next section).
|
||||
It has been argued in <em>Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
“Re-Assessing the” Classify and Count” Quantification Method.”
|
||||
arXiv preprint arXiv:2011.02552 (2020).</em>
|
||||
It has been argued in <a class="reference external" href="https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6">Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
Re-Assessing the “Classify and Count” Quantification Method.
|
||||
ECIR 2021: Advances in Information Retrieval pp 75–91.</a>
|
||||
that specific model selection strategies should
|
||||
be adopted for quantification. That is, model selection
|
||||
strategies for quantification should target
|
||||
quantification-oriented losses and be tested in a variety
|
||||
of scenarios exhibiting different degrees of prior
|
||||
probability shift.</p>
|
||||
<p>The class
|
||||
<em>qp.model_selection.GridSearchQ</em>
|
||||
implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that evaluates each<br />
|
||||
combination of hyper-parameters
|
||||
by means of a given quantification-oriented
|
||||
<p>The class <em>qp.model_selection.GridSearchQ</em> implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluates</a>
|
||||
each combination of hyper-parameters by means of a given quantification-oriented
|
||||
error metric (e.g., any of the error functions implemented
|
||||
in <em>qp.error</em>) and according to the
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation"><em>artificial sampling protocol</em></a>.</p>
|
||||
<p>The following is an example of model selection for quantification:</p>
|
||||
in <em>qp.error</em>) and according to a
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">sampling generation protocol</a>.</p>
|
||||
<p>The following is an example (also included in the examples folder) of model selection for quantification:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">PCC</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">DistributionMatching</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="c1"># set a seed to replicate runs</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">In this example, we show how to perform model selection on a DistributionMatching quantifier.</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'hp'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'N_JOBS'</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># explore hyper-parameters in parallel</span>
|
||||
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="c1"># The model will be returned by the fit method of GridSearchQ.</span>
|
||||
<span class="c1"># Model selection will be performed with a fixed budget of 1000 evaluations</span>
|
||||
<span class="c1"># for each hyper-parameter combination. The error to optimize is the MAE for</span>
|
||||
<span class="c1"># quantification, as evaluated on artificially drawn samples at prevalences </span>
|
||||
<span class="c1"># covering the entire spectrum on a held-out portion (40%) of the training set.</span>
|
||||
<span class="c1"># Every combination of hyper-parameters will be evaluated by confronting the</span>
|
||||
<span class="c1"># quantifier thus configured against a series of samples generated by means</span>
|
||||
<span class="c1"># of a sample generation protocol. For this example, we will use the</span>
|
||||
<span class="c1"># artificial-prevalence protocol (APP), that generates samples with prevalence</span>
|
||||
<span class="c1"># values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).</span>
|
||||
<span class="c1"># We devote 30% of the dataset for this exploration.</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
|
||||
<span class="n">protocol</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># We will explore a classification-dependent hyper-parameter (e.g., the 'C'</span>
|
||||
<span class="c1"># hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter</span>
|
||||
<span class="c1"># (e.g., the number of bins in a DistributionMatching quantifier.</span>
|
||||
<span class="c1"># Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"</span>
|
||||
<span class="c1"># in order to let the quantifier know this hyper-parameter belongs to its underlying</span>
|
||||
<span class="c1"># classifier.</span>
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">),</span>
|
||||
<span class="s1">'nbins'</span><span class="p">:</span> <span class="p">[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">64</span><span class="p">],</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="o">=</span><span class="n">PCC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">()),</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">10</span><span class="p">),</span> <span class="s1">'class_weight'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'balanced'</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span>
|
||||
<span class="n">eval_budget</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
|
||||
<span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span>
|
||||
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set</span>
|
||||
<span class="n">val_split</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span>
|
||||
<span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="n">param_grid</span><span class="p">,</span>
|
||||
<span class="n">protocol</span><span class="o">=</span><span class="n">protocol</span><span class="p">,</span>
|
||||
<span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="c1"># the error to optimize is the MAE (a quantification-oriented loss)</span>
|
||||
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set once done</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># show information as the process goes on</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">best_model_</span>
|
||||
|
||||
<span class="c1"># evaluation in terms of MAE</span>
|
||||
<span class="n">results</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_eval</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">,</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="p">,</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span>
|
||||
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">101</span><span class="p">,</span>
|
||||
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
|
||||
<span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span>
|
||||
<span class="p">)</span>
|
||||
<span class="c1"># we use the same evaluation protocol (APP) on the test set</span>
|
||||
<span class="n">mae_score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">),</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">)</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE=</span><span class="si">{</span><span class="n">results</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE=</span><span class="si">{</span><span class="n">mae_score</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this example, the system outputs:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>[GridSearchQ]: starting optimization with n_jobs=1
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': 'balanced'} got mae score 0.24987
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': None} got mae score 0.48135
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.001, 'class_weight': 'balanced'} got mae score 0.24866
|
||||
[...]
|
||||
[GridSearchQ]: checking hyperparams={'C': 100000.0, 'class_weight': None} got mae score 0.43676
|
||||
[GridSearchQ]: optimization finished: best params {'C': 0.1, 'class_weight': 'balanced'} (score=0.19982)
|
||||
[GridSearchQ]: refitting on the whole development set
|
||||
model selection ended: best hyper-parameters={'C': 0.1, 'class_weight': 'balanced'}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5005.54it/s]
|
||||
MAE=0.20342
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">starting</span> <span class="n">model</span> <span class="n">selection</span> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=-</span><span class="mi">1</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">64</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04021</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.1356</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04286</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2139</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">16</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04888</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2491</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.001</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">8</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.05163</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.5372</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">1000.0</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.02445</span> <span class="p">[</span><span class="n">took</span> <span class="mf">2.9056</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">optimization</span> <span class="n">finished</span><span class="p">:</span> <span class="n">best</span> <span class="n">params</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="p">(</span><span class="n">score</span><span class="o">=</span><span class="mf">0.02234</span><span class="p">)</span> <span class="p">[</span><span class="n">took</span> <span class="mf">7.3114</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">refitting</span> <span class="n">on</span> <span class="n">the</span> <span class="n">whole</span> <span class="n">development</span> <span class="nb">set</span>
|
||||
<span class="n">model</span> <span class="n">selection</span> <span class="n">ended</span><span class="p">:</span> <span class="n">best</span> <span class="n">hyper</span><span class="o">-</span><span class="n">parameters</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span>
|
||||
<span class="n">MAE</span><span class="o">=</span><span class="mf">0.03102</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The parameter <em>val_split</em> can alternatively be used to indicate
|
||||
|
|
@ -145,9 +166,9 @@ a validation set (i.e., an instance of <em>LabelledCollection</em>) instead
|
|||
of a proportion. This could be useful if one wants to have control
|
||||
on the specific data split to be used across different model selection
|
||||
experiments.</p>
|
||||
</div>
|
||||
<div class="section" id="targeting-a-classification-oriented-loss">
|
||||
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="targeting-a-classification-oriented-loss">
|
||||
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this heading">¶</a></h2>
|
||||
<p>Optimizing a model for quantification could rather be
|
||||
computationally costly.
|
||||
In aggregative methods, one could alternatively try to optimize
|
||||
|
|
@ -161,32 +182,15 @@ The following code illustrates how to do that:</p>
|
|||
<span class="n">LogisticRegression</span><span class="p">(),</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="s1">'class_weight'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'balanced'</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
|
||||
<span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">PCC</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this example, the system outputs:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>model selection ended: best hyper-parameters={'C': 10000.0, 'class_weight': None}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5379.55it/s]
|
||||
MAE=0.41734
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Note that the MAE is worse than the one we obtained when optimizing
|
||||
for quantification and, indeed, the hyper-parameters found optimal
|
||||
largely differ between the two selection modalities. The
|
||||
hyper-parameters C=10000 and class_weight=None have been found
|
||||
to work well for the specific training prevalence of the HP dataset,
|
||||
but these hyper-parameters turned out to be suboptimal when the
|
||||
class prevalences of the test set differs (as is indeed tested
|
||||
in scenarios of quantification).</p>
|
||||
<p>This is, however, not always the case, and one could, in practice,
|
||||
find examples
|
||||
in which optimizing for classification ends up resulting in a better
|
||||
quantifier than when optimizing for quantification.
|
||||
Nonetheless, this is theoretically unlikely to happen.</p>
|
||||
</div>
|
||||
</div>
|
||||
<p>However, this is conceptually flawed, since the model should be
|
||||
optimized for the task at hand (quantification), and not for a surrogate task (classification),
|
||||
i.e., the model should be requested to deliver low quantification errors, rather
|
||||
than low classification errors.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -195,8 +199,9 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Model Selection</a><ul>
|
||||
<li><a class="reference internal" href="#targeting-a-quantification-oriented-loss">Targeting a Quantification-oriented loss</a></li>
|
||||
<li><a class="reference internal" href="#targeting-a-classification-oriented-loss">Targeting a Classification-oriented loss</a></li>
|
||||
|
|
@ -204,6 +209,17 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Methods.html"
|
||||
title="previous chapter">Quantification Methods</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Plotting.html"
|
||||
title="next chapter">Plotting</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
|
@ -220,7 +236,7 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -234,13 +250,19 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -2,23 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Plotting — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Plotting — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="quapy" href="modules.html" />
|
||||
<link rel="prev" title="Quantification Methods" href="Methods.html" />
|
||||
<link rel="prev" title="Model Selection" href="Model-Selection.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
|
@ -37,9 +40,9 @@
|
|||
<a href="modules.html" title="quapy"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="plotting">
|
||||
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="plotting">
|
||||
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this heading">¶</a></h1>
|
||||
<p>The module <em>qp.plot</em> implements some basic plotting functions
|
||||
that can help analyse the performance of a quantification method.</p>
|
||||
<p>All plotting functions receive as inputs the outcomes of
|
||||
|
|
@ -91,7 +94,7 @@ quantification methods across different scenarios showcasing
|
|||
the accuracy of the quantifier in predicting class prevalences
|
||||
for a wide range of prior distributions. This can easily be
|
||||
achieved by means of the
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">artificial sampling protocol</a>
|
||||
that is implemented in QuaPy.</p>
|
||||
<p>The following code shows how to perform one simple experiment
|
||||
in which the 4 <em>CC-variants</em>, all equipped with a linear SVM, are
|
||||
|
|
@ -100,6 +103,7 @@ tested across the entire spectrum of class priors (taking 21 splits
|
|||
of the interval [0,1], i.e., using prevalence steps of 0.05, and
|
||||
generating 100 random samples at each prevalence).</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">protocol</span> <span class="kn">import</span> <span class="n">APP</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">CC</span><span class="p">,</span> <span class="n">ACC</span><span class="p">,</span> <span class="n">PCC</span><span class="p">,</span> <span class="n">PACC</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
|
||||
|
||||
|
|
@ -108,28 +112,26 @@ generating 100 random samples at each prevalence).</p>
|
|||
<span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">base_classifier</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">(</span><span class="n">class_weight</span><span class="o">=</span><span class="s1">'balanced'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">models</span><span class="p">():</span>
|
||||
<span class="k">yield</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'CC'</span><span class="p">,</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'ACC'</span><span class="p">,</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'PCC'</span><span class="p">,</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'PACC'</span><span class="p">,</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">test</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">method_name</span><span class="p">,</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
|
||||
|
||||
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">)</span>
|
||||
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method_name</span><span class="p">)</span>
|
||||
<span class="n">true_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">true_prev</span><span class="p">)</span>
|
||||
<span class="n">estim_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
|
||||
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">train</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span>
|
||||
|
||||
|
|
@ -137,8 +139,8 @@ generating 100 random samples at each prevalence).</p>
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>the plots that can be generated are explained below.</p>
|
||||
<div class="section" id="diagonal-plot">
|
||||
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="diagonal-plot">
|
||||
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this heading">¶</a></h2>
|
||||
<p>The <em>diagonal</em> plot shows a very insightful view of the
|
||||
quantifier’s performance. It plots the predicted class
|
||||
prevalence (in the y-axis) against the true class prevalence
|
||||
|
|
@ -164,9 +166,9 @@ the complete list of arguments in the documentation).</p>
|
|||
<p>Finally, note how most quantifiers, and specially the “unadjusted”
|
||||
variants CC and PCC, are strongly biased towards the
|
||||
prevalence seen during training.</p>
|
||||
</div>
|
||||
<div class="section" id="quantification-bias">
|
||||
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="quantification-bias">
|
||||
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this heading">¶</a></h2>
|
||||
<p>This plot aims at evincing the bias that any quantifier
|
||||
displays with respect to the training prevalences by
|
||||
means of <a class="reference external" href="https://en.wikipedia.org/wiki/Box_plot">box plots</a>.
|
||||
|
|
@ -196,21 +198,19 @@ IMDb dataset, and generate the bias plot again.
|
|||
This example can be run by rewritting the <em>gen_data()</em> function
|
||||
like this:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">CC</span><span class="p">(</span><span class="n">LinearSVC</span><span class="p">())</span>
|
||||
|
||||
<span class="n">method_data</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">training_prevalence</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">,</span> <span class="mi">9</span><span class="p">):</span>
|
||||
<span class="n">training_size</span> <span class="o">=</span> <span class="mi">5000</span>
|
||||
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">training_prevalence</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">sample</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
|
||||
<span class="p">)</span>
|
||||
<span class="c1"># method names can contain Latex syntax</span>
|
||||
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">'CC$_{'</span> <span class="o">+</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span> <span class="o">*</span> <span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span> <span class="o">+</span> <span class="s1">'\%}$'</span>
|
||||
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
|
||||
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained</span>
|
||||
<span class="n">train_sample</span> <span class="o">=</span> <span class="n">train</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span><span class="o">-</span><span class="n">training_prevalence</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_sample</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
|
||||
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">'CC$_{'</span><span class="o">+</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span><span class="o">*</span><span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span> <span class="o">+</span> <span class="s1">'\%}$'</span>
|
||||
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">train_sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
|
||||
|
||||
<span class="k">return</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">method_data</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
|
|
@ -237,9 +237,9 @@ and a negative bias (or a tendency to underestimate) in cases of high prevalence
|
|||
<p><img alt="diag plot on IMDb" src="_images/bin_diag_cc.png" /></p>
|
||||
<p>showing pretty clearly the dependency of CC on the prior probabilities
|
||||
of the labeled set it was trained on.</p>
|
||||
</div>
|
||||
<div class="section" id="error-by-drift">
|
||||
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="error-by-drift">
|
||||
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this heading">¶</a></h2>
|
||||
<p>Above discussed plots are useful for analyzing and comparing
|
||||
the performance of different quantification methods, but are
|
||||
limited to the binary case. The “error by drift” is a plot
|
||||
|
|
@ -270,8 +270,8 @@ In those cases, however, it is likely that the variances of each
|
|||
method get higher, to the detriment of the visualization.
|
||||
We recommend to set <em>show_std=False</em> in those cases
|
||||
in order to hide the color bands.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -280,8 +280,9 @@ in order to hide the color bands.</p>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Plotting</a><ul>
|
||||
<li><a class="reference internal" href="#diagonal-plot">Diagonal Plot</a></li>
|
||||
<li><a class="reference internal" href="#quantification-bias">Quantification bias</a></li>
|
||||
|
|
@ -290,12 +291,17 @@ in order to hide the color bands.</p>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Methods.html"
|
||||
title="previous chapter">Quantification Methods</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Model-Selection.html"
|
||||
title="previous chapter">Model Selection</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
|
@ -312,7 +318,7 @@ in order to hide the color bands.</p>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
|
@ -330,15 +336,15 @@ in order to hide the color bands.</p>
|
|||
<a href="modules.html" title="quapy"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -30,7 +30,8 @@ Output the class prevalences (showing 2 digit precision):
|
|||
[0.17, 0.50, 0.33]
|
||||
```
|
||||
|
||||
One can easily produce new samples at desired class prevalences:
|
||||
One can easily produce new samples at desired class prevalence values:
|
||||
|
||||
```python
|
||||
sample_size = 10
|
||||
prev = [0.4, 0.1, 0.5]
|
||||
|
|
@ -63,32 +64,10 @@ for method in methods:
|
|||
...
|
||||
```
|
||||
|
||||
QuaPy also implements the artificial sampling protocol that produces (via a
|
||||
Python's generator) a series of _LabelledCollection_ objects with equidistant
|
||||
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:
|
||||
|
||||
```python
|
||||
for sample in data.artificial_sampling_generator(sample_size=100, n_prevalences=5):
|
||||
print(F.strprev(sample.prevalence(), prec=2))
|
||||
```
|
||||
|
||||
produces one sampling for each (valid) combination of prevalences originating from
|
||||
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
|
||||
that is:
|
||||
```
|
||||
[0.00, 0.00, 1.00]
|
||||
[0.00, 0.25, 0.75]
|
||||
[0.00, 0.50, 0.50]
|
||||
[0.00, 0.75, 0.25]
|
||||
[0.00, 1.00, 0.00]
|
||||
[0.25, 0.00, 0.75]
|
||||
...
|
||||
[1.00, 0.00, 0.00]
|
||||
```
|
||||
|
||||
See the [Evaluation wiki](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) for
|
||||
further details on how to use the artificial sampling protocol to properly
|
||||
evaluate a quantification method.
|
||||
However, generating samples for evaluation purposes is tackled in QuaPy
|
||||
by means of the evaluation protocols (see the dedicated entries in the Wiki
|
||||
for [evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) and
|
||||
[protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)).
|
||||
|
||||
|
||||
## Reviews Datasets
|
||||
|
|
@ -177,6 +156,8 @@ Some details can be found below:
|
|||
| sst | 3 | 2971 | 1271 | 376132 | [0.261, 0.452, 0.288] | [0.207, 0.481, 0.312] | sparse |
|
||||
| wa | 3 | 2184 | 936 | 248563 | [0.305, 0.414, 0.281] | [0.282, 0.446, 0.272] | sparse |
|
||||
| wb | 3 | 4259 | 1823 | 404333 | [0.270, 0.392, 0.337] | [0.274, 0.392, 0.335] | sparse |
|
||||
|
||||
|
||||
## UCI Machine Learning
|
||||
|
||||
A set of 32 datasets from the [UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets.php)
|
||||
|
|
@ -272,6 +253,46 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
|
|||
OS-dependent software manually. Information on how to do it will be printed the first
|
||||
time the dataset is invoked.
|
||||
|
||||
## LeQua Datasets
|
||||
|
||||
QuaPy also provides the datasets used for the LeQua competition.
|
||||
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
|
||||
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
|
||||
raw documents instead.
|
||||
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
|
||||
are multiclass quantification problems consisting of estimating the class prevalence
|
||||
values of 28 different merchandise products.
|
||||
|
||||
Every task consists of a training set, a set of validation samples (for model selection)
|
||||
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
|
||||
(training) and two generation protocols (for validation and test samples), as follows:
|
||||
|
||||
```python
|
||||
training, val_generator, test_generator = fetch_lequa2022(task=task)
|
||||
```
|
||||
|
||||
See the `lequa2022_experiments.py` in the examples folder for further details on how to
|
||||
carry out experiments using these datasets.
|
||||
|
||||
The datasets are downloaded only once, and stored for fast reuse.
|
||||
|
||||
Some statistics are summarized below:
|
||||
|
||||
| Dataset | classes | train size | validation samples | test samples | docs by sample | type |
|
||||
|---------|:-------:|:----------:|:------------------:|:------------:|:----------------:|:--------:|
|
||||
| T1A | 2 | 5000 | 1000 | 5000 | 250 | vector |
|
||||
| T1B | 28 | 20000 | 1000 | 5000 | 1000 | vector |
|
||||
| T2A | 2 | 5000 | 1000 | 5000 | 250 | text |
|
||||
| T2B | 28 | 20000 | 1000 | 5000 | 1000 | text |
|
||||
|
||||
For further details on the datasets, we refer to the original
|
||||
[paper](https://ceur-ws.org/Vol-3180/paper-146.pdf):
|
||||
|
||||
```
|
||||
Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
|
||||
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
|
||||
```
|
||||
|
||||
## Adding Custom Datasets
|
||||
|
||||
QuaPy provides data loaders for simple formats dealing with
|
||||
|
|
@ -312,12 +333,15 @@ e.g.:
|
|||
|
||||
```python
|
||||
import quapy as qp
|
||||
|
||||
train_path = '../my_data/train.dat'
|
||||
test_path = '../my_data/test.dat'
|
||||
|
||||
def my_custom_loader(path):
|
||||
with open(path, 'rb') as fin:
|
||||
...
|
||||
return instances, labels
|
||||
|
||||
data = qp.data.Dataset.load(train_path, test_path, my_custom_loader)
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -50,7 +50,7 @@ indicating the value for the smoothing parameter epsilon.
|
|||
Traditionally, this value is set to 1/(2T) in past literature,
|
||||
with T the sampling size. One could either pass this value
|
||||
to the function each time, or to set a QuaPy's environment
|
||||
variable _SAMPLE_SIZE_ once, and ommit this argument
|
||||
variable _SAMPLE_SIZE_ once, and omit this argument
|
||||
thereafter (recommended);
|
||||
e.g.:
|
||||
|
||||
|
|
@ -58,7 +58,7 @@ e.g.:
|
|||
qp.environ['SAMPLE_SIZE'] = 100 # once for all
|
||||
true_prev = np.asarray([0.5, 0.3, 0.2]) # let's assume 3 classes
|
||||
estim_prev = np.asarray([0.1, 0.3, 0.6])
|
||||
error = qp.ae_.mrae(true_prev, estim_prev)
|
||||
error = qp.error.mrae(true_prev, estim_prev)
|
||||
print(f'mrae({true_prev}, {estim_prev}) = {error:.3f}')
|
||||
```
|
||||
|
||||
|
|
@ -71,162 +71,99 @@ Finally, it is possible to instantiate QuaPy's quantification
|
|||
error functions from strings using, e.g.:
|
||||
|
||||
```python
|
||||
error_function = qp.ae_.from_name('mse')
|
||||
error_function = qp.error.from_name('mse')
|
||||
error = error_function(true_prev, estim_prev)
|
||||
```
|
||||
|
||||
## Evaluation Protocols
|
||||
|
||||
QuaPy implements the so-called "artificial sampling protocol",
|
||||
according to which a test set is used to generate samplings at
|
||||
desired prevalences of fixed size and covering the full spectrum
|
||||
of prevalences. This protocol is called "artificial" in contrast
|
||||
to the "natural prevalence sampling" protocol that,
|
||||
despite introducing some variability during sampling, approximately
|
||||
preserves the training class prevalence.
|
||||
|
||||
In the artificial sampling procol, the user specifies the number
|
||||
of (equally distant) points to be generated from the interval [0,1].
|
||||
|
||||
For example, if n_prevpoints=11 then, for each class, the prevalences
|
||||
[0., 0.1, 0.2, ..., 1.] will be used. This means that, for two classes,
|
||||
the number of different prevalences will be 11 (since, once the prevalence
|
||||
of one class is determined, the other one is constrained). For 3 classes,
|
||||
the number of valid combinations can be obtained as 11 + 10 + ... + 1 = 66.
|
||||
In general, the number of valid combinations that will be produced for a given
|
||||
value of n_prevpoints can be consulted by invoking
|
||||
quapy.functional.num_prevalence_combinations, e.g.:
|
||||
An _evaluation protocol_ is an evaluation procedure that uses
|
||||
one specific _sample generation procotol_ to genereate many
|
||||
samples, typically characterized by widely varying amounts of
|
||||
_shift_ with respect to the original distribution, that are then
|
||||
used to evaluate the performance of a (trained) quantifier.
|
||||
These protocols are explained in more detail in a dedicated [entry
|
||||
in the wiki](Protocols.md). For the moment being, let us assume we already have
|
||||
chosen and instantiated one specific such protocol, that we here
|
||||
simply call _prot_. Let also assume our model is called
|
||||
_quantifier_ and that our evaluatio measure of choice is
|
||||
_mae_. The evaluation comes down to:
|
||||
|
||||
```python
|
||||
import quapy.functional as F
|
||||
n_prevpoints = 21
|
||||
n_classes = 4
|
||||
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=prot, error_metric='mae')
|
||||
print(f'MAE = {mae:.4f}')
|
||||
```
|
||||
|
||||
in this example, n=1771. Note the last argument, n_repeats, that
|
||||
informs of the number of examples that will be generated for any
|
||||
valid combination (typical values are, e.g., 1 for a single sample,
|
||||
or 10 or higher for computing standard deviations of performing statistical
|
||||
significance tests).
|
||||
|
||||
One can instead work the other way around, i.e., one could set a
|
||||
maximum budged of evaluations and get the number of prevalence points that
|
||||
will generate a number of evaluations close, but not higher, than
|
||||
the fixed budget. This can be achieved with the function
|
||||
quapy.functional.get_nprevpoints_approximation, e.g.:
|
||||
It is often desirable to evaluate our system using more than one
|
||||
single evaluatio measure. In this case, it is convenient to generate
|
||||
a _report_. A report in QuaPy is a dataframe accounting for all the
|
||||
true prevalence values with their corresponding prevalence values
|
||||
as estimated by the quantifier, along with the error each has given
|
||||
rise.
|
||||
|
||||
```python
|
||||
budget = 5000
|
||||
n_prevpoints = F.get_nprevpoints_approximation(budget, n_classes, n_repeats=1)
|
||||
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
|
||||
print(f'by setting n_prevpoints={n_prevpoints} the number of evaluations for {n_classes} classes will be {n}')
|
||||
```
|
||||
that will print:
|
||||
```
|
||||
by setting n_prevpoints=30 the number of evaluations for 4 classes will be 4960
|
||||
report = qp.evaluation.evaluation_report(quantifier, protocol=prot, error_metrics=['mae', 'mrae', 'mkld'])
|
||||
```
|
||||
|
||||
The cost of evaluation will depend on the values of _n_prevpoints_, _n_classes_,
|
||||
and _n_repeats_. Since it might sometimes be cumbersome to control the overall
|
||||
cost of an experiment having to do with the number of combinations that
|
||||
will be generated for a particular setting of these arguments (particularly
|
||||
when _n_classes>2_), evaluation functions
|
||||
typically allow the user to rather specify an _evaluation budget_, i.e., a maximum
|
||||
number of samplings to generate. By specifying this argument, one could avoid
|
||||
specifying _n_prevpoints_, and the value for it that would lead to a closer
|
||||
number of evaluation budget, without surpassing it, will be automatically set.
|
||||
|
||||
The following script shows a full example in which a PACC model relying
|
||||
on a Logistic Regressor classifier is
|
||||
tested on the _kindle_ dataset by means of the artificial prevalence
|
||||
sampling protocol on samples of size 500, in terms of various
|
||||
evaluation metrics.
|
||||
|
||||
````python
|
||||
import quapy as qp
|
||||
import quapy.functional as F
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 500
|
||||
|
||||
dataset = qp.datasets.fetch_reviews('kindle')
|
||||
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
|
||||
|
||||
training = dataset.training
|
||||
test = dataset.test
|
||||
|
||||
lr = LogisticRegression()
|
||||
pacc = qp.method.aggregative.PACC(lr)
|
||||
|
||||
pacc.fit(training)
|
||||
|
||||
df = qp.evaluation.artificial_sampling_report(
|
||||
pacc, # the quantification method
|
||||
test, # the test set on which the method will be evaluated
|
||||
sample_size=qp.environ['SAMPLE_SIZE'], #indicates the size of samples to be drawn
|
||||
n_prevpoints=11, # how many prevalence points will be extracted from the interval [0, 1] for each category
|
||||
n_repetitions=1, # number of times each prevalence will be used to generate a test sample
|
||||
n_jobs=-1, # indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)
|
||||
random_seed=42, # setting a random seed allows to replicate the test samples across runs
|
||||
error_metrics=['mae', 'mrae', 'mkld'], # specify the evaluation metrics
|
||||
verbose=True # set to True to show some standard-line outputs
|
||||
)
|
||||
````
|
||||
|
||||
The resulting report is a pandas' dataframe that can be directly printed.
|
||||
Here, we set some display options from pandas just to make the output clearer;
|
||||
note also that the estimated prevalences are shown as strings using the
|
||||
function strprev function that simply converts a prevalence into a
|
||||
string representing it, with a fixed decimal precision (default 3):
|
||||
From a pandas' dataframe, it is straightforward to visualize all the results,
|
||||
and compute the averaged values, e.g.:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
pd.set_option('display.expand_frame_repr', False)
|
||||
pd.set_option("precision", 3)
|
||||
df['estim-prev'] = df['estim-prev'].map(F.strprev)
|
||||
print(df)
|
||||
report['estim-prev'] = report['estim-prev'].map(F.strprev)
|
||||
print(report)
|
||||
|
||||
print('Averaged values:')
|
||||
print(report.mean())
|
||||
```
|
||||
|
||||
The output should look like:
|
||||
This will produce an output like:
|
||||
|
||||
```
|
||||
true-prev estim-prev mae mrae mkld
|
||||
0 [0.0, 1.0] [0.000, 1.000] 0.000 0.000 0.000e+00
|
||||
1 [0.1, 0.9] [0.091, 0.909] 0.009 0.048 4.426e-04
|
||||
2 [0.2, 0.8] [0.163, 0.837] 0.037 0.114 4.633e-03
|
||||
3 [0.3, 0.7] [0.283, 0.717] 0.017 0.041 7.383e-04
|
||||
4 [0.4, 0.6] [0.366, 0.634] 0.034 0.070 2.412e-03
|
||||
5 [0.5, 0.5] [0.459, 0.541] 0.041 0.082 3.387e-03
|
||||
6 [0.6, 0.4] [0.565, 0.435] 0.035 0.073 2.535e-03
|
||||
7 [0.7, 0.3] [0.654, 0.346] 0.046 0.108 4.701e-03
|
||||
8 [0.8, 0.2] [0.725, 0.275] 0.075 0.235 1.515e-02
|
||||
9 [0.9, 0.1] [0.858, 0.142] 0.042 0.229 7.740e-03
|
||||
10 [1.0, 0.0] [0.945, 0.055] 0.055 27.357 5.219e-02
|
||||
true-prev estim-prev mae mrae mkld
|
||||
0 [0.308, 0.692] [0.314, 0.686] 0.005649 0.013182 0.000074
|
||||
1 [0.896, 0.104] [0.909, 0.091] 0.013145 0.069323 0.000985
|
||||
2 [0.848, 0.152] [0.809, 0.191] 0.039063 0.149806 0.005175
|
||||
3 [0.016, 0.984] [0.033, 0.967] 0.017236 0.487529 0.005298
|
||||
4 [0.728, 0.272] [0.751, 0.249] 0.022769 0.057146 0.001350
|
||||
... ... ... ... ... ...
|
||||
4995 [0.72, 0.28] [0.698, 0.302] 0.021752 0.053631 0.001133
|
||||
4996 [0.868, 0.132] [0.888, 0.112] 0.020490 0.088230 0.001985
|
||||
4997 [0.292, 0.708] [0.298, 0.702] 0.006149 0.014788 0.000090
|
||||
4998 [0.24, 0.76] [0.220, 0.780] 0.019950 0.054309 0.001127
|
||||
4999 [0.948, 0.052] [0.965, 0.035] 0.016941 0.165776 0.003538
|
||||
|
||||
[5000 rows x 5 columns]
|
||||
Averaged values:
|
||||
mae 0.023588
|
||||
mrae 0.108779
|
||||
mkld 0.003631
|
||||
dtype: float64
|
||||
|
||||
Process finished with exit code 0
|
||||
```
|
||||
|
||||
One can get the averaged scores using standard pandas'
|
||||
functions, i.e.:
|
||||
Alternatively, we can simply generate all the predictions by:
|
||||
|
||||
```python
|
||||
print(df.mean())
|
||||
true_prevs, estim_prevs = qp.evaluation.prediction(quantifier, protocol=prot)
|
||||
```
|
||||
|
||||
will produce the following output:
|
||||
|
||||
```
|
||||
true-prev 0.500
|
||||
mae 0.035
|
||||
mrae 2.578
|
||||
mkld 0.009
|
||||
dtype: float64
|
||||
```
|
||||
|
||||
Other evaluation functions include:
|
||||
|
||||
* _artificial_sampling_eval_: that computes the evaluation for a
|
||||
given evaluation metric, returning the average instead of a dataframe.
|
||||
* _artificial_sampling_prediction_: that returns two np.arrays containing the
|
||||
true prevalences and the estimated prevalences.
|
||||
|
||||
See the documentation for further details.
|
||||
All the evaluation functions implement specific optimizations for speeding-up
|
||||
the evaluation of aggregative quantifiers (i.e., of instances of _AggregativeQuantifier_).
|
||||
The optimization comes down to generating classification predictions (either crisp or soft)
|
||||
only once for the entire test set, and then applying the sampling procedure to the
|
||||
predictions, instead of generating samples of instances and then computing the
|
||||
classification predictions every time. This is only possible when the protocol
|
||||
is an instance of _OnLabelledCollectionProtocol_. The optimization is only
|
||||
carried out when the number of classification predictions thus generated would be
|
||||
smaller than the number of predictions required for the entire protocol; e.g.,
|
||||
if the original dataset contains 1M instances, but the protocol is such that it would
|
||||
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
|
||||
classification for each sample. This behaviour is indicated by setting
|
||||
_aggr_speedup="auto"_. Conversely, when indicating _aggr_speedup="force"_ QuaPy will
|
||||
precompute all the predictions irrespectively of the number of instances and number of samples.
|
||||
Finally, this can be deactivated by setting _aggr_speedup=False_. Note that this optimization
|
||||
is not only applied for the final evaluation, but also for the internal evaluations carried
|
||||
out during _model selection_. Since these are typically many, the heuristic can help reduce the
|
||||
execution time a lot.
|
||||
|
|
@ -16,12 +16,6 @@ and implement some abstract methods:
|
|||
|
||||
@abstractmethod
|
||||
def quantify(self, instances): ...
|
||||
|
||||
@abstractmethod
|
||||
def set_params(self, **parameters): ...
|
||||
|
||||
@abstractmethod
|
||||
def get_params(self, deep=True): ...
|
||||
```
|
||||
The meaning of those functions should be familiar to those
|
||||
used to work with scikit-learn since the class structure of QuaPy
|
||||
|
|
@ -32,10 +26,10 @@ scikit-learn' structure has not been adopted _as is_ in QuaPy responds to
|
|||
the fact that scikit-learn's _predict_ function is expected to return
|
||||
one output for each input element --e.g., a predicted label for each
|
||||
instance in a sample-- while in quantification the output for a sample
|
||||
is one single array of class prevalences), while functions _set_params_
|
||||
and _get_params_ allow a
|
||||
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
|
||||
to automate the process of hyperparameter search.
|
||||
is one single array of class prevalences).
|
||||
Quantifiers also extend from scikit-learn's `BaseEstimator`, in order
|
||||
to simplify the use of _set_params_ and _get_params_ used in
|
||||
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection).
|
||||
|
||||
## Aggregative Methods
|
||||
|
||||
|
|
@ -58,11 +52,11 @@ of _BaseQuantifier.quantify_ is already provided, which looks like:
|
|||
|
||||
```python
|
||||
def quantify(self, instances):
|
||||
classif_predictions = self.preclassify(instances)
|
||||
classif_predictions = self.classify(instances)
|
||||
return self.aggregate(classif_predictions)
|
||||
```
|
||||
Aggregative quantifiers are expected to maintain a classifier (which is
|
||||
accessed through the _@property_ _learner_). This classifier is
|
||||
accessed through the _@property_ _classifier_). This classifier is
|
||||
given as input to the quantifier, and can be already fit
|
||||
on external data (in which case, the _fit_learner_ argument should
|
||||
be set to False), or be fit by the quantifier's fit (default).
|
||||
|
|
@ -73,13 +67,8 @@ _AggregativeProbabilisticQuantifier(AggregativeQuantifier)_.
|
|||
The particularity of _probabilistic_ aggregative methods (w.r.t.
|
||||
non-probabilistic ones), is that the default quantifier is defined
|
||||
in terms of the posterior probabilities returned by a probabilistic
|
||||
classifier, and not by the crisp decisions of a hard classifier; i.e.:
|
||||
|
||||
```python
|
||||
def quantify(self, instances):
|
||||
classif_posteriors = self.posterior_probabilities(instances)
|
||||
return self.aggregate(classif_posteriors)
|
||||
```
|
||||
classifier, and not by the crisp decisions of a hard classifier.
|
||||
In any case, the interface _classify(instances)_ remains unchanged.
|
||||
|
||||
One advantage of _aggregative_ methods (either probabilistic or not)
|
||||
is that the evaluation according to any sampling procedure (e.g.,
|
||||
|
|
@ -110,9 +99,7 @@ import quapy as qp
|
|||
import quapy.functional as F
|
||||
from sklearn.svm import LinearSVC
|
||||
|
||||
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
||||
training = dataset.training
|
||||
test = dataset.test
|
||||
training, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
|
||||
|
||||
# instantiate a classifier learner, in this case a SVM
|
||||
svm = LinearSVC()
|
||||
|
|
@ -156,11 +143,12 @@ model.fit(training, val_split=5)
|
|||
```
|
||||
|
||||
The following code illustrates the case in which PCC is used:
|
||||
|
||||
```python
|
||||
model = qp.method.aggregative.PCC(svm)
|
||||
model.fit(training)
|
||||
estim_prevalence = model.quantify(test.instances)
|
||||
print('classifier:', model.learner)
|
||||
print('classifier:', model.classifier)
|
||||
```
|
||||
In this case, QuaPy will print:
|
||||
```
|
||||
|
|
@ -211,14 +199,22 @@ model.fit(dataset.training)
|
|||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
_New in v0.1.7_: EMQ now accepts two new parameters in the construction method, namely
|
||||
_exact_train_prev_ which allows to use the true training prevalence as the departing
|
||||
prevalence estimation (default behaviour), or instead an approximation of it as
|
||||
suggested by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html)
|
||||
(by setting _exact_train_prev=False_).
|
||||
The other parameter is _recalib_ which allows to indicate a calibration method, among those
|
||||
proposed by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html),
|
||||
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
|
||||
See the API documentation for further details.
|
||||
|
||||
|
||||
### Hellinger Distance y (HDy)
|
||||
|
||||
The method HDy is described in:
|
||||
|
||||
_Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164._
|
||||
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
[González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
|
||||
|
||||
It is implemented in _qp.method.aggregative.HDy_ (also accessible
|
||||
through the allias _qp.method.aggregative.HellingerDistanceY_).
|
||||
|
|
@ -249,30 +245,51 @@ model.fit(dataset.training)
|
|||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
_New in v0.1.7:_ QuaPy now provides an implementation of the generalized
|
||||
"Distribution Matching" approaches for multiclass, inspired by the framework
|
||||
of [Firat (2016)](https://arxiv.org/abs/1606.00868). One can instantiate
|
||||
a variant of HDy for multiclass quantification as follows:
|
||||
|
||||
```python
|
||||
mutliclassHDy = qp.method.aggregative.DistributionMatching(classifier=LogisticRegression(), divergence='HD', cdf=False)
|
||||
```
|
||||
|
||||
_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
|
||||
framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
|
||||
and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
|
||||
(thanks to _Pablo González_ for the contributions!)
|
||||
|
||||
### Threshold Optimization methods
|
||||
|
||||
_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
|
||||
see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423)
|
||||
and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
|
||||
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
|
||||
|
||||
### Explicit Loss Minimization
|
||||
|
||||
The Explicit Loss Minimization (ELM) represent a family of methods
|
||||
based on structured output learning, i.e., quantifiers relying on
|
||||
classifiers that have been optimized targeting a
|
||||
quantification-oriented evaluation measure.
|
||||
The original methods are implemented in QuaPy as classify & count (CC)
|
||||
quantifiers that use Joachim's [SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
|
||||
as the underlying classifier, properly set to optimize for the desired loss.
|
||||
|
||||
In QuaPy, the following methods, all relying on Joachim's
|
||||
[SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
|
||||
implementation, are available in _qp.method.aggregative_:
|
||||
In QuaPy, this can be more achieved by calling the functions:
|
||||
|
||||
* SVMQ (SVM-Q) is a quantification method optimizing the metric _Q_ defined
|
||||
in _Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604._
|
||||
* SVMKLD (SVM for Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
|
||||
* _newSVMQ_: returns the quantification method called SVM(Q) that optimizes for the metric _Q_ defined
|
||||
in [_Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604._](https://www.sciencedirect.com/science/article/pii/S003132031400291X)
|
||||
* _newSVMKLD_ and _newSVMNKLD_: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
|
||||
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in [_Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
|
||||
* SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
|
||||
* SVMAE (SVM for Mean Absolute Error)
|
||||
* SVMRAE (SVM for Mean Relative Absolute Error)
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._](https://dl.acm.org/doi/abs/10.1145/2700406)
|
||||
* _newSVMAE_ and _newSVMRAE_: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
|
||||
(Mean) Relative Absolute Error, as first used by
|
||||
[_Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23._](https://arxiv.org/abs/2011.02552)
|
||||
|
||||
the last two methods (SVMAE and SVMRAE) have been implemented in
|
||||
the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
|
||||
QuaPy in order to make available ELM variants for what nowadays
|
||||
are considered the most well-behaved evaluation metrics in quantification.
|
||||
|
||||
|
|
@ -306,13 +323,18 @@ currently supports only binary classification.
|
|||
ELM variants (any binary quantifier in general) can be extended
|
||||
to operate in single-label scenarios trivially by adopting a
|
||||
"one-vs-all" strategy (as, e.g., in
|
||||
_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22_).
|
||||
In QuaPy this is possible by using the _OneVsAll_ class:
|
||||
[_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22_](https://link.springer.com/article/10.1007/s13278-016-0327-z)).
|
||||
In QuaPy this is possible by using the _OneVsAll_ class.
|
||||
|
||||
There are two ways for instantiating this class, _OneVsAllGeneric_ that works for
|
||||
any quantifier, and _OneVsAllAggregative_ that is optimized for aggregative quantifiers.
|
||||
In general, you can simply use the _getOneVsAll_ function and QuaPy will choose
|
||||
the more convenient of the two.
|
||||
|
||||
```python
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import SVMQ, OneVsAll
|
||||
from quapy.method.aggregative import SVMQ
|
||||
|
||||
# load a single-label dataset (this one contains 3 classes)
|
||||
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
||||
|
|
@ -320,11 +342,14 @@ dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
|||
# let qp know where svmperf is
|
||||
qp.environ['SVMPERF_HOME'] = '../svm_perf_quantification'
|
||||
|
||||
model = OneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
|
||||
model = getOneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
Check the examples _[explicit_loss_minimization.py](..%2Fexamples%2Fexplicit_loss_minimization.py)_
|
||||
and [one_vs_all.py](..%2Fexamples%2Fone_vs_all.py) for more details.
|
||||
|
||||
## Meta Models
|
||||
|
||||
By _meta_ models we mean quantification methods that are defined on top of other
|
||||
|
|
@ -337,12 +362,12 @@ _Meta_ models are implemented in the _qp.method.meta_ module.
|
|||
|
||||
QuaPy implements (some of) the variants proposed in:
|
||||
|
||||
* _Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
* [_Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
|
||||
Information Fusion, 34, 87-100._
|
||||
* _Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Information Fusion, 34, 87-100._](https://www.sciencedirect.com/science/article/pii/S1566253516300628)
|
||||
* [_Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Dynamic ensemble selection for quantification tasks.
|
||||
Information Fusion, 45, 1-15._
|
||||
Information Fusion, 45, 1-15._](https://www.sciencedirect.com/science/article/pii/S1566253517303652)
|
||||
|
||||
The following code shows how to instantiate an Ensemble of 30 _Adjusted Classify & Count_ (ACC)
|
||||
quantifiers operating with a _Logistic Regressor_ (LR) as the base classifier, and using the
|
||||
|
|
@ -378,10 +403,10 @@ wiki if you want to optimize the hyperparameters of ensemble for classification
|
|||
|
||||
QuaPy offers an implementation of QuaNet, a deep learning model presented in:
|
||||
|
||||
_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
[_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
A recurrent neural network for sentiment quantification.
|
||||
In Proceedings of the 27th ACM International Conference on
|
||||
Information and Knowledge Management (pp. 1775-1778)._
|
||||
Information and Knowledge Management (pp. 1775-1778)._](https://dl.acm.org/doi/abs/10.1145/3269206.3269287)
|
||||
|
||||
This model requires _torch_ to be installed.
|
||||
QuaNet also requires a classifier that can provide embedded representations
|
||||
|
|
@ -406,7 +431,8 @@ cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
|
|||
learner = NeuralClassifierTrainer(cnn, device='cuda')
|
||||
|
||||
# train QuaNet
|
||||
model = QuaNet(learner, qp.environ['SAMPLE_SIZE'], device='cuda')
|
||||
model = QuaNet(learner, device='cuda')
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -22,9 +22,9 @@ Quantification has long been regarded as an add-on of
|
|||
classification, and thus the model selection strategies
|
||||
customarily adopted in classification have simply been
|
||||
applied to quantification (see the next section).
|
||||
It has been argued in _Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
"Re-Assessing the" Classify and Count" Quantification Method."
|
||||
arXiv preprint arXiv:2011.02552 (2020)._
|
||||
It has been argued in [Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
Re-Assessing the "Classify and Count" Quantification Method.
|
||||
ECIR 2021: Advances in Information Retrieval pp 75–91.](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6)
|
||||
that specific model selection strategies should
|
||||
be adopted for quantification. That is, model selection
|
||||
strategies for quantification should target
|
||||
|
|
@ -32,76 +32,86 @@ quantification-oriented losses and be tested in a variety
|
|||
of scenarios exhibiting different degrees of prior
|
||||
probability shift.
|
||||
|
||||
The class
|
||||
_qp.model_selection.GridSearchQ_
|
||||
implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that evaluates each
|
||||
combination of hyper-parameters
|
||||
by means of a given quantification-oriented
|
||||
The class _qp.model_selection.GridSearchQ_ implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that [evaluates](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
each combination of hyper-parameters by means of a given quantification-oriented
|
||||
error metric (e.g., any of the error functions implemented
|
||||
in _qp.error_) and according to the
|
||||
[_artificial sampling protocol_](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation).
|
||||
in _qp.error_) and according to a
|
||||
[sampling generation protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols).
|
||||
|
||||
The following is an example of model selection for quantification:
|
||||
The following is an example (also included in the examples folder) of model selection for quantification:
|
||||
|
||||
```python
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import PCC
|
||||
from quapy.protocol import APP
|
||||
from quapy.method.aggregative import DistributionMatching
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import numpy as np
|
||||
|
||||
# set a seed to replicate runs
|
||||
np.random.seed(0)
|
||||
qp.environ['SAMPLE_SIZE'] = 500
|
||||
"""
|
||||
In this example, we show how to perform model selection on a DistributionMatching quantifier.
|
||||
"""
|
||||
|
||||
dataset = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=5)
|
||||
model = DistributionMatching(LogisticRegression())
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
qp.environ['N_JOBS'] = -1 # explore hyper-parameters in parallel
|
||||
|
||||
training, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
||||
|
||||
# The model will be returned by the fit method of GridSearchQ.
|
||||
# Model selection will be performed with a fixed budget of 1000 evaluations
|
||||
# for each hyper-parameter combination. The error to optimize is the MAE for
|
||||
# quantification, as evaluated on artificially drawn samples at prevalences
|
||||
# covering the entire spectrum on a held-out portion (40%) of the training set.
|
||||
# Every combination of hyper-parameters will be evaluated by confronting the
|
||||
# quantifier thus configured against a series of samples generated by means
|
||||
# of a sample generation protocol. For this example, we will use the
|
||||
# artificial-prevalence protocol (APP), that generates samples with prevalence
|
||||
# values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).
|
||||
# We devote 30% of the dataset for this exploration.
|
||||
training, validation = training.split_stratified(train_prop=0.7)
|
||||
protocol = APP(validation)
|
||||
|
||||
# We will explore a classification-dependent hyper-parameter (e.g., the 'C'
|
||||
# hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter
|
||||
# (e.g., the number of bins in a DistributionMatching quantifier.
|
||||
# Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"
|
||||
# in order to let the quantifier know this hyper-parameter belongs to its underlying
|
||||
# classifier.
|
||||
param_grid = {
|
||||
'classifier__C': np.logspace(-3,3,7),
|
||||
'nbins': [8, 16, 32, 64],
|
||||
}
|
||||
|
||||
model = qp.model_selection.GridSearchQ(
|
||||
model=PCC(LogisticRegression()),
|
||||
param_grid={'C': np.logspace(-4,5,10), 'class_weight': ['balanced', None]},
|
||||
sample_size=qp.environ['SAMPLE_SIZE'],
|
||||
eval_budget=1000,
|
||||
error='mae',
|
||||
refit=True, # retrain on the whole labelled set
|
||||
val_split=0.4,
|
||||
model=model,
|
||||
param_grid=param_grid,
|
||||
protocol=protocol,
|
||||
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
|
||||
refit=True, # retrain on the whole labelled set once done
|
||||
verbose=True # show information as the process goes on
|
||||
).fit(dataset.training)
|
||||
).fit(training)
|
||||
|
||||
print(f'model selection ended: best hyper-parameters={model.best_params_}')
|
||||
model = model.best_model_
|
||||
|
||||
# evaluation in terms of MAE
|
||||
results = qp.evaluation.artificial_sampling_eval(
|
||||
model,
|
||||
dataset.test,
|
||||
sample_size=qp.environ['SAMPLE_SIZE'],
|
||||
n_prevpoints=101,
|
||||
n_repetitions=10,
|
||||
error_metric='mae'
|
||||
)
|
||||
# we use the same evaluation protocol (APP) on the test set
|
||||
mae_score = qp.evaluation.evaluate(model, protocol=APP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE={results:.5f}')
|
||||
print(f'MAE={mae_score:.5f}')
|
||||
```
|
||||
|
||||
In this example, the system outputs:
|
||||
```
|
||||
[GridSearchQ]: starting optimization with n_jobs=1
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': 'balanced'} got mae score 0.24987
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': None} got mae score 0.48135
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.001, 'class_weight': 'balanced'} got mae score 0.24866
|
||||
[GridSearchQ]: starting model selection with self.n_jobs =-1
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 64} got mae score 0.04021 [took 1.1356s]
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 32} got mae score 0.04286 [took 1.2139s]
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 16} got mae score 0.04888 [took 1.2491s]
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.001, 'nbins': 8} got mae score 0.05163 [took 1.5372s]
|
||||
[...]
|
||||
[GridSearchQ]: checking hyperparams={'C': 100000.0, 'class_weight': None} got mae score 0.43676
|
||||
[GridSearchQ]: optimization finished: best params {'C': 0.1, 'class_weight': 'balanced'} (score=0.19982)
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 1000.0, 'nbins': 32} got mae score 0.02445 [took 2.9056s]
|
||||
[GridSearchQ]: optimization finished: best params {'classifier__C': 100.0, 'nbins': 32} (score=0.02234) [took 7.3114s]
|
||||
[GridSearchQ]: refitting on the whole development set
|
||||
model selection ended: best hyper-parameters={'C': 0.1, 'class_weight': 'balanced'}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5005.54it/s]
|
||||
MAE=0.20342
|
||||
model selection ended: best hyper-parameters={'classifier__C': 100.0, 'nbins': 32}
|
||||
MAE=0.03102
|
||||
```
|
||||
|
||||
The parameter _val_split_ can alternatively be used to indicate
|
||||
|
|
@ -121,39 +131,20 @@ quantification literature have opted for this strategy.
|
|||
|
||||
In QuaPy, this is achieved by simply instantiating the
|
||||
classifier learner as a GridSearchCV from scikit-learn.
|
||||
The following code illustrates how to do that:
|
||||
The following code illustrates how to do that:
|
||||
|
||||
```python
|
||||
learner = GridSearchCV(
|
||||
LogisticRegression(),
|
||||
param_grid={'C': np.logspace(-4, 5, 10), 'class_weight': ['balanced', None]},
|
||||
cv=5)
|
||||
model = PCC(learner).fit(dataset.training)
|
||||
print(f'model selection ended: best hyper-parameters={model.learner.best_params_}')
|
||||
model = DistributionMatching(learner).fit(dataset.training)
|
||||
```
|
||||
|
||||
In this example, the system outputs:
|
||||
```
|
||||
model selection ended: best hyper-parameters={'C': 10000.0, 'class_weight': None}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5379.55it/s]
|
||||
MAE=0.41734
|
||||
```
|
||||
|
||||
Note that the MAE is worse than the one we obtained when optimizing
|
||||
for quantification and, indeed, the hyper-parameters found optimal
|
||||
largely differ between the two selection modalities. The
|
||||
hyper-parameters C=10000 and class_weight=None have been found
|
||||
to work well for the specific training prevalence of the HP dataset,
|
||||
but these hyper-parameters turned out to be suboptimal when the
|
||||
class prevalences of the test set differs (as is indeed tested
|
||||
in scenarios of quantification).
|
||||
|
||||
This is, however, not always the case, and one could, in practice,
|
||||
find examples
|
||||
in which optimizing for classification ends up resulting in a better
|
||||
quantifier than when optimizing for quantification.
|
||||
Nonetheless, this is theoretically unlikely to happen.
|
||||
However, this is conceptually flawed, since the model should be
|
||||
optimized for the task at hand (quantification), and not for a surrogate task (classification),
|
||||
i.e., the model should be requested to deliver low quantification errors, rather
|
||||
than low classification errors.
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -43,7 +43,7 @@ quantification methods across different scenarios showcasing
|
|||
the accuracy of the quantifier in predicting class prevalences
|
||||
for a wide range of prior distributions. This can easily be
|
||||
achieved by means of the
|
||||
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
|
||||
that is implemented in QuaPy.
|
||||
|
||||
The following code shows how to perform one simple experiment
|
||||
|
|
@ -55,6 +55,7 @@ generating 100 random samples at each prevalence).
|
|||
|
||||
```python
|
||||
import quapy as qp
|
||||
from protocol import APP
|
||||
from quapy.method.aggregative import CC, ACC, PCC, PACC
|
||||
from sklearn.svm import LinearSVC
|
||||
|
||||
|
|
@ -63,28 +64,26 @@ qp.environ['SAMPLE_SIZE'] = 500
|
|||
def gen_data():
|
||||
|
||||
def base_classifier():
|
||||
return LinearSVC()
|
||||
return LinearSVC(class_weight='balanced')
|
||||
|
||||
def models():
|
||||
yield CC(base_classifier())
|
||||
yield ACC(base_classifier())
|
||||
yield PCC(base_classifier())
|
||||
yield PACC(base_classifier())
|
||||
yield 'CC', CC(base_classifier())
|
||||
yield 'ACC', ACC(base_classifier())
|
||||
yield 'PCC', PCC(base_classifier())
|
||||
yield 'PACC', PACC(base_classifier())
|
||||
|
||||
data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)
|
||||
train, test = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5).train_test
|
||||
|
||||
method_names, true_prevs, estim_prevs, tr_prevs = [], [], [], []
|
||||
|
||||
for model in models():
|
||||
model.fit(data.training)
|
||||
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
|
||||
model, data.test, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
|
||||
)
|
||||
for method_name, model in models():
|
||||
model.fit(train)
|
||||
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
|
||||
|
||||
method_names.append(model.__class__.__name__)
|
||||
method_names.append(method_name)
|
||||
true_prevs.append(true_prev)
|
||||
estim_prevs.append(estim_prev)
|
||||
tr_prevs.append(data.training.prevalence())
|
||||
tr_prevs.append(train.prevalence())
|
||||
|
||||
return method_names, true_prevs, estim_prevs, tr_prevs
|
||||
|
||||
|
|
@ -163,21 +162,19 @@ like this:
|
|||
|
||||
```python
|
||||
def gen_data():
|
||||
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5)
|
||||
|
||||
train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
||||
model = CC(LinearSVC())
|
||||
|
||||
method_data = []
|
||||
for training_prevalence in np.linspace(0.1, 0.9, 9):
|
||||
training_size = 5000
|
||||
# since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)
|
||||
training = data.training.sampling(training_size, 1 - training_prevalence)
|
||||
model.fit(training)
|
||||
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
|
||||
model, data.sample, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
|
||||
)
|
||||
# method names can contain Latex syntax
|
||||
method_name = 'CC$_{' + f'{int(100 * training_prevalence)}' + '\%}$'
|
||||
method_data.append((method_name, true_prev, estim_prev, training.prevalence()))
|
||||
# since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained
|
||||
train_sample = train.sampling(training_size, 1-training_prevalence)
|
||||
model.fit(train_sample)
|
||||
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
|
||||
method_name = 'CC$_{'+f'{int(100*training_prevalence)}' + '\%}$'
|
||||
method_data.append((method_name, true_prev, estim_prev, train_sample.prevalence()))
|
||||
|
||||
return zip(*method_data)
|
||||
```
|
||||
|
|
|
|||
|
|
@ -64,6 +64,7 @@ Features
|
|||
* 32 UCI Machine Learning datasets.
|
||||
* 11 Twitter Sentiment datasets.
|
||||
* 3 Reviews Sentiment datasets.
|
||||
* 4 tasks from LeQua competition (_new in v0.1.7!_)
|
||||
* Native supports for binary and single-label scenarios of quantification.
|
||||
* Model selection functionality targeting quantification-oriented losses.
|
||||
* Visualization tools for analysing results.
|
||||
|
|
@ -75,8 +76,9 @@ Features
|
|||
Installation
|
||||
Datasets
|
||||
Evaluation
|
||||
Protocols
|
||||
Methods
|
||||
Model Selection
|
||||
Model-Selection
|
||||
Plotting
|
||||
API Developers documentation<modules>
|
||||
|
||||
|
|
|
|||
|
|
@ -1,27 +1,38 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy.classification package
|
||||
============================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.classification.methods module
|
||||
-----------------------------------
|
||||
quapy.classification.calibration
|
||||
--------------------------------
|
||||
|
||||
.. versionadded:: 0.1.7
|
||||
.. automodule:: quapy.classification.calibration
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.classification.methods
|
||||
----------------------------
|
||||
|
||||
.. automodule:: quapy.classification.methods
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.classification.neural module
|
||||
----------------------------------
|
||||
quapy.classification.neural
|
||||
---------------------------
|
||||
|
||||
.. automodule:: quapy.classification.neural
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.classification.svmperf module
|
||||
-----------------------------------
|
||||
quapy.classification.svmperf
|
||||
----------------------------
|
||||
|
||||
.. automodule:: quapy.classification.svmperf
|
||||
:members:
|
||||
|
|
|
|||
|
|
@ -1,35 +1,37 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy.data package
|
||||
==================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.data.base module
|
||||
----------------------
|
||||
quapy.data.base
|
||||
---------------
|
||||
|
||||
.. automodule:: quapy.data.base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.data.datasets module
|
||||
--------------------------
|
||||
quapy.data.datasets
|
||||
-------------------
|
||||
|
||||
.. automodule:: quapy.data.datasets
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.data.preprocessing module
|
||||
-------------------------------
|
||||
quapy.data.preprocessing
|
||||
------------------------
|
||||
|
||||
.. automodule:: quapy.data.preprocessing
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.data.reader module
|
||||
------------------------
|
||||
quapy.data.reader
|
||||
-----------------
|
||||
|
||||
.. automodule:: quapy.data.reader
|
||||
:members:
|
||||
|
|
|
|||
|
|
@ -1,43 +1,45 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy.method package
|
||||
====================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.method.aggregative module
|
||||
-------------------------------
|
||||
quapy.method.aggregative
|
||||
------------------------
|
||||
|
||||
.. automodule:: quapy.method.aggregative
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.base module
|
||||
------------------------
|
||||
quapy.method.base
|
||||
-----------------
|
||||
|
||||
.. automodule:: quapy.method.base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.meta module
|
||||
------------------------
|
||||
quapy.method.meta
|
||||
-----------------
|
||||
|
||||
.. automodule:: quapy.method.meta
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.neural module
|
||||
--------------------------
|
||||
quapy.method.neural
|
||||
-------------------
|
||||
|
||||
.. automodule:: quapy.method.neural
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.non\_aggregative module
|
||||
------------------------------------
|
||||
quapy.method.non\_aggregative
|
||||
-----------------------------
|
||||
|
||||
.. automodule:: quapy.method.non_aggregative
|
||||
:members:
|
||||
|
|
|
|||
|
|
@ -1,68 +1,79 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy package
|
||||
=============
|
||||
|
||||
Subpackages
|
||||
-----------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
quapy.classification
|
||||
quapy.data
|
||||
quapy.method
|
||||
quapy.tests
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.error module
|
||||
------------------
|
||||
quapy.error
|
||||
-----------
|
||||
|
||||
.. automodule:: quapy.error
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.evaluation module
|
||||
-----------------------
|
||||
quapy.evaluation
|
||||
----------------
|
||||
|
||||
.. automodule:: quapy.evaluation
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.functional module
|
||||
-----------------------
|
||||
quapy.protocol
|
||||
--------------
|
||||
|
||||
.. versionadded:: 0.1.7
|
||||
.. automodule:: quapy.protocol
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.functional
|
||||
----------------
|
||||
|
||||
.. automodule:: quapy.functional
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.model\_selection module
|
||||
-----------------------------
|
||||
quapy.model\_selection
|
||||
----------------------
|
||||
|
||||
.. automodule:: quapy.model_selection
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.plot module
|
||||
-----------------
|
||||
quapy.plot
|
||||
----------
|
||||
|
||||
.. automodule:: quapy.plot
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.util module
|
||||
-----------------
|
||||
quapy.util
|
||||
----------
|
||||
|
||||
.. automodule:: quapy.util
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Subpackages
|
||||
-----------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
quapy.classification
|
||||
quapy.data
|
||||
quapy.method
|
||||
|
||||
|
||||
Module contents
|
||||
---------------
|
||||
|
||||
|
|
@ -70,3 +81,4 @@ Module contents
|
|||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
|
|
|
|||
|
|
@ -1,37 +0,0 @@
|
|||
quapy.tests package
|
||||
===================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.tests.test\_base module
|
||||
-----------------------------
|
||||
|
||||
.. automodule:: quapy.tests.test_base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.tests.test\_datasets module
|
||||
---------------------------------
|
||||
|
||||
.. automodule:: quapy.tests.test_datasets
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.tests.test\_methods module
|
||||
--------------------------------
|
||||
|
||||
.. automodule:: quapy.tests.test_methods
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Module contents
|
||||
---------------
|
||||
|
||||
.. automodule:: quapy.tests
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
|
@ -1,7 +0,0 @@
|
|||
Getting Started
|
||||
===============
|
||||
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python.
|
||||
|
||||
Installation
|
||||
------------
|
||||
>>> pip install quapy
|
||||
|
|
@ -1 +0,0 @@
|
|||
.. include:: ../../README.md
|
||||
|
|
@ -1,701 +0,0 @@
|
|||
@import url("basic.css");
|
||||
|
||||
/* -- page layout ----------------------------------------------------------- */
|
||||
|
||||
body {
|
||||
font-family: Georgia, serif;
|
||||
font-size: 17px;
|
||||
background-color: #fff;
|
||||
color: #000;
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
|
||||
div.document {
|
||||
width: 940px;
|
||||
margin: 30px auto 0 auto;
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
float: left;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin: 0 0 0 220px;
|
||||
}
|
||||
|
||||
div.sphinxsidebar {
|
||||
width: 220px;
|
||||
font-size: 14px;
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
hr {
|
||||
border: 1px solid #B1B4B6;
|
||||
}
|
||||
|
||||
div.body {
|
||||
background-color: #fff;
|
||||
color: #3E4349;
|
||||
padding: 0 30px 0 30px;
|
||||
}
|
||||
|
||||
div.body > .section {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
div.footer {
|
||||
width: 940px;
|
||||
margin: 20px auto 30px auto;
|
||||
font-size: 14px;
|
||||
color: #888;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
div.footer a {
|
||||
color: #888;
|
||||
}
|
||||
|
||||
p.caption {
|
||||
font-family: inherit;
|
||||
font-size: inherit;
|
||||
}
|
||||
|
||||
|
||||
div.relations {
|
||||
display: none;
|
||||
}
|
||||
|
||||
|
||||
div.sphinxsidebar a {
|
||||
color: #444;
|
||||
text-decoration: none;
|
||||
border-bottom: 1px dotted #999;
|
||||
}
|
||||
|
||||
div.sphinxsidebar a:hover {
|
||||
border-bottom: 1px solid #999;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper {
|
||||
padding: 18px 10px;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper p.logo {
|
||||
padding: 0;
|
||||
margin: -10px 0 0 0px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper h1.logo {
|
||||
margin-top: -10px;
|
||||
text-align: center;
|
||||
margin-bottom: 5px;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper h1.logo-name {
|
||||
margin-top: 0px;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper p.blurb {
|
||||
margin-top: 0;
|
||||
font-style: normal;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3,
|
||||
div.sphinxsidebar h4 {
|
||||
font-family: Georgia, serif;
|
||||
color: #444;
|
||||
font-size: 24px;
|
||||
font-weight: normal;
|
||||
margin: 0 0 5px 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h4 {
|
||||
font-size: 20px;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3 a {
|
||||
color: #444;
|
||||
}
|
||||
|
||||
div.sphinxsidebar p.logo a,
|
||||
div.sphinxsidebar h3 a,
|
||||
div.sphinxsidebar p.logo a:hover,
|
||||
div.sphinxsidebar h3 a:hover {
|
||||
border: none;
|
||||
}
|
||||
|
||||
div.sphinxsidebar p {
|
||||
color: #555;
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul {
|
||||
margin: 10px 0;
|
||||
padding: 0;
|
||||
color: #000;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul li.toctree-l1 > a {
|
||||
font-size: 120%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul li.toctree-l2 > a {
|
||||
font-size: 110%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar input {
|
||||
border: 1px solid #CCC;
|
||||
font-family: Georgia, serif;
|
||||
font-size: 1em;
|
||||
}
|
||||
|
||||
div.sphinxsidebar hr {
|
||||
border: none;
|
||||
height: 1px;
|
||||
color: #AAA;
|
||||
background: #AAA;
|
||||
|
||||
text-align: left;
|
||||
margin-left: 0;
|
||||
width: 50%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar .badge {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
div.sphinxsidebar .badge:hover {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
/* To address an issue with donation coming after search */
|
||||
div.sphinxsidebar h3.donation {
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
/* -- body styles ----------------------------------------------------------- */
|
||||
|
||||
a {
|
||||
color: #004B6B;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
color: #6D4100;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
div.body h1,
|
||||
div.body h2,
|
||||
div.body h3,
|
||||
div.body h4,
|
||||
div.body h5,
|
||||
div.body h6 {
|
||||
font-family: Georgia, serif;
|
||||
font-weight: normal;
|
||||
margin: 30px 0px 10px 0px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; }
|
||||
div.body h2 { font-size: 180%; }
|
||||
div.body h3 { font-size: 150%; }
|
||||
div.body h4 { font-size: 130%; }
|
||||
div.body h5 { font-size: 100%; }
|
||||
div.body h6 { font-size: 100%; }
|
||||
|
||||
a.headerlink {
|
||||
color: #DDD;
|
||||
padding: 0 4px;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
a.headerlink:hover {
|
||||
color: #444;
|
||||
background: #EAEAEA;
|
||||
}
|
||||
|
||||
div.body p, div.body dd, div.body li {
|
||||
line-height: 1.4em;
|
||||
}
|
||||
|
||||
div.admonition {
|
||||
margin: 20px 0px;
|
||||
padding: 10px 30px;
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.admonition tt.xref, div.admonition code.xref, div.admonition a tt {
|
||||
background-color: #FBFBFB;
|
||||
border-bottom: 1px solid #fafafa;
|
||||
}
|
||||
|
||||
div.admonition p.admonition-title {
|
||||
font-family: Georgia, serif;
|
||||
font-weight: normal;
|
||||
font-size: 24px;
|
||||
margin: 0 0 10px 0;
|
||||
padding: 0;
|
||||
line-height: 1;
|
||||
}
|
||||
|
||||
div.admonition p.last {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
div.highlight {
|
||||
background-color: #fff;
|
||||
}
|
||||
|
||||
dt:target, .highlight {
|
||||
background: #FAF3E8;
|
||||
}
|
||||
|
||||
div.warning {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
}
|
||||
|
||||
div.danger {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
-moz-box-shadow: 2px 2px 4px #D52C2C;
|
||||
-webkit-box-shadow: 2px 2px 4px #D52C2C;
|
||||
box-shadow: 2px 2px 4px #D52C2C;
|
||||
}
|
||||
|
||||
div.error {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
-moz-box-shadow: 2px 2px 4px #D52C2C;
|
||||
-webkit-box-shadow: 2px 2px 4px #D52C2C;
|
||||
box-shadow: 2px 2px 4px #D52C2C;
|
||||
}
|
||||
|
||||
div.caution {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
}
|
||||
|
||||
div.attention {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
}
|
||||
|
||||
div.important {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.note {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.tip {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.hint {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.seealso {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.topic {
|
||||
background-color: #EEE;
|
||||
}
|
||||
|
||||
p.admonition-title {
|
||||
display: inline;
|
||||
}
|
||||
|
||||
p.admonition-title:after {
|
||||
content: ":";
|
||||
}
|
||||
|
||||
pre, tt, code {
|
||||
font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.hll {
|
||||
background-color: #FFC;
|
||||
margin: 0 -12px;
|
||||
padding: 0 12px;
|
||||
display: block;
|
||||
}
|
||||
|
||||
img.screenshot {
|
||||
}
|
||||
|
||||
tt.descname, tt.descclassname, code.descname, code.descclassname {
|
||||
font-size: 0.95em;
|
||||
}
|
||||
|
||||
tt.descname, code.descname {
|
||||
padding-right: 0.08em;
|
||||
}
|
||||
|
||||
img.screenshot {
|
||||
-moz-box-shadow: 2px 2px 4px #EEE;
|
||||
-webkit-box-shadow: 2px 2px 4px #EEE;
|
||||
box-shadow: 2px 2px 4px #EEE;
|
||||
}
|
||||
|
||||
table.docutils {
|
||||
border: 1px solid #888;
|
||||
-moz-box-shadow: 2px 2px 4px #EEE;
|
||||
-webkit-box-shadow: 2px 2px 4px #EEE;
|
||||
box-shadow: 2px 2px 4px #EEE;
|
||||
}
|
||||
|
||||
table.docutils td, table.docutils th {
|
||||
border: 1px solid #888;
|
||||
padding: 0.25em 0.7em;
|
||||
}
|
||||
|
||||
table.field-list, table.footnote {
|
||||
border: none;
|
||||
-moz-box-shadow: none;
|
||||
-webkit-box-shadow: none;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
table.footnote {
|
||||
margin: 15px 0;
|
||||
width: 100%;
|
||||
border: 1px solid #EEE;
|
||||
background: #FDFDFD;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
table.footnote + table.footnote {
|
||||
margin-top: -15px;
|
||||
border-top: none;
|
||||
}
|
||||
|
||||
table.field-list th {
|
||||
padding: 0 0.8em 0 0;
|
||||
}
|
||||
|
||||
table.field-list td {
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
table.field-list p {
|
||||
margin-bottom: 0.8em;
|
||||
}
|
||||
|
||||
/* Cloned from
|
||||
* https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68
|
||||
*/
|
||||
.field-name {
|
||||
-moz-hyphens: manual;
|
||||
-ms-hyphens: manual;
|
||||
-webkit-hyphens: manual;
|
||||
hyphens: manual;
|
||||
}
|
||||
|
||||
table.footnote td.label {
|
||||
width: .1px;
|
||||
padding: 0.3em 0 0.3em 0.5em;
|
||||
}
|
||||
|
||||
table.footnote td {
|
||||
padding: 0.3em 0.5em;
|
||||
}
|
||||
|
||||
dl {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
dl dd {
|
||||
margin-left: 30px;
|
||||
}
|
||||
|
||||
blockquote {
|
||||
margin: 0 0 0 30px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
ul, ol {
|
||||
/* Matches the 30px from the narrow-screen "li > ul" selector below */
|
||||
margin: 10px 0 10px 30px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
pre {
|
||||
background: #EEE;
|
||||
padding: 7px 30px;
|
||||
margin: 15px 0px;
|
||||
line-height: 1.3em;
|
||||
}
|
||||
|
||||
div.viewcode-block:target {
|
||||
background: #ffd;
|
||||
}
|
||||
|
||||
dl pre, blockquote pre, li pre {
|
||||
margin-left: 0;
|
||||
padding-left: 30px;
|
||||
}
|
||||
|
||||
tt, code {
|
||||
background-color: #ecf0f3;
|
||||
color: #222;
|
||||
/* padding: 1px 2px; */
|
||||
}
|
||||
|
||||
tt.xref, code.xref, a tt {
|
||||
background-color: #FBFBFB;
|
||||
border-bottom: 1px solid #fff;
|
||||
}
|
||||
|
||||
a.reference {
|
||||
text-decoration: none;
|
||||
border-bottom: 1px dotted #004B6B;
|
||||
}
|
||||
|
||||
/* Don't put an underline on images */
|
||||
a.image-reference, a.image-reference:hover {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
a.reference:hover {
|
||||
border-bottom: 1px solid #6D4100;
|
||||
}
|
||||
|
||||
a.footnote-reference {
|
||||
text-decoration: none;
|
||||
font-size: 0.7em;
|
||||
vertical-align: top;
|
||||
border-bottom: 1px dotted #004B6B;
|
||||
}
|
||||
|
||||
a.footnote-reference:hover {
|
||||
border-bottom: 1px solid #6D4100;
|
||||
}
|
||||
|
||||
a:hover tt, a:hover code {
|
||||
background: #EEE;
|
||||
}
|
||||
|
||||
|
||||
@media screen and (max-width: 870px) {
|
||||
|
||||
div.sphinxsidebar {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.document {
|
||||
width: 100%;
|
||||
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
margin-left: 0;
|
||||
margin-top: 0;
|
||||
margin-right: 0;
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin-top: 0;
|
||||
margin-right: 0;
|
||||
margin-bottom: 0;
|
||||
margin-left: 0;
|
||||
}
|
||||
|
||||
ul {
|
||||
margin-left: 0;
|
||||
}
|
||||
|
||||
li > ul {
|
||||
/* Matches the 30px from the "ul, ol" selector above */
|
||||
margin-left: 30px;
|
||||
}
|
||||
|
||||
.document {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.bodywrapper {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.github {
|
||||
display: none;
|
||||
}
|
||||
|
||||
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
@media screen and (max-width: 875px) {
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
padding: 20px 30px;
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
float: none;
|
||||
background: #fff;
|
||||
}
|
||||
|
||||
div.sphinxsidebar {
|
||||
display: block;
|
||||
float: none;
|
||||
width: 102.5%;
|
||||
margin: 50px -30px -20px -30px;
|
||||
padding: 10px 20px;
|
||||
background: #333;
|
||||
color: #FFF;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p,
|
||||
div.sphinxsidebar h3 a {
|
||||
color: #fff;
|
||||
}
|
||||
|
||||
div.sphinxsidebar a {
|
||||
color: #AAA;
|
||||
}
|
||||
|
||||
div.sphinxsidebar p.logo {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.document {
|
||||
width: 100%;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
div.footer {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
div.body {
|
||||
min-height: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.rtd_doc_footer {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.document {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.github {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* misc. */
|
||||
|
||||
.revsys-inline {
|
||||
display: none!important;
|
||||
}
|
||||
|
||||
/* Make nested-list/multi-paragraph items look better in Releases changelog
|
||||
* pages. Without this, docutils' magical list fuckery causes inconsistent
|
||||
* formatting between different release sub-lists.
|
||||
*/
|
||||
div#changelog > div.section > ul > li > p:only-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
/* Hide fugly table cell borders in ..bibliography:: directive output */
|
||||
table.docutils.citation, table.docutils.citation td, table.docutils.citation th {
|
||||
border: none;
|
||||
/* Below needed in some edge cases; if not applied, bottom shadows appear */
|
||||
-moz-box-shadow: none;
|
||||
-webkit-box-shadow: none;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
|
||||
/* relbar */
|
||||
|
||||
.related {
|
||||
line-height: 30px;
|
||||
width: 100%;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
.related.top {
|
||||
border-bottom: 1px solid #EEE;
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
|
||||
.related.bottom {
|
||||
border-top: 1px solid #EEE;
|
||||
}
|
||||
|
||||
.related ul {
|
||||
padding: 0;
|
||||
margin: 0;
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
.related li {
|
||||
display: inline;
|
||||
}
|
||||
|
||||
nav#rellinks {
|
||||
float: right;
|
||||
}
|
||||
|
||||
nav#rellinks li+li:before {
|
||||
content: "|";
|
||||
}
|
||||
|
||||
nav#breadcrumbs li+li:before {
|
||||
content: "\00BB";
|
||||
}
|
||||
|
||||
/* Hide certain items when printing */
|
||||
@media print {
|
||||
div.related {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
|
|
@ -4,7 +4,7 @@
|
|||
*
|
||||
* Sphinx stylesheet -- basic theme.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
|
@ -222,7 +222,7 @@ table.modindextable td {
|
|||
/* -- general body styles --------------------------------------------------- */
|
||||
|
||||
div.body {
|
||||
min-width: 450px;
|
||||
min-width: 360px;
|
||||
max-width: 800px;
|
||||
}
|
||||
|
||||
|
|
@ -237,16 +237,6 @@ a.headerlink {
|
|||
visibility: hidden;
|
||||
}
|
||||
|
||||
a.brackets:before,
|
||||
span.brackets > a:before{
|
||||
content: "[";
|
||||
}
|
||||
|
||||
a.brackets:after,
|
||||
span.brackets > a:after {
|
||||
content: "]";
|
||||
}
|
||||
|
||||
h1:hover > a.headerlink,
|
||||
h2:hover > a.headerlink,
|
||||
h3:hover > a.headerlink,
|
||||
|
|
@ -334,13 +324,15 @@ aside.sidebar {
|
|||
p.sidebar-title {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.admonition, div.topic, blockquote {
|
||||
clear: left;
|
||||
}
|
||||
|
||||
/* -- topics ---------------------------------------------------------------- */
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.topic {
|
||||
border: 1px solid #ccc;
|
||||
padding: 7px;
|
||||
|
|
@ -379,6 +371,8 @@ div.body p.centered {
|
|||
|
||||
div.sidebar > :last-child,
|
||||
aside.sidebar > :last-child,
|
||||
nav.contents > :last-child,
|
||||
aside.topic > :last-child,
|
||||
div.topic > :last-child,
|
||||
div.admonition > :last-child {
|
||||
margin-bottom: 0;
|
||||
|
|
@ -386,6 +380,8 @@ div.admonition > :last-child {
|
|||
|
||||
div.sidebar::after,
|
||||
aside.sidebar::after,
|
||||
nav.contents::after,
|
||||
aside.topic::after,
|
||||
div.topic::after,
|
||||
div.admonition::after,
|
||||
blockquote::after {
|
||||
|
|
@ -428,10 +424,6 @@ table.docutils td, table.docutils th {
|
|||
border-bottom: 1px solid #aaa;
|
||||
}
|
||||
|
||||
table.footnote td, table.footnote th {
|
||||
border: 0 !important;
|
||||
}
|
||||
|
||||
th {
|
||||
text-align: left;
|
||||
padding-right: 5px;
|
||||
|
|
@ -614,20 +606,26 @@ ol.simple p,
|
|||
ul.simple p {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
dl.footnote > dt,
|
||||
dl.citation > dt {
|
||||
aside.footnote > span,
|
||||
div.citation > span {
|
||||
float: left;
|
||||
margin-right: 0.5em;
|
||||
}
|
||||
|
||||
dl.footnote > dd,
|
||||
dl.citation > dd {
|
||||
aside.footnote > span:last-of-type,
|
||||
div.citation > span:last-of-type {
|
||||
padding-right: 0.5em;
|
||||
}
|
||||
aside.footnote > p {
|
||||
margin-left: 2em;
|
||||
}
|
||||
div.citation > p {
|
||||
margin-left: 4em;
|
||||
}
|
||||
aside.footnote > p:last-of-type,
|
||||
div.citation > p:last-of-type {
|
||||
margin-bottom: 0em;
|
||||
}
|
||||
|
||||
dl.footnote > dd:after,
|
||||
dl.citation > dd:after {
|
||||
aside.footnote > p:last-of-type:after,
|
||||
div.citation > p:last-of-type:after {
|
||||
content: "";
|
||||
clear: both;
|
||||
}
|
||||
|
|
@ -644,10 +642,6 @@ dl.field-list > dt {
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
dl.field-list > dt:after {
|
||||
content: ":";
|
||||
}
|
||||
|
||||
dl.field-list > dd {
|
||||
padding-left: 0.5em;
|
||||
margin-top: 0em;
|
||||
|
|
@ -731,8 +725,9 @@ dl.glossary dt {
|
|||
|
||||
.classifier:before {
|
||||
font-style: normal;
|
||||
margin: 0.5em;
|
||||
margin: 0 0.5em;
|
||||
content: ":";
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
abbr, acronym {
|
||||
|
|
@ -756,6 +751,7 @@ span.pre {
|
|||
-ms-hyphens: none;
|
||||
-webkit-hyphens: none;
|
||||
hyphens: none;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
div[class*="highlight-"] {
|
||||
|
|
|
|||
|
|
@ -294,6 +294,8 @@ div.quotebar {
|
|||
padding: 2px 7px;
|
||||
border: 1px solid #ccc;
|
||||
}
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
|
||||
div.topic {
|
||||
background-color: #f8f8f8;
|
||||
|
|
|
|||
|
|
@ -9,33 +9,22 @@
|
|||
// :copyright: Copyright 2012-2014 by Sphinx team, see AUTHORS.
|
||||
// :license: BSD, see LICENSE for details.
|
||||
//
|
||||
$(document).ready(function(){
|
||||
if (navigator.userAgent.indexOf('iPhone') > 0 ||
|
||||
navigator.userAgent.indexOf('Android') > 0) {
|
||||
$("li.nav-item-0 a").text("Top");
|
||||
const initialiseBizStyle = () => {
|
||||
if (navigator.userAgent.indexOf("iPhone") > 0 || navigator.userAgent.indexOf("Android") > 0) {
|
||||
document.querySelector("li.nav-item-0 a").innerText = "Top"
|
||||
}
|
||||
const truncator = item => {if (item.textContent.length > 20) {
|
||||
item.title = item.innerText
|
||||
item.innerText = item.innerText.substr(0, 17) + "..."
|
||||
}
|
||||
}
|
||||
document.querySelectorAll("div.related:first ul li:not(.right) a").slice(1).forEach(truncator);
|
||||
document.querySelectorAll("div.related:last ul li:not(.right) a").slice(1).forEach(truncator);
|
||||
}
|
||||
|
||||
$("div.related:first ul li:not(.right) a").slice(1).each(function(i, item){
|
||||
if (item.text.length > 20) {
|
||||
var tmpstr = item.text
|
||||
$(item).attr("title", tmpstr);
|
||||
$(item).text(tmpstr.substr(0, 17) + "...");
|
||||
}
|
||||
});
|
||||
$("div.related:last ul li:not(.right) a").slice(1).each(function(i, item){
|
||||
if (item.text.length > 20) {
|
||||
var tmpstr = item.text
|
||||
$(item).attr("title", tmpstr);
|
||||
$(item).text(tmpstr.substr(0, 17) + "...");
|
||||
}
|
||||
});
|
||||
});
|
||||
window.addEventListener("resize",
|
||||
() => (document.querySelector("li.nav-item-0 a").innerText = (window.innerWidth <= 776) ? "Top" : "QuaPy 0.1.7 documentation")
|
||||
)
|
||||
|
||||
$(window).resize(function(){
|
||||
if ($(window).width() <= 776) {
|
||||
$("li.nav-item-0 a").text("Top");
|
||||
}
|
||||
else {
|
||||
$("li.nav-item-0 a").text("QuaPy 0.1.6 documentation");
|
||||
}
|
||||
});
|
||||
if (document.readyState !== "loading") initialiseBizStyle()
|
||||
else document.addEventListener("DOMContentLoaded", initialiseBizStyle)
|
||||
|
|
@ -1 +0,0 @@
|
|||
/* This file intentionally left blank. */
|
||||
|
|
@ -2,322 +2,155 @@
|
|||
* doctools.js
|
||||
* ~~~~~~~~~~~
|
||||
*
|
||||
* Sphinx JavaScript utilities for all documentation.
|
||||
* Base JavaScript utilities for all Sphinx HTML documentation.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
"use strict";
|
||||
|
||||
/**
|
||||
* select a different prefix for underscore
|
||||
*/
|
||||
$u = _.noConflict();
|
||||
const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([
|
||||
"TEXTAREA",
|
||||
"INPUT",
|
||||
"SELECT",
|
||||
"BUTTON",
|
||||
]);
|
||||
|
||||
/**
|
||||
* make the code below compatible with browsers without
|
||||
* an installed firebug like debugger
|
||||
if (!window.console || !console.firebug) {
|
||||
var names = ["log", "debug", "info", "warn", "error", "assert", "dir",
|
||||
"dirxml", "group", "groupEnd", "time", "timeEnd", "count", "trace",
|
||||
"profile", "profileEnd"];
|
||||
window.console = {};
|
||||
for (var i = 0; i < names.length; ++i)
|
||||
window.console[names[i]] = function() {};
|
||||
}
|
||||
*/
|
||||
|
||||
/**
|
||||
* small helper function to urldecode strings
|
||||
*
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
|
||||
*/
|
||||
jQuery.urldecode = function(x) {
|
||||
if (!x) {
|
||||
return x
|
||||
const _ready = (callback) => {
|
||||
if (document.readyState !== "loading") {
|
||||
callback();
|
||||
} else {
|
||||
document.addEventListener("DOMContentLoaded", callback);
|
||||
}
|
||||
return decodeURIComponent(x.replace(/\+/g, ' '));
|
||||
};
|
||||
|
||||
/**
|
||||
* small helper function to urlencode strings
|
||||
*/
|
||||
jQuery.urlencode = encodeURIComponent;
|
||||
|
||||
/**
|
||||
* This function returns the parsed url parameters of the
|
||||
* current request. Multiple values per key are supported,
|
||||
* it will always return arrays of strings for the value parts.
|
||||
*/
|
||||
jQuery.getQueryParameters = function(s) {
|
||||
if (typeof s === 'undefined')
|
||||
s = document.location.search;
|
||||
var parts = s.substr(s.indexOf('?') + 1).split('&');
|
||||
var result = {};
|
||||
for (var i = 0; i < parts.length; i++) {
|
||||
var tmp = parts[i].split('=', 2);
|
||||
var key = jQuery.urldecode(tmp[0]);
|
||||
var value = jQuery.urldecode(tmp[1]);
|
||||
if (key in result)
|
||||
result[key].push(value);
|
||||
else
|
||||
result[key] = [value];
|
||||
}
|
||||
return result;
|
||||
};
|
||||
|
||||
/**
|
||||
* highlight a given string on a jquery object by wrapping it in
|
||||
* span elements with the given class name.
|
||||
*/
|
||||
jQuery.fn.highlightText = function(text, className) {
|
||||
function highlight(node, addItems) {
|
||||
if (node.nodeType === 3) {
|
||||
var val = node.nodeValue;
|
||||
var pos = val.toLowerCase().indexOf(text);
|
||||
if (pos >= 0 &&
|
||||
!jQuery(node.parentNode).hasClass(className) &&
|
||||
!jQuery(node.parentNode).hasClass("nohighlight")) {
|
||||
var span;
|
||||
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
|
||||
if (isInSVG) {
|
||||
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
|
||||
} else {
|
||||
span = document.createElement("span");
|
||||
span.className = className;
|
||||
}
|
||||
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
|
||||
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
|
||||
document.createTextNode(val.substr(pos + text.length)),
|
||||
node.nextSibling));
|
||||
node.nodeValue = val.substr(0, pos);
|
||||
if (isInSVG) {
|
||||
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
|
||||
var bbox = node.parentElement.getBBox();
|
||||
rect.x.baseVal.value = bbox.x;
|
||||
rect.y.baseVal.value = bbox.y;
|
||||
rect.width.baseVal.value = bbox.width;
|
||||
rect.height.baseVal.value = bbox.height;
|
||||
rect.setAttribute('class', className);
|
||||
addItems.push({
|
||||
"parent": node.parentNode,
|
||||
"target": rect});
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (!jQuery(node).is("button, select, textarea")) {
|
||||
jQuery.each(node.childNodes, function() {
|
||||
highlight(this, addItems);
|
||||
});
|
||||
}
|
||||
}
|
||||
var addItems = [];
|
||||
var result = this.each(function() {
|
||||
highlight(this, addItems);
|
||||
});
|
||||
for (var i = 0; i < addItems.length; ++i) {
|
||||
jQuery(addItems[i].parent).before(addItems[i].target);
|
||||
}
|
||||
return result;
|
||||
};
|
||||
|
||||
/*
|
||||
* backward compatibility for jQuery.browser
|
||||
* This will be supported until firefox bug is fixed.
|
||||
*/
|
||||
if (!jQuery.browser) {
|
||||
jQuery.uaMatch = function(ua) {
|
||||
ua = ua.toLowerCase();
|
||||
|
||||
var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(msie) ([\w.]+)/.exec(ua) ||
|
||||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
|
||||
[];
|
||||
|
||||
return {
|
||||
browser: match[ 1 ] || "",
|
||||
version: match[ 2 ] || "0"
|
||||
};
|
||||
};
|
||||
jQuery.browser = {};
|
||||
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Small JavaScript module for the documentation.
|
||||
*/
|
||||
var Documentation = {
|
||||
|
||||
init : function() {
|
||||
this.fixFirefoxAnchorBug();
|
||||
this.highlightSearchWords();
|
||||
this.initIndexTable();
|
||||
if (DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) {
|
||||
this.initOnKeyListeners();
|
||||
}
|
||||
const Documentation = {
|
||||
init: () => {
|
||||
Documentation.initDomainIndexTable();
|
||||
Documentation.initOnKeyListeners();
|
||||
},
|
||||
|
||||
/**
|
||||
* i18n support
|
||||
*/
|
||||
TRANSLATIONS : {},
|
||||
PLURAL_EXPR : function(n) { return n === 1 ? 0 : 1; },
|
||||
LOCALE : 'unknown',
|
||||
TRANSLATIONS: {},
|
||||
PLURAL_EXPR: (n) => (n === 1 ? 0 : 1),
|
||||
LOCALE: "unknown",
|
||||
|
||||
// gettext and ngettext don't access this so that the functions
|
||||
// can safely bound to a different name (_ = Documentation.gettext)
|
||||
gettext : function(string) {
|
||||
var translated = Documentation.TRANSLATIONS[string];
|
||||
if (typeof translated === 'undefined')
|
||||
return string;
|
||||
return (typeof translated === 'string') ? translated : translated[0];
|
||||
gettext: (string) => {
|
||||
const translated = Documentation.TRANSLATIONS[string];
|
||||
switch (typeof translated) {
|
||||
case "undefined":
|
||||
return string; // no translation
|
||||
case "string":
|
||||
return translated; // translation exists
|
||||
default:
|
||||
return translated[0]; // (singular, plural) translation tuple exists
|
||||
}
|
||||
},
|
||||
|
||||
ngettext : function(singular, plural, n) {
|
||||
var translated = Documentation.TRANSLATIONS[singular];
|
||||
if (typeof translated === 'undefined')
|
||||
return (n == 1) ? singular : plural;
|
||||
return translated[Documentation.PLURALEXPR(n)];
|
||||
ngettext: (singular, plural, n) => {
|
||||
const translated = Documentation.TRANSLATIONS[singular];
|
||||
if (typeof translated !== "undefined")
|
||||
return translated[Documentation.PLURAL_EXPR(n)];
|
||||
return n === 1 ? singular : plural;
|
||||
},
|
||||
|
||||
addTranslations : function(catalog) {
|
||||
for (var key in catalog.messages)
|
||||
this.TRANSLATIONS[key] = catalog.messages[key];
|
||||
this.PLURAL_EXPR = new Function('n', 'return +(' + catalog.plural_expr + ')');
|
||||
this.LOCALE = catalog.locale;
|
||||
addTranslations: (catalog) => {
|
||||
Object.assign(Documentation.TRANSLATIONS, catalog.messages);
|
||||
Documentation.PLURAL_EXPR = new Function(
|
||||
"n",
|
||||
`return (${catalog.plural_expr})`
|
||||
);
|
||||
Documentation.LOCALE = catalog.locale;
|
||||
},
|
||||
|
||||
/**
|
||||
* add context elements like header anchor links
|
||||
* helper function to focus on search bar
|
||||
*/
|
||||
addContextElements : function() {
|
||||
$('div[id] > :header:first').each(function() {
|
||||
$('<a class="headerlink">\u00B6</a>').
|
||||
attr('href', '#' + this.id).
|
||||
attr('title', _('Permalink to this headline')).
|
||||
appendTo(this);
|
||||
});
|
||||
$('dt[id]').each(function() {
|
||||
$('<a class="headerlink">\u00B6</a>').
|
||||
attr('href', '#' + this.id).
|
||||
attr('title', _('Permalink to this definition')).
|
||||
appendTo(this);
|
||||
});
|
||||
focusSearchBar: () => {
|
||||
document.querySelectorAll("input[name=q]")[0]?.focus();
|
||||
},
|
||||
|
||||
/**
|
||||
* workaround a firefox stupidity
|
||||
* see: https://bugzilla.mozilla.org/show_bug.cgi?id=645075
|
||||
* Initialise the domain index toggle buttons
|
||||
*/
|
||||
fixFirefoxAnchorBug : function() {
|
||||
if (document.location.hash && $.browser.mozilla)
|
||||
window.setTimeout(function() {
|
||||
document.location.href += '';
|
||||
}, 10);
|
||||
},
|
||||
|
||||
/**
|
||||
* highlight the search words provided in the url in the text
|
||||
*/
|
||||
highlightSearchWords : function() {
|
||||
var params = $.getQueryParameters();
|
||||
var terms = (params.highlight) ? params.highlight[0].split(/\s+/) : [];
|
||||
if (terms.length) {
|
||||
var body = $('div.body');
|
||||
if (!body.length) {
|
||||
body = $('body');
|
||||
initDomainIndexTable: () => {
|
||||
const toggler = (el) => {
|
||||
const idNumber = el.id.substr(7);
|
||||
const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`);
|
||||
if (el.src.substr(-9) === "minus.png") {
|
||||
el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`;
|
||||
toggledRows.forEach((el) => (el.style.display = "none"));
|
||||
} else {
|
||||
el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`;
|
||||
toggledRows.forEach((el) => (el.style.display = ""));
|
||||
}
|
||||
window.setTimeout(function() {
|
||||
$.each(terms, function() {
|
||||
body.highlightText(this.toLowerCase(), 'highlighted');
|
||||
});
|
||||
}, 10);
|
||||
$('<p class="highlight-link"><a href="javascript:Documentation.' +
|
||||
'hideSearchWords()">' + _('Hide Search Matches') + '</a></p>')
|
||||
.appendTo($('#searchbox'));
|
||||
}
|
||||
};
|
||||
|
||||
const togglerElements = document.querySelectorAll("img.toggler");
|
||||
togglerElements.forEach((el) =>
|
||||
el.addEventListener("click", (event) => toggler(event.currentTarget))
|
||||
);
|
||||
togglerElements.forEach((el) => (el.style.display = ""));
|
||||
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler);
|
||||
},
|
||||
|
||||
/**
|
||||
* init the domain index toggle buttons
|
||||
*/
|
||||
initIndexTable : function() {
|
||||
var togglers = $('img.toggler').click(function() {
|
||||
var src = $(this).attr('src');
|
||||
var idnum = $(this).attr('id').substr(7);
|
||||
$('tr.cg-' + idnum).toggle();
|
||||
if (src.substr(-9) === 'minus.png')
|
||||
$(this).attr('src', src.substr(0, src.length-9) + 'plus.png');
|
||||
else
|
||||
$(this).attr('src', src.substr(0, src.length-8) + 'minus.png');
|
||||
}).css('display', '');
|
||||
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) {
|
||||
togglers.click();
|
||||
}
|
||||
},
|
||||
initOnKeyListeners: () => {
|
||||
// only install a listener if it is really needed
|
||||
if (
|
||||
!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS &&
|
||||
!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS
|
||||
)
|
||||
return;
|
||||
|
||||
/**
|
||||
* helper function to hide the search marks again
|
||||
*/
|
||||
hideSearchWords : function() {
|
||||
$('#searchbox .highlight-link').fadeOut(300);
|
||||
$('span.highlighted').removeClass('highlighted');
|
||||
},
|
||||
document.addEventListener("keydown", (event) => {
|
||||
// bail for input elements
|
||||
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
|
||||
// bail with special keys
|
||||
if (event.altKey || event.ctrlKey || event.metaKey) return;
|
||||
|
||||
/**
|
||||
* make the url absolute
|
||||
*/
|
||||
makeURL : function(relativeURL) {
|
||||
return DOCUMENTATION_OPTIONS.URL_ROOT + '/' + relativeURL;
|
||||
},
|
||||
if (!event.shiftKey) {
|
||||
switch (event.key) {
|
||||
case "ArrowLeft":
|
||||
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
|
||||
|
||||
/**
|
||||
* get the current relative url
|
||||
*/
|
||||
getCurrentURL : function() {
|
||||
var path = document.location.pathname;
|
||||
var parts = path.split(/\//);
|
||||
$.each(DOCUMENTATION_OPTIONS.URL_ROOT.split(/\//), function() {
|
||||
if (this === '..')
|
||||
parts.pop();
|
||||
});
|
||||
var url = parts.join('/');
|
||||
return path.substring(url.lastIndexOf('/') + 1, path.length - 1);
|
||||
},
|
||||
|
||||
initOnKeyListeners: function() {
|
||||
$(document).keydown(function(event) {
|
||||
var activeElementType = document.activeElement.tagName;
|
||||
// don't navigate when in search box, textarea, dropdown or button
|
||||
if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT'
|
||||
&& activeElementType !== 'BUTTON' && !event.altKey && !event.ctrlKey && !event.metaKey
|
||||
&& !event.shiftKey) {
|
||||
switch (event.keyCode) {
|
||||
case 37: // left
|
||||
var prevHref = $('link[rel="prev"]').prop('href');
|
||||
if (prevHref) {
|
||||
window.location.href = prevHref;
|
||||
return false;
|
||||
const prevLink = document.querySelector('link[rel="prev"]');
|
||||
if (prevLink && prevLink.href) {
|
||||
window.location.href = prevLink.href;
|
||||
event.preventDefault();
|
||||
}
|
||||
break;
|
||||
case 39: // right
|
||||
var nextHref = $('link[rel="next"]').prop('href');
|
||||
if (nextHref) {
|
||||
window.location.href = nextHref;
|
||||
return false;
|
||||
case "ArrowRight":
|
||||
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
|
||||
|
||||
const nextLink = document.querySelector('link[rel="next"]');
|
||||
if (nextLink && nextLink.href) {
|
||||
window.location.href = nextLink.href;
|
||||
event.preventDefault();
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// some keyboard layouts may need Shift to get /
|
||||
switch (event.key) {
|
||||
case "/":
|
||||
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break;
|
||||
Documentation.focusSearchBar();
|
||||
event.preventDefault();
|
||||
}
|
||||
});
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
// quick alias for translations
|
||||
_ = Documentation.gettext;
|
||||
const _ = Documentation.gettext;
|
||||
|
||||
$(document).ready(function() {
|
||||
Documentation.init();
|
||||
});
|
||||
_ready(Documentation.init);
|
||||
|
|
|
|||
|
|
@ -1,12 +1,14 @@
|
|||
var DOCUMENTATION_OPTIONS = {
|
||||
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
|
||||
VERSION: '0.1.6',
|
||||
LANGUAGE: 'None',
|
||||
VERSION: '0.1.7',
|
||||
LANGUAGE: 'en',
|
||||
COLLAPSE_INDEX: false,
|
||||
BUILDER: 'html',
|
||||
FILE_SUFFIX: '.html',
|
||||
LINK_SUFFIX: '.html',
|
||||
HAS_SOURCE: true,
|
||||
SOURCELINK_SUFFIX: '.txt',
|
||||
NAVIGATION_WITH_KEYS: false
|
||||
NAVIGATION_WITH_KEYS: false,
|
||||
SHOW_SEARCH_SUMMARY: true,
|
||||
ENABLE_SEARCH_SHORTCUTS: true,
|
||||
};
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
|
|
@ -5,12 +5,12 @@
|
|||
* This script contains the language-specific data used by searchtools.js,
|
||||
* namely the list of stopwords, stemmer, scorer and splitter.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
||||
var stopwords = ["a","and","are","as","at","be","but","by","for","if","in","into","is","it","near","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"];
|
||||
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
|
||||
|
||||
|
||||
/* Non-minified version is copied as a separate JS file, is available */
|
||||
|
|
@ -197,101 +197,3 @@ var Stemmer = function() {
|
|||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
var splitChars = (function() {
|
||||
var result = {};
|
||||
var singles = [96, 180, 187, 191, 215, 247, 749, 885, 903, 907, 909, 930, 1014, 1648,
|
||||
1748, 1809, 2416, 2473, 2481, 2526, 2601, 2609, 2612, 2615, 2653, 2702,
|
||||
2706, 2729, 2737, 2740, 2857, 2865, 2868, 2910, 2928, 2948, 2961, 2971,
|
||||
2973, 3085, 3089, 3113, 3124, 3213, 3217, 3241, 3252, 3295, 3341, 3345,
|
||||
3369, 3506, 3516, 3633, 3715, 3721, 3736, 3744, 3748, 3750, 3756, 3761,
|
||||
3781, 3912, 4239, 4347, 4681, 4695, 4697, 4745, 4785, 4799, 4801, 4823,
|
||||
4881, 5760, 5901, 5997, 6313, 7405, 8024, 8026, 8028, 8030, 8117, 8125,
|
||||
8133, 8181, 8468, 8485, 8487, 8489, 8494, 8527, 11311, 11359, 11687, 11695,
|
||||
11703, 11711, 11719, 11727, 11735, 12448, 12539, 43010, 43014, 43019, 43587,
|
||||
43696, 43713, 64286, 64297, 64311, 64317, 64319, 64322, 64325, 65141];
|
||||
var i, j, start, end;
|
||||
for (i = 0; i < singles.length; i++) {
|
||||
result[singles[i]] = true;
|
||||
}
|
||||
var ranges = [[0, 47], [58, 64], [91, 94], [123, 169], [171, 177], [182, 184], [706, 709],
|
||||
[722, 735], [741, 747], [751, 879], [888, 889], [894, 901], [1154, 1161],
|
||||
[1318, 1328], [1367, 1368], [1370, 1376], [1416, 1487], [1515, 1519], [1523, 1568],
|
||||
[1611, 1631], [1642, 1645], [1750, 1764], [1767, 1773], [1789, 1790], [1792, 1807],
|
||||
[1840, 1868], [1958, 1968], [1970, 1983], [2027, 2035], [2038, 2041], [2043, 2047],
|
||||
[2070, 2073], [2075, 2083], [2085, 2087], [2089, 2307], [2362, 2364], [2366, 2383],
|
||||
[2385, 2391], [2402, 2405], [2419, 2424], [2432, 2436], [2445, 2446], [2449, 2450],
|
||||
[2483, 2485], [2490, 2492], [2494, 2509], [2511, 2523], [2530, 2533], [2546, 2547],
|
||||
[2554, 2564], [2571, 2574], [2577, 2578], [2618, 2648], [2655, 2661], [2672, 2673],
|
||||
[2677, 2692], [2746, 2748], [2750, 2767], [2769, 2783], [2786, 2789], [2800, 2820],
|
||||
[2829, 2830], [2833, 2834], [2874, 2876], [2878, 2907], [2914, 2917], [2930, 2946],
|
||||
[2955, 2957], [2966, 2968], [2976, 2978], [2981, 2983], [2987, 2989], [3002, 3023],
|
||||
[3025, 3045], [3059, 3076], [3130, 3132], [3134, 3159], [3162, 3167], [3170, 3173],
|
||||
[3184, 3191], [3199, 3204], [3258, 3260], [3262, 3293], [3298, 3301], [3312, 3332],
|
||||
[3386, 3388], [3390, 3423], [3426, 3429], [3446, 3449], [3456, 3460], [3479, 3481],
|
||||
[3518, 3519], [3527, 3584], [3636, 3647], [3655, 3663], [3674, 3712], [3717, 3718],
|
||||
[3723, 3724], [3726, 3731], [3752, 3753], [3764, 3772], [3774, 3775], [3783, 3791],
|
||||
[3802, 3803], [3806, 3839], [3841, 3871], [3892, 3903], [3949, 3975], [3980, 4095],
|
||||
[4139, 4158], [4170, 4175], [4182, 4185], [4190, 4192], [4194, 4196], [4199, 4205],
|
||||
[4209, 4212], [4226, 4237], [4250, 4255], [4294, 4303], [4349, 4351], [4686, 4687],
|
||||
[4702, 4703], [4750, 4751], [4790, 4791], [4806, 4807], [4886, 4887], [4955, 4968],
|
||||
[4989, 4991], [5008, 5023], [5109, 5120], [5741, 5742], [5787, 5791], [5867, 5869],
|
||||
[5873, 5887], [5906, 5919], [5938, 5951], [5970, 5983], [6001, 6015], [6068, 6102],
|
||||
[6104, 6107], [6109, 6111], [6122, 6127], [6138, 6159], [6170, 6175], [6264, 6271],
|
||||
[6315, 6319], [6390, 6399], [6429, 6469], [6510, 6511], [6517, 6527], [6572, 6592],
|
||||
[6600, 6607], [6619, 6655], [6679, 6687], [6741, 6783], [6794, 6799], [6810, 6822],
|
||||
[6824, 6916], [6964, 6980], [6988, 6991], [7002, 7042], [7073, 7085], [7098, 7167],
|
||||
[7204, 7231], [7242, 7244], [7294, 7400], [7410, 7423], [7616, 7679], [7958, 7959],
|
||||
[7966, 7967], [8006, 8007], [8014, 8015], [8062, 8063], [8127, 8129], [8141, 8143],
|
||||
[8148, 8149], [8156, 8159], [8173, 8177], [8189, 8303], [8306, 8307], [8314, 8318],
|
||||
[8330, 8335], [8341, 8449], [8451, 8454], [8456, 8457], [8470, 8472], [8478, 8483],
|
||||
[8506, 8507], [8512, 8516], [8522, 8525], [8586, 9311], [9372, 9449], [9472, 10101],
|
||||
[10132, 11263], [11493, 11498], [11503, 11516], [11518, 11519], [11558, 11567],
|
||||
[11622, 11630], [11632, 11647], [11671, 11679], [11743, 11822], [11824, 12292],
|
||||
[12296, 12320], [12330, 12336], [12342, 12343], [12349, 12352], [12439, 12444],
|
||||
[12544, 12548], [12590, 12592], [12687, 12689], [12694, 12703], [12728, 12783],
|
||||
[12800, 12831], [12842, 12880], [12896, 12927], [12938, 12976], [12992, 13311],
|
||||
[19894, 19967], [40908, 40959], [42125, 42191], [42238, 42239], [42509, 42511],
|
||||
[42540, 42559], [42592, 42593], [42607, 42622], [42648, 42655], [42736, 42774],
|
||||
[42784, 42785], [42889, 42890], [42893, 43002], [43043, 43055], [43062, 43071],
|
||||
[43124, 43137], [43188, 43215], [43226, 43249], [43256, 43258], [43260, 43263],
|
||||
[43302, 43311], [43335, 43359], [43389, 43395], [43443, 43470], [43482, 43519],
|
||||
[43561, 43583], [43596, 43599], [43610, 43615], [43639, 43641], [43643, 43647],
|
||||
[43698, 43700], [43703, 43704], [43710, 43711], [43715, 43738], [43742, 43967],
|
||||
[44003, 44015], [44026, 44031], [55204, 55215], [55239, 55242], [55292, 55295],
|
||||
[57344, 63743], [64046, 64047], [64110, 64111], [64218, 64255], [64263, 64274],
|
||||
[64280, 64284], [64434, 64466], [64830, 64847], [64912, 64913], [64968, 65007],
|
||||
[65020, 65135], [65277, 65295], [65306, 65312], [65339, 65344], [65371, 65381],
|
||||
[65471, 65473], [65480, 65481], [65488, 65489], [65496, 65497]];
|
||||
for (i = 0; i < ranges.length; i++) {
|
||||
start = ranges[i][0];
|
||||
end = ranges[i][1];
|
||||
for (j = start; j <= end; j++) {
|
||||
result[j] = true;
|
||||
}
|
||||
}
|
||||
return result;
|
||||
})();
|
||||
|
||||
function splitQuery(query) {
|
||||
var result = [];
|
||||
var start = -1;
|
||||
for (var i = 0; i < query.length; i++) {
|
||||
if (splitChars[query.charCodeAt(i)]) {
|
||||
if (start !== -1) {
|
||||
result.push(query.slice(start, i));
|
||||
start = -1;
|
||||
}
|
||||
} else if (start === -1) {
|
||||
start = i;
|
||||
}
|
||||
}
|
||||
if (start !== -1) {
|
||||
result.push(query.slice(start));
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -4,22 +4,24 @@
|
|||
*
|
||||
* Sphinx JavaScript utilities for the full-text search.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
"use strict";
|
||||
|
||||
if (!Scorer) {
|
||||
/**
|
||||
* Simple result scoring code.
|
||||
*/
|
||||
/**
|
||||
* Simple result scoring code.
|
||||
*/
|
||||
if (typeof Scorer === "undefined") {
|
||||
var Scorer = {
|
||||
// Implement the following function to further tweak the score for each result
|
||||
// The function takes a result array [filename, title, anchor, descr, score]
|
||||
// The function takes a result array [docname, title, anchor, descr, score, filename]
|
||||
// and returns the new score.
|
||||
/*
|
||||
score: function(result) {
|
||||
return result[4];
|
||||
score: result => {
|
||||
const [docname, title, anchor, descr, score, filename] = result
|
||||
return score
|
||||
},
|
||||
*/
|
||||
|
||||
|
|
@ -28,9 +30,11 @@ if (!Scorer) {
|
|||
// or matches in the last dotted part of the object name
|
||||
objPartialMatch: 6,
|
||||
// Additive scores depending on the priority of the object
|
||||
objPrio: {0: 15, // used to be importantResults
|
||||
1: 5, // used to be objectResults
|
||||
2: -5}, // used to be unimportantResults
|
||||
objPrio: {
|
||||
0: 15, // used to be importantResults
|
||||
1: 5, // used to be objectResults
|
||||
2: -5, // used to be unimportantResults
|
||||
},
|
||||
// Used when the priority is not in the mapping.
|
||||
objPrioDefault: 0,
|
||||
|
||||
|
|
@ -39,455 +43,495 @@ if (!Scorer) {
|
|||
partialTitle: 7,
|
||||
// query found in terms
|
||||
term: 5,
|
||||
partialTerm: 2
|
||||
partialTerm: 2,
|
||||
};
|
||||
}
|
||||
|
||||
if (!splitQuery) {
|
||||
function splitQuery(query) {
|
||||
return query.split(/\s+/);
|
||||
const _removeChildren = (element) => {
|
||||
while (element && element.lastChild) element.removeChild(element.lastChild);
|
||||
};
|
||||
|
||||
/**
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
|
||||
*/
|
||||
const _escapeRegExp = (string) =>
|
||||
string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
|
||||
|
||||
const _displayItem = (item, searchTerms) => {
|
||||
const docBuilder = DOCUMENTATION_OPTIONS.BUILDER;
|
||||
const docUrlRoot = DOCUMENTATION_OPTIONS.URL_ROOT;
|
||||
const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX;
|
||||
const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX;
|
||||
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
|
||||
|
||||
const [docName, title, anchor, descr, score, _filename] = item;
|
||||
|
||||
let listItem = document.createElement("li");
|
||||
let requestUrl;
|
||||
let linkUrl;
|
||||
if (docBuilder === "dirhtml") {
|
||||
// dirhtml builder
|
||||
let dirname = docName + "/";
|
||||
if (dirname.match(/\/index\/$/))
|
||||
dirname = dirname.substring(0, dirname.length - 6);
|
||||
else if (dirname === "index/") dirname = "";
|
||||
requestUrl = docUrlRoot + dirname;
|
||||
linkUrl = requestUrl;
|
||||
} else {
|
||||
// normal html builders
|
||||
requestUrl = docUrlRoot + docName + docFileSuffix;
|
||||
linkUrl = docName + docLinkSuffix;
|
||||
}
|
||||
let linkEl = listItem.appendChild(document.createElement("a"));
|
||||
linkEl.href = linkUrl + anchor;
|
||||
linkEl.dataset.score = score;
|
||||
linkEl.innerHTML = title;
|
||||
if (descr)
|
||||
listItem.appendChild(document.createElement("span")).innerHTML =
|
||||
" (" + descr + ")";
|
||||
else if (showSearchSummary)
|
||||
fetch(requestUrl)
|
||||
.then((responseData) => responseData.text())
|
||||
.then((data) => {
|
||||
if (data)
|
||||
listItem.appendChild(
|
||||
Search.makeSearchSummary(data, searchTerms)
|
||||
);
|
||||
});
|
||||
Search.output.appendChild(listItem);
|
||||
};
|
||||
const _finishSearch = (resultCount) => {
|
||||
Search.stopPulse();
|
||||
Search.title.innerText = _("Search Results");
|
||||
if (!resultCount)
|
||||
Search.status.innerText = Documentation.gettext(
|
||||
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
|
||||
);
|
||||
else
|
||||
Search.status.innerText = _(
|
||||
`Search finished, found ${resultCount} page(s) matching the search query.`
|
||||
);
|
||||
};
|
||||
const _displayNextItem = (
|
||||
results,
|
||||
resultCount,
|
||||
searchTerms
|
||||
) => {
|
||||
// results left, load the summary and display it
|
||||
// this is intended to be dynamic (don't sub resultsCount)
|
||||
if (results.length) {
|
||||
_displayItem(results.pop(), searchTerms);
|
||||
setTimeout(
|
||||
() => _displayNextItem(results, resultCount, searchTerms),
|
||||
5
|
||||
);
|
||||
}
|
||||
// search finished, update title and status message
|
||||
else _finishSearch(resultCount);
|
||||
};
|
||||
|
||||
/**
|
||||
* Default splitQuery function. Can be overridden in ``sphinx.search`` with a
|
||||
* custom function per language.
|
||||
*
|
||||
* The regular expression works by splitting the string on consecutive characters
|
||||
* that are not Unicode letters, numbers, underscores, or emoji characters.
|
||||
* This is the same as ``\W+`` in Python, preserving the surrogate pair area.
|
||||
*/
|
||||
if (typeof splitQuery === "undefined") {
|
||||
var splitQuery = (query) => query
|
||||
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu)
|
||||
.filter(term => term) // remove remaining empty strings
|
||||
}
|
||||
|
||||
/**
|
||||
* Search Module
|
||||
*/
|
||||
var Search = {
|
||||
const Search = {
|
||||
_index: null,
|
||||
_queued_query: null,
|
||||
_pulse_status: -1,
|
||||
|
||||
_index : null,
|
||||
_queued_query : null,
|
||||
_pulse_status : -1,
|
||||
|
||||
htmlToText : function(htmlString) {
|
||||
var virtualDocument = document.implementation.createHTMLDocument('virtual');
|
||||
var htmlElement = $(htmlString, virtualDocument);
|
||||
htmlElement.find('.headerlink').remove();
|
||||
docContent = htmlElement.find('[role=main]')[0];
|
||||
if(docContent === undefined) {
|
||||
console.warn("Content block not found. Sphinx search tries to obtain it " +
|
||||
"via '[role=main]'. Could you check your theme or template.");
|
||||
return "";
|
||||
}
|
||||
return docContent.textContent || docContent.innerText;
|
||||
htmlToText: (htmlString) => {
|
||||
const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html');
|
||||
htmlElement.querySelectorAll(".headerlink").forEach((el) => { el.remove() });
|
||||
const docContent = htmlElement.querySelector('[role="main"]');
|
||||
if (docContent !== undefined) return docContent.textContent;
|
||||
console.warn(
|
||||
"Content block not found. Sphinx search tries to obtain it via '[role=main]'. Could you check your theme or template."
|
||||
);
|
||||
return "";
|
||||
},
|
||||
|
||||
init : function() {
|
||||
var params = $.getQueryParameters();
|
||||
if (params.q) {
|
||||
var query = params.q[0];
|
||||
$('input[name="q"]')[0].value = query;
|
||||
this.performSearch(query);
|
||||
}
|
||||
init: () => {
|
||||
const query = new URLSearchParams(window.location.search).get("q");
|
||||
document
|
||||
.querySelectorAll('input[name="q"]')
|
||||
.forEach((el) => (el.value = query));
|
||||
if (query) Search.performSearch(query);
|
||||
},
|
||||
|
||||
loadIndex : function(url) {
|
||||
$.ajax({type: "GET", url: url, data: null,
|
||||
dataType: "script", cache: true,
|
||||
complete: function(jqxhr, textstatus) {
|
||||
if (textstatus != "success") {
|
||||
document.getElementById("searchindexloader").src = url;
|
||||
}
|
||||
}});
|
||||
},
|
||||
loadIndex: (url) =>
|
||||
(document.body.appendChild(document.createElement("script")).src = url),
|
||||
|
||||
setIndex : function(index) {
|
||||
var q;
|
||||
this._index = index;
|
||||
if ((q = this._queued_query) !== null) {
|
||||
this._queued_query = null;
|
||||
Search.query(q);
|
||||
setIndex: (index) => {
|
||||
Search._index = index;
|
||||
if (Search._queued_query !== null) {
|
||||
const query = Search._queued_query;
|
||||
Search._queued_query = null;
|
||||
Search.query(query);
|
||||
}
|
||||
},
|
||||
|
||||
hasIndex : function() {
|
||||
return this._index !== null;
|
||||
},
|
||||
hasIndex: () => Search._index !== null,
|
||||
|
||||
deferQuery : function(query) {
|
||||
this._queued_query = query;
|
||||
},
|
||||
deferQuery: (query) => (Search._queued_query = query),
|
||||
|
||||
stopPulse : function() {
|
||||
this._pulse_status = 0;
|
||||
},
|
||||
stopPulse: () => (Search._pulse_status = -1),
|
||||
|
||||
startPulse : function() {
|
||||
if (this._pulse_status >= 0)
|
||||
return;
|
||||
function pulse() {
|
||||
var i;
|
||||
startPulse: () => {
|
||||
if (Search._pulse_status >= 0) return;
|
||||
|
||||
const pulse = () => {
|
||||
Search._pulse_status = (Search._pulse_status + 1) % 4;
|
||||
var dotString = '';
|
||||
for (i = 0; i < Search._pulse_status; i++)
|
||||
dotString += '.';
|
||||
Search.dots.text(dotString);
|
||||
if (Search._pulse_status > -1)
|
||||
window.setTimeout(pulse, 500);
|
||||
}
|
||||
Search.dots.innerText = ".".repeat(Search._pulse_status);
|
||||
if (Search._pulse_status >= 0) window.setTimeout(pulse, 500);
|
||||
};
|
||||
pulse();
|
||||
},
|
||||
|
||||
/**
|
||||
* perform a search for something (or wait until index is loaded)
|
||||
*/
|
||||
performSearch : function(query) {
|
||||
performSearch: (query) => {
|
||||
// create the required interface elements
|
||||
this.out = $('#search-results');
|
||||
this.title = $('<h2>' + _('Searching') + '</h2>').appendTo(this.out);
|
||||
this.dots = $('<span></span>').appendTo(this.title);
|
||||
this.status = $('<p class="search-summary"> </p>').appendTo(this.out);
|
||||
this.output = $('<ul class="search"/>').appendTo(this.out);
|
||||
const searchText = document.createElement("h2");
|
||||
searchText.textContent = _("Searching");
|
||||
const searchSummary = document.createElement("p");
|
||||
searchSummary.classList.add("search-summary");
|
||||
searchSummary.innerText = "";
|
||||
const searchList = document.createElement("ul");
|
||||
searchList.classList.add("search");
|
||||
|
||||
$('#search-progress').text(_('Preparing search...'));
|
||||
this.startPulse();
|
||||
const out = document.getElementById("search-results");
|
||||
Search.title = out.appendChild(searchText);
|
||||
Search.dots = Search.title.appendChild(document.createElement("span"));
|
||||
Search.status = out.appendChild(searchSummary);
|
||||
Search.output = out.appendChild(searchList);
|
||||
|
||||
const searchProgress = document.getElementById("search-progress");
|
||||
// Some themes don't use the search progress node
|
||||
if (searchProgress) {
|
||||
searchProgress.innerText = _("Preparing search...");
|
||||
}
|
||||
Search.startPulse();
|
||||
|
||||
// index already loaded, the browser was quick!
|
||||
if (this.hasIndex())
|
||||
this.query(query);
|
||||
else
|
||||
this.deferQuery(query);
|
||||
if (Search.hasIndex()) Search.query(query);
|
||||
else Search.deferQuery(query);
|
||||
},
|
||||
|
||||
/**
|
||||
* execute search (requires search index to be loaded)
|
||||
*/
|
||||
query : function(query) {
|
||||
var i;
|
||||
query: (query) => {
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const titles = Search._index.titles;
|
||||
const allTitles = Search._index.alltitles;
|
||||
const indexEntries = Search._index.indexentries;
|
||||
|
||||
// stem the searchterms and add them to the correct list
|
||||
var stemmer = new Stemmer();
|
||||
var searchterms = [];
|
||||
var excluded = [];
|
||||
var hlterms = [];
|
||||
var tmp = splitQuery(query);
|
||||
var objectterms = [];
|
||||
for (i = 0; i < tmp.length; i++) {
|
||||
if (tmp[i] !== "") {
|
||||
objectterms.push(tmp[i].toLowerCase());
|
||||
}
|
||||
// stem the search terms and add them to the correct list
|
||||
const stemmer = new Stemmer();
|
||||
const searchTerms = new Set();
|
||||
const excludedTerms = new Set();
|
||||
const highlightTerms = new Set();
|
||||
const objectTerms = new Set(splitQuery(query.toLowerCase().trim()));
|
||||
splitQuery(query.trim()).forEach((queryTerm) => {
|
||||
const queryTermLower = queryTerm.toLowerCase();
|
||||
|
||||
// maybe skip this "word"
|
||||
// stopwords array is from language_data.js
|
||||
if (
|
||||
stopwords.indexOf(queryTermLower) !== -1 ||
|
||||
queryTerm.match(/^\d+$/)
|
||||
)
|
||||
return;
|
||||
|
||||
if ($u.indexOf(stopwords, tmp[i].toLowerCase()) != -1 || tmp[i] === "") {
|
||||
// skip this "word"
|
||||
continue;
|
||||
}
|
||||
// stem the word
|
||||
var word = stemmer.stemWord(tmp[i].toLowerCase());
|
||||
// prevent stemmer from cutting word smaller than two chars
|
||||
if(word.length < 3 && tmp[i].length >= 3) {
|
||||
word = tmp[i];
|
||||
}
|
||||
var toAppend;
|
||||
let word = stemmer.stemWord(queryTermLower);
|
||||
// select the correct list
|
||||
if (word[0] == '-') {
|
||||
toAppend = excluded;
|
||||
word = word.substr(1);
|
||||
}
|
||||
if (word[0] === "-") excludedTerms.add(word.substr(1));
|
||||
else {
|
||||
toAppend = searchterms;
|
||||
hlterms.push(tmp[i].toLowerCase());
|
||||
searchTerms.add(word);
|
||||
highlightTerms.add(queryTermLower);
|
||||
}
|
||||
// only add if not already in the list
|
||||
if (!$u.contains(toAppend, word))
|
||||
toAppend.push(word);
|
||||
});
|
||||
|
||||
if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js
|
||||
localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" "))
|
||||
}
|
||||
var highlightstring = '?highlight=' + $.urlencode(hlterms.join(" "));
|
||||
|
||||
// console.debug('SEARCH: searching for:');
|
||||
// console.info('required: ', searchterms);
|
||||
// console.info('excluded: ', excluded);
|
||||
// console.debug("SEARCH: searching for:");
|
||||
// console.info("required: ", [...searchTerms]);
|
||||
// console.info("excluded: ", [...excludedTerms]);
|
||||
|
||||
// prepare search
|
||||
var terms = this._index.terms;
|
||||
var titleterms = this._index.titleterms;
|
||||
// array of [docname, title, anchor, descr, score, filename]
|
||||
let results = [];
|
||||
_removeChildren(document.getElementById("search-progress"));
|
||||
|
||||
// array of [filename, title, anchor, descr, score]
|
||||
var results = [];
|
||||
$('#search-progress').empty();
|
||||
const queryLower = query.toLowerCase();
|
||||
for (const [title, foundTitles] of Object.entries(allTitles)) {
|
||||
if (title.toLowerCase().includes(queryLower) && (queryLower.length >= title.length/2)) {
|
||||
for (const [file, id] of foundTitles) {
|
||||
let score = Math.round(100 * queryLower.length / title.length)
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file] !== title ? `${titles[file]} > ${title}` : title,
|
||||
id !== null ? "#" + id : "",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// search for explicit entries in index directives
|
||||
for (const [entry, foundEntries] of Object.entries(indexEntries)) {
|
||||
if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) {
|
||||
for (const [file, id] of foundEntries) {
|
||||
let score = Math.round(100 * queryLower.length / entry.length)
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file],
|
||||
id ? "#" + id : "",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// lookup as object
|
||||
for (i = 0; i < objectterms.length; i++) {
|
||||
var others = [].concat(objectterms.slice(0, i),
|
||||
objectterms.slice(i+1, objectterms.length));
|
||||
results = results.concat(this.performObjectSearch(objectterms[i], others));
|
||||
}
|
||||
objectTerms.forEach((term) =>
|
||||
results.push(...Search.performObjectSearch(term, objectTerms))
|
||||
);
|
||||
|
||||
// lookup as search terms in fulltext
|
||||
results = results.concat(this.performTermsSearch(searchterms, excluded, terms, titleterms));
|
||||
results.push(...Search.performTermsSearch(searchTerms, excludedTerms));
|
||||
|
||||
// let the scorer override scores with a custom scoring function
|
||||
if (Scorer.score) {
|
||||
for (i = 0; i < results.length; i++)
|
||||
results[i][4] = Scorer.score(results[i]);
|
||||
}
|
||||
if (Scorer.score) results.forEach((item) => (item[4] = Scorer.score(item)));
|
||||
|
||||
// now sort the results by score (in opposite order of appearance, since the
|
||||
// display function below uses pop() to retrieve items) and then
|
||||
// alphabetically
|
||||
results.sort(function(a, b) {
|
||||
var left = a[4];
|
||||
var right = b[4];
|
||||
if (left > right) {
|
||||
return 1;
|
||||
| < | ||||