merging from protocols (aka v0.1.7)
This commit is contained in:
commit
e0718b6e1b
31
README.md
31
README.md
|
@ -13,6 +13,7 @@ for facilitating the analysis and interpretation of the experimental results.
|
|||
|
||||
### Last updates:
|
||||
|
||||
* Version 0.1.7 is released! major changes can be consulted [here](quapy/FCHANGE_LOG.txt).
|
||||
* A detailed documentation is now available [here](https://hlt-isti.github.io/QuaPy/)
|
||||
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)
|
||||
|
||||
|
@ -73,13 +74,14 @@ See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.
|
|||
## Features
|
||||
|
||||
* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
|
||||
quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
|
||||
* Versatile functionality for performing evaluation based on artificial sampling protocols.
|
||||
quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
|
||||
* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
|
||||
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
|
||||
* Datasets frequently used in quantification (textual and numeric), including:
|
||||
* 32 UCI Machine Learning datasets.
|
||||
* 11 Twitter quantification-by-sentiment datasets.
|
||||
* 3 product reviews quantification-by-sentiment datasets.
|
||||
* 4 tasks from LeQua competition (_new in v0.1.7!_)
|
||||
* Native support for binary and single-label multiclass quantification scenarios.
|
||||
* Model selection functionality that minimizes quantification-oriented loss functions.
|
||||
* Visualization tools for analysing the experimental results.
|
||||
|
@ -94,29 +96,6 @@ quantification methods based on structured output learning, HDy, QuaNet, and qua
|
|||
* pandas, xlrd
|
||||
* matplotlib
|
||||
|
||||
## SVM-perf with quantification-oriented losses
|
||||
In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
|
||||
SVM(AE), or SVM(RAE), you have to first download the
|
||||
[svmperf](http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
|
||||
package, apply the patch
|
||||
[svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch), and compile the sources.
|
||||
The script [prepare_svmperf.sh](prepare_svmperf.sh) does all the job. Simply run:
|
||||
|
||||
```
|
||||
./prepare_svmperf.sh
|
||||
```
|
||||
|
||||
The resulting directory [svm_perf_quantification](./svm_perf_quantification) contains the
|
||||
patched version of _svmperf_ with quantification-oriented losses.
|
||||
|
||||
The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is an extension of the patch made available by
|
||||
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
|
||||
that allows SVMperf to optimize for
|
||||
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
|
||||
and for the _KLD_ and _NKLD_ measures as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0).
|
||||
This patch extends the above one by also allowing SVMperf to optimize for
|
||||
_AE_ and _RAE_.
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
|
@ -127,6 +106,8 @@ are provided:
|
|||
|
||||
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
|
||||
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
|
||||
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
|
||||
* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)
|
||||
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
|
||||
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)
|
||||
|
|
15
TODO.txt
15
TODO.txt
|
@ -1,7 +1,20 @@
|
|||
sample_size should not be mandatory when qp.environ['SAMPLE_SIZE'] has been specified
|
||||
clean all the cumbersome methods that have to be implemented for new quantifiers (e.g., n_classes_ prop, etc.)
|
||||
make truly parallel the GridSearchQ
|
||||
make more examples in the "examples" directory
|
||||
merge with master, because I had to fix some problems with QuaNet due to an issue notified via GitHub!
|
||||
added cross_val_predict in qp.model_selection (i.e., a cross_val_predict for quantification) --would be nice to have
|
||||
it parallelized
|
||||
|
||||
check the OneVsAll module(s)
|
||||
|
||||
check the set_params de neural.py, because the separation of estimator__<param> is not implemented; see also
|
||||
__check_params_colision
|
||||
|
||||
HDy can be customized so that the number of bins is specified, instead of explored within the fit method
|
||||
|
||||
Packaging:
|
||||
==========================================
|
||||
Documentation with sphinx
|
||||
Document methods with paper references
|
||||
unit-tests
|
||||
clean wiki_examples!
|
||||
|
|
|
@ -2,23 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Datasets — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Datasets — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="quapy" href="modules.html" />
|
||||
<link rel="prev" title="Getting Started" href="readme.html" />
|
||||
<link rel="next" title="Evaluation" href="Evaluation.html" />
|
||||
<link rel="prev" title="Installation" href="Installation.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
@ -34,12 +37,12 @@
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="modules.html" title="quapy"
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="readme.html" title="Getting Started"
|
||||
<a href="Installation.html" title="Installation"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="datasets">
|
||||
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="datasets">
|
||||
<h1>Datasets<a class="headerlink" href="#datasets" title="Permalink to this heading">¶</a></h1>
|
||||
<p>QuaPy makes available several datasets that have been used in
|
||||
quantification literature, as well as an interface to allow
|
||||
anyone import their custom datasets.</p>
|
||||
|
@ -77,7 +80,7 @@ Take a look at the following code:</p>
|
|||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.17</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.33</span><span class="p">]</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>One can easily produce new samples at desired class prevalences:</p>
|
||||
<p>One can easily produce new samples at desired class prevalence values:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">sample_size</span> <span class="o">=</span> <span class="mi">10</span>
|
||||
<span class="n">prev</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span>
|
||||
<span class="n">sample</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prev</span><span class="p">)</span>
|
||||
|
@ -106,31 +109,12 @@ the indexes, that can then be used to generate the sample:</p>
|
|||
<span class="o">...</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>QuaPy also implements the artificial sampling protocol that produces (via a
|
||||
Python’s generator) a series of <em>LabelledCollection</em> objects with equidistant
|
||||
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">artificial_sampling_generator</span><span class="p">(</span><span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">(),</span> <span class="n">prec</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>produces one sampling for each (valid) combination of prevalences originating from
|
||||
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
|
||||
that is:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">,</span> <span class="mf">0.50</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.00</span><span class="p">,</span> <span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">]</span>
|
||||
<span class="o">...</span>
|
||||
<span class="p">[</span><span class="mf">1.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">,</span> <span class="mf">0.00</span><span class="p">]</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>See the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">Evaluation wiki</a> for
|
||||
further details on how to use the artificial sampling protocol to properly
|
||||
evaluate a quantification method.</p>
|
||||
<div class="section" id="reviews-datasets">
|
||||
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this headline">¶</a></h2>
|
||||
<p>However, generating samples for evaluation purposes is tackled in QuaPy
|
||||
by means of the evaluation protocols (see the dedicated entries in the Wiki
|
||||
for <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluation</a> and
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">protocols</a>).</p>
|
||||
<section id="reviews-datasets">
|
||||
<h2>Reviews Datasets<a class="headerlink" href="#reviews-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>Three datasets of reviews about Kindle devices, Harry Potter’s series, and
|
||||
the well-known IMDb movie reviews can be fetched using a unified interface.
|
||||
For example:</p>
|
||||
|
@ -150,47 +134,47 @@ For example:</p>
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>Some statistics of the fhe available datasets are summarized below:</p>
|
||||
<table class="colwidths-auto docutils align-default">
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="text-align:center head"><p>classes</p></th>
|
||||
<th class="text-align:center head"><p>train size</p></th>
|
||||
<th class="text-align:center head"><p>test size</p></th>
|
||||
<th class="text-align:center head"><p>train prev</p></th>
|
||||
<th class="text-align:center head"><p>test prev</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>train size</p></th>
|
||||
<th class="head text-center"><p>test size</p></th>
|
||||
<th class="head text-center"><p>train prev</p></th>
|
||||
<th class="head text-center"><p>test prev</p></th>
|
||||
<th class="head"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>hp</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>9533</p></td>
|
||||
<td class="text-align:center"><p>18399</p></td>
|
||||
<td class="text-align:center"><p>[0.018, 0.982]</p></td>
|
||||
<td class="text-align:center"><p>[0.065, 0.935]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>9533</p></td>
|
||||
<td class="text-center"><p>18399</p></td>
|
||||
<td class="text-center"><p>[0.018, 0.982]</p></td>
|
||||
<td class="text-center"><p>[0.065, 0.935]</p></td>
|
||||
<td><p>text</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>kindle</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>3821</p></td>
|
||||
<td class="text-align:center"><p>21591</p></td>
|
||||
<td class="text-align:center"><p>[0.081, 0.919]</p></td>
|
||||
<td class="text-align:center"><p>[0.063, 0.937]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>3821</p></td>
|
||||
<td class="text-center"><p>21591</p></td>
|
||||
<td class="text-center"><p>[0.081, 0.919]</p></td>
|
||||
<td class="text-center"><p>[0.063, 0.937]</p></td>
|
||||
<td><p>text</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>imdb</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>25000</p></td>
|
||||
<td class="text-align:center"><p>25000</p></td>
|
||||
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
|
||||
<td class="text-align:center"><p>[0.500, 0.500]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>25000</p></td>
|
||||
<td class="text-center"><p>25000</p></td>
|
||||
<td class="text-center"><p>[0.500, 0.500]</p></td>
|
||||
<td class="text-center"><p>[0.500, 0.500]</p></td>
|
||||
<td><p>text</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<div class="section" id="twitter-sentiment-datasets">
|
||||
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="twitter-sentiment-datasets">
|
||||
<h2>Twitter Sentiment Datasets<a class="headerlink" href="#twitter-sentiment-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>11 Twitter datasets for sentiment analysis.
|
||||
Text is not accessible, and the documents were made available
|
||||
in tf-idf format. Each dataset presents two splits: a train/val
|
||||
|
@ -221,123 +205,123 @@ The lists of the Twitter dataset’s ids can be consulted in:</p>
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>Some details can be found below:</p>
|
||||
<table class="colwidths-auto docutils align-default">
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="text-align:center head"><p>classes</p></th>
|
||||
<th class="text-align:center head"><p>train size</p></th>
|
||||
<th class="text-align:center head"><p>test size</p></th>
|
||||
<th class="text-align:center head"><p>features</p></th>
|
||||
<th class="text-align:center head"><p>train prev</p></th>
|
||||
<th class="text-align:center head"><p>test prev</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>train size</p></th>
|
||||
<th class="head text-center"><p>test size</p></th>
|
||||
<th class="head text-center"><p>features</p></th>
|
||||
<th class="head text-center"><p>train prev</p></th>
|
||||
<th class="head text-center"><p>test prev</p></th>
|
||||
<th class="head"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>gasp</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>8788</p></td>
|
||||
<td class="text-align:center"><p>3765</p></td>
|
||||
<td class="text-align:center"><p>694582</p></td>
|
||||
<td class="text-align:center"><p>[0.421, 0.496, 0.082]</p></td>
|
||||
<td class="text-align:center"><p>[0.407, 0.507, 0.086]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>8788</p></td>
|
||||
<td class="text-center"><p>3765</p></td>
|
||||
<td class="text-center"><p>694582</p></td>
|
||||
<td class="text-center"><p>[0.421, 0.496, 0.082]</p></td>
|
||||
<td class="text-center"><p>[0.407, 0.507, 0.086]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>hcr</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>1594</p></td>
|
||||
<td class="text-align:center"><p>798</p></td>
|
||||
<td class="text-align:center"><p>222046</p></td>
|
||||
<td class="text-align:center"><p>[0.546, 0.211, 0.243]</p></td>
|
||||
<td class="text-align:center"><p>[0.640, 0.167, 0.193]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>1594</p></td>
|
||||
<td class="text-center"><p>798</p></td>
|
||||
<td class="text-center"><p>222046</p></td>
|
||||
<td class="text-center"><p>[0.546, 0.211, 0.243]</p></td>
|
||||
<td class="text-center"><p>[0.640, 0.167, 0.193]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>omd</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>1839</p></td>
|
||||
<td class="text-align:center"><p>787</p></td>
|
||||
<td class="text-align:center"><p>199151</p></td>
|
||||
<td class="text-align:center"><p>[0.463, 0.271, 0.266]</p></td>
|
||||
<td class="text-align:center"><p>[0.437, 0.283, 0.280]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>1839</p></td>
|
||||
<td class="text-center"><p>787</p></td>
|
||||
<td class="text-center"><p>199151</p></td>
|
||||
<td class="text-center"><p>[0.463, 0.271, 0.266]</p></td>
|
||||
<td class="text-center"><p>[0.437, 0.283, 0.280]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>sanders</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>2155</p></td>
|
||||
<td class="text-align:center"><p>923</p></td>
|
||||
<td class="text-align:center"><p>229399</p></td>
|
||||
<td class="text-align:center"><p>[0.161, 0.691, 0.148]</p></td>
|
||||
<td class="text-align:center"><p>[0.164, 0.688, 0.148]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>2155</p></td>
|
||||
<td class="text-center"><p>923</p></td>
|
||||
<td class="text-center"><p>229399</p></td>
|
||||
<td class="text-center"><p>[0.161, 0.691, 0.148]</p></td>
|
||||
<td class="text-center"><p>[0.164, 0.688, 0.148]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>semeval13</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>11338</p></td>
|
||||
<td class="text-align:center"><p>3813</p></td>
|
||||
<td class="text-align:center"><p>1215742</p></td>
|
||||
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-align:center"><p>[0.158, 0.430, 0.412]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>11338</p></td>
|
||||
<td class="text-center"><p>3813</p></td>
|
||||
<td class="text-center"><p>1215742</p></td>
|
||||
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-center"><p>[0.158, 0.430, 0.412]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>semeval14</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>11338</p></td>
|
||||
<td class="text-align:center"><p>1853</p></td>
|
||||
<td class="text-align:center"><p>1215742</p></td>
|
||||
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-align:center"><p>[0.109, 0.361, 0.530]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>11338</p></td>
|
||||
<td class="text-center"><p>1853</p></td>
|
||||
<td class="text-center"><p>1215742</p></td>
|
||||
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-center"><p>[0.109, 0.361, 0.530]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>semeval15</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>11338</p></td>
|
||||
<td class="text-align:center"><p>2390</p></td>
|
||||
<td class="text-align:center"><p>1215742</p></td>
|
||||
<td class="text-align:center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-align:center"><p>[0.153, 0.413, 0.434]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>11338</p></td>
|
||||
<td class="text-center"><p>2390</p></td>
|
||||
<td class="text-center"><p>1215742</p></td>
|
||||
<td class="text-center"><p>[0.159, 0.470, 0.372]</p></td>
|
||||
<td class="text-center"><p>[0.153, 0.413, 0.434]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>semeval16</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>8000</p></td>
|
||||
<td class="text-align:center"><p>2000</p></td>
|
||||
<td class="text-align:center"><p>889504</p></td>
|
||||
<td class="text-align:center"><p>[0.157, 0.351, 0.492]</p></td>
|
||||
<td class="text-align:center"><p>[0.163, 0.341, 0.497]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>8000</p></td>
|
||||
<td class="text-center"><p>2000</p></td>
|
||||
<td class="text-center"><p>889504</p></td>
|
||||
<td class="text-center"><p>[0.157, 0.351, 0.492]</p></td>
|
||||
<td class="text-center"><p>[0.163, 0.341, 0.497]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>sst</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>2971</p></td>
|
||||
<td class="text-align:center"><p>1271</p></td>
|
||||
<td class="text-align:center"><p>376132</p></td>
|
||||
<td class="text-align:center"><p>[0.261, 0.452, 0.288]</p></td>
|
||||
<td class="text-align:center"><p>[0.207, 0.481, 0.312]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>2971</p></td>
|
||||
<td class="text-center"><p>1271</p></td>
|
||||
<td class="text-center"><p>376132</p></td>
|
||||
<td class="text-center"><p>[0.261, 0.452, 0.288]</p></td>
|
||||
<td class="text-center"><p>[0.207, 0.481, 0.312]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wa</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>2184</p></td>
|
||||
<td class="text-align:center"><p>936</p></td>
|
||||
<td class="text-align:center"><p>248563</p></td>
|
||||
<td class="text-align:center"><p>[0.305, 0.414, 0.281]</p></td>
|
||||
<td class="text-align:center"><p>[0.282, 0.446, 0.272]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>2184</p></td>
|
||||
<td class="text-center"><p>936</p></td>
|
||||
<td class="text-center"><p>248563</p></td>
|
||||
<td class="text-center"><p>[0.305, 0.414, 0.281]</p></td>
|
||||
<td class="text-center"><p>[0.282, 0.446, 0.272]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wb</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>4259</p></td>
|
||||
<td class="text-align:center"><p>1823</p></td>
|
||||
<td class="text-align:center"><p>404333</p></td>
|
||||
<td class="text-align:center"><p>[0.270, 0.392, 0.337]</p></td>
|
||||
<td class="text-align:center"><p>[0.274, 0.392, 0.335]</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>4259</p></td>
|
||||
<td class="text-center"><p>1823</p></td>
|
||||
<td class="text-center"><p>404333</p></td>
|
||||
<td class="text-center"><p>[0.270, 0.392, 0.337]</p></td>
|
||||
<td class="text-center"><p>[0.274, 0.392, 0.335]</p></td>
|
||||
<td><p>sparse</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
<div class="section" id="uci-machine-learning">
|
||||
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="uci-machine-learning">
|
||||
<h2>UCI Machine Learning<a class="headerlink" href="#uci-machine-learning" title="Permalink to this heading">¶</a></h2>
|
||||
<p>A set of 32 datasets from the <a class="reference external" href="https://archive.ics.uci.edu/ml/datasets.php">UCI Machine Learning repository</a>
|
||||
used in:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Pérez</span><span class="o">-</span><span class="n">Gállego</span><span class="p">,</span> <span class="n">P</span><span class="o">.</span><span class="p">,</span> <span class="n">Quevedo</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">R</span><span class="o">.</span><span class="p">,</span> <span class="o">&</span> <span class="k">del</span> <span class="n">Coz</span><span class="p">,</span> <span class="n">J</span><span class="o">.</span> <span class="n">J</span><span class="o">.</span> <span class="p">(</span><span class="mi">2017</span><span class="p">)</span><span class="o">.</span>
|
||||
|
@ -371,252 +355,252 @@ training+test dataset at a time, following a kFCV protocol:</p>
|
|||
<p>Above code will allow to conduct a 2x5FCV evaluation on the “yeast” dataset.</p>
|
||||
<p>All datasets come in numerical form (dense matrices); some statistics
|
||||
are summarized below.</p>
|
||||
<table class="colwidths-auto docutils align-default">
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="text-align:center head"><p>classes</p></th>
|
||||
<th class="text-align:center head"><p>instances</p></th>
|
||||
<th class="text-align:center head"><p>features</p></th>
|
||||
<th class="text-align:center head"><p>prev</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>instances</p></th>
|
||||
<th class="head text-center"><p>features</p></th>
|
||||
<th class="head text-center"><p>prev</p></th>
|
||||
<th class="head"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>acute.a</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>120</p></td>
|
||||
<td class="text-align:center"><p>6</p></td>
|
||||
<td class="text-align:center"><p>[0.508, 0.492]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>120</p></td>
|
||||
<td class="text-center"><p>6</p></td>
|
||||
<td class="text-center"><p>[0.508, 0.492]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>acute.b</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>120</p></td>
|
||||
<td class="text-align:center"><p>6</p></td>
|
||||
<td class="text-align:center"><p>[0.583, 0.417]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>120</p></td>
|
||||
<td class="text-center"><p>6</p></td>
|
||||
<td class="text-center"><p>[0.583, 0.417]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>balance.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>625</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>625</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.539, 0.461]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>balance.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>625</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.922, 0.078]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>625</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.922, 0.078]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>balance.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>625</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.539, 0.461]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>625</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.539, 0.461]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>breast-cancer</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>683</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.350, 0.650]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>683</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.350, 0.650]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>cmc.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1473</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.573, 0.427]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1473</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.573, 0.427]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>cmc.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1473</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.774, 0.226]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1473</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.774, 0.226]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>cmc.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1473</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1473</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.653, 0.347]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>ctg.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>2126</p></td>
|
||||
<td class="text-align:center"><p>22</p></td>
|
||||
<td class="text-align:center"><p>[0.222, 0.778]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>2126</p></td>
|
||||
<td class="text-center"><p>22</p></td>
|
||||
<td class="text-center"><p>[0.222, 0.778]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>ctg.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>2126</p></td>
|
||||
<td class="text-align:center"><p>22</p></td>
|
||||
<td class="text-align:center"><p>[0.861, 0.139]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>2126</p></td>
|
||||
<td class="text-center"><p>22</p></td>
|
||||
<td class="text-center"><p>[0.861, 0.139]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>ctg.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>2126</p></td>
|
||||
<td class="text-align:center"><p>22</p></td>
|
||||
<td class="text-align:center"><p>[0.917, 0.083]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>2126</p></td>
|
||||
<td class="text-center"><p>22</p></td>
|
||||
<td class="text-center"><p>[0.917, 0.083]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>german</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1000</p></td>
|
||||
<td class="text-align:center"><p>24</p></td>
|
||||
<td class="text-align:center"><p>[0.300, 0.700]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>24</p></td>
|
||||
<td class="text-center"><p>[0.300, 0.700]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>haberman</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>306</p></td>
|
||||
<td class="text-align:center"><p>3</p></td>
|
||||
<td class="text-align:center"><p>[0.735, 0.265]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>306</p></td>
|
||||
<td class="text-center"><p>3</p></td>
|
||||
<td class="text-center"><p>[0.735, 0.265]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>ionosphere</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>351</p></td>
|
||||
<td class="text-align:center"><p>34</p></td>
|
||||
<td class="text-align:center"><p>[0.641, 0.359]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>351</p></td>
|
||||
<td class="text-center"><p>34</p></td>
|
||||
<td class="text-center"><p>[0.641, 0.359]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>iris.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>150</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>150</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.667, 0.333]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>iris.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>150</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>150</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.667, 0.333]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>iris.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>150</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.667, 0.333]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>150</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.667, 0.333]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>mammographic</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>830</p></td>
|
||||
<td class="text-align:center"><p>5</p></td>
|
||||
<td class="text-align:center"><p>[0.514, 0.486]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>830</p></td>
|
||||
<td class="text-center"><p>5</p></td>
|
||||
<td class="text-center"><p>[0.514, 0.486]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>pageblocks.5</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>5473</p></td>
|
||||
<td class="text-align:center"><p>10</p></td>
|
||||
<td class="text-align:center"><p>[0.979, 0.021]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>5473</p></td>
|
||||
<td class="text-center"><p>10</p></td>
|
||||
<td class="text-center"><p>[0.979, 0.021]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>semeion</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1593</p></td>
|
||||
<td class="text-align:center"><p>256</p></td>
|
||||
<td class="text-align:center"><p>[0.901, 0.099]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1593</p></td>
|
||||
<td class="text-center"><p>256</p></td>
|
||||
<td class="text-center"><p>[0.901, 0.099]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>sonar</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>208</p></td>
|
||||
<td class="text-align:center"><p>60</p></td>
|
||||
<td class="text-align:center"><p>[0.534, 0.466]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>208</p></td>
|
||||
<td class="text-center"><p>60</p></td>
|
||||
<td class="text-center"><p>[0.534, 0.466]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>spambase</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>4601</p></td>
|
||||
<td class="text-align:center"><p>57</p></td>
|
||||
<td class="text-align:center"><p>[0.606, 0.394]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>4601</p></td>
|
||||
<td class="text-center"><p>57</p></td>
|
||||
<td class="text-center"><p>[0.606, 0.394]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>spectf</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>267</p></td>
|
||||
<td class="text-align:center"><p>44</p></td>
|
||||
<td class="text-align:center"><p>[0.794, 0.206]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>267</p></td>
|
||||
<td class="text-center"><p>44</p></td>
|
||||
<td class="text-center"><p>[0.794, 0.206]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>tictactoe</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>958</p></td>
|
||||
<td class="text-align:center"><p>9</p></td>
|
||||
<td class="text-align:center"><p>[0.653, 0.347]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>958</p></td>
|
||||
<td class="text-center"><p>9</p></td>
|
||||
<td class="text-center"><p>[0.653, 0.347]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>transfusion</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>748</p></td>
|
||||
<td class="text-align:center"><p>4</p></td>
|
||||
<td class="text-align:center"><p>[0.762, 0.238]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>748</p></td>
|
||||
<td class="text-center"><p>4</p></td>
|
||||
<td class="text-center"><p>[0.762, 0.238]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wdbc</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>569</p></td>
|
||||
<td class="text-align:center"><p>30</p></td>
|
||||
<td class="text-align:center"><p>[0.627, 0.373]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>569</p></td>
|
||||
<td class="text-center"><p>30</p></td>
|
||||
<td class="text-center"><p>[0.627, 0.373]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wine.1</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>178</p></td>
|
||||
<td class="text-align:center"><p>13</p></td>
|
||||
<td class="text-align:center"><p>[0.669, 0.331]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>178</p></td>
|
||||
<td class="text-center"><p>13</p></td>
|
||||
<td class="text-center"><p>[0.669, 0.331]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wine.2</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>178</p></td>
|
||||
<td class="text-align:center"><p>13</p></td>
|
||||
<td class="text-align:center"><p>[0.601, 0.399]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>178</p></td>
|
||||
<td class="text-center"><p>13</p></td>
|
||||
<td class="text-center"><p>[0.601, 0.399]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wine.3</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>178</p></td>
|
||||
<td class="text-align:center"><p>13</p></td>
|
||||
<td class="text-align:center"><p>[0.730, 0.270]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>178</p></td>
|
||||
<td class="text-center"><p>13</p></td>
|
||||
<td class="text-center"><p>[0.730, 0.270]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>wine-q-red</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1599</p></td>
|
||||
<td class="text-align:center"><p>11</p></td>
|
||||
<td class="text-align:center"><p>[0.465, 0.535]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1599</p></td>
|
||||
<td class="text-center"><p>11</p></td>
|
||||
<td class="text-center"><p>[0.465, 0.535]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>wine-q-white</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>4898</p></td>
|
||||
<td class="text-align:center"><p>11</p></td>
|
||||
<td class="text-align:center"><p>[0.335, 0.665]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>4898</p></td>
|
||||
<td class="text-center"><p>11</p></td>
|
||||
<td class="text-center"><p>[0.335, 0.665]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>yeast</p></td>
|
||||
<td class="text-align:center"><p>2</p></td>
|
||||
<td class="text-align:center"><p>1484</p></td>
|
||||
<td class="text-align:center"><p>8</p></td>
|
||||
<td class="text-align:center"><p>[0.711, 0.289]</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>1484</p></td>
|
||||
<td class="text-center"><p>8</p></td>
|
||||
<td class="text-center"><p>[0.711, 0.289]</p></td>
|
||||
<td><p>dense</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="section" id="issues">
|
||||
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="issues">
|
||||
<h3>Issues:<a class="headerlink" href="#issues" title="Permalink to this heading">¶</a></h3>
|
||||
<p>All datasets will be downloaded automatically the first time they are requested, and
|
||||
stored in the <em>quapy_data</em> folder for faster further reuse.
|
||||
However, some datasets require special actions that at the moment are not fully
|
||||
|
@ -631,10 +615,82 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
|
|||
OS-dependent software manually. Information on how to do it will be printed the first
|
||||
time the dataset is invoked.</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
<section id="lequa-datasets">
|
||||
<h2>LeQua Datasets<a class="headerlink" href="#lequa-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>QuaPy also provides the datasets used for the LeQua competition.
|
||||
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
|
||||
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
|
||||
raw documents instead.
|
||||
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
|
||||
are multiclass quantification problems consisting of estimating the class prevalence
|
||||
values of 28 different merchandise products.</p>
|
||||
<p>Every task consists of a training set, a set of validation samples (for model selection)
|
||||
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
|
||||
(training) and two generation protocols (for validation and test samples), as follows:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">training</span><span class="p">,</span> <span class="n">val_generator</span><span class="p">,</span> <span class="n">test_generator</span> <span class="o">=</span> <span class="n">fetch_lequa2022</span><span class="p">(</span><span class="n">task</span><span class="o">=</span><span class="n">task</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>See the <code class="docutils literal notranslate"><span class="pre">lequa2022_experiments.py</span></code> in the examples folder for further details on how to
|
||||
carry out experiments using these datasets.</p>
|
||||
<p>The datasets are downloaded only once, and stored for fast reuse.</p>
|
||||
<p>Some statistics are summarized below:</p>
|
||||
<table class="docutils align-default">
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Dataset</p></th>
|
||||
<th class="head text-center"><p>classes</p></th>
|
||||
<th class="head text-center"><p>train size</p></th>
|
||||
<th class="head text-center"><p>validation samples</p></th>
|
||||
<th class="head text-center"><p>test samples</p></th>
|
||||
<th class="head text-center"><p>docs by sample</p></th>
|
||||
<th class="head text-center"><p>type</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p>T1A</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>250</p></td>
|
||||
<td class="text-center"><p>vector</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>T1B</p></td>
|
||||
<td class="text-center"><p>28</p></td>
|
||||
<td class="text-center"><p>20000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>vector</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>T2A</p></td>
|
||||
<td class="text-center"><p>2</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>250</p></td>
|
||||
<td class="text-center"><p>text</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p>T2B</p></td>
|
||||
<td class="text-center"><p>28</p></td>
|
||||
<td class="text-center"><p>20000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>5000</p></td>
|
||||
<td class="text-center"><p>1000</p></td>
|
||||
<td class="text-center"><p>text</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For further details on the datasets, we refer to the original
|
||||
<a class="reference external" href="https://ceur-ws.org/Vol-3180/paper-146.pdf">paper</a>:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Esuli</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Moreo</span><span class="p">,</span> <span class="n">A</span><span class="o">.</span><span class="p">,</span> <span class="n">Sebastiani</span><span class="p">,</span> <span class="n">F</span><span class="o">.</span><span class="p">,</span> <span class="o">&</span> <span class="n">Sperduti</span><span class="p">,</span> <span class="n">G</span><span class="o">.</span> <span class="p">(</span><span class="mi">2022</span><span class="p">)</span><span class="o">.</span>
|
||||
<span class="n">A</span> <span class="n">Detailed</span> <span class="n">Overview</span> <span class="n">of</span> <span class="n">LeQua</span><span class="o">@</span> <span class="n">CLEF</span> <span class="mi">2022</span><span class="p">:</span> <span class="n">Learning</span> <span class="n">to</span> <span class="n">Quantify</span><span class="o">.</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="adding-custom-datasets">
|
||||
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="adding-custom-datasets">
|
||||
<h2>Adding Custom Datasets<a class="headerlink" href="#adding-custom-datasets" title="Permalink to this heading">¶</a></h2>
|
||||
<p>QuaPy provides data loaders for simple formats dealing with
|
||||
text, following the format:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">class</span><span class="o">-</span><span class="nb">id</span> \<span class="n">t</span> <span class="n">first</span> <span class="n">document</span><span class="s1">'s pre-processed text </span><span class="se">\n</span>
|
||||
|
@ -664,17 +720,20 @@ all classes to be present in the collection).</p>
|
|||
paths, in order to create a training and test pair of <em>LabelledCollection</em>,
|
||||
e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
|
||||
<span class="n">train_path</span> <span class="o">=</span> <span class="s1">'../my_data/train.dat'</span>
|
||||
<span class="n">test_path</span> <span class="o">=</span> <span class="s1">'../my_data/test.dat'</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">my_custom_loader</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
|
||||
<span class="o">...</span>
|
||||
<span class="k">return</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">Dataset</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">my_custom_loader</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="data-processing">
|
||||
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="data-processing">
|
||||
<h3>Data Processing<a class="headerlink" href="#data-processing" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy implements a number of preprocessing functions in the package <em>qp.data.preprocessing</em>, including:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>text2tfidf</em>: tfidf vectorization</p></li>
|
||||
|
@ -683,9 +742,9 @@ e.g.:</p>
|
|||
that the column values have zero mean and unit variance).</p></li>
|
||||
<li><p><em>index</em>: transforms textual tokens into lists of numeric ids)</p></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -694,8 +753,9 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Datasets</a><ul>
|
||||
<li><a class="reference internal" href="#reviews-datasets">Reviews Datasets</a></li>
|
||||
<li><a class="reference internal" href="#twitter-sentiment-datasets">Twitter Sentiment Datasets</a></li>
|
||||
|
@ -703,6 +763,7 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
<li><a class="reference internal" href="#issues">Issues:</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#lequa-datasets">LeQua Datasets</a></li>
|
||||
<li><a class="reference internal" href="#adding-custom-datasets">Adding Custom Datasets</a><ul>
|
||||
<li><a class="reference internal" href="#data-processing">Data Processing</a></li>
|
||||
</ul>
|
||||
|
@ -711,12 +772,17 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="readme.html"
|
||||
title="previous chapter">Getting Started</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Installation.html"
|
||||
title="previous chapter">Installation</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Evaluation.html"
|
||||
title="next chapter">Evaluation</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -733,7 +799,7 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -748,18 +814,18 @@ that the column values have zero mean and unit variance).</p></li>
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="modules.html" title="quapy"
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="readme.html" title="Getting Started"
|
||||
<a href="Installation.html" title="Installation"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Datasets</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,22 +2,25 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Evaluation — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Evaluation — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="Quantification Methods" href="Methods.html" />
|
||||
<link rel="next" title="Protocols" href="Protocols.html" />
|
||||
<link rel="prev" title="Datasets" href="Datasets.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
|
@ -34,12 +37,12 @@
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Datasets.html" title="Datasets"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="evaluation">
|
||||
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="evaluation">
|
||||
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this heading">¶</a></h1>
|
||||
<p>Quantification is an appealing tool in scenarios of dataset shift,
|
||||
and particularly in scenarios of prior-probability shift.
|
||||
That is, the interest in estimating the class prevalences arises
|
||||
|
@ -62,8 +65,8 @@ to be unlikely (as is the case in general scenarios of
|
|||
machine learning governed by the iid assumption).
|
||||
In brief, quantification requires dedicated evaluation protocols,
|
||||
which are implemented in QuaPy and explained here.</p>
|
||||
<div class="section" id="error-measures">
|
||||
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="error-measures">
|
||||
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this heading">¶</a></h2>
|
||||
<p>The module quapy.error implements the following error measures for quantification:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>mae</em>: mean absolute error</p></li>
|
||||
|
@ -96,13 +99,13 @@ third argument, e.g.:</p>
|
|||
Traditionally, this value is set to 1/(2T) in past literature,
|
||||
with T the sampling size. One could either pass this value
|
||||
to the function each time, or to set a QuaPy’s environment
|
||||
variable <em>SAMPLE_SIZE</em> once, and ommit this argument
|
||||
variable <em>SAMPLE_SIZE</em> once, and omit this argument
|
||||
thereafter (recommended);
|
||||
e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># once for all</span>
|
||||
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">])</span> <span class="c1"># let's assume 3 classes</span>
|
||||
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">])</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'mrae(</span><span class="si">{</span><span class="n">true_prev</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">estim_prev</span><span class="si">}</span><span class="s1">) = </span><span class="si">{</span><span class="n">error</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
|
@ -112,150 +115,95 @@ e.g.:</p>
|
|||
</div>
|
||||
<p>Finally, it is possible to instantiate QuaPy’s quantification
|
||||
error functions from strings using, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">ae_</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">'mse'</span><span class="p">)</span>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">'mse'</span><span class="p">)</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">error_function</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="evaluation-protocols">
|
||||
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this headline">¶</a></h2>
|
||||
<p>QuaPy implements the so-called “artificial sampling protocol”,
|
||||
according to which a test set is used to generate samplings at
|
||||
desired prevalences of fixed size and covering the full spectrum
|
||||
of prevalences. This protocol is called “artificial” in contrast
|
||||
to the “natural prevalence sampling” protocol that,
|
||||
despite introducing some variability during sampling, approximately
|
||||
preserves the training class prevalence.</p>
|
||||
<p>In the artificial sampling procol, the user specifies the number
|
||||
of (equally distant) points to be generated from the interval [0,1].</p>
|
||||
<p>For example, if n_prevpoints=11 then, for each class, the prevalences
|
||||
[0., 0.1, 0.2, …, 1.] will be used. This means that, for two classes,
|
||||
the number of different prevalences will be 11 (since, once the prevalence
|
||||
of one class is determined, the other one is constrained). For 3 classes,
|
||||
the number of valid combinations can be obtained as 11 + 10 + … + 1 = 66.
|
||||
In general, the number of valid combinations that will be produced for a given
|
||||
value of n_prevpoints can be consulted by invoking
|
||||
quapy.functional.num_prevalence_combinations, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="mi">21</span>
|
||||
<span class="n">n_classes</span> <span class="o">=</span> <span class="mi">4</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
</section>
|
||||
<section id="evaluation-protocols">
|
||||
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this heading">¶</a></h2>
|
||||
<p>An <em>evaluation protocol</em> is an evaluation procedure that uses
|
||||
one specific <em>sample generation procotol</em> to genereate many
|
||||
samples, typically characterized by widely varying amounts of
|
||||
<em>shift</em> with respect to the original distribution, that are then
|
||||
used to evaluate the performance of a (trained) quantifier.
|
||||
These protocols are explained in more detail in a dedicated <a class="reference internal" href="Protocols.html"><span class="doc std std-doc">entry
|
||||
in the wiki</span></a>. For the moment being, let us assume we already have
|
||||
chosen and instantiated one specific such protocol, that we here
|
||||
simply call <em>prot</em>. Let also assume our model is called
|
||||
<em>quantifier</em> and that our evaluatio measure of choice is
|
||||
<em>mae</em>. The evaluation comes down to:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mae</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE = </span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>in this example, n=1771. Note the last argument, n_repeats, that
|
||||
informs of the number of examples that will be generated for any
|
||||
valid combination (typical values are, e.g., 1 for a single sample,
|
||||
or 10 or higher for computing standard deviations of performing statistical
|
||||
significance tests).</p>
|
||||
<p>One can instead work the other way around, i.e., one could set a
|
||||
maximum budged of evaluations and get the number of prevalence points that
|
||||
will generate a number of evaluations close, but not higher, than
|
||||
the fixed budget. This can be achieved with the function
|
||||
quapy.functional.get_nprevpoints_approximation, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">budget</span> <span class="o">=</span> <span class="mi">5000</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">get_nprevpoints_approximation</span><span class="p">(</span><span class="n">budget</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'by setting n_prevpoints=</span><span class="si">{</span><span class="n">n_prevpoints</span><span class="si">}</span><span class="s1"> the number of evaluations for </span><span class="si">{</span><span class="n">n_classes</span><span class="si">}</span><span class="s1"> classes will be </span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<p>It is often desirable to evaluate our system using more than one
|
||||
single evaluatio measure. In this case, it is convenient to generate
|
||||
a <em>report</em>. A report in QuaPy is a dataframe accounting for all the
|
||||
true prevalence values with their corresponding prevalence values
|
||||
as estimated by the quantifier, along with the error each has given
|
||||
rise.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">report</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluation_report</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'mae'</span><span class="p">,</span> <span class="s1">'mrae'</span><span class="p">,</span> <span class="s1">'mkld'</span><span class="p">])</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>that will print:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">by</span> <span class="n">setting</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">30</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">evaluations</span> <span class="k">for</span> <span class="mi">4</span> <span class="n">classes</span> <span class="n">will</span> <span class="n">be</span> <span class="mi">4960</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The cost of evaluation will depend on the values of <em>n_prevpoints</em>, <em>n_classes</em>,
|
||||
and <em>n_repeats</em>. Since it might sometimes be cumbersome to control the overall
|
||||
cost of an experiment having to do with the number of combinations that
|
||||
will be generated for a particular setting of these arguments (particularly
|
||||
when <em>n_classes>2</em>), evaluation functions
|
||||
typically allow the user to rather specify an <em>evaluation budget</em>, i.e., a maximum
|
||||
number of samplings to generate. By specifying this argument, one could avoid
|
||||
specifying <em>n_prevpoints</em>, and the value for it that would lead to a closer
|
||||
number of evaluation budget, without surpassing it, will be automatically set.</p>
|
||||
<p>The following script shows a full example in which a PACC model relying
|
||||
on a Logistic Regressor classifier is
|
||||
tested on the <em>kindle</em> dataset by means of the artificial prevalence
|
||||
sampling protocol on samples of size 500, in terms of various
|
||||
evaluation metrics.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<p>From a pandas’ dataframe, it is straightforward to visualize all the results,
|
||||
and compute the averaged values, e.g.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">'display.expand_frame_repr'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">report</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span> <span class="o">=</span> <span class="n">report</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="p">)</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">)</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">preprocessing</span><span class="o">.</span><span class="n">text2tfidf</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
|
||||
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
|
||||
|
||||
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_report</span><span class="p">(</span>
|
||||
<span class="n">pacc</span><span class="p">,</span> <span class="c1"># the quantification method</span>
|
||||
<span class="n">test</span><span class="p">,</span> <span class="c1"># the test set on which the method will be evaluated</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="c1">#indicates the size of samples to be drawn</span>
|
||||
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="c1"># how many prevalence points will be extracted from the interval [0, 1] for each category</span>
|
||||
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># number of times each prevalence will be used to generate a test sample</span>
|
||||
<span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)</span>
|
||||
<span class="n">random_seed</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span> <span class="c1"># setting a random seed allows to replicate the test samples across runs</span>
|
||||
<span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'mae'</span><span class="p">,</span> <span class="s1">'mrae'</span><span class="p">,</span> <span class="s1">'mkld'</span><span class="p">],</span> <span class="c1"># specify the evaluation metrics</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># set to True to show some standard-line outputs</span>
|
||||
<span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Averaged values:'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The resulting report is a pandas’ dataframe that can be directly printed.
|
||||
Here, we set some display options from pandas just to make the output clearer;
|
||||
note also that the estimated prevalences are shown as strings using the
|
||||
function strprev function that simply converts a prevalence into a
|
||||
string representing it, with a fixed decimal precision (default 3):</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
|
||||
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">'display.expand_frame_repr'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s2">"precision"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
|
||||
<span class="n">df</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output should look like:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
|
||||
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.000</span><span class="p">,</span> <span class="mf">1.000</span><span class="p">]</span> <span class="mf">0.000</span> <span class="mf">0.000</span> <span class="mf">0.000e+00</span>
|
||||
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.091</span><span class="p">,</span> <span class="mf">0.909</span><span class="p">]</span> <span class="mf">0.009</span> <span class="mf">0.048</span> <span class="mf">4.426e-04</span>
|
||||
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.163</span><span class="p">,</span> <span class="mf">0.837</span><span class="p">]</span> <span class="mf">0.037</span> <span class="mf">0.114</span> <span class="mf">4.633e-03</span>
|
||||
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.283</span><span class="p">,</span> <span class="mf">0.717</span><span class="p">]</span> <span class="mf">0.017</span> <span class="mf">0.041</span> <span class="mf">7.383e-04</span>
|
||||
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.366</span><span class="p">,</span> <span class="mf">0.634</span><span class="p">]</span> <span class="mf">0.034</span> <span class="mf">0.070</span> <span class="mf">2.412e-03</span>
|
||||
<span class="mi">5</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.459</span><span class="p">,</span> <span class="mf">0.541</span><span class="p">]</span> <span class="mf">0.041</span> <span class="mf">0.082</span> <span class="mf">3.387e-03</span>
|
||||
<span class="mi">6</span> <span class="p">[</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.565</span><span class="p">,</span> <span class="mf">0.435</span><span class="p">]</span> <span class="mf">0.035</span> <span class="mf">0.073</span> <span class="mf">2.535e-03</span>
|
||||
<span class="mi">7</span> <span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.654</span><span class="p">,</span> <span class="mf">0.346</span><span class="p">]</span> <span class="mf">0.046</span> <span class="mf">0.108</span> <span class="mf">4.701e-03</span>
|
||||
<span class="mi">8</span> <span class="p">[</span><span class="mf">0.8</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.725</span><span class="p">,</span> <span class="mf">0.275</span><span class="p">]</span> <span class="mf">0.075</span> <span class="mf">0.235</span> <span class="mf">1.515e-02</span>
|
||||
<span class="mi">9</span> <span class="p">[</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.858</span><span class="p">,</span> <span class="mf">0.142</span><span class="p">]</span> <span class="mf">0.042</span> <span class="mf">0.229</span> <span class="mf">7.740e-03</span>
|
||||
<span class="mi">10</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.945</span><span class="p">,</span> <span class="mf">0.055</span><span class="p">]</span> <span class="mf">0.055</span> <span class="mf">27.357</span> <span class="mf">5.219e-02</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>One can get the averaged scores using standard pandas’
|
||||
functions, i.e.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>will produce the following output:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="mf">0.500</span>
|
||||
<span class="n">mae</span> <span class="mf">0.035</span>
|
||||
<span class="n">mrae</span> <span class="mf">2.578</span>
|
||||
<span class="n">mkld</span> <span class="mf">0.009</span>
|
||||
<p>This will produce an output like:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
|
||||
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.308</span><span class="p">,</span> <span class="mf">0.692</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.314</span><span class="p">,</span> <span class="mf">0.686</span><span class="p">]</span> <span class="mf">0.005649</span> <span class="mf">0.013182</span> <span class="mf">0.000074</span>
|
||||
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.896</span><span class="p">,</span> <span class="mf">0.104</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.909</span><span class="p">,</span> <span class="mf">0.091</span><span class="p">]</span> <span class="mf">0.013145</span> <span class="mf">0.069323</span> <span class="mf">0.000985</span>
|
||||
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.848</span><span class="p">,</span> <span class="mf">0.152</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.809</span><span class="p">,</span> <span class="mf">0.191</span><span class="p">]</span> <span class="mf">0.039063</span> <span class="mf">0.149806</span> <span class="mf">0.005175</span>
|
||||
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.016</span><span class="p">,</span> <span class="mf">0.984</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.033</span><span class="p">,</span> <span class="mf">0.967</span><span class="p">]</span> <span class="mf">0.017236</span> <span class="mf">0.487529</span> <span class="mf">0.005298</span>
|
||||
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.728</span><span class="p">,</span> <span class="mf">0.272</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.751</span><span class="p">,</span> <span class="mf">0.249</span><span class="p">]</span> <span class="mf">0.022769</span> <span class="mf">0.057146</span> <span class="mf">0.001350</span>
|
||||
<span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span>
|
||||
<span class="mi">4995</span> <span class="p">[</span><span class="mf">0.72</span><span class="p">,</span> <span class="mf">0.28</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.698</span><span class="p">,</span> <span class="mf">0.302</span><span class="p">]</span> <span class="mf">0.021752</span> <span class="mf">0.053631</span> <span class="mf">0.001133</span>
|
||||
<span class="mi">4996</span> <span class="p">[</span><span class="mf">0.868</span><span class="p">,</span> <span class="mf">0.132</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.888</span><span class="p">,</span> <span class="mf">0.112</span><span class="p">]</span> <span class="mf">0.020490</span> <span class="mf">0.088230</span> <span class="mf">0.001985</span>
|
||||
<span class="mi">4997</span> <span class="p">[</span><span class="mf">0.292</span><span class="p">,</span> <span class="mf">0.708</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.298</span><span class="p">,</span> <span class="mf">0.702</span><span class="p">]</span> <span class="mf">0.006149</span> <span class="mf">0.014788</span> <span class="mf">0.000090</span>
|
||||
<span class="mi">4998</span> <span class="p">[</span><span class="mf">0.24</span><span class="p">,</span> <span class="mf">0.76</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.220</span><span class="p">,</span> <span class="mf">0.780</span><span class="p">]</span> <span class="mf">0.019950</span> <span class="mf">0.054309</span> <span class="mf">0.001127</span>
|
||||
<span class="mi">4999</span> <span class="p">[</span><span class="mf">0.948</span><span class="p">,</span> <span class="mf">0.052</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.965</span><span class="p">,</span> <span class="mf">0.035</span><span class="p">]</span> <span class="mf">0.016941</span> <span class="mf">0.165776</span> <span class="mf">0.003538</span>
|
||||
|
||||
<span class="p">[</span><span class="mi">5000</span> <span class="n">rows</span> <span class="n">x</span> <span class="mi">5</span> <span class="n">columns</span><span class="p">]</span>
|
||||
<span class="n">Averaged</span> <span class="n">values</span><span class="p">:</span>
|
||||
<span class="n">mae</span> <span class="mf">0.023588</span>
|
||||
<span class="n">mrae</span> <span class="mf">0.108779</span>
|
||||
<span class="n">mkld</span> <span class="mf">0.003631</span>
|
||||
<span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>
|
||||
|
||||
<span class="n">Process</span> <span class="n">finished</span> <span class="k">with</span> <span class="n">exit</span> <span class="n">code</span> <span class="mi">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Other evaluation functions include:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>artificial_sampling_eval</em>: that computes the evaluation for a
|
||||
given evaluation metric, returning the average instead of a dataframe.</p></li>
|
||||
<li><p><em>artificial_sampling_prediction</em>: that returns two np.arrays containing the
|
||||
true prevalences and the estimated prevalences.</p></li>
|
||||
</ul>
|
||||
<p>See the documentation for further details.</p>
|
||||
</div>
|
||||
<p>Alternatively, we can simply generate all the predictions by:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>All the evaluation functions implement specific optimizations for speeding-up
|
||||
the evaluation of aggregative quantifiers (i.e., of instances of <em>AggregativeQuantifier</em>).
|
||||
The optimization comes down to generating classification predictions (either crisp or soft)
|
||||
only once for the entire test set, and then applying the sampling procedure to the
|
||||
predictions, instead of generating samples of instances and then computing the
|
||||
classification predictions every time. This is only possible when the protocol
|
||||
is an instance of <em>OnLabelledCollectionProtocol</em>. The optimization is only
|
||||
carried out when the number of classification predictions thus generated would be
|
||||
smaller than the number of predictions required for the entire protocol; e.g.,
|
||||
if the original dataset contains 1M instances, but the protocol is such that it would
|
||||
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
|
||||
classification for each sample. This behaviour is indicated by setting
|
||||
<em>aggr_speedup=”auto”</em>. Conversely, when indicating <em>aggr_speedup=”force”</em> QuaPy will
|
||||
precompute all the predictions irrespectively of the number of instances and number of samples.
|
||||
Finally, this can be deactivated by setting <em>aggr_speedup=False</em>. Note that this optimization
|
||||
is not only applied for the final evaluation, but also for the internal evaluations carried
|
||||
out during <em>model selection</em>. Since these are typically many, the heuristic can help reduce the
|
||||
execution time a lot.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -264,8 +212,9 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Evaluation</a><ul>
|
||||
<li><a class="reference internal" href="#error-measures">Error Measures</a></li>
|
||||
<li><a class="reference internal" href="#evaluation-protocols">Evaluation Protocols</a></li>
|
||||
|
@ -273,12 +222,17 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="previous chapter">Datasets</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Methods.html"
|
||||
title="next chapter">Quantification Methods</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="previous chapter">Datasets</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Protocols.html"
|
||||
title="next chapter">Protocols</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -295,7 +249,7 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -310,18 +264,18 @@ true prevalences and the estimated prevalences.</p></li>
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Datasets.html" title="Datasets"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,18 +2,21 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Installation — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Installation — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
@ -39,7 +42,7 @@
|
|||
<li class="right" >
|
||||
<a href="index.html" title="Welcome to QuaPy’s documentation!"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -49,15 +52,15 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="section" id="installation">
|
||||
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="installation">
|
||||
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this heading">¶</a></h1>
|
||||
<p>QuaPy can be easily installed via <cite>pip</cite></p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">quapy</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>See <a class="reference external" href="https://pypi.org/project/QuaPy/">pip page</a> for older versions.</p>
|
||||
<div class="section" id="requirements">
|
||||
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="requirements">
|
||||
<h2>Requirements<a class="headerlink" href="#requirements" title="Permalink to this heading">¶</a></h2>
|
||||
<ul class="simple">
|
||||
<li><p>scikit-learn, numpy, scipy</p></li>
|
||||
<li><p>pytorch (for QuaNet)</p></li>
|
||||
|
@ -67,9 +70,9 @@
|
|||
<li><p>pandas, xlrd</p></li>
|
||||
<li><p>matplotlib</p></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="svm-perf-with-quantification-oriented-losses">
|
||||
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="svm-perf-with-quantification-oriented-losses">
|
||||
<h2>SVM-perf with quantification-oriented losses<a class="headerlink" href="#svm-perf-with-quantification-oriented-losses" title="Permalink to this heading">¶</a></h2>
|
||||
<p>In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
|
||||
SVM(AE), or SVM(RAE), you have to first download the
|
||||
<a class="reference external" href="http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">svmperf</a>
|
||||
|
@ -96,8 +99,8 @@ and for the <cite>KLD</cite> and <cite>NKLD</cite> as proposed by
|
|||
for quantification.
|
||||
This patch extends the former by also allowing SVMperf to optimize for
|
||||
<cite>AE</cite> and <cite>RAE</cite>.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -106,8 +109,9 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Installation</a><ul>
|
||||
<li><a class="reference internal" href="#requirements">Requirements</a></li>
|
||||
<li><a class="reference internal" href="#svm-perf-with-quantification-oriented-losses">SVM-perf with quantification-oriented losses</a></li>
|
||||
|
@ -115,12 +119,17 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="index.html"
|
||||
title="previous chapter">Welcome to QuaPy’s documentation!</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="next chapter">Datasets</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="index.html"
|
||||
title="previous chapter">Welcome to QuaPy’s documentation!</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Datasets.html"
|
||||
title="next chapter">Datasets</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -137,7 +146,7 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -157,13 +166,13 @@ This patch extends the former by also allowing SVMperf to optimize for
|
|||
<li class="right" >
|
||||
<a href="index.html" title="Welcome to QuaPy’s documentation!"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,23 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Quantification Methods — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Quantification Methods — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="Plotting" href="Plotting.html" />
|
||||
<link rel="prev" title="Evaluation" href="Evaluation.html" />
|
||||
<link rel="next" title="Model Selection" href="Model-Selection.html" />
|
||||
<link rel="prev" title="Protocols" href="Protocols.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
@ -34,12 +37,12 @@
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="quantification-methods">
|
||||
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="quantification-methods">
|
||||
<h1>Quantification Methods<a class="headerlink" href="#quantification-methods" title="Permalink to this heading">¶</a></h1>
|
||||
<p>Quantification methods can be categorized as belonging to
|
||||
<em>aggregative</em> and <em>non-aggregative</em> groups.
|
||||
Most methods included in QuaPy at the moment are of type <em>aggregative</em>
|
||||
|
@ -65,12 +68,6 @@ and implement some abstract methods:</p>
|
|||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span> <span class="o">...</span>
|
||||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span> <span class="o">...</span>
|
||||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span> <span class="o">...</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The meaning of those functions should be familiar to those
|
||||
|
@ -82,12 +79,12 @@ scikit-learn’ structure has not been adopted <em>as is</em> in QuaPy responds
|
|||
the fact that scikit-learn’s <em>predict</em> function is expected to return
|
||||
one output for each input element –e.g., a predicted label for each
|
||||
instance in a sample– while in quantification the output for a sample
|
||||
is one single array of class prevalences), while functions <em>set_params</em>
|
||||
and <em>get_params</em> allow a
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>
|
||||
to automate the process of hyperparameter search.</p>
|
||||
<div class="section" id="aggregative-methods">
|
||||
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this headline">¶</a></h2>
|
||||
is one single array of class prevalences).
|
||||
Quantifiers also extend from scikit-learn’s <code class="docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, in order
|
||||
to simplify the use of <em>set_params</em> and <em>get_params</em> used in
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selector</a>.</p>
|
||||
<section id="aggregative-methods">
|
||||
<h2>Aggregative Methods<a class="headerlink" href="#aggregative-methods" title="Permalink to this heading">¶</a></h2>
|
||||
<p>All quantification methods are implemented as part of the
|
||||
<em>qp.method</em> package. In particular, <em>aggregative</em> methods are defined in
|
||||
<em>qp.method.aggregative</em>, and extend <em>AggregativeQuantifier(BaseQuantifier)</em>.
|
||||
|
@ -103,12 +100,12 @@ The methods that any <em>aggregative</em> quantifier must implement are:</p>
|
|||
individual predictions of a classifier. Indeed, a default implementation
|
||||
of <em>BaseQuantifier.quantify</em> is already provided, which looks like:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">preclassify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">classif_predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Aggregative quantifiers are expected to maintain a classifier (which is
|
||||
accessed through the <em>@property</em> <em>learner</em>). This classifier is
|
||||
accessed through the <em>@property</em> <em>classifier</em>). This classifier is
|
||||
given as input to the quantifier, and can be already fit
|
||||
on external data (in which case, the <em>fit_learner</em> argument should
|
||||
be set to False), or be fit by the quantifier’s fit (default).</p>
|
||||
|
@ -118,12 +115,8 @@ aggregative methods, that should inherit from the abstract class
|
|||
The particularity of <em>probabilistic</em> aggregative methods (w.r.t.
|
||||
non-probabilistic ones), is that the default quantifier is defined
|
||||
in terms of the posterior probabilities returned by a probabilistic
|
||||
classifier, and not by the crisp decisions of a hard classifier; i.e.:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">classif_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">posterior_probabilities</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">classif_posteriors</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
classifier, and not by the crisp decisions of a hard classifier.
|
||||
In any case, the interface <em>classify(instances)</em> remains unchanged.</p>
|
||||
<p>One advantage of <em>aggregative</em> methods (either probabilistic or not)
|
||||
is that the evaluation according to any sampling procedure (e.g.,
|
||||
the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>)
|
||||
|
@ -133,8 +126,8 @@ reuse these predictions, without requiring to classify each element every time.
|
|||
QuaPy leverages this property to speed-up any procedure having to do with
|
||||
quantification over samples, as is customarily done in model selection or
|
||||
in evaluation.</p>
|
||||
<div class="section" id="the-classify-count-variants">
|
||||
<h3>The Classify & Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="the-classify-count-variants">
|
||||
<h3>The Classify & Count variants<a class="headerlink" href="#the-classify-count-variants" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy implements the four CC variants, i.e.:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>CC</em> (Classify & Count), the simplest aggregative quantifier; one that
|
||||
|
@ -150,9 +143,7 @@ with a SVM as the classifier:</p>
|
|||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="c1"># instantiate a classifier learner, in this case a SVM</span>
|
||||
<span class="n">svm</span> <span class="o">=</span> <span class="n">LinearSVC</span><span class="p">()</span>
|
||||
|
@ -196,7 +187,7 @@ e.g.:</p>
|
|||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PCC</span><span class="p">(</span><span class="n">svm</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'classifier:'</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'classifier:'</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this case, QuaPy will print:</p>
|
||||
|
@ -214,9 +205,9 @@ be applied to hard classifiers when <em>fit_learner=True</em>; an exception
|
|||
will be raised otherwise.</p>
|
||||
<p>Lastly, everything we said aboud ACC and PCC
|
||||
applies to PACC as well.</p>
|
||||
</div>
|
||||
<div class="section" id="expectation-maximization-emq">
|
||||
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this headline">¶</a></h3>
|
||||
</section>
|
||||
<section id="expectation-maximization-emq">
|
||||
<h3>Expectation Maximization (EMQ)<a class="headerlink" href="#expectation-maximization-emq" title="Permalink to this heading">¶</a></h3>
|
||||
<p>The Expectation Maximization Quantifier (EMQ), also known as
|
||||
the SLD, is available at <em>qp.method.aggregative.EMQ</em> or via the
|
||||
alias <em>qp.method.aggregative.ExpectationMaximizationQuantifier</em>.
|
||||
|
@ -241,13 +232,21 @@ experiments we have carried out.</p>
|
|||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="hellinger-distance-y-hdy">
|
||||
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this headline">¶</a></h3>
|
||||
<p>The method HDy is described in:</p>
|
||||
<p><em>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.</em></p>
|
||||
<p><em>New in v0.1.7</em>: EMQ now accepts two new parameters in the construction method, namely
|
||||
<em>exact_train_prev</em> which allows to use the true training prevalence as the departing
|
||||
prevalence estimation (default behaviour), or instead an approximation of it as
|
||||
suggested by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>
|
||||
(by setting <em>exact_train_prev=False</em>).
|
||||
The other parameter is <em>recalib</em> which allows to indicate a calibration method, among those
|
||||
proposed by <a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. (2020)</a>,
|
||||
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
|
||||
See the API documentation for further details.</p>
|
||||
</section>
|
||||
<section id="hellinger-distance-y-hdy">
|
||||
<h3>Hellinger Distance y (HDy)<a class="headerlink" href="#hellinger-distance-y-hdy" title="Permalink to this heading">¶</a></h3>
|
||||
<p>Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
<a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S0020025512004069">González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.</a></p>
|
||||
<p>It is implemented in <em>qp.method.aggregative.HDy</em> (also accessible
|
||||
through the allias <em>qp.method.aggregative.HellingerDistanceY</em>).
|
||||
This method works with a probabilistic classifier (hard classifiers
|
||||
|
@ -274,30 +273,48 @@ provided in QuaPy accepts only binary datasets.</p>
|
|||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the generalized
|
||||
“Distribution Matching” approaches for multiclass, inspired by the framework
|
||||
of <a class="reference external" href="https://arxiv.org/abs/1606.00868">Firat (2016)</a>. One can instantiate
|
||||
a variant of HDy for multiclass quantification as follows:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mutliclassHDy</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">DistributionMatching</span><span class="p">(</span><span class="n">classifier</span><span class="o">=</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">divergence</span><span class="o">=</span><span class="s1">'HD'</span><span class="p">,</span> <span class="n">cdf</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="section" id="explicit-loss-minimization">
|
||||
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this headline">¶</a></h3>
|
||||
<p><em>New in v0.1.7:</em> QuaPy now provides an implementation of the “DyS”
|
||||
framework proposed by <a class="reference external" href="https://ojs.aaai.org/index.php/AAAI/article/view/4376">Maletzke et al (2020)</a>
|
||||
and the “SMM” method proposed by <a class="reference external" href="https://ieeexplore.ieee.org/document/9260028">Hassan et al (2019)</a>
|
||||
(thanks to <em>Pablo González</em> for the contributions!)</p>
|
||||
</section>
|
||||
<section id="threshold-optimization-methods">
|
||||
<h3>Threshold Optimization methods<a class="headerlink" href="#threshold-optimization-methods" title="Permalink to this heading">¶</a></h3>
|
||||
<p><em>New in v0.1.7:</em> QuaPy now implements Forman’s threshold optimization methods;
|
||||
see, e.g., <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/1150402.1150423">(Forman 2006)</a>
|
||||
and <a class="reference external" href="https://link.springer.com/article/10.1007/s10618-008-0097-y">(Forman 2008)</a>.
|
||||
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.</p>
|
||||
</section>
|
||||
<section id="explicit-loss-minimization">
|
||||
<h3>Explicit Loss Minimization<a class="headerlink" href="#explicit-loss-minimization" title="Permalink to this heading">¶</a></h3>
|
||||
<p>The Explicit Loss Minimization (ELM) represent a family of methods
|
||||
based on structured output learning, i.e., quantifiers relying on
|
||||
classifiers that have been optimized targeting a
|
||||
quantification-oriented evaluation measure.</p>
|
||||
<p>In QuaPy, the following methods, all relying on Joachim’s
|
||||
<a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
|
||||
implementation, are available in <em>qp.method.aggregative</em>:</p>
|
||||
quantification-oriented evaluation measure.
|
||||
The original methods are implemented in QuaPy as classify & count (CC)
|
||||
quantifiers that use Joachim’s <a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVMperf</a>
|
||||
as the underlying classifier, properly set to optimize for the desired loss.</p>
|
||||
<p>In QuaPy, this can be more achieved by calling the functions:</p>
|
||||
<ul class="simple">
|
||||
<li><p>SVMQ (SVM-Q) is a quantification method optimizing the metric <em>Q</em> defined
|
||||
in <em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604.</em></p></li>
|
||||
<li><p>SVMKLD (SVM for Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
|
||||
<li><p><em>newSVMQ</em>: returns the quantification method called SVM(Q) that optimizes for the metric <em>Q</em> defined
|
||||
in <a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S003132031400291X"><em>Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604.</em></a></p></li>
|
||||
<li><p><em>newSVMKLD</em> and <em>newSVMNKLD</em>: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
|
||||
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in <a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/2700406"><em>Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
|
||||
<li><p>SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in <em>Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></p></li>
|
||||
<li><p>SVMAE (SVM for Mean Absolute Error)</p></li>
|
||||
<li><p>SVMRAE (SVM for Mean Relative Absolute Error)</p></li>
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27.</em></a></p></li>
|
||||
<li><p><em>newSVMAE</em> and <em>newSVMRAE</em>: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
|
||||
(Mean) Relative Absolute Error, as first used by
|
||||
<a class="reference external" href="https://arxiv.org/abs/2011.02552"><em>Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23.</em></a></p></li>
|
||||
</ul>
|
||||
<p>the last two methods (SVMAE and SVMRAE) have been implemented in
|
||||
<p>the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
|
||||
QuaPy in order to make available ELM variants for what nowadays
|
||||
are considered the most well-behaved evaluation metrics in quantification.</p>
|
||||
<p>In order to make these models work, you would need to run the script
|
||||
|
@ -327,11 +344,15 @@ currently supports only binary classification.
|
|||
ELM variants (any binary quantifier in general) can be extended
|
||||
to operate in single-label scenarios trivially by adopting a
|
||||
“one-vs-all” strategy (as, e.g., in
|
||||
<em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22</em>).
|
||||
In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
|
||||
<a class="reference external" href="https://link.springer.com/article/10.1007/s13278-016-0327-z"><em>Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22</em></a>).
|
||||
In QuaPy this is possible by using the <em>OneVsAll</em> class.</p>
|
||||
<p>There are two ways for instantiating this class, <em>OneVsAllGeneric</em> that works for
|
||||
any quantifier, and <em>OneVsAllAggregative</em> that is optimized for aggregative quantifiers.
|
||||
In general, you can simply use the <em>getOneVsAll</em> function and QuaPy will choose
|
||||
the more convenient of the two.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span><span class="p">,</span> <span class="n">OneVsAll</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">SVMQ</span>
|
||||
|
||||
<span class="c1"># load a single-label dataset (this one contains 3 classes)</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
@ -339,30 +360,32 @@ In QuaPy this is possible by using the <em>OneVsAll</em> class:</p>
|
|||
<span class="c1"># let qp know where svmperf is</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SVMPERF_HOME'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'../svm_perf_quantification'</span>
|
||||
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">OneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">getOneVsAll</span><span class="p">(</span><span class="n">SVMQ</span><span class="p">(),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># run them on parallel</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="meta-models">
|
||||
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this headline">¶</a></h2>
|
||||
<p>Check the examples <em><span class="xref myst">explicit_loss_minimization.py</span></em>
|
||||
and <span class="xref myst">one_vs_all.py</span> for more details.</p>
|
||||
</section>
|
||||
</section>
|
||||
<section id="meta-models">
|
||||
<h2>Meta Models<a class="headerlink" href="#meta-models" title="Permalink to this heading">¶</a></h2>
|
||||
<p>By <em>meta</em> models we mean quantification methods that are defined on top of other
|
||||
quantification methods, and that thus do not squarely belong to the aggregative nor
|
||||
the non-aggregative group (indeed, <em>meta</em> models could use quantifiers from any of those
|
||||
groups).
|
||||
<em>Meta</em> models are implemented in the <em>qp.method.meta</em> module.</p>
|
||||
<div class="section" id="ensembles">
|
||||
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this headline">¶</a></h3>
|
||||
<section id="ensembles">
|
||||
<h3>Ensembles<a class="headerlink" href="#ensembles" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy implements (some of) the variants proposed in:</p>
|
||||
<ul class="simple">
|
||||
<li><p><em>Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253516300628"><em>Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
|
||||
Information Fusion, 34, 87-100.</em></p></li>
|
||||
<li><p><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Information Fusion, 34, 87-100.</em></a></p></li>
|
||||
<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1566253517303652"><em>Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Dynamic ensemble selection for quantification tasks.
|
||||
Information Fusion, 45, 1-15.</em></p></li>
|
||||
Information Fusion, 45, 1-15.</em></a></p></li>
|
||||
</ul>
|
||||
<p>The following code shows how to instantiate an Ensemble of 30 <em>Adjusted Classify & Count</em> (ACC)
|
||||
quantifiers operating with a <em>Logistic Regressor</em> (LR) as the base classifier, and using the
|
||||
|
@ -391,14 +414,14 @@ the performance estimated for each member of the ensemble in terms of that evalu
|
|||
informs of the number of members to retain.</p>
|
||||
<p>Please, check the <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection">model selection</a>
|
||||
wiki if you want to optimize the hyperparameters of ensemble for classification or quantification.</p>
|
||||
</div>
|
||||
<div class="section" id="the-quanet-neural-network">
|
||||
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this headline">¶</a></h3>
|
||||
</section>
|
||||
<section id="the-quanet-neural-network">
|
||||
<h3>The QuaNet neural network<a class="headerlink" href="#the-quanet-neural-network" title="Permalink to this heading">¶</a></h3>
|
||||
<p>QuaPy offers an implementation of QuaNet, a deep learning model presented in:</p>
|
||||
<p><em>Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
<p><a class="reference external" href="https://dl.acm.org/doi/abs/10.1145/3269206.3269287"><em>Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
A recurrent neural network for sentiment quantification.
|
||||
In Proceedings of the 27th ACM International Conference on
|
||||
Information and Knowledge Management (pp. 1775-1778).</em></p>
|
||||
Information and Knowledge Management (pp. 1775-1778).</em></a></p>
|
||||
<p>This model requires <em>torch</em> to be installed.
|
||||
QuaNet also requires a classifier that can provide embedded representations
|
||||
of the inputs.
|
||||
|
@ -420,14 +443,14 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
<span class="n">learner</span> <span class="o">=</span> <span class="n">NeuralClassifierTrainer</span><span class="p">(</span><span class="n">cnn</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># train QuaNet</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">estim_prevalence</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -436,13 +459,15 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Quantification Methods</a><ul>
|
||||
<li><a class="reference internal" href="#aggregative-methods">Aggregative Methods</a><ul>
|
||||
<li><a class="reference internal" href="#the-classify-count-variants">The Classify & Count variants</a></li>
|
||||
<li><a class="reference internal" href="#expectation-maximization-emq">Expectation Maximization (EMQ)</a></li>
|
||||
<li><a class="reference internal" href="#hellinger-distance-y-hdy">Hellinger Distance y (HDy)</a></li>
|
||||
<li><a class="reference internal" href="#threshold-optimization-methods">Threshold Optimization methods</a></li>
|
||||
<li><a class="reference internal" href="#explicit-loss-minimization">Explicit Loss Minimization</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
|
@ -455,12 +480,17 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Evaluation.html"
|
||||
title="previous chapter">Evaluation</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Plotting.html"
|
||||
title="next chapter">Plotting</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Protocols.html"
|
||||
title="previous chapter">Protocols</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Model-Selection.html"
|
||||
title="next chapter">Model Selection</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -477,7 +507,7 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -492,18 +522,18 @@ In the following example, we show an instantiation of QuaNet that instead uses C
|
|||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Evaluation.html" title="Evaluation"
|
||||
<a href="Protocols.html" title="Protocols"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Quantification Methods</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,21 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Model Selection — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Model Selection — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="Plotting" href="Plotting.html" />
|
||||
<link rel="prev" title="Quantification Methods" href="Methods.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
@ -31,7 +36,13 @@
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -41,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="model-selection">
|
||||
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="model-selection">
|
||||
<h1>Model Selection<a class="headerlink" href="#model-selection" title="Permalink to this heading">¶</a></h1>
|
||||
<p>As a supervised machine learning task, quantification methods
|
||||
can strongly depend on a good choice of model hyper-parameters.
|
||||
The process whereby those hyper-parameters are chosen is
|
||||
|
@ -50,8 +61,8 @@ typically known as <em>Model Selection</em>, and typically consists of
|
|||
testing different settings and picking the one that performed
|
||||
best in a held-out validation set in terms of any given
|
||||
evaluation measure.</p>
|
||||
<div class="section" id="targeting-a-quantification-oriented-loss">
|
||||
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="targeting-a-quantification-oriented-loss">
|
||||
<h2>Targeting a Quantification-oriented loss<a class="headerlink" href="#targeting-a-quantification-oriented-loss" title="Permalink to this heading">¶</a></h2>
|
||||
<p>The task being optimized determines the evaluation protocol,
|
||||
i.e., the criteria according to which the performance of
|
||||
any given method for solving is to be assessed.
|
||||
|
@ -63,81 +74,91 @@ specifically designed for the task of quantification.</p>
|
|||
classification, and thus the model selection strategies
|
||||
customarily adopted in classification have simply been
|
||||
applied to quantification (see the next section).
|
||||
It has been argued in <em>Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
“Re-Assessing the” Classify and Count” Quantification Method.”
|
||||
arXiv preprint arXiv:2011.02552 (2020).</em>
|
||||
It has been argued in <a class="reference external" href="https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6">Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
Re-Assessing the “Classify and Count” Quantification Method.
|
||||
ECIR 2021: Advances in Information Retrieval pp 75–91.</a>
|
||||
that specific model selection strategies should
|
||||
be adopted for quantification. That is, model selection
|
||||
strategies for quantification should target
|
||||
quantification-oriented losses and be tested in a variety
|
||||
of scenarios exhibiting different degrees of prior
|
||||
probability shift.</p>
|
||||
<p>The class
|
||||
<em>qp.model_selection.GridSearchQ</em>
|
||||
implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that evaluates each<br />
|
||||
combination of hyper-parameters
|
||||
by means of a given quantification-oriented
|
||||
<p>The class <em>qp.model_selection.GridSearchQ</em> implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that <a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">evaluates</a>
|
||||
each combination of hyper-parameters by means of a given quantification-oriented
|
||||
error metric (e.g., any of the error functions implemented
|
||||
in <em>qp.error</em>) and according to the
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation"><em>artificial sampling protocol</em></a>.</p>
|
||||
<p>The following is an example of model selection for quantification:</p>
|
||||
in <em>qp.error</em>) and according to a
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">sampling generation protocol</a>.</p>
|
||||
<p>The following is an example (also included in the examples folder) of model selection for quantification:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">PCC</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">DistributionMatching</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="c1"># set a seed to replicate runs</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">500</span>
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">In this example, we show how to perform model selection on a DistributionMatching quantifier.</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'hp'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'N_JOBS'</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># explore hyper-parameters in parallel</span>
|
||||
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="c1"># The model will be returned by the fit method of GridSearchQ.</span>
|
||||
<span class="c1"># Model selection will be performed with a fixed budget of 1000 evaluations</span>
|
||||
<span class="c1"># for each hyper-parameter combination. The error to optimize is the MAE for</span>
|
||||
<span class="c1"># quantification, as evaluated on artificially drawn samples at prevalences </span>
|
||||
<span class="c1"># covering the entire spectrum on a held-out portion (40%) of the training set.</span>
|
||||
<span class="c1"># Every combination of hyper-parameters will be evaluated by confronting the</span>
|
||||
<span class="c1"># quantifier thus configured against a series of samples generated by means</span>
|
||||
<span class="c1"># of a sample generation protocol. For this example, we will use the</span>
|
||||
<span class="c1"># artificial-prevalence protocol (APP), that generates samples with prevalence</span>
|
||||
<span class="c1"># values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).</span>
|
||||
<span class="c1"># We devote 30% of the dataset for this exploration.</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
|
||||
<span class="n">protocol</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># We will explore a classification-dependent hyper-parameter (e.g., the 'C'</span>
|
||||
<span class="c1"># hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter</span>
|
||||
<span class="c1"># (e.g., the number of bins in a DistributionMatching quantifier.</span>
|
||||
<span class="c1"># Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"</span>
|
||||
<span class="c1"># in order to let the quantifier know this hyper-parameter belongs to its underlying</span>
|
||||
<span class="c1"># classifier.</span>
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">),</span>
|
||||
<span class="s1">'nbins'</span><span class="p">:</span> <span class="p">[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">64</span><span class="p">],</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="o">=</span><span class="n">PCC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">()),</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">10</span><span class="p">),</span> <span class="s1">'class_weight'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'balanced'</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span>
|
||||
<span class="n">eval_budget</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
|
||||
<span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span>
|
||||
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set</span>
|
||||
<span class="n">val_split</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span>
|
||||
<span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="n">param_grid</span><span class="p">,</span>
|
||||
<span class="n">protocol</span><span class="o">=</span><span class="n">protocol</span><span class="p">,</span>
|
||||
<span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="c1"># the error to optimize is the MAE (a quantification-oriented loss)</span>
|
||||
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># retrain on the whole labelled set once done</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">True</span> <span class="c1"># show information as the process goes on</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">best_model_</span>
|
||||
|
||||
<span class="c1"># evaluation in terms of MAE</span>
|
||||
<span class="n">results</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_eval</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">,</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="p">,</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span>
|
||||
<span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">101</span><span class="p">,</span>
|
||||
<span class="n">n_repetitions</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
|
||||
<span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span>
|
||||
<span class="p">)</span>
|
||||
<span class="c1"># we use the same evaluation protocol (APP) on the test set</span>
|
||||
<span class="n">mae_score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">),</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">)</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE=</span><span class="si">{</span><span class="n">results</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE=</span><span class="si">{</span><span class="n">mae_score</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this example, the system outputs:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>[GridSearchQ]: starting optimization with n_jobs=1
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': 'balanced'} got mae score 0.24987
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': None} got mae score 0.48135
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.001, 'class_weight': 'balanced'} got mae score 0.24866
|
||||
[...]
|
||||
[GridSearchQ]: checking hyperparams={'C': 100000.0, 'class_weight': None} got mae score 0.43676
|
||||
[GridSearchQ]: optimization finished: best params {'C': 0.1, 'class_weight': 'balanced'} (score=0.19982)
|
||||
[GridSearchQ]: refitting on the whole development set
|
||||
model selection ended: best hyper-parameters={'C': 0.1, 'class_weight': 'balanced'}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5005.54it/s]
|
||||
MAE=0.20342
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">starting</span> <span class="n">model</span> <span class="n">selection</span> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=-</span><span class="mi">1</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">64</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04021</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.1356</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04286</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2139</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">16</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.04888</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.2491</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">0.001</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">8</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.05163</span> <span class="p">[</span><span class="n">took</span> <span class="mf">1.5372</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">1000.0</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="n">got</span> <span class="n">mae</span> <span class="n">score</span> <span class="mf">0.02445</span> <span class="p">[</span><span class="n">took</span> <span class="mf">2.9056</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">optimization</span> <span class="n">finished</span><span class="p">:</span> <span class="n">best</span> <span class="n">params</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span> <span class="p">(</span><span class="n">score</span><span class="o">=</span><span class="mf">0.02234</span><span class="p">)</span> <span class="p">[</span><span class="n">took</span> <span class="mf">7.3114</span><span class="n">s</span><span class="p">]</span>
|
||||
<span class="p">[</span><span class="n">GridSearchQ</span><span class="p">]:</span> <span class="n">refitting</span> <span class="n">on</span> <span class="n">the</span> <span class="n">whole</span> <span class="n">development</span> <span class="nb">set</span>
|
||||
<span class="n">model</span> <span class="n">selection</span> <span class="n">ended</span><span class="p">:</span> <span class="n">best</span> <span class="n">hyper</span><span class="o">-</span><span class="n">parameters</span><span class="o">=</span><span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="mf">100.0</span><span class="p">,</span> <span class="s1">'nbins'</span><span class="p">:</span> <span class="mi">32</span><span class="p">}</span>
|
||||
<span class="n">MAE</span><span class="o">=</span><span class="mf">0.03102</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The parameter <em>val_split</em> can alternatively be used to indicate
|
||||
|
@ -145,9 +166,9 @@ a validation set (i.e., an instance of <em>LabelledCollection</em>) instead
|
|||
of a proportion. This could be useful if one wants to have control
|
||||
on the specific data split to be used across different model selection
|
||||
experiments.</p>
|
||||
</div>
|
||||
<div class="section" id="targeting-a-classification-oriented-loss">
|
||||
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="targeting-a-classification-oriented-loss">
|
||||
<h2>Targeting a Classification-oriented loss<a class="headerlink" href="#targeting-a-classification-oriented-loss" title="Permalink to this heading">¶</a></h2>
|
||||
<p>Optimizing a model for quantification could rather be
|
||||
computationally costly.
|
||||
In aggregative methods, one could alternatively try to optimize
|
||||
|
@ -161,32 +182,15 @@ The following code illustrates how to do that:</p>
|
|||
<span class="n">LogisticRegression</span><span class="p">(),</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="s1">'class_weight'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'balanced'</span><span class="p">,</span> <span class="kc">None</span><span class="p">]},</span>
|
||||
<span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">PCC</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'model selection ended: best hyper-parameters=</span><span class="si">{</span><span class="n">model</span><span class="o">.</span><span class="n">learner</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">DistributionMatching</span><span class="p">(</span><span class="n">learner</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this example, the system outputs:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>model selection ended: best hyper-parameters={'C': 10000.0, 'class_weight': None}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5379.55it/s]
|
||||
MAE=0.41734
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Note that the MAE is worse than the one we obtained when optimizing
|
||||
for quantification and, indeed, the hyper-parameters found optimal
|
||||
largely differ between the two selection modalities. The
|
||||
hyper-parameters C=10000 and class_weight=None have been found
|
||||
to work well for the specific training prevalence of the HP dataset,
|
||||
but these hyper-parameters turned out to be suboptimal when the
|
||||
class prevalences of the test set differs (as is indeed tested
|
||||
in scenarios of quantification).</p>
|
||||
<p>This is, however, not always the case, and one could, in practice,
|
||||
find examples
|
||||
in which optimizing for classification ends up resulting in a better
|
||||
quantifier than when optimizing for quantification.
|
||||
Nonetheless, this is theoretically unlikely to happen.</p>
|
||||
</div>
|
||||
</div>
|
||||
<p>However, this is conceptually flawed, since the model should be
|
||||
optimized for the task at hand (quantification), and not for a surrogate task (classification),
|
||||
i.e., the model should be requested to deliver low quantification errors, rather
|
||||
than low classification errors.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -195,8 +199,9 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Model Selection</a><ul>
|
||||
<li><a class="reference internal" href="#targeting-a-quantification-oriented-loss">Targeting a Quantification-oriented loss</a></li>
|
||||
<li><a class="reference internal" href="#targeting-a-classification-oriented-loss">Targeting a Classification-oriented loss</a></li>
|
||||
|
@ -204,6 +209,17 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Methods.html"
|
||||
title="previous chapter">Quantification Methods</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Plotting.html"
|
||||
title="next chapter">Plotting</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -220,7 +236,7 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -234,13 +250,19 @@ Nonetheless, this is theoretically unlikely to happen.</p>
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Model Selection</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,23 +2,26 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Plotting — QuaPy 0.1.6 documentation</title>
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Plotting — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="quapy" href="modules.html" />
|
||||
<link rel="prev" title="Quantification Methods" href="Methods.html" />
|
||||
<link rel="prev" title="Model Selection" href="Model-Selection.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
|
@ -37,9 +40,9 @@
|
|||
<a href="modules.html" title="quapy"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -49,8 +52,8 @@
|
|||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="tex2jax_ignore mathjax_ignore section" id="plotting">
|
||||
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this headline">¶</a></h1>
|
||||
<section id="plotting">
|
||||
<h1>Plotting<a class="headerlink" href="#plotting" title="Permalink to this heading">¶</a></h1>
|
||||
<p>The module <em>qp.plot</em> implements some basic plotting functions
|
||||
that can help analyse the performance of a quantification method.</p>
|
||||
<p>All plotting functions receive as inputs the outcomes of
|
||||
|
@ -91,7 +94,7 @@ quantification methods across different scenarios showcasing
|
|||
the accuracy of the quantifier in predicting class prevalences
|
||||
for a wide range of prior distributions. This can easily be
|
||||
achieved by means of the
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation">artificial sampling protocol</a>
|
||||
<a class="reference external" href="https://github.com/HLT-ISTI/QuaPy/wiki/Protocols">artificial sampling protocol</a>
|
||||
that is implemented in QuaPy.</p>
|
||||
<p>The following code shows how to perform one simple experiment
|
||||
in which the 4 <em>CC-variants</em>, all equipped with a linear SVM, are
|
||||
|
@ -100,6 +103,7 @@ tested across the entire spectrum of class priors (taking 21 splits
|
|||
of the interval [0,1], i.e., using prevalence steps of 0.05, and
|
||||
generating 100 random samples at each prevalence).</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">protocol</span> <span class="kn">import</span> <span class="n">APP</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">CC</span><span class="p">,</span> <span class="n">ACC</span><span class="p">,</span> <span class="n">PCC</span><span class="p">,</span> <span class="n">PACC</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
|
||||
|
||||
|
@ -108,28 +112,26 @@ generating 100 random samples at each prevalence).</p>
|
|||
<span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">base_classifier</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">LinearSVC</span><span class="p">(</span><span class="n">class_weight</span><span class="o">=</span><span class="s1">'balanced'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">models</span><span class="p">():</span>
|
||||
<span class="k">yield</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'CC'</span><span class="p">,</span> <span class="n">CC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'ACC'</span><span class="p">,</span> <span class="n">ACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'PCC'</span><span class="p">,</span> <span class="n">PCC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
<span class="k">yield</span> <span class="s1">'PACC'</span><span class="p">,</span> <span class="n">PACC</span><span class="p">(</span><span class="n">base_classifier</span><span class="p">())</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">test</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">method_name</span><span class="p">,</span> <span class="n">model</span> <span class="ow">in</span> <span class="n">models</span><span class="p">():</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
|
||||
|
||||
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">)</span>
|
||||
<span class="n">method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method_name</span><span class="p">)</span>
|
||||
<span class="n">true_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">true_prev</span><span class="p">)</span>
|
||||
<span class="n">estim_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
|
||||
<span class="n">tr_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">train</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span>
|
||||
|
||||
|
@ -137,8 +139,8 @@ generating 100 random samples at each prevalence).</p>
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>the plots that can be generated are explained below.</p>
|
||||
<div class="section" id="diagonal-plot">
|
||||
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this headline">¶</a></h2>
|
||||
<section id="diagonal-plot">
|
||||
<h2>Diagonal Plot<a class="headerlink" href="#diagonal-plot" title="Permalink to this heading">¶</a></h2>
|
||||
<p>The <em>diagonal</em> plot shows a very insightful view of the
|
||||
quantifier’s performance. It plots the predicted class
|
||||
prevalence (in the y-axis) against the true class prevalence
|
||||
|
@ -164,9 +166,9 @@ the complete list of arguments in the documentation).</p>
|
|||
<p>Finally, note how most quantifiers, and specially the “unadjusted”
|
||||
variants CC and PCC, are strongly biased towards the
|
||||
prevalence seen during training.</p>
|
||||
</div>
|
||||
<div class="section" id="quantification-bias">
|
||||
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="quantification-bias">
|
||||
<h2>Quantification bias<a class="headerlink" href="#quantification-bias" title="Permalink to this heading">¶</a></h2>
|
||||
<p>This plot aims at evincing the bias that any quantifier
|
||||
displays with respect to the training prevalences by
|
||||
means of <a class="reference external" href="https://en.wikipedia.org/wiki/Box_plot">box plots</a>.
|
||||
|
@ -196,21 +198,19 @@ IMDb dataset, and generate the bias plot again.
|
|||
This example can be run by rewritting the <em>gen_data()</em> function
|
||||
like this:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gen_data</span><span class="p">():</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">CC</span><span class="p">(</span><span class="n">LinearSVC</span><span class="p">())</span>
|
||||
|
||||
<span class="n">method_data</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">training_prevalence</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">,</span> <span class="mi">9</span><span class="p">):</span>
|
||||
<span class="n">training_size</span> <span class="o">=</span> <span class="mi">5000</span>
|
||||
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">training_prevalence</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">artificial_sampling_prediction</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">sample</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">],</span> <span class="n">n_repetitions</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="o">=</span><span class="mi">21</span>
|
||||
<span class="p">)</span>
|
||||
<span class="c1"># method names can contain Latex syntax</span>
|
||||
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">'CC$_{'</span> <span class="o">+</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span> <span class="o">*</span> <span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span> <span class="o">+</span> <span class="s1">'\%}$'</span>
|
||||
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
|
||||
<span class="c1"># since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained</span>
|
||||
<span class="n">train_sample</span> <span class="o">=</span> <span class="n">train</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">training_size</span><span class="p">,</span> <span class="mi">1</span><span class="o">-</span><span class="n">training_prevalence</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_sample</span><span class="p">)</span>
|
||||
<span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
|
||||
<span class="n">method_name</span> <span class="o">=</span> <span class="s1">'CC$_{'</span><span class="o">+</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span><span class="o">*</span><span class="n">training_prevalence</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span> <span class="o">+</span> <span class="s1">'\%}$'</span>
|
||||
<span class="n">method_data</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">,</span> <span class="n">train_sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()))</span>
|
||||
|
||||
<span class="k">return</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">method_data</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
|
@ -237,9 +237,9 @@ and a negative bias (or a tendency to underestimate) in cases of high prevalence
|
|||
<p><img alt="diag plot on IMDb" src="_images/bin_diag_cc.png" /></p>
|
||||
<p>showing pretty clearly the dependency of CC on the prior probabilities
|
||||
of the labeled set it was trained on.</p>
|
||||
</div>
|
||||
<div class="section" id="error-by-drift">
|
||||
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this headline">¶</a></h2>
|
||||
</section>
|
||||
<section id="error-by-drift">
|
||||
<h2>Error by Drift<a class="headerlink" href="#error-by-drift" title="Permalink to this heading">¶</a></h2>
|
||||
<p>Above discussed plots are useful for analyzing and comparing
|
||||
the performance of different quantification methods, but are
|
||||
limited to the binary case. The “error by drift” is a plot
|
||||
|
@ -270,8 +270,8 @@ In those cases, however, it is likely that the variances of each
|
|||
method get higher, to the detriment of the visualization.
|
||||
We recommend to set <em>show_std=False</em> in those cases
|
||||
in order to hide the color bands.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -280,8 +280,9 @@ in order to hide the color bands.</p>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Plotting</a><ul>
|
||||
<li><a class="reference internal" href="#diagonal-plot">Diagonal Plot</a></li>
|
||||
<li><a class="reference internal" href="#quantification-bias">Quantification bias</a></li>
|
||||
|
@ -290,12 +291,17 @@ in order to hide the color bands.</p>
|
|||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Methods.html"
|
||||
title="previous chapter">Quantification Methods</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Model-Selection.html"
|
||||
title="previous chapter">Model Selection</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -312,7 +318,7 @@ in order to hide the color bands.</p>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -330,15 +336,15 @@ in order to hide the color bands.</p>
|
|||
<a href="modules.html" title="quapy"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="Methods.html" title="Quantification Methods"
|
||||
<a href="Model-Selection.html" title="Model Selection"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Plotting</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -30,7 +30,8 @@ Output the class prevalences (showing 2 digit precision):
|
|||
[0.17, 0.50, 0.33]
|
||||
```
|
||||
|
||||
One can easily produce new samples at desired class prevalences:
|
||||
One can easily produce new samples at desired class prevalence values:
|
||||
|
||||
```python
|
||||
sample_size = 10
|
||||
prev = [0.4, 0.1, 0.5]
|
||||
|
@ -63,32 +64,10 @@ for method in methods:
|
|||
...
|
||||
```
|
||||
|
||||
QuaPy also implements the artificial sampling protocol that produces (via a
|
||||
Python's generator) a series of _LabelledCollection_ objects with equidistant
|
||||
prevalences ranging across the entire prevalence spectrum in the simplex space, e.g.:
|
||||
|
||||
```python
|
||||
for sample in data.artificial_sampling_generator(sample_size=100, n_prevalences=5):
|
||||
print(F.strprev(sample.prevalence(), prec=2))
|
||||
```
|
||||
|
||||
produces one sampling for each (valid) combination of prevalences originating from
|
||||
splitting the range [0,1] into n_prevalences=5 points (i.e., [0, 0.25, 0.5, 0.75, 1]),
|
||||
that is:
|
||||
```
|
||||
[0.00, 0.00, 1.00]
|
||||
[0.00, 0.25, 0.75]
|
||||
[0.00, 0.50, 0.50]
|
||||
[0.00, 0.75, 0.25]
|
||||
[0.00, 1.00, 0.00]
|
||||
[0.25, 0.00, 0.75]
|
||||
...
|
||||
[1.00, 0.00, 0.00]
|
||||
```
|
||||
|
||||
See the [Evaluation wiki](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) for
|
||||
further details on how to use the artificial sampling protocol to properly
|
||||
evaluate a quantification method.
|
||||
However, generating samples for evaluation purposes is tackled in QuaPy
|
||||
by means of the evaluation protocols (see the dedicated entries in the Wiki
|
||||
for [evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation) and
|
||||
[protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)).
|
||||
|
||||
|
||||
## Reviews Datasets
|
||||
|
@ -177,6 +156,8 @@ Some details can be found below:
|
|||
| sst | 3 | 2971 | 1271 | 376132 | [0.261, 0.452, 0.288] | [0.207, 0.481, 0.312] | sparse |
|
||||
| wa | 3 | 2184 | 936 | 248563 | [0.305, 0.414, 0.281] | [0.282, 0.446, 0.272] | sparse |
|
||||
| wb | 3 | 4259 | 1823 | 404333 | [0.270, 0.392, 0.337] | [0.274, 0.392, 0.335] | sparse |
|
||||
|
||||
|
||||
## UCI Machine Learning
|
||||
|
||||
A set of 32 datasets from the [UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets.php)
|
||||
|
@ -272,6 +253,46 @@ standard Pythons packages like gzip or zip. This file would need to be uncompres
|
|||
OS-dependent software manually. Information on how to do it will be printed the first
|
||||
time the dataset is invoked.
|
||||
|
||||
## LeQua Datasets
|
||||
|
||||
QuaPy also provides the datasets used for the LeQua competition.
|
||||
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
|
||||
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide
|
||||
raw documents instead.
|
||||
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B
|
||||
are multiclass quantification problems consisting of estimating the class prevalence
|
||||
values of 28 different merchandise products.
|
||||
|
||||
Every task consists of a training set, a set of validation samples (for model selection)
|
||||
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
|
||||
(training) and two generation protocols (for validation and test samples), as follows:
|
||||
|
||||
```python
|
||||
training, val_generator, test_generator = fetch_lequa2022(task=task)
|
||||
```
|
||||
|
||||
See the `lequa2022_experiments.py` in the examples folder for further details on how to
|
||||
carry out experiments using these datasets.
|
||||
|
||||
The datasets are downloaded only once, and stored for fast reuse.
|
||||
|
||||
Some statistics are summarized below:
|
||||
|
||||
| Dataset | classes | train size | validation samples | test samples | docs by sample | type |
|
||||
|---------|:-------:|:----------:|:------------------:|:------------:|:----------------:|:--------:|
|
||||
| T1A | 2 | 5000 | 1000 | 5000 | 250 | vector |
|
||||
| T1B | 28 | 20000 | 1000 | 5000 | 1000 | vector |
|
||||
| T2A | 2 | 5000 | 1000 | 5000 | 250 | text |
|
||||
| T2B | 28 | 20000 | 1000 | 5000 | 1000 | text |
|
||||
|
||||
For further details on the datasets, we refer to the original
|
||||
[paper](https://ceur-ws.org/Vol-3180/paper-146.pdf):
|
||||
|
||||
```
|
||||
Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
|
||||
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
|
||||
```
|
||||
|
||||
## Adding Custom Datasets
|
||||
|
||||
QuaPy provides data loaders for simple formats dealing with
|
||||
|
@ -312,12 +333,15 @@ e.g.:
|
|||
|
||||
```python
|
||||
import quapy as qp
|
||||
|
||||
train_path = '../my_data/train.dat'
|
||||
test_path = '../my_data/test.dat'
|
||||
|
||||
def my_custom_loader(path):
|
||||
with open(path, 'rb') as fin:
|
||||
...
|
||||
return instances, labels
|
||||
|
||||
data = qp.data.Dataset.load(train_path, test_path, my_custom_loader)
|
||||
```
|
||||
|
||||
|
|
|
@ -50,7 +50,7 @@ indicating the value for the smoothing parameter epsilon.
|
|||
Traditionally, this value is set to 1/(2T) in past literature,
|
||||
with T the sampling size. One could either pass this value
|
||||
to the function each time, or to set a QuaPy's environment
|
||||
variable _SAMPLE_SIZE_ once, and ommit this argument
|
||||
variable _SAMPLE_SIZE_ once, and omit this argument
|
||||
thereafter (recommended);
|
||||
e.g.:
|
||||
|
||||
|
@ -58,7 +58,7 @@ e.g.:
|
|||
qp.environ['SAMPLE_SIZE'] = 100 # once for all
|
||||
true_prev = np.asarray([0.5, 0.3, 0.2]) # let's assume 3 classes
|
||||
estim_prev = np.asarray([0.1, 0.3, 0.6])
|
||||
error = qp.ae_.mrae(true_prev, estim_prev)
|
||||
error = qp.error.mrae(true_prev, estim_prev)
|
||||
print(f'mrae({true_prev}, {estim_prev}) = {error:.3f}')
|
||||
```
|
||||
|
||||
|
@ -71,162 +71,99 @@ Finally, it is possible to instantiate QuaPy's quantification
|
|||
error functions from strings using, e.g.:
|
||||
|
||||
```python
|
||||
error_function = qp.ae_.from_name('mse')
|
||||
error_function = qp.error.from_name('mse')
|
||||
error = error_function(true_prev, estim_prev)
|
||||
```
|
||||
|
||||
## Evaluation Protocols
|
||||
|
||||
QuaPy implements the so-called "artificial sampling protocol",
|
||||
according to which a test set is used to generate samplings at
|
||||
desired prevalences of fixed size and covering the full spectrum
|
||||
of prevalences. This protocol is called "artificial" in contrast
|
||||
to the "natural prevalence sampling" protocol that,
|
||||
despite introducing some variability during sampling, approximately
|
||||
preserves the training class prevalence.
|
||||
|
||||
In the artificial sampling procol, the user specifies the number
|
||||
of (equally distant) points to be generated from the interval [0,1].
|
||||
|
||||
For example, if n_prevpoints=11 then, for each class, the prevalences
|
||||
[0., 0.1, 0.2, ..., 1.] will be used. This means that, for two classes,
|
||||
the number of different prevalences will be 11 (since, once the prevalence
|
||||
of one class is determined, the other one is constrained). For 3 classes,
|
||||
the number of valid combinations can be obtained as 11 + 10 + ... + 1 = 66.
|
||||
In general, the number of valid combinations that will be produced for a given
|
||||
value of n_prevpoints can be consulted by invoking
|
||||
quapy.functional.num_prevalence_combinations, e.g.:
|
||||
An _evaluation protocol_ is an evaluation procedure that uses
|
||||
one specific _sample generation procotol_ to genereate many
|
||||
samples, typically characterized by widely varying amounts of
|
||||
_shift_ with respect to the original distribution, that are then
|
||||
used to evaluate the performance of a (trained) quantifier.
|
||||
These protocols are explained in more detail in a dedicated [entry
|
||||
in the wiki](Protocols.md). For the moment being, let us assume we already have
|
||||
chosen and instantiated one specific such protocol, that we here
|
||||
simply call _prot_. Let also assume our model is called
|
||||
_quantifier_ and that our evaluatio measure of choice is
|
||||
_mae_. The evaluation comes down to:
|
||||
|
||||
```python
|
||||
import quapy.functional as F
|
||||
n_prevpoints = 21
|
||||
n_classes = 4
|
||||
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=prot, error_metric='mae')
|
||||
print(f'MAE = {mae:.4f}')
|
||||
```
|
||||
|
||||
in this example, n=1771. Note the last argument, n_repeats, that
|
||||
informs of the number of examples that will be generated for any
|
||||
valid combination (typical values are, e.g., 1 for a single sample,
|
||||
or 10 or higher for computing standard deviations of performing statistical
|
||||
significance tests).
|
||||
|
||||
One can instead work the other way around, i.e., one could set a
|
||||
maximum budged of evaluations and get the number of prevalence points that
|
||||
will generate a number of evaluations close, but not higher, than
|
||||
the fixed budget. This can be achieved with the function
|
||||
quapy.functional.get_nprevpoints_approximation, e.g.:
|
||||
It is often desirable to evaluate our system using more than one
|
||||
single evaluatio measure. In this case, it is convenient to generate
|
||||
a _report_. A report in QuaPy is a dataframe accounting for all the
|
||||
true prevalence values with their corresponding prevalence values
|
||||
as estimated by the quantifier, along with the error each has given
|
||||
rise.
|
||||
|
||||
```python
|
||||
budget = 5000
|
||||
n_prevpoints = F.get_nprevpoints_approximation(budget, n_classes, n_repeats=1)
|
||||
n = F.num_prevalence_combinations(n_prevpoints, n_classes, n_repeats=1)
|
||||
print(f'by setting n_prevpoints={n_prevpoints} the number of evaluations for {n_classes} classes will be {n}')
|
||||
```
|
||||
that will print:
|
||||
```
|
||||
by setting n_prevpoints=30 the number of evaluations for 4 classes will be 4960
|
||||
report = qp.evaluation.evaluation_report(quantifier, protocol=prot, error_metrics=['mae', 'mrae', 'mkld'])
|
||||
```
|
||||
|
||||
The cost of evaluation will depend on the values of _n_prevpoints_, _n_classes_,
|
||||
and _n_repeats_. Since it might sometimes be cumbersome to control the overall
|
||||
cost of an experiment having to do with the number of combinations that
|
||||
will be generated for a particular setting of these arguments (particularly
|
||||
when _n_classes>2_), evaluation functions
|
||||
typically allow the user to rather specify an _evaluation budget_, i.e., a maximum
|
||||
number of samplings to generate. By specifying this argument, one could avoid
|
||||
specifying _n_prevpoints_, and the value for it that would lead to a closer
|
||||
number of evaluation budget, without surpassing it, will be automatically set.
|
||||
|
||||
The following script shows a full example in which a PACC model relying
|
||||
on a Logistic Regressor classifier is
|
||||
tested on the _kindle_ dataset by means of the artificial prevalence
|
||||
sampling protocol on samples of size 500, in terms of various
|
||||
evaluation metrics.
|
||||
|
||||
````python
|
||||
import quapy as qp
|
||||
import quapy.functional as F
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 500
|
||||
|
||||
dataset = qp.datasets.fetch_reviews('kindle')
|
||||
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
|
||||
|
||||
training = dataset.training
|
||||
test = dataset.test
|
||||
|
||||
lr = LogisticRegression()
|
||||
pacc = qp.method.aggregative.PACC(lr)
|
||||
|
||||
pacc.fit(training)
|
||||
|
||||
df = qp.evaluation.artificial_sampling_report(
|
||||
pacc, # the quantification method
|
||||
test, # the test set on which the method will be evaluated
|
||||
sample_size=qp.environ['SAMPLE_SIZE'], #indicates the size of samples to be drawn
|
||||
n_prevpoints=11, # how many prevalence points will be extracted from the interval [0, 1] for each category
|
||||
n_repetitions=1, # number of times each prevalence will be used to generate a test sample
|
||||
n_jobs=-1, # indicates the number of parallel workers (-1 indicates, as in sklearn, all CPUs)
|
||||
random_seed=42, # setting a random seed allows to replicate the test samples across runs
|
||||
error_metrics=['mae', 'mrae', 'mkld'], # specify the evaluation metrics
|
||||
verbose=True # set to True to show some standard-line outputs
|
||||
)
|
||||
````
|
||||
|
||||
The resulting report is a pandas' dataframe that can be directly printed.
|
||||
Here, we set some display options from pandas just to make the output clearer;
|
||||
note also that the estimated prevalences are shown as strings using the
|
||||
function strprev function that simply converts a prevalence into a
|
||||
string representing it, with a fixed decimal precision (default 3):
|
||||
From a pandas' dataframe, it is straightforward to visualize all the results,
|
||||
and compute the averaged values, e.g.:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
pd.set_option('display.expand_frame_repr', False)
|
||||
pd.set_option("precision", 3)
|
||||
df['estim-prev'] = df['estim-prev'].map(F.strprev)
|
||||
print(df)
|
||||
report['estim-prev'] = report['estim-prev'].map(F.strprev)
|
||||
print(report)
|
||||
|
||||
print('Averaged values:')
|
||||
print(report.mean())
|
||||
```
|
||||
|
||||
The output should look like:
|
||||
This will produce an output like:
|
||||
|
||||
```
|
||||
true-prev estim-prev mae mrae mkld
|
||||
0 [0.0, 1.0] [0.000, 1.000] 0.000 0.000 0.000e+00
|
||||
1 [0.1, 0.9] [0.091, 0.909] 0.009 0.048 4.426e-04
|
||||
2 [0.2, 0.8] [0.163, 0.837] 0.037 0.114 4.633e-03
|
||||
3 [0.3, 0.7] [0.283, 0.717] 0.017 0.041 7.383e-04
|
||||
4 [0.4, 0.6] [0.366, 0.634] 0.034 0.070 2.412e-03
|
||||
5 [0.5, 0.5] [0.459, 0.541] 0.041 0.082 3.387e-03
|
||||
6 [0.6, 0.4] [0.565, 0.435] 0.035 0.073 2.535e-03
|
||||
7 [0.7, 0.3] [0.654, 0.346] 0.046 0.108 4.701e-03
|
||||
8 [0.8, 0.2] [0.725, 0.275] 0.075 0.235 1.515e-02
|
||||
9 [0.9, 0.1] [0.858, 0.142] 0.042 0.229 7.740e-03
|
||||
10 [1.0, 0.0] [0.945, 0.055] 0.055 27.357 5.219e-02
|
||||
true-prev estim-prev mae mrae mkld
|
||||
0 [0.308, 0.692] [0.314, 0.686] 0.005649 0.013182 0.000074
|
||||
1 [0.896, 0.104] [0.909, 0.091] 0.013145 0.069323 0.000985
|
||||
2 [0.848, 0.152] [0.809, 0.191] 0.039063 0.149806 0.005175
|
||||
3 [0.016, 0.984] [0.033, 0.967] 0.017236 0.487529 0.005298
|
||||
4 [0.728, 0.272] [0.751, 0.249] 0.022769 0.057146 0.001350
|
||||
... ... ... ... ... ...
|
||||
4995 [0.72, 0.28] [0.698, 0.302] 0.021752 0.053631 0.001133
|
||||
4996 [0.868, 0.132] [0.888, 0.112] 0.020490 0.088230 0.001985
|
||||
4997 [0.292, 0.708] [0.298, 0.702] 0.006149 0.014788 0.000090
|
||||
4998 [0.24, 0.76] [0.220, 0.780] 0.019950 0.054309 0.001127
|
||||
4999 [0.948, 0.052] [0.965, 0.035] 0.016941 0.165776 0.003538
|
||||
|
||||
[5000 rows x 5 columns]
|
||||
Averaged values:
|
||||
mae 0.023588
|
||||
mrae 0.108779
|
||||
mkld 0.003631
|
||||
dtype: float64
|
||||
|
||||
Process finished with exit code 0
|
||||
```
|
||||
|
||||
One can get the averaged scores using standard pandas'
|
||||
functions, i.e.:
|
||||
Alternatively, we can simply generate all the predictions by:
|
||||
|
||||
```python
|
||||
print(df.mean())
|
||||
true_prevs, estim_prevs = qp.evaluation.prediction(quantifier, protocol=prot)
|
||||
```
|
||||
|
||||
will produce the following output:
|
||||
|
||||
```
|
||||
true-prev 0.500
|
||||
mae 0.035
|
||||
mrae 2.578
|
||||
mkld 0.009
|
||||
dtype: float64
|
||||
```
|
||||
|
||||
Other evaluation functions include:
|
||||
|
||||
* _artificial_sampling_eval_: that computes the evaluation for a
|
||||
given evaluation metric, returning the average instead of a dataframe.
|
||||
* _artificial_sampling_prediction_: that returns two np.arrays containing the
|
||||
true prevalences and the estimated prevalences.
|
||||
|
||||
See the documentation for further details.
|
||||
All the evaluation functions implement specific optimizations for speeding-up
|
||||
the evaluation of aggregative quantifiers (i.e., of instances of _AggregativeQuantifier_).
|
||||
The optimization comes down to generating classification predictions (either crisp or soft)
|
||||
only once for the entire test set, and then applying the sampling procedure to the
|
||||
predictions, instead of generating samples of instances and then computing the
|
||||
classification predictions every time. This is only possible when the protocol
|
||||
is an instance of _OnLabelledCollectionProtocol_. The optimization is only
|
||||
carried out when the number of classification predictions thus generated would be
|
||||
smaller than the number of predictions required for the entire protocol; e.g.,
|
||||
if the original dataset contains 1M instances, but the protocol is such that it would
|
||||
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
|
||||
classification for each sample. This behaviour is indicated by setting
|
||||
_aggr_speedup="auto"_. Conversely, when indicating _aggr_speedup="force"_ QuaPy will
|
||||
precompute all the predictions irrespectively of the number of instances and number of samples.
|
||||
Finally, this can be deactivated by setting _aggr_speedup=False_. Note that this optimization
|
||||
is not only applied for the final evaluation, but also for the internal evaluations carried
|
||||
out during _model selection_. Since these are typically many, the heuristic can help reduce the
|
||||
execution time a lot.
|
|
@ -16,12 +16,6 @@ and implement some abstract methods:
|
|||
|
||||
@abstractmethod
|
||||
def quantify(self, instances): ...
|
||||
|
||||
@abstractmethod
|
||||
def set_params(self, **parameters): ...
|
||||
|
||||
@abstractmethod
|
||||
def get_params(self, deep=True): ...
|
||||
```
|
||||
The meaning of those functions should be familiar to those
|
||||
used to work with scikit-learn since the class structure of QuaPy
|
||||
|
@ -32,10 +26,10 @@ scikit-learn' structure has not been adopted _as is_ in QuaPy responds to
|
|||
the fact that scikit-learn's _predict_ function is expected to return
|
||||
one output for each input element --e.g., a predicted label for each
|
||||
instance in a sample-- while in quantification the output for a sample
|
||||
is one single array of class prevalences), while functions _set_params_
|
||||
and _get_params_ allow a
|
||||
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
|
||||
to automate the process of hyperparameter search.
|
||||
is one single array of class prevalences).
|
||||
Quantifiers also extend from scikit-learn's `BaseEstimator`, in order
|
||||
to simplify the use of _set_params_ and _get_params_ used in
|
||||
[model selector](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection).
|
||||
|
||||
## Aggregative Methods
|
||||
|
||||
|
@ -58,11 +52,11 @@ of _BaseQuantifier.quantify_ is already provided, which looks like:
|
|||
|
||||
```python
|
||||
def quantify(self, instances):
|
||||
classif_predictions = self.preclassify(instances)
|
||||
classif_predictions = self.classify(instances)
|
||||
return self.aggregate(classif_predictions)
|
||||
```
|
||||
Aggregative quantifiers are expected to maintain a classifier (which is
|
||||
accessed through the _@property_ _learner_). This classifier is
|
||||
accessed through the _@property_ _classifier_). This classifier is
|
||||
given as input to the quantifier, and can be already fit
|
||||
on external data (in which case, the _fit_learner_ argument should
|
||||
be set to False), or be fit by the quantifier's fit (default).
|
||||
|
@ -73,13 +67,8 @@ _AggregativeProbabilisticQuantifier(AggregativeQuantifier)_.
|
|||
The particularity of _probabilistic_ aggregative methods (w.r.t.
|
||||
non-probabilistic ones), is that the default quantifier is defined
|
||||
in terms of the posterior probabilities returned by a probabilistic
|
||||
classifier, and not by the crisp decisions of a hard classifier; i.e.:
|
||||
|
||||
```python
|
||||
def quantify(self, instances):
|
||||
classif_posteriors = self.posterior_probabilities(instances)
|
||||
return self.aggregate(classif_posteriors)
|
||||
```
|
||||
classifier, and not by the crisp decisions of a hard classifier.
|
||||
In any case, the interface _classify(instances)_ remains unchanged.
|
||||
|
||||
One advantage of _aggregative_ methods (either probabilistic or not)
|
||||
is that the evaluation according to any sampling procedure (e.g.,
|
||||
|
@ -110,9 +99,7 @@ import quapy as qp
|
|||
import quapy.functional as F
|
||||
from sklearn.svm import LinearSVC
|
||||
|
||||
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
||||
training = dataset.training
|
||||
test = dataset.test
|
||||
training, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
|
||||
|
||||
# instantiate a classifier learner, in this case a SVM
|
||||
svm = LinearSVC()
|
||||
|
@ -156,11 +143,12 @@ model.fit(training, val_split=5)
|
|||
```
|
||||
|
||||
The following code illustrates the case in which PCC is used:
|
||||
|
||||
```python
|
||||
model = qp.method.aggregative.PCC(svm)
|
||||
model.fit(training)
|
||||
estim_prevalence = model.quantify(test.instances)
|
||||
print('classifier:', model.learner)
|
||||
print('classifier:', model.classifier)
|
||||
```
|
||||
In this case, QuaPy will print:
|
||||
```
|
||||
|
@ -211,14 +199,22 @@ model.fit(dataset.training)
|
|||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
_New in v0.1.7_: EMQ now accepts two new parameters in the construction method, namely
|
||||
_exact_train_prev_ which allows to use the true training prevalence as the departing
|
||||
prevalence estimation (default behaviour), or instead an approximation of it as
|
||||
suggested by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html)
|
||||
(by setting _exact_train_prev=False_).
|
||||
The other parameter is _recalib_ which allows to indicate a calibration method, among those
|
||||
proposed by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html),
|
||||
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
|
||||
See the API documentation for further details.
|
||||
|
||||
|
||||
### Hellinger Distance y (HDy)
|
||||
|
||||
The method HDy is described in:
|
||||
|
||||
_Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164._
|
||||
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
[González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
|
||||
|
||||
It is implemented in _qp.method.aggregative.HDy_ (also accessible
|
||||
through the allias _qp.method.aggregative.HellingerDistanceY_).
|
||||
|
@ -249,30 +245,51 @@ model.fit(dataset.training)
|
|||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
_New in v0.1.7:_ QuaPy now provides an implementation of the generalized
|
||||
"Distribution Matching" approaches for multiclass, inspired by the framework
|
||||
of [Firat (2016)](https://arxiv.org/abs/1606.00868). One can instantiate
|
||||
a variant of HDy for multiclass quantification as follows:
|
||||
|
||||
```python
|
||||
mutliclassHDy = qp.method.aggregative.DistributionMatching(classifier=LogisticRegression(), divergence='HD', cdf=False)
|
||||
```
|
||||
|
||||
_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
|
||||
framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
|
||||
and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
|
||||
(thanks to _Pablo González_ for the contributions!)
|
||||
|
||||
### Threshold Optimization methods
|
||||
|
||||
_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
|
||||
see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423)
|
||||
and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
|
||||
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
|
||||
|
||||
### Explicit Loss Minimization
|
||||
|
||||
The Explicit Loss Minimization (ELM) represent a family of methods
|
||||
based on structured output learning, i.e., quantifiers relying on
|
||||
classifiers that have been optimized targeting a
|
||||
quantification-oriented evaluation measure.
|
||||
The original methods are implemented in QuaPy as classify & count (CC)
|
||||
quantifiers that use Joachim's [SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
|
||||
as the underlying classifier, properly set to optimize for the desired loss.
|
||||
|
||||
In QuaPy, the following methods, all relying on Joachim's
|
||||
[SVMperf](https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
|
||||
implementation, are available in _qp.method.aggregative_:
|
||||
In QuaPy, this can be more achieved by calling the functions:
|
||||
|
||||
* SVMQ (SVM-Q) is a quantification method optimizing the metric _Q_ defined
|
||||
in _Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604._
|
||||
* SVMKLD (SVM for Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
|
||||
* _newSVMQ_: returns the quantification method called SVM(Q) that optimizes for the metric _Q_ defined
|
||||
in [_Barranquero, J., Díez, J., and del Coz, J. J. (2015). Quantification-oriented learning based
|
||||
on reliable classifiers. Pattern Recognition, 48(2):591–604._](https://www.sciencedirect.com/science/article/pii/S003132031400291X)
|
||||
* _newSVMKLD_ and _newSVMNKLD_: returns the quantification method called SVM(KLD) and SVM(nKLD), standing for
|
||||
Kullback-Leibler Divergence and Normalized Kullback-Leibler Divergence, as proposed in [_Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
|
||||
* SVMNKLD (SVM for Normalized Kullback-Leibler Divergence) proposed in _Esuli, A. and Sebastiani, F. (2015).
|
||||
Optimizing text quantifiers for multivariate loss functions.
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._
|
||||
* SVMAE (SVM for Mean Absolute Error)
|
||||
* SVMRAE (SVM for Mean Relative Absolute Error)
|
||||
ACM Transactions on Knowledge Discovery and Data, 9(4):Article 27._](https://dl.acm.org/doi/abs/10.1145/2700406)
|
||||
* _newSVMAE_ and _newSVMRAE_: returns a quantification method called SVM(AE) and SVM(RAE) that optimizes for the (Mean) Absolute Error and for the
|
||||
(Mean) Relative Absolute Error, as first used by
|
||||
[_Moreo, A. and Sebastiani, F. (2021). Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE 17 (9), 1-23._](https://arxiv.org/abs/2011.02552)
|
||||
|
||||
the last two methods (SVMAE and SVMRAE) have been implemented in
|
||||
the last two methods (SVM(AE) and SVM(RAE)) have been implemented in
|
||||
QuaPy in order to make available ELM variants for what nowadays
|
||||
are considered the most well-behaved evaluation metrics in quantification.
|
||||
|
||||
|
@ -306,13 +323,18 @@ currently supports only binary classification.
|
|||
ELM variants (any binary quantifier in general) can be extended
|
||||
to operate in single-label scenarios trivially by adopting a
|
||||
"one-vs-all" strategy (as, e.g., in
|
||||
_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22_).
|
||||
In QuaPy this is possible by using the _OneVsAll_ class:
|
||||
[_Gao, W. and Sebastiani, F. (2016). From classification to quantification in tweet sentiment
|
||||
analysis. Social Network Analysis and Mining, 6(19):1–22_](https://link.springer.com/article/10.1007/s13278-016-0327-z)).
|
||||
In QuaPy this is possible by using the _OneVsAll_ class.
|
||||
|
||||
There are two ways for instantiating this class, _OneVsAllGeneric_ that works for
|
||||
any quantifier, and _OneVsAllAggregative_ that is optimized for aggregative quantifiers.
|
||||
In general, you can simply use the _getOneVsAll_ function and QuaPy will choose
|
||||
the more convenient of the two.
|
||||
|
||||
```python
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import SVMQ, OneVsAll
|
||||
from quapy.method.aggregative import SVMQ
|
||||
|
||||
# load a single-label dataset (this one contains 3 classes)
|
||||
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
||||
|
@ -320,11 +342,14 @@ dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
|||
# let qp know where svmperf is
|
||||
qp.environ['SVMPERF_HOME'] = '../svm_perf_quantification'
|
||||
|
||||
model = OneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
|
||||
model = getOneVsAll(SVMQ(), n_jobs=-1) # run them on parallel
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
Check the examples _[explicit_loss_minimization.py](..%2Fexamples%2Fexplicit_loss_minimization.py)_
|
||||
and [one_vs_all.py](..%2Fexamples%2Fone_vs_all.py) for more details.
|
||||
|
||||
## Meta Models
|
||||
|
||||
By _meta_ models we mean quantification methods that are defined on top of other
|
||||
|
@ -337,12 +362,12 @@ _Meta_ models are implemented in the _qp.method.meta_ module.
|
|||
|
||||
QuaPy implements (some of) the variants proposed in:
|
||||
|
||||
* _Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
* [_Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
|
||||
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
|
||||
Information Fusion, 34, 87-100._
|
||||
* _Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Information Fusion, 34, 87-100._](https://www.sciencedirect.com/science/article/pii/S1566253516300628)
|
||||
* [_Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
|
||||
Dynamic ensemble selection for quantification tasks.
|
||||
Information Fusion, 45, 1-15._
|
||||
Information Fusion, 45, 1-15._](https://www.sciencedirect.com/science/article/pii/S1566253517303652)
|
||||
|
||||
The following code shows how to instantiate an Ensemble of 30 _Adjusted Classify & Count_ (ACC)
|
||||
quantifiers operating with a _Logistic Regressor_ (LR) as the base classifier, and using the
|
||||
|
@ -378,10 +403,10 @@ wiki if you want to optimize the hyperparameters of ensemble for classification
|
|||
|
||||
QuaPy offers an implementation of QuaNet, a deep learning model presented in:
|
||||
|
||||
_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
[_Esuli, A., Moreo, A., & Sebastiani, F. (2018, October).
|
||||
A recurrent neural network for sentiment quantification.
|
||||
In Proceedings of the 27th ACM International Conference on
|
||||
Information and Knowledge Management (pp. 1775-1778)._
|
||||
Information and Knowledge Management (pp. 1775-1778)._](https://dl.acm.org/doi/abs/10.1145/3269206.3269287)
|
||||
|
||||
This model requires _torch_ to be installed.
|
||||
QuaNet also requires a classifier that can provide embedded representations
|
||||
|
@ -406,7 +431,8 @@ cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
|
|||
learner = NeuralClassifierTrainer(cnn, device='cuda')
|
||||
|
||||
# train QuaNet
|
||||
model = QuaNet(learner, qp.environ['SAMPLE_SIZE'], device='cuda')
|
||||
model = QuaNet(learner, device='cuda')
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.quantify(dataset.test.instances)
|
||||
```
|
||||
|
||||
|
|
|
@ -22,9 +22,9 @@ Quantification has long been regarded as an add-on of
|
|||
classification, and thus the model selection strategies
|
||||
customarily adopted in classification have simply been
|
||||
applied to quantification (see the next section).
|
||||
It has been argued in _Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
"Re-Assessing the" Classify and Count" Quantification Method."
|
||||
arXiv preprint arXiv:2011.02552 (2020)._
|
||||
It has been argued in [Moreo, Alejandro, and Fabrizio Sebastiani.
|
||||
Re-Assessing the "Classify and Count" Quantification Method.
|
||||
ECIR 2021: Advances in Information Retrieval pp 75–91.](https://link.springer.com/chapter/10.1007/978-3-030-72240-1_6)
|
||||
that specific model selection strategies should
|
||||
be adopted for quantification. That is, model selection
|
||||
strategies for quantification should target
|
||||
|
@ -32,76 +32,86 @@ quantification-oriented losses and be tested in a variety
|
|||
of scenarios exhibiting different degrees of prior
|
||||
probability shift.
|
||||
|
||||
The class
|
||||
_qp.model_selection.GridSearchQ_
|
||||
implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that evaluates each
|
||||
combination of hyper-parameters
|
||||
by means of a given quantification-oriented
|
||||
The class _qp.model_selection.GridSearchQ_ implements a grid-search exploration over the space of
|
||||
hyper-parameter combinations that [evaluates](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
each combination of hyper-parameters by means of a given quantification-oriented
|
||||
error metric (e.g., any of the error functions implemented
|
||||
in _qp.error_) and according to the
|
||||
[_artificial sampling protocol_](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation).
|
||||
in _qp.error_) and according to a
|
||||
[sampling generation protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols).
|
||||
|
||||
The following is an example of model selection for quantification:
|
||||
The following is an example (also included in the examples folder) of model selection for quantification:
|
||||
|
||||
```python
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import PCC
|
||||
from quapy.protocol import APP
|
||||
from quapy.method.aggregative import DistributionMatching
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import numpy as np
|
||||
|
||||
# set a seed to replicate runs
|
||||
np.random.seed(0)
|
||||
qp.environ['SAMPLE_SIZE'] = 500
|
||||
"""
|
||||
In this example, we show how to perform model selection on a DistributionMatching quantifier.
|
||||
"""
|
||||
|
||||
dataset = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=5)
|
||||
model = DistributionMatching(LogisticRegression())
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
qp.environ['N_JOBS'] = -1 # explore hyper-parameters in parallel
|
||||
|
||||
training, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
||||
|
||||
# The model will be returned by the fit method of GridSearchQ.
|
||||
# Model selection will be performed with a fixed budget of 1000 evaluations
|
||||
# for each hyper-parameter combination. The error to optimize is the MAE for
|
||||
# quantification, as evaluated on artificially drawn samples at prevalences
|
||||
# covering the entire spectrum on a held-out portion (40%) of the training set.
|
||||
# Every combination of hyper-parameters will be evaluated by confronting the
|
||||
# quantifier thus configured against a series of samples generated by means
|
||||
# of a sample generation protocol. For this example, we will use the
|
||||
# artificial-prevalence protocol (APP), that generates samples with prevalence
|
||||
# values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).
|
||||
# We devote 30% of the dataset for this exploration.
|
||||
training, validation = training.split_stratified(train_prop=0.7)
|
||||
protocol = APP(validation)
|
||||
|
||||
# We will explore a classification-dependent hyper-parameter (e.g., the 'C'
|
||||
# hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter
|
||||
# (e.g., the number of bins in a DistributionMatching quantifier.
|
||||
# Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"
|
||||
# in order to let the quantifier know this hyper-parameter belongs to its underlying
|
||||
# classifier.
|
||||
param_grid = {
|
||||
'classifier__C': np.logspace(-3,3,7),
|
||||
'nbins': [8, 16, 32, 64],
|
||||
}
|
||||
|
||||
model = qp.model_selection.GridSearchQ(
|
||||
model=PCC(LogisticRegression()),
|
||||
param_grid={'C': np.logspace(-4,5,10), 'class_weight': ['balanced', None]},
|
||||
sample_size=qp.environ['SAMPLE_SIZE'],
|
||||
eval_budget=1000,
|
||||
error='mae',
|
||||
refit=True, # retrain on the whole labelled set
|
||||
val_split=0.4,
|
||||
model=model,
|
||||
param_grid=param_grid,
|
||||
protocol=protocol,
|
||||
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
|
||||
refit=True, # retrain on the whole labelled set once done
|
||||
verbose=True # show information as the process goes on
|
||||
).fit(dataset.training)
|
||||
).fit(training)
|
||||
|
||||
print(f'model selection ended: best hyper-parameters={model.best_params_}')
|
||||
model = model.best_model_
|
||||
|
||||
# evaluation in terms of MAE
|
||||
results = qp.evaluation.artificial_sampling_eval(
|
||||
model,
|
||||
dataset.test,
|
||||
sample_size=qp.environ['SAMPLE_SIZE'],
|
||||
n_prevpoints=101,
|
||||
n_repetitions=10,
|
||||
error_metric='mae'
|
||||
)
|
||||
# we use the same evaluation protocol (APP) on the test set
|
||||
mae_score = qp.evaluation.evaluate(model, protocol=APP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE={results:.5f}')
|
||||
print(f'MAE={mae_score:.5f}')
|
||||
```
|
||||
|
||||
In this example, the system outputs:
|
||||
```
|
||||
[GridSearchQ]: starting optimization with n_jobs=1
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': 'balanced'} got mae score 0.24987
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.0001, 'class_weight': None} got mae score 0.48135
|
||||
[GridSearchQ]: checking hyperparams={'C': 0.001, 'class_weight': 'balanced'} got mae score 0.24866
|
||||
[GridSearchQ]: starting model selection with self.n_jobs =-1
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 64} got mae score 0.04021 [took 1.1356s]
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 32} got mae score 0.04286 [took 1.2139s]
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.01, 'nbins': 16} got mae score 0.04888 [took 1.2491s]
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 0.001, 'nbins': 8} got mae score 0.05163 [took 1.5372s]
|
||||
[...]
|
||||
[GridSearchQ]: checking hyperparams={'C': 100000.0, 'class_weight': None} got mae score 0.43676
|
||||
[GridSearchQ]: optimization finished: best params {'C': 0.1, 'class_weight': 'balanced'} (score=0.19982)
|
||||
[GridSearchQ]: hyperparams={'classifier__C': 1000.0, 'nbins': 32} got mae score 0.02445 [took 2.9056s]
|
||||
[GridSearchQ]: optimization finished: best params {'classifier__C': 100.0, 'nbins': 32} (score=0.02234) [took 7.3114s]
|
||||
[GridSearchQ]: refitting on the whole development set
|
||||
model selection ended: best hyper-parameters={'C': 0.1, 'class_weight': 'balanced'}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5005.54it/s]
|
||||
MAE=0.20342
|
||||
model selection ended: best hyper-parameters={'classifier__C': 100.0, 'nbins': 32}
|
||||
MAE=0.03102
|
||||
```
|
||||
|
||||
The parameter _val_split_ can alternatively be used to indicate
|
||||
|
@ -128,32 +138,13 @@ learner = GridSearchCV(
|
|||
LogisticRegression(),
|
||||
param_grid={'C': np.logspace(-4, 5, 10), 'class_weight': ['balanced', None]},
|
||||
cv=5)
|
||||
model = PCC(learner).fit(dataset.training)
|
||||
print(f'model selection ended: best hyper-parameters={model.learner.best_params_}')
|
||||
model = DistributionMatching(learner).fit(dataset.training)
|
||||
```
|
||||
|
||||
In this example, the system outputs:
|
||||
```
|
||||
model selection ended: best hyper-parameters={'C': 10000.0, 'class_weight': None}
|
||||
1010 evaluations will be performed for each combination of hyper-parameters
|
||||
[artificial sampling protocol] generating predictions: 100%|██████████| 1010/1010 [00:00<00:00, 5379.55it/s]
|
||||
MAE=0.41734
|
||||
```
|
||||
|
||||
Note that the MAE is worse than the one we obtained when optimizing
|
||||
for quantification and, indeed, the hyper-parameters found optimal
|
||||
largely differ between the two selection modalities. The
|
||||
hyper-parameters C=10000 and class_weight=None have been found
|
||||
to work well for the specific training prevalence of the HP dataset,
|
||||
but these hyper-parameters turned out to be suboptimal when the
|
||||
class prevalences of the test set differs (as is indeed tested
|
||||
in scenarios of quantification).
|
||||
|
||||
This is, however, not always the case, and one could, in practice,
|
||||
find examples
|
||||
in which optimizing for classification ends up resulting in a better
|
||||
quantifier than when optimizing for quantification.
|
||||
Nonetheless, this is theoretically unlikely to happen.
|
||||
However, this is conceptually flawed, since the model should be
|
||||
optimized for the task at hand (quantification), and not for a surrogate task (classification),
|
||||
i.e., the model should be requested to deliver low quantification errors, rather
|
||||
than low classification errors.
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -43,7 +43,7 @@ quantification methods across different scenarios showcasing
|
|||
the accuracy of the quantifier in predicting class prevalences
|
||||
for a wide range of prior distributions. This can easily be
|
||||
achieved by means of the
|
||||
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
[artificial sampling protocol](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
|
||||
that is implemented in QuaPy.
|
||||
|
||||
The following code shows how to perform one simple experiment
|
||||
|
@ -55,6 +55,7 @@ generating 100 random samples at each prevalence).
|
|||
|
||||
```python
|
||||
import quapy as qp
|
||||
from protocol import APP
|
||||
from quapy.method.aggregative import CC, ACC, PCC, PACC
|
||||
from sklearn.svm import LinearSVC
|
||||
|
||||
|
@ -63,28 +64,26 @@ qp.environ['SAMPLE_SIZE'] = 500
|
|||
def gen_data():
|
||||
|
||||
def base_classifier():
|
||||
return LinearSVC()
|
||||
return LinearSVC(class_weight='balanced')
|
||||
|
||||
def models():
|
||||
yield CC(base_classifier())
|
||||
yield ACC(base_classifier())
|
||||
yield PCC(base_classifier())
|
||||
yield PACC(base_classifier())
|
||||
yield 'CC', CC(base_classifier())
|
||||
yield 'ACC', ACC(base_classifier())
|
||||
yield 'PCC', PCC(base_classifier())
|
||||
yield 'PACC', PACC(base_classifier())
|
||||
|
||||
data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)
|
||||
train, test = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5).train_test
|
||||
|
||||
method_names, true_prevs, estim_prevs, tr_prevs = [], [], [], []
|
||||
|
||||
for model in models():
|
||||
model.fit(data.training)
|
||||
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
|
||||
model, data.test, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
|
||||
)
|
||||
for method_name, model in models():
|
||||
model.fit(train)
|
||||
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
|
||||
|
||||
method_names.append(model.__class__.__name__)
|
||||
method_names.append(method_name)
|
||||
true_prevs.append(true_prev)
|
||||
estim_prevs.append(estim_prev)
|
||||
tr_prevs.append(data.training.prevalence())
|
||||
tr_prevs.append(train.prevalence())
|
||||
|
||||
return method_names, true_prevs, estim_prevs, tr_prevs
|
||||
|
||||
|
@ -163,21 +162,19 @@ like this:
|
|||
|
||||
```python
|
||||
def gen_data():
|
||||
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5)
|
||||
|
||||
train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
||||
model = CC(LinearSVC())
|
||||
|
||||
method_data = []
|
||||
for training_prevalence in np.linspace(0.1, 0.9, 9):
|
||||
training_size = 5000
|
||||
# since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)
|
||||
training = data.training.sampling(training_size, 1 - training_prevalence)
|
||||
model.fit(training)
|
||||
true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(
|
||||
model, data.sample, qp.environ['SAMPLE_SIZE'], n_repetitions=100, n_prevpoints=21
|
||||
)
|
||||
# method names can contain Latex syntax
|
||||
method_name = 'CC$_{' + f'{int(100 * training_prevalence)}' + '\%}$'
|
||||
method_data.append((method_name, true_prev, estim_prev, training.prevalence()))
|
||||
# since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained
|
||||
train_sample = train.sampling(training_size, 1-training_prevalence)
|
||||
model.fit(train_sample)
|
||||
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
|
||||
method_name = 'CC$_{'+f'{int(100*training_prevalence)}' + '\%}$'
|
||||
method_data.append((method_name, true_prev, estim_prev, train_sample.prevalence()))
|
||||
|
||||
return zip(*method_data)
|
||||
```
|
||||
|
|
|
@ -64,6 +64,7 @@ Features
|
|||
* 32 UCI Machine Learning datasets.
|
||||
* 11 Twitter Sentiment datasets.
|
||||
* 3 Reviews Sentiment datasets.
|
||||
* 4 tasks from LeQua competition (_new in v0.1.7!_)
|
||||
* Native supports for binary and single-label scenarios of quantification.
|
||||
* Model selection functionality targeting quantification-oriented losses.
|
||||
* Visualization tools for analysing results.
|
||||
|
@ -75,8 +76,9 @@ Features
|
|||
Installation
|
||||
Datasets
|
||||
Evaluation
|
||||
Protocols
|
||||
Methods
|
||||
Model Selection
|
||||
Model-Selection
|
||||
Plotting
|
||||
API Developers documentation<modules>
|
||||
|
||||
|
|
|
@ -1,27 +1,38 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy.classification package
|
||||
============================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.classification.methods module
|
||||
-----------------------------------
|
||||
quapy.classification.calibration
|
||||
--------------------------------
|
||||
|
||||
.. versionadded:: 0.1.7
|
||||
.. automodule:: quapy.classification.calibration
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.classification.methods
|
||||
----------------------------
|
||||
|
||||
.. automodule:: quapy.classification.methods
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.classification.neural module
|
||||
----------------------------------
|
||||
quapy.classification.neural
|
||||
---------------------------
|
||||
|
||||
.. automodule:: quapy.classification.neural
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.classification.svmperf module
|
||||
-----------------------------------
|
||||
quapy.classification.svmperf
|
||||
----------------------------
|
||||
|
||||
.. automodule:: quapy.classification.svmperf
|
||||
:members:
|
||||
|
|
|
@ -1,35 +1,37 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy.data package
|
||||
==================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.data.base module
|
||||
----------------------
|
||||
quapy.data.base
|
||||
---------------
|
||||
|
||||
.. automodule:: quapy.data.base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.data.datasets module
|
||||
--------------------------
|
||||
quapy.data.datasets
|
||||
-------------------
|
||||
|
||||
.. automodule:: quapy.data.datasets
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.data.preprocessing module
|
||||
-------------------------------
|
||||
quapy.data.preprocessing
|
||||
------------------------
|
||||
|
||||
.. automodule:: quapy.data.preprocessing
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.data.reader module
|
||||
------------------------
|
||||
quapy.data.reader
|
||||
-----------------
|
||||
|
||||
.. automodule:: quapy.data.reader
|
||||
:members:
|
||||
|
|
|
@ -1,43 +1,45 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy.method package
|
||||
====================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.method.aggregative module
|
||||
-------------------------------
|
||||
quapy.method.aggregative
|
||||
------------------------
|
||||
|
||||
.. automodule:: quapy.method.aggregative
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.base module
|
||||
------------------------
|
||||
quapy.method.base
|
||||
-----------------
|
||||
|
||||
.. automodule:: quapy.method.base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.meta module
|
||||
------------------------
|
||||
quapy.method.meta
|
||||
-----------------
|
||||
|
||||
.. automodule:: quapy.method.meta
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.neural module
|
||||
--------------------------
|
||||
quapy.method.neural
|
||||
-------------------
|
||||
|
||||
.. automodule:: quapy.method.neural
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.method.non\_aggregative module
|
||||
------------------------------------
|
||||
quapy.method.non\_aggregative
|
||||
-----------------------------
|
||||
|
||||
.. automodule:: quapy.method.non_aggregative
|
||||
:members:
|
||||
|
|
|
@ -1,68 +1,79 @@
|
|||
:tocdepth: 2
|
||||
|
||||
quapy package
|
||||
=============
|
||||
|
||||
Subpackages
|
||||
-----------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
quapy.classification
|
||||
quapy.data
|
||||
quapy.method
|
||||
quapy.tests
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.error module
|
||||
------------------
|
||||
quapy.error
|
||||
-----------
|
||||
|
||||
.. automodule:: quapy.error
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.evaluation module
|
||||
-----------------------
|
||||
quapy.evaluation
|
||||
----------------
|
||||
|
||||
.. automodule:: quapy.evaluation
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.functional module
|
||||
-----------------------
|
||||
quapy.protocol
|
||||
--------------
|
||||
|
||||
.. versionadded:: 0.1.7
|
||||
.. automodule:: quapy.protocol
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.functional
|
||||
----------------
|
||||
|
||||
.. automodule:: quapy.functional
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.model\_selection module
|
||||
-----------------------------
|
||||
quapy.model\_selection
|
||||
----------------------
|
||||
|
||||
.. automodule:: quapy.model_selection
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.plot module
|
||||
-----------------
|
||||
quapy.plot
|
||||
----------
|
||||
|
||||
.. automodule:: quapy.plot
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.util module
|
||||
-----------------
|
||||
quapy.util
|
||||
----------
|
||||
|
||||
.. automodule:: quapy.util
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Subpackages
|
||||
-----------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
quapy.classification
|
||||
quapy.data
|
||||
quapy.method
|
||||
|
||||
|
||||
Module contents
|
||||
---------------
|
||||
|
||||
|
@ -70,3 +81,4 @@ Module contents
|
|||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
|
|
|
@ -1,37 +0,0 @@
|
|||
quapy.tests package
|
||||
===================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
quapy.tests.test\_base module
|
||||
-----------------------------
|
||||
|
||||
.. automodule:: quapy.tests.test_base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.tests.test\_datasets module
|
||||
---------------------------------
|
||||
|
||||
.. automodule:: quapy.tests.test_datasets
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
quapy.tests.test\_methods module
|
||||
--------------------------------
|
||||
|
||||
.. automodule:: quapy.tests.test_methods
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Module contents
|
||||
---------------
|
||||
|
||||
.. automodule:: quapy.tests
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
|
@ -1,7 +0,0 @@
|
|||
Getting Started
|
||||
===============
|
||||
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python.
|
||||
|
||||
Installation
|
||||
------------
|
||||
>>> pip install quapy
|
|
@ -1 +0,0 @@
|
|||
.. include:: ../../README.md
|
|
@ -1,701 +0,0 @@
|
|||
@import url("basic.css");
|
||||
|
||||
/* -- page layout ----------------------------------------------------------- */
|
||||
|
||||
body {
|
||||
font-family: Georgia, serif;
|
||||
font-size: 17px;
|
||||
background-color: #fff;
|
||||
color: #000;
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
|
||||
div.document {
|
||||
width: 940px;
|
||||
margin: 30px auto 0 auto;
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
float: left;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin: 0 0 0 220px;
|
||||
}
|
||||
|
||||
div.sphinxsidebar {
|
||||
width: 220px;
|
||||
font-size: 14px;
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
hr {
|
||||
border: 1px solid #B1B4B6;
|
||||
}
|
||||
|
||||
div.body {
|
||||
background-color: #fff;
|
||||
color: #3E4349;
|
||||
padding: 0 30px 0 30px;
|
||||
}
|
||||
|
||||
div.body > .section {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
div.footer {
|
||||
width: 940px;
|
||||
margin: 20px auto 30px auto;
|
||||
font-size: 14px;
|
||||
color: #888;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
div.footer a {
|
||||
color: #888;
|
||||
}
|
||||
|
||||
p.caption {
|
||||
font-family: inherit;
|
||||
font-size: inherit;
|
||||
}
|
||||
|
||||
|
||||
div.relations {
|
||||
display: none;
|
||||
}
|
||||
|
||||
|
||||
div.sphinxsidebar a {
|
||||
color: #444;
|
||||
text-decoration: none;
|
||||
border-bottom: 1px dotted #999;
|
||||
}
|
||||
|
||||
div.sphinxsidebar a:hover {
|
||||
border-bottom: 1px solid #999;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper {
|
||||
padding: 18px 10px;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper p.logo {
|
||||
padding: 0;
|
||||
margin: -10px 0 0 0px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper h1.logo {
|
||||
margin-top: -10px;
|
||||
text-align: center;
|
||||
margin-bottom: 5px;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper h1.logo-name {
|
||||
margin-top: 0px;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper p.blurb {
|
||||
margin-top: 0;
|
||||
font-style: normal;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3,
|
||||
div.sphinxsidebar h4 {
|
||||
font-family: Georgia, serif;
|
||||
color: #444;
|
||||
font-size: 24px;
|
||||
font-weight: normal;
|
||||
margin: 0 0 5px 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h4 {
|
||||
font-size: 20px;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3 a {
|
||||
color: #444;
|
||||
}
|
||||
|
||||
div.sphinxsidebar p.logo a,
|
||||
div.sphinxsidebar h3 a,
|
||||
div.sphinxsidebar p.logo a:hover,
|
||||
div.sphinxsidebar h3 a:hover {
|
||||
border: none;
|
||||
}
|
||||
|
||||
div.sphinxsidebar p {
|
||||
color: #555;
|
||||
margin: 10px 0;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul {
|
||||
margin: 10px 0;
|
||||
padding: 0;
|
||||
color: #000;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul li.toctree-l1 > a {
|
||||
font-size: 120%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul li.toctree-l2 > a {
|
||||
font-size: 110%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar input {
|
||||
border: 1px solid #CCC;
|
||||
font-family: Georgia, serif;
|
||||
font-size: 1em;
|
||||
}
|
||||
|
||||
div.sphinxsidebar hr {
|
||||
border: none;
|
||||
height: 1px;
|
||||
color: #AAA;
|
||||
background: #AAA;
|
||||
|
||||
text-align: left;
|
||||
margin-left: 0;
|
||||
width: 50%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar .badge {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
div.sphinxsidebar .badge:hover {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
/* To address an issue with donation coming after search */
|
||||
div.sphinxsidebar h3.donation {
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
/* -- body styles ----------------------------------------------------------- */
|
||||
|
||||
a {
|
||||
color: #004B6B;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
color: #6D4100;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
div.body h1,
|
||||
div.body h2,
|
||||
div.body h3,
|
||||
div.body h4,
|
||||
div.body h5,
|
||||
div.body h6 {
|
||||
font-family: Georgia, serif;
|
||||
font-weight: normal;
|
||||
margin: 30px 0px 10px 0px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; }
|
||||
div.body h2 { font-size: 180%; }
|
||||
div.body h3 { font-size: 150%; }
|
||||
div.body h4 { font-size: 130%; }
|
||||
div.body h5 { font-size: 100%; }
|
||||
div.body h6 { font-size: 100%; }
|
||||
|
||||
a.headerlink {
|
||||
color: #DDD;
|
||||
padding: 0 4px;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
a.headerlink:hover {
|
||||
color: #444;
|
||||
background: #EAEAEA;
|
||||
}
|
||||
|
||||
div.body p, div.body dd, div.body li {
|
||||
line-height: 1.4em;
|
||||
}
|
||||
|
||||
div.admonition {
|
||||
margin: 20px 0px;
|
||||
padding: 10px 30px;
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.admonition tt.xref, div.admonition code.xref, div.admonition a tt {
|
||||
background-color: #FBFBFB;
|
||||
border-bottom: 1px solid #fafafa;
|
||||
}
|
||||
|
||||
div.admonition p.admonition-title {
|
||||
font-family: Georgia, serif;
|
||||
font-weight: normal;
|
||||
font-size: 24px;
|
||||
margin: 0 0 10px 0;
|
||||
padding: 0;
|
||||
line-height: 1;
|
||||
}
|
||||
|
||||
div.admonition p.last {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
div.highlight {
|
||||
background-color: #fff;
|
||||
}
|
||||
|
||||
dt:target, .highlight {
|
||||
background: #FAF3E8;
|
||||
}
|
||||
|
||||
div.warning {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
}
|
||||
|
||||
div.danger {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
-moz-box-shadow: 2px 2px 4px #D52C2C;
|
||||
-webkit-box-shadow: 2px 2px 4px #D52C2C;
|
||||
box-shadow: 2px 2px 4px #D52C2C;
|
||||
}
|
||||
|
||||
div.error {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
-moz-box-shadow: 2px 2px 4px #D52C2C;
|
||||
-webkit-box-shadow: 2px 2px 4px #D52C2C;
|
||||
box-shadow: 2px 2px 4px #D52C2C;
|
||||
}
|
||||
|
||||
div.caution {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
}
|
||||
|
||||
div.attention {
|
||||
background-color: #FCC;
|
||||
border: 1px solid #FAA;
|
||||
}
|
||||
|
||||
div.important {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.note {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.tip {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.hint {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.seealso {
|
||||
background-color: #EEE;
|
||||
border: 1px solid #CCC;
|
||||
}
|
||||
|
||||
div.topic {
|
||||
background-color: #EEE;
|
||||
}
|
||||
|
||||
p.admonition-title {
|
||||
display: inline;
|
||||
}
|
||||
|
||||
p.admonition-title:after {
|
||||
content: ":";
|
||||
}
|
||||
|
||||
pre, tt, code {
|
||||
font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.hll {
|
||||
background-color: #FFC;
|
||||
margin: 0 -12px;
|
||||
padding: 0 12px;
|
||||
display: block;
|
||||
}
|
||||
|
||||
img.screenshot {
|
||||
}
|
||||
|
||||
tt.descname, tt.descclassname, code.descname, code.descclassname {
|
||||
font-size: 0.95em;
|
||||
}
|
||||
|
||||
tt.descname, code.descname {
|
||||
padding-right: 0.08em;
|
||||
}
|
||||
|
||||
img.screenshot {
|
||||
-moz-box-shadow: 2px 2px 4px #EEE;
|
||||
-webkit-box-shadow: 2px 2px 4px #EEE;
|
||||
box-shadow: 2px 2px 4px #EEE;
|
||||
}
|
||||
|
||||
table.docutils {
|
||||
border: 1px solid #888;
|
||||
-moz-box-shadow: 2px 2px 4px #EEE;
|
||||
-webkit-box-shadow: 2px 2px 4px #EEE;
|
||||
box-shadow: 2px 2px 4px #EEE;
|
||||
}
|
||||
|
||||
table.docutils td, table.docutils th {
|
||||
border: 1px solid #888;
|
||||
padding: 0.25em 0.7em;
|
||||
}
|
||||
|
||||
table.field-list, table.footnote {
|
||||
border: none;
|
||||
-moz-box-shadow: none;
|
||||
-webkit-box-shadow: none;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
table.footnote {
|
||||
margin: 15px 0;
|
||||
width: 100%;
|
||||
border: 1px solid #EEE;
|
||||
background: #FDFDFD;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
table.footnote + table.footnote {
|
||||
margin-top: -15px;
|
||||
border-top: none;
|
||||
}
|
||||
|
||||
table.field-list th {
|
||||
padding: 0 0.8em 0 0;
|
||||
}
|
||||
|
||||
table.field-list td {
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
table.field-list p {
|
||||
margin-bottom: 0.8em;
|
||||
}
|
||||
|
||||
/* Cloned from
|
||||
* https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68
|
||||
*/
|
||||
.field-name {
|
||||
-moz-hyphens: manual;
|
||||
-ms-hyphens: manual;
|
||||
-webkit-hyphens: manual;
|
||||
hyphens: manual;
|
||||
}
|
||||
|
||||
table.footnote td.label {
|
||||
width: .1px;
|
||||
padding: 0.3em 0 0.3em 0.5em;
|
||||
}
|
||||
|
||||
table.footnote td {
|
||||
padding: 0.3em 0.5em;
|
||||
}
|
||||
|
||||
dl {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
dl dd {
|
||||
margin-left: 30px;
|
||||
}
|
||||
|
||||
blockquote {
|
||||
margin: 0 0 0 30px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
ul, ol {
|
||||
/* Matches the 30px from the narrow-screen "li > ul" selector below */
|
||||
margin: 10px 0 10px 30px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
pre {
|
||||
background: #EEE;
|
||||
padding: 7px 30px;
|
||||
margin: 15px 0px;
|
||||
line-height: 1.3em;
|
||||
}
|
||||
|
||||
div.viewcode-block:target {
|
||||
background: #ffd;
|
||||
}
|
||||
|
||||
dl pre, blockquote pre, li pre {
|
||||
margin-left: 0;
|
||||
padding-left: 30px;
|
||||
}
|
||||
|
||||
tt, code {
|
||||
background-color: #ecf0f3;
|
||||
color: #222;
|
||||
/* padding: 1px 2px; */
|
||||
}
|
||||
|
||||
tt.xref, code.xref, a tt {
|
||||
background-color: #FBFBFB;
|
||||
border-bottom: 1px solid #fff;
|
||||
}
|
||||
|
||||
a.reference {
|
||||
text-decoration: none;
|
||||
border-bottom: 1px dotted #004B6B;
|
||||
}
|
||||
|
||||
/* Don't put an underline on images */
|
||||
a.image-reference, a.image-reference:hover {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
a.reference:hover {
|
||||
border-bottom: 1px solid #6D4100;
|
||||
}
|
||||
|
||||
a.footnote-reference {
|
||||
text-decoration: none;
|
||||
font-size: 0.7em;
|
||||
vertical-align: top;
|
||||
border-bottom: 1px dotted #004B6B;
|
||||
}
|
||||
|
||||
a.footnote-reference:hover {
|
||||
border-bottom: 1px solid #6D4100;
|
||||
}
|
||||
|
||||
a:hover tt, a:hover code {
|
||||
background: #EEE;
|
||||
}
|
||||
|
||||
|
||||
@media screen and (max-width: 870px) {
|
||||
|
||||
div.sphinxsidebar {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.document {
|
||||
width: 100%;
|
||||
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
margin-left: 0;
|
||||
margin-top: 0;
|
||||
margin-right: 0;
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin-top: 0;
|
||||
margin-right: 0;
|
||||
margin-bottom: 0;
|
||||
margin-left: 0;
|
||||
}
|
||||
|
||||
ul {
|
||||
margin-left: 0;
|
||||
}
|
||||
|
||||
li > ul {
|
||||
/* Matches the 30px from the "ul, ol" selector above */
|
||||
margin-left: 30px;
|
||||
}
|
||||
|
||||
.document {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.bodywrapper {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.github {
|
||||
display: none;
|
||||
}
|
||||
|
||||
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
@media screen and (max-width: 875px) {
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
padding: 20px 30px;
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
float: none;
|
||||
background: #fff;
|
||||
}
|
||||
|
||||
div.sphinxsidebar {
|
||||
display: block;
|
||||
float: none;
|
||||
width: 102.5%;
|
||||
margin: 50px -30px -20px -30px;
|
||||
padding: 10px 20px;
|
||||
background: #333;
|
||||
color: #FFF;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p,
|
||||
div.sphinxsidebar h3 a {
|
||||
color: #fff;
|
||||
}
|
||||
|
||||
div.sphinxsidebar a {
|
||||
color: #AAA;
|
||||
}
|
||||
|
||||
div.sphinxsidebar p.logo {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.document {
|
||||
width: 100%;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
div.footer {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
div.body {
|
||||
min-height: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.rtd_doc_footer {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.document {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.footer {
|
||||
width: auto;
|
||||
}
|
||||
|
||||
.github {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* misc. */
|
||||
|
||||
.revsys-inline {
|
||||
display: none!important;
|
||||
}
|
||||
|
||||
/* Make nested-list/multi-paragraph items look better in Releases changelog
|
||||
* pages. Without this, docutils' magical list fuckery causes inconsistent
|
||||
* formatting between different release sub-lists.
|
||||
*/
|
||||
div#changelog > div.section > ul > li > p:only-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
/* Hide fugly table cell borders in ..bibliography:: directive output */
|
||||
table.docutils.citation, table.docutils.citation td, table.docutils.citation th {
|
||||
border: none;
|
||||
/* Below needed in some edge cases; if not applied, bottom shadows appear */
|
||||
-moz-box-shadow: none;
|
||||
-webkit-box-shadow: none;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
|
||||
/* relbar */
|
||||
|
||||
.related {
|
||||
line-height: 30px;
|
||||
width: 100%;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
.related.top {
|
||||
border-bottom: 1px solid #EEE;
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
|
||||
.related.bottom {
|
||||
border-top: 1px solid #EEE;
|
||||
}
|
||||
|
||||
.related ul {
|
||||
padding: 0;
|
||||
margin: 0;
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
.related li {
|
||||
display: inline;
|
||||
}
|
||||
|
||||
nav#rellinks {
|
||||
float: right;
|
||||
}
|
||||
|
||||
nav#rellinks li+li:before {
|
||||
content: "|";
|
||||
}
|
||||
|
||||
nav#breadcrumbs li+li:before {
|
||||
content: "\00BB";
|
||||
}
|
||||
|
||||
/* Hide certain items when printing */
|
||||
@media print {
|
||||
div.related {
|
||||
display: none;
|
||||
}
|
||||
}
|
|
@ -4,7 +4,7 @@
|
|||
*
|
||||
* Sphinx stylesheet -- basic theme.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
@ -222,7 +222,7 @@ table.modindextable td {
|
|||
/* -- general body styles --------------------------------------------------- */
|
||||
|
||||
div.body {
|
||||
min-width: 450px;
|
||||
min-width: 360px;
|
||||
max-width: 800px;
|
||||
}
|
||||
|
||||
|
@ -237,16 +237,6 @@ a.headerlink {
|
|||
visibility: hidden;
|
||||
}
|
||||
|
||||
a.brackets:before,
|
||||
span.brackets > a:before{
|
||||
content: "[";
|
||||
}
|
||||
|
||||
a.brackets:after,
|
||||
span.brackets > a:after {
|
||||
content: "]";
|
||||
}
|
||||
|
||||
h1:hover > a.headerlink,
|
||||
h2:hover > a.headerlink,
|
||||
h3:hover > a.headerlink,
|
||||
|
@ -334,13 +324,15 @@ aside.sidebar {
|
|||
p.sidebar-title {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.admonition, div.topic, blockquote {
|
||||
clear: left;
|
||||
}
|
||||
|
||||
/* -- topics ---------------------------------------------------------------- */
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.topic {
|
||||
border: 1px solid #ccc;
|
||||
padding: 7px;
|
||||
|
@ -379,6 +371,8 @@ div.body p.centered {
|
|||
|
||||
div.sidebar > :last-child,
|
||||
aside.sidebar > :last-child,
|
||||
nav.contents > :last-child,
|
||||
aside.topic > :last-child,
|
||||
div.topic > :last-child,
|
||||
div.admonition > :last-child {
|
||||
margin-bottom: 0;
|
||||
|
@ -386,6 +380,8 @@ div.admonition > :last-child {
|
|||
|
||||
div.sidebar::after,
|
||||
aside.sidebar::after,
|
||||
nav.contents::after,
|
||||
aside.topic::after,
|
||||
div.topic::after,
|
||||
div.admonition::after,
|
||||
blockquote::after {
|
||||
|
@ -428,10 +424,6 @@ table.docutils td, table.docutils th {
|
|||
border-bottom: 1px solid #aaa;
|
||||
}
|
||||
|
||||
table.footnote td, table.footnote th {
|
||||
border: 0 !important;
|
||||
}
|
||||
|
||||
th {
|
||||
text-align: left;
|
||||
padding-right: 5px;
|
||||
|
@ -614,20 +606,26 @@ ol.simple p,
|
|||
ul.simple p {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
dl.footnote > dt,
|
||||
dl.citation > dt {
|
||||
aside.footnote > span,
|
||||
div.citation > span {
|
||||
float: left;
|
||||
margin-right: 0.5em;
|
||||
}
|
||||
|
||||
dl.footnote > dd,
|
||||
dl.citation > dd {
|
||||
aside.footnote > span:last-of-type,
|
||||
div.citation > span:last-of-type {
|
||||
padding-right: 0.5em;
|
||||
}
|
||||
aside.footnote > p {
|
||||
margin-left: 2em;
|
||||
}
|
||||
div.citation > p {
|
||||
margin-left: 4em;
|
||||
}
|
||||
aside.footnote > p:last-of-type,
|
||||
div.citation > p:last-of-type {
|
||||
margin-bottom: 0em;
|
||||
}
|
||||
|
||||
dl.footnote > dd:after,
|
||||
dl.citation > dd:after {
|
||||
aside.footnote > p:last-of-type:after,
|
||||
div.citation > p:last-of-type:after {
|
||||
content: "";
|
||||
clear: both;
|
||||
}
|
||||
|
@ -644,10 +642,6 @@ dl.field-list > dt {
|
|||
padding-right: 5px;
|
||||
}
|
||||
|
||||
dl.field-list > dt:after {
|
||||
content: ":";
|
||||
}
|
||||
|
||||
dl.field-list > dd {
|
||||
padding-left: 0.5em;
|
||||
margin-top: 0em;
|
||||
|
@ -731,8 +725,9 @@ dl.glossary dt {
|
|||
|
||||
.classifier:before {
|
||||
font-style: normal;
|
||||
margin: 0.5em;
|
||||
margin: 0 0.5em;
|
||||
content: ":";
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
abbr, acronym {
|
||||
|
@ -756,6 +751,7 @@ span.pre {
|
|||
-ms-hyphens: none;
|
||||
-webkit-hyphens: none;
|
||||
hyphens: none;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
div[class*="highlight-"] {
|
||||
|
|
|
@ -294,6 +294,8 @@ div.quotebar {
|
|||
padding: 2px 7px;
|
||||
border: 1px solid #ccc;
|
||||
}
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
|
||||
div.topic {
|
||||
background-color: #f8f8f8;
|
||||
|
|
|
@ -9,33 +9,22 @@
|
|||
// :copyright: Copyright 2012-2014 by Sphinx team, see AUTHORS.
|
||||
// :license: BSD, see LICENSE for details.
|
||||
//
|
||||
$(document).ready(function(){
|
||||
if (navigator.userAgent.indexOf('iPhone') > 0 ||
|
||||
navigator.userAgent.indexOf('Android') > 0) {
|
||||
$("li.nav-item-0 a").text("Top");
|
||||
const initialiseBizStyle = () => {
|
||||
if (navigator.userAgent.indexOf("iPhone") > 0 || navigator.userAgent.indexOf("Android") > 0) {
|
||||
document.querySelector("li.nav-item-0 a").innerText = "Top"
|
||||
}
|
||||
const truncator = item => {if (item.textContent.length > 20) {
|
||||
item.title = item.innerText
|
||||
item.innerText = item.innerText.substr(0, 17) + "..."
|
||||
}
|
||||
}
|
||||
document.querySelectorAll("div.related:first ul li:not(.right) a").slice(1).forEach(truncator);
|
||||
document.querySelectorAll("div.related:last ul li:not(.right) a").slice(1).forEach(truncator);
|
||||
}
|
||||
|
||||
$("div.related:first ul li:not(.right) a").slice(1).each(function(i, item){
|
||||
if (item.text.length > 20) {
|
||||
var tmpstr = item.text
|
||||
$(item).attr("title", tmpstr);
|
||||
$(item).text(tmpstr.substr(0, 17) + "...");
|
||||
}
|
||||
});
|
||||
$("div.related:last ul li:not(.right) a").slice(1).each(function(i, item){
|
||||
if (item.text.length > 20) {
|
||||
var tmpstr = item.text
|
||||
$(item).attr("title", tmpstr);
|
||||
$(item).text(tmpstr.substr(0, 17) + "...");
|
||||
}
|
||||
});
|
||||
});
|
||||
window.addEventListener("resize",
|
||||
() => (document.querySelector("li.nav-item-0 a").innerText = (window.innerWidth <= 776) ? "Top" : "QuaPy 0.1.7 documentation")
|
||||
)
|
||||
|
||||
$(window).resize(function(){
|
||||
if ($(window).width() <= 776) {
|
||||
$("li.nav-item-0 a").text("Top");
|
||||
}
|
||||
else {
|
||||
$("li.nav-item-0 a").text("QuaPy 0.1.6 documentation");
|
||||
}
|
||||
});
|
||||
if (document.readyState !== "loading") initialiseBizStyle()
|
||||
else document.addEventListener("DOMContentLoaded", initialiseBizStyle)
|
|
@ -1 +0,0 @@
|
|||
/* This file intentionally left blank. */
|
|
@ -2,322 +2,155 @@
|
|||
* doctools.js
|
||||
* ~~~~~~~~~~~
|
||||
*
|
||||
* Sphinx JavaScript utilities for all documentation.
|
||||
* Base JavaScript utilities for all Sphinx HTML documentation.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
"use strict";
|
||||
|
||||
/**
|
||||
* select a different prefix for underscore
|
||||
*/
|
||||
$u = _.noConflict();
|
||||
const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([
|
||||
"TEXTAREA",
|
||||
"INPUT",
|
||||
"SELECT",
|
||||
"BUTTON",
|
||||
]);
|
||||
|
||||
/**
|
||||
* make the code below compatible with browsers without
|
||||
* an installed firebug like debugger
|
||||
if (!window.console || !console.firebug) {
|
||||
var names = ["log", "debug", "info", "warn", "error", "assert", "dir",
|
||||
"dirxml", "group", "groupEnd", "time", "timeEnd", "count", "trace",
|
||||
"profile", "profileEnd"];
|
||||
window.console = {};
|
||||
for (var i = 0; i < names.length; ++i)
|
||||
window.console[names[i]] = function() {};
|
||||
}
|
||||
*/
|
||||
|
||||
/**
|
||||
* small helper function to urldecode strings
|
||||
*
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
|
||||
*/
|
||||
jQuery.urldecode = function(x) {
|
||||
if (!x) {
|
||||
return x
|
||||
const _ready = (callback) => {
|
||||
if (document.readyState !== "loading") {
|
||||
callback();
|
||||
} else {
|
||||
document.addEventListener("DOMContentLoaded", callback);
|
||||
}
|
||||
return decodeURIComponent(x.replace(/\+/g, ' '));
|
||||
};
|
||||
|
||||
/**
|
||||
* small helper function to urlencode strings
|
||||
*/
|
||||
jQuery.urlencode = encodeURIComponent;
|
||||
|
||||
/**
|
||||
* This function returns the parsed url parameters of the
|
||||
* current request. Multiple values per key are supported,
|
||||
* it will always return arrays of strings for the value parts.
|
||||
*/
|
||||
jQuery.getQueryParameters = function(s) {
|
||||
if (typeof s === 'undefined')
|
||||
s = document.location.search;
|
||||
var parts = s.substr(s.indexOf('?') + 1).split('&');
|
||||
var result = {};
|
||||
for (var i = 0; i < parts.length; i++) {
|
||||
var tmp = parts[i].split('=', 2);
|
||||
var key = jQuery.urldecode(tmp[0]);
|
||||
var value = jQuery.urldecode(tmp[1]);
|
||||
if (key in result)
|
||||
result[key].push(value);
|
||||
else
|
||||
result[key] = [value];
|
||||
}
|
||||
return result;
|
||||
};
|
||||
|
||||
/**
|
||||
* highlight a given string on a jquery object by wrapping it in
|
||||
* span elements with the given class name.
|
||||
*/
|
||||
jQuery.fn.highlightText = function(text, className) {
|
||||
function highlight(node, addItems) {
|
||||
if (node.nodeType === 3) {
|
||||
var val = node.nodeValue;
|
||||
var pos = val.toLowerCase().indexOf(text);
|
||||
if (pos >= 0 &&
|
||||
!jQuery(node.parentNode).hasClass(className) &&
|
||||
!jQuery(node.parentNode).hasClass("nohighlight")) {
|
||||
var span;
|
||||
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
|
||||
if (isInSVG) {
|
||||
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
|
||||
} else {
|
||||
span = document.createElement("span");
|
||||
span.className = className;
|
||||
}
|
||||
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
|
||||
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
|
||||
document.createTextNode(val.substr(pos + text.length)),
|
||||
node.nextSibling));
|
||||
node.nodeValue = val.substr(0, pos);
|
||||
if (isInSVG) {
|
||||
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
|
||||
var bbox = node.parentElement.getBBox();
|
||||
rect.x.baseVal.value = bbox.x;
|
||||
rect.y.baseVal.value = bbox.y;
|
||||
rect.width.baseVal.value = bbox.width;
|
||||
rect.height.baseVal.value = bbox.height;
|
||||
rect.setAttribute('class', className);
|
||||
addItems.push({
|
||||
"parent": node.parentNode,
|
||||
"target": rect});
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (!jQuery(node).is("button, select, textarea")) {
|
||||
jQuery.each(node.childNodes, function() {
|
||||
highlight(this, addItems);
|
||||
});
|
||||
}
|
||||
}
|
||||
var addItems = [];
|
||||
var result = this.each(function() {
|
||||
highlight(this, addItems);
|
||||
});
|
||||
for (var i = 0; i < addItems.length; ++i) {
|
||||
jQuery(addItems[i].parent).before(addItems[i].target);
|
||||
}
|
||||
return result;
|
||||
};
|
||||
|
||||
/*
|
||||
* backward compatibility for jQuery.browser
|
||||
* This will be supported until firefox bug is fixed.
|
||||
*/
|
||||
if (!jQuery.browser) {
|
||||
jQuery.uaMatch = function(ua) {
|
||||
ua = ua.toLowerCase();
|
||||
|
||||
var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(msie) ([\w.]+)/.exec(ua) ||
|
||||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
|
||||
[];
|
||||
|
||||
return {
|
||||
browser: match[ 1 ] || "",
|
||||
version: match[ 2 ] || "0"
|
||||
};
|
||||
};
|
||||
jQuery.browser = {};
|
||||
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Small JavaScript module for the documentation.
|
||||
*/
|
||||
var Documentation = {
|
||||
|
||||
init : function() {
|
||||
this.fixFirefoxAnchorBug();
|
||||
this.highlightSearchWords();
|
||||
this.initIndexTable();
|
||||
if (DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) {
|
||||
this.initOnKeyListeners();
|
||||
}
|
||||
const Documentation = {
|
||||
init: () => {
|
||||
Documentation.initDomainIndexTable();
|
||||
Documentation.initOnKeyListeners();
|
||||
},
|
||||
|
||||
/**
|
||||
* i18n support
|
||||
*/
|
||||
TRANSLATIONS : {},
|
||||
PLURAL_EXPR : function(n) { return n === 1 ? 0 : 1; },
|
||||
LOCALE : 'unknown',
|
||||
TRANSLATIONS: {},
|
||||
PLURAL_EXPR: (n) => (n === 1 ? 0 : 1),
|
||||
LOCALE: "unknown",
|
||||
|
||||
// gettext and ngettext don't access this so that the functions
|
||||
// can safely bound to a different name (_ = Documentation.gettext)
|
||||
gettext : function(string) {
|
||||
var translated = Documentation.TRANSLATIONS[string];
|
||||
if (typeof translated === 'undefined')
|
||||
return string;
|
||||
return (typeof translated === 'string') ? translated : translated[0];
|
||||
gettext: (string) => {
|
||||
const translated = Documentation.TRANSLATIONS[string];
|
||||
switch (typeof translated) {
|
||||
case "undefined":
|
||||
return string; // no translation
|
||||
case "string":
|
||||
return translated; // translation exists
|
||||
default:
|
||||
return translated[0]; // (singular, plural) translation tuple exists
|
||||
}
|
||||
},
|
||||
|
||||
ngettext : function(singular, plural, n) {
|
||||
var translated = Documentation.TRANSLATIONS[singular];
|
||||
if (typeof translated === 'undefined')
|
||||
return (n == 1) ? singular : plural;
|
||||
return translated[Documentation.PLURALEXPR(n)];
|
||||
ngettext: (singular, plural, n) => {
|
||||
const translated = Documentation.TRANSLATIONS[singular];
|
||||
if (typeof translated !== "undefined")
|
||||
return translated[Documentation.PLURAL_EXPR(n)];
|
||||
return n === 1 ? singular : plural;
|
||||
},
|
||||
|
||||
addTranslations : function(catalog) {
|
||||
for (var key in catalog.messages)
|
||||
this.TRANSLATIONS[key] = catalog.messages[key];
|
||||
this.PLURAL_EXPR = new Function('n', 'return +(' + catalog.plural_expr + ')');
|
||||
this.LOCALE = catalog.locale;
|
||||
addTranslations: (catalog) => {
|
||||
Object.assign(Documentation.TRANSLATIONS, catalog.messages);
|
||||
Documentation.PLURAL_EXPR = new Function(
|
||||
"n",
|
||||
`return (${catalog.plural_expr})`
|
||||
);
|
||||
Documentation.LOCALE = catalog.locale;
|
||||
},
|
||||
|
||||
/**
|
||||
* add context elements like header anchor links
|
||||
* helper function to focus on search bar
|
||||
*/
|
||||
addContextElements : function() {
|
||||
$('div[id] > :header:first').each(function() {
|
||||
$('<a class="headerlink">\u00B6</a>').
|
||||
attr('href', '#' + this.id).
|
||||
attr('title', _('Permalink to this headline')).
|
||||
appendTo(this);
|
||||
});
|
||||
$('dt[id]').each(function() {
|
||||
$('<a class="headerlink">\u00B6</a>').
|
||||
attr('href', '#' + this.id).
|
||||
attr('title', _('Permalink to this definition')).
|
||||
appendTo(this);
|
||||
});
|
||||
focusSearchBar: () => {
|
||||
document.querySelectorAll("input[name=q]")[0]?.focus();
|
||||
},
|
||||
|
||||
/**
|
||||
* workaround a firefox stupidity
|
||||
* see: https://bugzilla.mozilla.org/show_bug.cgi?id=645075
|
||||
* Initialise the domain index toggle buttons
|
||||
*/
|
||||
fixFirefoxAnchorBug : function() {
|
||||
if (document.location.hash && $.browser.mozilla)
|
||||
window.setTimeout(function() {
|
||||
document.location.href += '';
|
||||
}, 10);
|
||||
},
|
||||
|
||||
/**
|
||||
* highlight the search words provided in the url in the text
|
||||
*/
|
||||
highlightSearchWords : function() {
|
||||
var params = $.getQueryParameters();
|
||||
var terms = (params.highlight) ? params.highlight[0].split(/\s+/) : [];
|
||||
if (terms.length) {
|
||||
var body = $('div.body');
|
||||
if (!body.length) {
|
||||
body = $('body');
|
||||
initDomainIndexTable: () => {
|
||||
const toggler = (el) => {
|
||||
const idNumber = el.id.substr(7);
|
||||
const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`);
|
||||
if (el.src.substr(-9) === "minus.png") {
|
||||
el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`;
|
||||
toggledRows.forEach((el) => (el.style.display = "none"));
|
||||
} else {
|
||||
el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`;
|
||||
toggledRows.forEach((el) => (el.style.display = ""));
|
||||
}
|
||||
window.setTimeout(function() {
|
||||
$.each(terms, function() {
|
||||
body.highlightText(this.toLowerCase(), 'highlighted');
|
||||
});
|
||||
}, 10);
|
||||
$('<p class="highlight-link"><a href="javascript:Documentation.' +
|
||||
'hideSearchWords()">' + _('Hide Search Matches') + '</a></p>')
|
||||
.appendTo($('#searchbox'));
|
||||
}
|
||||
};
|
||||
|
||||
const togglerElements = document.querySelectorAll("img.toggler");
|
||||
togglerElements.forEach((el) =>
|
||||
el.addEventListener("click", (event) => toggler(event.currentTarget))
|
||||
);
|
||||
togglerElements.forEach((el) => (el.style.display = ""));
|
||||
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler);
|
||||
},
|
||||
|
||||
/**
|
||||
* init the domain index toggle buttons
|
||||
*/
|
||||
initIndexTable : function() {
|
||||
var togglers = $('img.toggler').click(function() {
|
||||
var src = $(this).attr('src');
|
||||
var idnum = $(this).attr('id').substr(7);
|
||||
$('tr.cg-' + idnum).toggle();
|
||||
if (src.substr(-9) === 'minus.png')
|
||||
$(this).attr('src', src.substr(0, src.length-9) + 'plus.png');
|
||||
else
|
||||
$(this).attr('src', src.substr(0, src.length-8) + 'minus.png');
|
||||
}).css('display', '');
|
||||
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) {
|
||||
togglers.click();
|
||||
}
|
||||
},
|
||||
initOnKeyListeners: () => {
|
||||
// only install a listener if it is really needed
|
||||
if (
|
||||
!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS &&
|
||||
!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS
|
||||
)
|
||||
return;
|
||||
|
||||
/**
|
||||
* helper function to hide the search marks again
|
||||
*/
|
||||
hideSearchWords : function() {
|
||||
$('#searchbox .highlight-link').fadeOut(300);
|
||||
$('span.highlighted').removeClass('highlighted');
|
||||
},
|
||||
document.addEventListener("keydown", (event) => {
|
||||
// bail for input elements
|
||||
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
|
||||
// bail with special keys
|
||||
if (event.altKey || event.ctrlKey || event.metaKey) return;
|
||||
|
||||
/**
|
||||
* make the url absolute
|
||||
*/
|
||||
makeURL : function(relativeURL) {
|
||||
return DOCUMENTATION_OPTIONS.URL_ROOT + '/' + relativeURL;
|
||||
},
|
||||
if (!event.shiftKey) {
|
||||
switch (event.key) {
|
||||
case "ArrowLeft":
|
||||
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
|
||||
|
||||
/**
|
||||
* get the current relative url
|
||||
*/
|
||||
getCurrentURL : function() {
|
||||
var path = document.location.pathname;
|
||||
var parts = path.split(/\//);
|
||||
$.each(DOCUMENTATION_OPTIONS.URL_ROOT.split(/\//), function() {
|
||||
if (this === '..')
|
||||
parts.pop();
|
||||
});
|
||||
var url = parts.join('/');
|
||||
return path.substring(url.lastIndexOf('/') + 1, path.length - 1);
|
||||
},
|
||||
|
||||
initOnKeyListeners: function() {
|
||||
$(document).keydown(function(event) {
|
||||
var activeElementType = document.activeElement.tagName;
|
||||
// don't navigate when in search box, textarea, dropdown or button
|
||||
if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT'
|
||||
&& activeElementType !== 'BUTTON' && !event.altKey && !event.ctrlKey && !event.metaKey
|
||||
&& !event.shiftKey) {
|
||||
switch (event.keyCode) {
|
||||
case 37: // left
|
||||
var prevHref = $('link[rel="prev"]').prop('href');
|
||||
if (prevHref) {
|
||||
window.location.href = prevHref;
|
||||
return false;
|
||||
const prevLink = document.querySelector('link[rel="prev"]');
|
||||
if (prevLink && prevLink.href) {
|
||||
window.location.href = prevLink.href;
|
||||
event.preventDefault();
|
||||
}
|
||||
break;
|
||||
case 39: // right
|
||||
var nextHref = $('link[rel="next"]').prop('href');
|
||||
if (nextHref) {
|
||||
window.location.href = nextHref;
|
||||
return false;
|
||||
case "ArrowRight":
|
||||
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
|
||||
|
||||
const nextLink = document.querySelector('link[rel="next"]');
|
||||
if (nextLink && nextLink.href) {
|
||||
window.location.href = nextLink.href;
|
||||
event.preventDefault();
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// some keyboard layouts may need Shift to get /
|
||||
switch (event.key) {
|
||||
case "/":
|
||||
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break;
|
||||
Documentation.focusSearchBar();
|
||||
event.preventDefault();
|
||||
}
|
||||
});
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
// quick alias for translations
|
||||
_ = Documentation.gettext;
|
||||
const _ = Documentation.gettext;
|
||||
|
||||
$(document).ready(function() {
|
||||
Documentation.init();
|
||||
});
|
||||
_ready(Documentation.init);
|
||||
|
|
|
@ -1,12 +1,14 @@
|
|||
var DOCUMENTATION_OPTIONS = {
|
||||
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
|
||||
VERSION: '0.1.6',
|
||||
LANGUAGE: 'None',
|
||||
VERSION: '0.1.7',
|
||||
LANGUAGE: 'en',
|
||||
COLLAPSE_INDEX: false,
|
||||
BUILDER: 'html',
|
||||
FILE_SUFFIX: '.html',
|
||||
LINK_SUFFIX: '.html',
|
||||
HAS_SOURCE: true,
|
||||
SOURCELINK_SUFFIX: '.txt',
|
||||
NAVIGATION_WITH_KEYS: false
|
||||
NAVIGATION_WITH_KEYS: false,
|
||||
SHOW_SEARCH_SUMMARY: true,
|
||||
ENABLE_SEARCH_SHORTCUTS: true,
|
||||
};
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
|
@ -5,12 +5,12 @@
|
|||
* This script contains the language-specific data used by searchtools.js,
|
||||
* namely the list of stopwords, stemmer, scorer and splitter.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
||||
var stopwords = ["a","and","are","as","at","be","but","by","for","if","in","into","is","it","near","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"];
|
||||
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
|
||||
|
||||
|
||||
/* Non-minified version is copied as a separate JS file, is available */
|
||||
|
@ -197,101 +197,3 @@ var Stemmer = function() {
|
|||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
var splitChars = (function() {
|
||||
var result = {};
|
||||
var singles = [96, 180, 187, 191, 215, 247, 749, 885, 903, 907, 909, 930, 1014, 1648,
|
||||
1748, 1809, 2416, 2473, 2481, 2526, 2601, 2609, 2612, 2615, 2653, 2702,
|
||||
2706, 2729, 2737, 2740, 2857, 2865, 2868, 2910, 2928, 2948, 2961, 2971,
|
||||
2973, 3085, 3089, 3113, 3124, 3213, 3217, 3241, 3252, 3295, 3341, 3345,
|
||||
3369, 3506, 3516, 3633, 3715, 3721, 3736, 3744, 3748, 3750, 3756, 3761,
|
||||
3781, 3912, 4239, 4347, 4681, 4695, 4697, 4745, 4785, 4799, 4801, 4823,
|
||||
4881, 5760, 5901, 5997, 6313, 7405, 8024, 8026, 8028, 8030, 8117, 8125,
|
||||
8133, 8181, 8468, 8485, 8487, 8489, 8494, 8527, 11311, 11359, 11687, 11695,
|
||||
11703, 11711, 11719, 11727, 11735, 12448, 12539, 43010, 43014, 43019, 43587,
|
||||
43696, 43713, 64286, 64297, 64311, 64317, 64319, 64322, 64325, 65141];
|
||||
var i, j, start, end;
|
||||
for (i = 0; i < singles.length; i++) {
|
||||
result[singles[i]] = true;
|
||||
}
|
||||
var ranges = [[0, 47], [58, 64], [91, 94], [123, 169], [171, 177], [182, 184], [706, 709],
|
||||
[722, 735], [741, 747], [751, 879], [888, 889], [894, 901], [1154, 1161],
|
||||
[1318, 1328], [1367, 1368], [1370, 1376], [1416, 1487], [1515, 1519], [1523, 1568],
|
||||
[1611, 1631], [1642, 1645], [1750, 1764], [1767, 1773], [1789, 1790], [1792, 1807],
|
||||
[1840, 1868], [1958, 1968], [1970, 1983], [2027, 2035], [2038, 2041], [2043, 2047],
|
||||
[2070, 2073], [2075, 2083], [2085, 2087], [2089, 2307], [2362, 2364], [2366, 2383],
|
||||
[2385, 2391], [2402, 2405], [2419, 2424], [2432, 2436], [2445, 2446], [2449, 2450],
|
||||
[2483, 2485], [2490, 2492], [2494, 2509], [2511, 2523], [2530, 2533], [2546, 2547],
|
||||
[2554, 2564], [2571, 2574], [2577, 2578], [2618, 2648], [2655, 2661], [2672, 2673],
|
||||
[2677, 2692], [2746, 2748], [2750, 2767], [2769, 2783], [2786, 2789], [2800, 2820],
|
||||
[2829, 2830], [2833, 2834], [2874, 2876], [2878, 2907], [2914, 2917], [2930, 2946],
|
||||
[2955, 2957], [2966, 2968], [2976, 2978], [2981, 2983], [2987, 2989], [3002, 3023],
|
||||
[3025, 3045], [3059, 3076], [3130, 3132], [3134, 3159], [3162, 3167], [3170, 3173],
|
||||
[3184, 3191], [3199, 3204], [3258, 3260], [3262, 3293], [3298, 3301], [3312, 3332],
|
||||
[3386, 3388], [3390, 3423], [3426, 3429], [3446, 3449], [3456, 3460], [3479, 3481],
|
||||
[3518, 3519], [3527, 3584], [3636, 3647], [3655, 3663], [3674, 3712], [3717, 3718],
|
||||
[3723, 3724], [3726, 3731], [3752, 3753], [3764, 3772], [3774, 3775], [3783, 3791],
|
||||
[3802, 3803], [3806, 3839], [3841, 3871], [3892, 3903], [3949, 3975], [3980, 4095],
|
||||
[4139, 4158], [4170, 4175], [4182, 4185], [4190, 4192], [4194, 4196], [4199, 4205],
|
||||
[4209, 4212], [4226, 4237], [4250, 4255], [4294, 4303], [4349, 4351], [4686, 4687],
|
||||
[4702, 4703], [4750, 4751], [4790, 4791], [4806, 4807], [4886, 4887], [4955, 4968],
|
||||
[4989, 4991], [5008, 5023], [5109, 5120], [5741, 5742], [5787, 5791], [5867, 5869],
|
||||
[5873, 5887], [5906, 5919], [5938, 5951], [5970, 5983], [6001, 6015], [6068, 6102],
|
||||
[6104, 6107], [6109, 6111], [6122, 6127], [6138, 6159], [6170, 6175], [6264, 6271],
|
||||
[6315, 6319], [6390, 6399], [6429, 6469], [6510, 6511], [6517, 6527], [6572, 6592],
|
||||
[6600, 6607], [6619, 6655], [6679, 6687], [6741, 6783], [6794, 6799], [6810, 6822],
|
||||
[6824, 6916], [6964, 6980], [6988, 6991], [7002, 7042], [7073, 7085], [7098, 7167],
|
||||
[7204, 7231], [7242, 7244], [7294, 7400], [7410, 7423], [7616, 7679], [7958, 7959],
|
||||
[7966, 7967], [8006, 8007], [8014, 8015], [8062, 8063], [8127, 8129], [8141, 8143],
|
||||
[8148, 8149], [8156, 8159], [8173, 8177], [8189, 8303], [8306, 8307], [8314, 8318],
|
||||
[8330, 8335], [8341, 8449], [8451, 8454], [8456, 8457], [8470, 8472], [8478, 8483],
|
||||
[8506, 8507], [8512, 8516], [8522, 8525], [8586, 9311], [9372, 9449], [9472, 10101],
|
||||
[10132, 11263], [11493, 11498], [11503, 11516], [11518, 11519], [11558, 11567],
|
||||
[11622, 11630], [11632, 11647], [11671, 11679], [11743, 11822], [11824, 12292],
|
||||
[12296, 12320], [12330, 12336], [12342, 12343], [12349, 12352], [12439, 12444],
|
||||
[12544, 12548], [12590, 12592], [12687, 12689], [12694, 12703], [12728, 12783],
|
||||
[12800, 12831], [12842, 12880], [12896, 12927], [12938, 12976], [12992, 13311],
|
||||
[19894, 19967], [40908, 40959], [42125, 42191], [42238, 42239], [42509, 42511],
|
||||
[42540, 42559], [42592, 42593], [42607, 42622], [42648, 42655], [42736, 42774],
|
||||
[42784, 42785], [42889, 42890], [42893, 43002], [43043, 43055], [43062, 43071],
|
||||
[43124, 43137], [43188, 43215], [43226, 43249], [43256, 43258], [43260, 43263],
|
||||
[43302, 43311], [43335, 43359], [43389, 43395], [43443, 43470], [43482, 43519],
|
||||
[43561, 43583], [43596, 43599], [43610, 43615], [43639, 43641], [43643, 43647],
|
||||
[43698, 43700], [43703, 43704], [43710, 43711], [43715, 43738], [43742, 43967],
|
||||
[44003, 44015], [44026, 44031], [55204, 55215], [55239, 55242], [55292, 55295],
|
||||
[57344, 63743], [64046, 64047], [64110, 64111], [64218, 64255], [64263, 64274],
|
||||
[64280, 64284], [64434, 64466], [64830, 64847], [64912, 64913], [64968, 65007],
|
||||
[65020, 65135], [65277, 65295], [65306, 65312], [65339, 65344], [65371, 65381],
|
||||
[65471, 65473], [65480, 65481], [65488, 65489], [65496, 65497]];
|
||||
for (i = 0; i < ranges.length; i++) {
|
||||
start = ranges[i][0];
|
||||
end = ranges[i][1];
|
||||
for (j = start; j <= end; j++) {
|
||||
result[j] = true;
|
||||
}
|
||||
}
|
||||
return result;
|
||||
})();
|
||||
|
||||
function splitQuery(query) {
|
||||
var result = [];
|
||||
var start = -1;
|
||||
for (var i = 0; i < query.length; i++) {
|
||||
if (splitChars[query.charCodeAt(i)]) {
|
||||
if (start !== -1) {
|
||||
result.push(query.slice(start, i));
|
||||
start = -1;
|
||||
}
|
||||
} else if (start === -1) {
|
||||
start = i;
|
||||
}
|
||||
}
|
||||
if (start !== -1) {
|
||||
result.push(query.slice(start));
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -4,22 +4,24 @@
|
|||
*
|
||||
* Sphinx JavaScript utilities for the full-text search.
|
||||
*
|
||||
* :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
|
||||
* :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
"use strict";
|
||||
|
||||
if (!Scorer) {
|
||||
/**
|
||||
* Simple result scoring code.
|
||||
*/
|
||||
/**
|
||||
* Simple result scoring code.
|
||||
*/
|
||||
if (typeof Scorer === "undefined") {
|
||||
var Scorer = {
|
||||
// Implement the following function to further tweak the score for each result
|
||||
// The function takes a result array [filename, title, anchor, descr, score]
|
||||
// The function takes a result array [docname, title, anchor, descr, score, filename]
|
||||
// and returns the new score.
|
||||
/*
|
||||
score: function(result) {
|
||||
return result[4];
|
||||
score: result => {
|
||||
const [docname, title, anchor, descr, score, filename] = result
|
||||
return score
|
||||
},
|
||||
*/
|
||||
|
||||
|
@ -28,9 +30,11 @@ if (!Scorer) {
|
|||
// or matches in the last dotted part of the object name
|
||||
objPartialMatch: 6,
|
||||
// Additive scores depending on the priority of the object
|
||||
objPrio: {0: 15, // used to be importantResults
|
||||
1: 5, // used to be objectResults
|
||||
2: -5}, // used to be unimportantResults
|
||||
objPrio: {
|
||||
0: 15, // used to be importantResults
|
||||
1: 5, // used to be objectResults
|
||||
2: -5, // used to be unimportantResults
|
||||
},
|
||||
// Used when the priority is not in the mapping.
|
||||
objPrioDefault: 0,
|
||||
|
||||
|
@ -39,455 +43,495 @@ if (!Scorer) {
|
|||
partialTitle: 7,
|
||||
// query found in terms
|
||||
term: 5,
|
||||
partialTerm: 2
|
||||
partialTerm: 2,
|
||||
};
|
||||
}
|
||||
|
||||
if (!splitQuery) {
|
||||
function splitQuery(query) {
|
||||
return query.split(/\s+/);
|
||||
const _removeChildren = (element) => {
|
||||
while (element && element.lastChild) element.removeChild(element.lastChild);
|
||||
};
|
||||
|
||||
/**
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
|
||||
*/
|
||||
const _escapeRegExp = (string) =>
|
||||
string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
|
||||
|
||||
const _displayItem = (item, searchTerms) => {
|
||||
const docBuilder = DOCUMENTATION_OPTIONS.BUILDER;
|
||||
const docUrlRoot = DOCUMENTATION_OPTIONS.URL_ROOT;
|
||||
const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX;
|
||||
const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX;
|
||||
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
|
||||
|
||||
const [docName, title, anchor, descr, score, _filename] = item;
|
||||
|
||||
let listItem = document.createElement("li");
|
||||
let requestUrl;
|
||||
let linkUrl;
|
||||
if (docBuilder === "dirhtml") {
|
||||
// dirhtml builder
|
||||
let dirname = docName + "/";
|
||||
if (dirname.match(/\/index\/$/))
|
||||
dirname = dirname.substring(0, dirname.length - 6);
|
||||
else if (dirname === "index/") dirname = "";
|
||||
requestUrl = docUrlRoot + dirname;
|
||||
linkUrl = requestUrl;
|
||||
} else {
|
||||
// normal html builders
|
||||
requestUrl = docUrlRoot + docName + docFileSuffix;
|
||||
linkUrl = docName + docLinkSuffix;
|
||||
}
|
||||
let linkEl = listItem.appendChild(document.createElement("a"));
|
||||
linkEl.href = linkUrl + anchor;
|
||||
linkEl.dataset.score = score;
|
||||
linkEl.innerHTML = title;
|
||||
if (descr)
|
||||
listItem.appendChild(document.createElement("span")).innerHTML =
|
||||
" (" + descr + ")";
|
||||
else if (showSearchSummary)
|
||||
fetch(requestUrl)
|
||||
.then((responseData) => responseData.text())
|
||||
.then((data) => {
|
||||
if (data)
|
||||
listItem.appendChild(
|
||||
Search.makeSearchSummary(data, searchTerms)
|
||||
);
|
||||
});
|
||||
Search.output.appendChild(listItem);
|
||||
};
|
||||
const _finishSearch = (resultCount) => {
|
||||
Search.stopPulse();
|
||||
Search.title.innerText = _("Search Results");
|
||||
if (!resultCount)
|
||||
Search.status.innerText = Documentation.gettext(
|
||||
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
|
||||
);
|
||||
else
|
||||
Search.status.innerText = _(
|
||||
`Search finished, found ${resultCount} page(s) matching the search query.`
|
||||
);
|
||||
};
|
||||
const _displayNextItem = (
|
||||
results,
|
||||
resultCount,
|
||||
searchTerms
|
||||
) => {
|
||||
// results left, load the summary and display it
|
||||
// this is intended to be dynamic (don't sub resultsCount)
|
||||
if (results.length) {
|
||||
_displayItem(results.pop(), searchTerms);
|
||||
setTimeout(
|
||||
() => _displayNextItem(results, resultCount, searchTerms),
|
||||
5
|
||||
);
|
||||
}
|
||||
// search finished, update title and status message
|
||||
else _finishSearch(resultCount);
|
||||
};
|
||||
|
||||
/**
|
||||
* Default splitQuery function. Can be overridden in ``sphinx.search`` with a
|
||||
* custom function per language.
|
||||
*
|
||||
* The regular expression works by splitting the string on consecutive characters
|
||||
* that are not Unicode letters, numbers, underscores, or emoji characters.
|
||||
* This is the same as ``\W+`` in Python, preserving the surrogate pair area.
|
||||
*/
|
||||
if (typeof splitQuery === "undefined") {
|
||||
var splitQuery = (query) => query
|
||||
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu)
|
||||
.filter(term => term) // remove remaining empty strings
|
||||
}
|
||||
|
||||
/**
|
||||
* Search Module
|
||||
*/
|
||||
var Search = {
|
||||
const Search = {
|
||||
_index: null,
|
||||
_queued_query: null,
|
||||
_pulse_status: -1,
|
||||
|
||||
_index : null,
|
||||
_queued_query : null,
|
||||
_pulse_status : -1,
|
||||
|
||||
htmlToText : function(htmlString) {
|
||||
var virtualDocument = document.implementation.createHTMLDocument('virtual');
|
||||
var htmlElement = $(htmlString, virtualDocument);
|
||||
htmlElement.find('.headerlink').remove();
|
||||
docContent = htmlElement.find('[role=main]')[0];
|
||||
if(docContent === undefined) {
|
||||
console.warn("Content block not found. Sphinx search tries to obtain it " +
|
||||
"via '[role=main]'. Could you check your theme or template.");
|
||||
return "";
|
||||
}
|
||||
return docContent.textContent || docContent.innerText;
|
||||
htmlToText: (htmlString) => {
|
||||
const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html');
|
||||
htmlElement.querySelectorAll(".headerlink").forEach((el) => { el.remove() });
|
||||
const docContent = htmlElement.querySelector('[role="main"]');
|
||||
if (docContent !== undefined) return docContent.textContent;
|
||||
console.warn(
|
||||
"Content block not found. Sphinx search tries to obtain it via '[role=main]'. Could you check your theme or template."
|
||||
);
|
||||
return "";
|
||||
},
|
||||
|
||||
init : function() {
|
||||
var params = $.getQueryParameters();
|
||||
if (params.q) {
|
||||
var query = params.q[0];
|
||||
$('input[name="q"]')[0].value = query;
|
||||
this.performSearch(query);
|
||||
}
|
||||
init: () => {
|
||||
const query = new URLSearchParams(window.location.search).get("q");
|
||||
document
|
||||
.querySelectorAll('input[name="q"]')
|
||||
.forEach((el) => (el.value = query));
|
||||
if (query) Search.performSearch(query);
|
||||
},
|
||||
|
||||
loadIndex : function(url) {
|
||||
$.ajax({type: "GET", url: url, data: null,
|
||||
dataType: "script", cache: true,
|
||||
complete: function(jqxhr, textstatus) {
|
||||
if (textstatus != "success") {
|
||||
document.getElementById("searchindexloader").src = url;
|
||||
}
|
||||
}});
|
||||
},
|
||||
loadIndex: (url) =>
|
||||
(document.body.appendChild(document.createElement("script")).src = url),
|
||||
|
||||
setIndex : function(index) {
|
||||
var q;
|
||||
this._index = index;
|
||||
if ((q = this._queued_query) !== null) {
|
||||
this._queued_query = null;
|
||||
Search.query(q);
|
||||
setIndex: (index) => {
|
||||
Search._index = index;
|
||||
if (Search._queued_query !== null) {
|
||||
const query = Search._queued_query;
|
||||
Search._queued_query = null;
|
||||
Search.query(query);
|
||||
}
|
||||
},
|
||||
|
||||
hasIndex : function() {
|
||||
return this._index !== null;
|
||||
},
|
||||
hasIndex: () => Search._index !== null,
|
||||
|
||||
deferQuery : function(query) {
|
||||
this._queued_query = query;
|
||||
},
|
||||
deferQuery: (query) => (Search._queued_query = query),
|
||||
|
||||
stopPulse : function() {
|
||||
this._pulse_status = 0;
|
||||
},
|
||||
stopPulse: () => (Search._pulse_status = -1),
|
||||
|
||||
startPulse : function() {
|
||||
if (this._pulse_status >= 0)
|
||||
return;
|
||||
function pulse() {
|
||||
var i;
|
||||
startPulse: () => {
|
||||
if (Search._pulse_status >= 0) return;
|
||||
|
||||
const pulse = () => {
|
||||
Search._pulse_status = (Search._pulse_status + 1) % 4;
|
||||
var dotString = '';
|
||||
for (i = 0; i < Search._pulse_status; i++)
|
||||
dotString += '.';
|
||||
Search.dots.text(dotString);
|
||||
if (Search._pulse_status > -1)
|
||||
window.setTimeout(pulse, 500);
|
||||
}
|
||||
Search.dots.innerText = ".".repeat(Search._pulse_status);
|
||||
if (Search._pulse_status >= 0) window.setTimeout(pulse, 500);
|
||||
};
|
||||
pulse();
|
||||
},
|
||||
|
||||
/**
|
||||
* perform a search for something (or wait until index is loaded)
|
||||
*/
|
||||
performSearch : function(query) {
|
||||
performSearch: (query) => {
|
||||
// create the required interface elements
|
||||
this.out = $('#search-results');
|
||||
this.title = $('<h2>' + _('Searching') + '</h2>').appendTo(this.out);
|
||||
this.dots = $('<span></span>').appendTo(this.title);
|
||||
this.status = $('<p class="search-summary"> </p>').appendTo(this.out);
|
||||
this.output = $('<ul class="search"/>').appendTo(this.out);
|
||||
const searchText = document.createElement("h2");
|
||||
searchText.textContent = _("Searching");
|
||||
const searchSummary = document.createElement("p");
|
||||
searchSummary.classList.add("search-summary");
|
||||
searchSummary.innerText = "";
|
||||
const searchList = document.createElement("ul");
|
||||
searchList.classList.add("search");
|
||||
|
||||
$('#search-progress').text(_('Preparing search...'));
|
||||
this.startPulse();
|
||||
const out = document.getElementById("search-results");
|
||||
Search.title = out.appendChild(searchText);
|
||||
Search.dots = Search.title.appendChild(document.createElement("span"));
|
||||
Search.status = out.appendChild(searchSummary);
|
||||
Search.output = out.appendChild(searchList);
|
||||
|
||||
const searchProgress = document.getElementById("search-progress");
|
||||
// Some themes don't use the search progress node
|
||||
if (searchProgress) {
|
||||
searchProgress.innerText = _("Preparing search...");
|
||||
}
|
||||
Search.startPulse();
|
||||
|
||||
// index already loaded, the browser was quick!
|
||||
if (this.hasIndex())
|
||||
this.query(query);
|
||||
else
|
||||
this.deferQuery(query);
|
||||
if (Search.hasIndex()) Search.query(query);
|
||||
else Search.deferQuery(query);
|
||||
},
|
||||
|
||||
/**
|
||||
* execute search (requires search index to be loaded)
|
||||
*/
|
||||
query : function(query) {
|
||||
var i;
|
||||
query: (query) => {
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const titles = Search._index.titles;
|
||||
const allTitles = Search._index.alltitles;
|
||||
const indexEntries = Search._index.indexentries;
|
||||
|
||||
// stem the searchterms and add them to the correct list
|
||||
var stemmer = new Stemmer();
|
||||
var searchterms = [];
|
||||
var excluded = [];
|
||||
var hlterms = [];
|
||||
var tmp = splitQuery(query);
|
||||
var objectterms = [];
|
||||
for (i = 0; i < tmp.length; i++) {
|
||||
if (tmp[i] !== "") {
|
||||
objectterms.push(tmp[i].toLowerCase());
|
||||
}
|
||||
// stem the search terms and add them to the correct list
|
||||
const stemmer = new Stemmer();
|
||||
const searchTerms = new Set();
|
||||
const excludedTerms = new Set();
|
||||
const highlightTerms = new Set();
|
||||
const objectTerms = new Set(splitQuery(query.toLowerCase().trim()));
|
||||
splitQuery(query.trim()).forEach((queryTerm) => {
|
||||
const queryTermLower = queryTerm.toLowerCase();
|
||||
|
||||
// maybe skip this "word"
|
||||
// stopwords array is from language_data.js
|
||||
if (
|
||||
stopwords.indexOf(queryTermLower) !== -1 ||
|
||||
queryTerm.match(/^\d+$/)
|
||||
)
|
||||
return;
|
||||
|
||||
if ($u.indexOf(stopwords, tmp[i].toLowerCase()) != -1 || tmp[i] === "") {
|
||||
// skip this "word"
|
||||
continue;
|
||||
}
|
||||
// stem the word
|
||||
var word = stemmer.stemWord(tmp[i].toLowerCase());
|
||||
// prevent stemmer from cutting word smaller than two chars
|
||||
if(word.length < 3 && tmp[i].length >= 3) {
|
||||
word = tmp[i];
|
||||
}
|
||||
var toAppend;
|
||||
let word = stemmer.stemWord(queryTermLower);
|
||||
// select the correct list
|
||||
if (word[0] == '-') {
|
||||
toAppend = excluded;
|
||||
word = word.substr(1);
|
||||
}
|
||||
if (word[0] === "-") excludedTerms.add(word.substr(1));
|
||||
else {
|
||||
toAppend = searchterms;
|
||||
hlterms.push(tmp[i].toLowerCase());
|
||||
searchTerms.add(word);
|
||||
highlightTerms.add(queryTermLower);
|
||||
}
|
||||
// only add if not already in the list
|
||||
if (!$u.contains(toAppend, word))
|
||||
toAppend.push(word);
|
||||
});
|
||||
|
||||
if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js
|
||||
localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" "))
|
||||
}
|
||||
var highlightstring = '?highlight=' + $.urlencode(hlterms.join(" "));
|
||||
|
||||
// console.debug('SEARCH: searching for:');
|
||||
// console.info('required: ', searchterms);
|
||||
// console.info('excluded: ', excluded);
|
||||
// console.debug("SEARCH: searching for:");
|
||||
// console.info("required: ", [...searchTerms]);
|
||||
// console.info("excluded: ", [...excludedTerms]);
|
||||
|
||||
// prepare search
|
||||
var terms = this._index.terms;
|
||||
var titleterms = this._index.titleterms;
|
||||
// array of [docname, title, anchor, descr, score, filename]
|
||||
let results = [];
|
||||
_removeChildren(document.getElementById("search-progress"));
|
||||
|
||||
// array of [filename, title, anchor, descr, score]
|
||||
var results = [];
|
||||
$('#search-progress').empty();
|
||||
const queryLower = query.toLowerCase();
|
||||
for (const [title, foundTitles] of Object.entries(allTitles)) {
|
||||
if (title.toLowerCase().includes(queryLower) && (queryLower.length >= title.length/2)) {
|
||||
for (const [file, id] of foundTitles) {
|
||||
let score = Math.round(100 * queryLower.length / title.length)
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file] !== title ? `${titles[file]} > ${title}` : title,
|
||||
id !== null ? "#" + id : "",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// search for explicit entries in index directives
|
||||
for (const [entry, foundEntries] of Object.entries(indexEntries)) {
|
||||
if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) {
|
||||
for (const [file, id] of foundEntries) {
|
||||
let score = Math.round(100 * queryLower.length / entry.length)
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file],
|
||||
id ? "#" + id : "",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// lookup as object
|
||||
for (i = 0; i < objectterms.length; i++) {
|
||||
var others = [].concat(objectterms.slice(0, i),
|
||||
objectterms.slice(i+1, objectterms.length));
|
||||
results = results.concat(this.performObjectSearch(objectterms[i], others));
|
||||
}
|
||||
objectTerms.forEach((term) =>
|
||||
results.push(...Search.performObjectSearch(term, objectTerms))
|
||||
);
|
||||
|
||||
// lookup as search terms in fulltext
|
||||
results = results.concat(this.performTermsSearch(searchterms, excluded, terms, titleterms));
|
||||
results.push(...Search.performTermsSearch(searchTerms, excludedTerms));
|
||||
|
||||
// let the scorer override scores with a custom scoring function
|
||||
if (Scorer.score) {
|
||||
for (i = 0; i < results.length; i++)
|
||||
results[i][4] = Scorer.score(results[i]);
|
||||
}
|
||||
if (Scorer.score) results.forEach((item) => (item[4] = Scorer.score(item)));
|
||||
|
||||
// now sort the results by score (in opposite order of appearance, since the
|
||||
// display function below uses pop() to retrieve items) and then
|
||||
// alphabetically
|
||||
results.sort(function(a, b) {
|
||||
var left = a[4];
|
||||
var right = b[4];
|
||||
if (left > right) {
|
||||
return 1;
|
||||
} else if (left < right) {
|
||||
return -1;
|
||||
} else {
|
||||
results.sort((a, b) => {
|
||||
const leftScore = a[4];
|
||||
const rightScore = b[4];
|
||||
if (leftScore === rightScore) {
|
||||
// same score: sort alphabetically
|
||||
left = a[1].toLowerCase();
|
||||
right = b[1].toLowerCase();
|
||||
return (left > right) ? -1 : ((left < right) ? 1 : 0);
|
||||
const leftTitle = a[1].toLowerCase();
|
||||
const rightTitle = b[1].toLowerCase();
|
||||
if (leftTitle === rightTitle) return 0;
|
||||
return leftTitle > rightTitle ? -1 : 1; // inverted is intentional
|
||||
}
|
||||
return leftScore > rightScore ? 1 : -1;
|
||||
});
|
||||
|
||||
// remove duplicate search results
|
||||
// note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept
|
||||
let seen = new Set();
|
||||
results = results.reverse().reduce((acc, result) => {
|
||||
let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(',');
|
||||
if (!seen.has(resultStr)) {
|
||||
acc.push(result);
|
||||
seen.add(resultStr);
|
||||
}
|
||||
return acc;
|
||||
}, []);
|
||||
|
||||
results = results.reverse();
|
||||
|
||||
// for debugging
|
||||
//Search.lastresults = results.slice(); // a copy
|
||||
//console.info('search results:', Search.lastresults);
|
||||
// console.info("search results:", Search.lastresults);
|
||||
|
||||
// print the results
|
||||
var resultCount = results.length;
|
||||
function displayNextItem() {
|
||||
// results left, load the summary and display it
|
||||
if (results.length) {
|
||||
var item = results.pop();
|
||||
var listItem = $('<li></li>');
|
||||
var requestUrl = "";
|
||||
var linkUrl = "";
|
||||
if (DOCUMENTATION_OPTIONS.BUILDER === 'dirhtml') {
|
||||
// dirhtml builder
|
||||
var dirname = item[0] + '/';
|
||||
if (dirname.match(/\/index\/$/)) {
|
||||
dirname = dirname.substring(0, dirname.length-6);
|
||||
} else if (dirname == 'index/') {
|
||||
dirname = '';
|
||||
}
|
||||
requestUrl = DOCUMENTATION_OPTIONS.URL_ROOT + dirname;
|
||||
linkUrl = requestUrl;
|
||||
|
||||
} else {
|
||||
// normal html builders
|
||||
requestUrl = DOCUMENTATION_OPTIONS.URL_ROOT + item[0] + DOCUMENTATION_OPTIONS.FILE_SUFFIX;
|
||||
linkUrl = item[0] + DOCUMENTATION_OPTIONS.LINK_SUFFIX;
|
||||
}
|
||||
listItem.append($('<a/>').attr('href',
|
||||
linkUrl +
|
||||
highlightstring + item[2]).html(item[1]));
|
||||
if (item[3]) {
|
||||
listItem.append($('<span> (' + item[3] + ')</span>'));
|
||||
Search.output.append(listItem);
|
||||
setTimeout(function() {
|
||||
displayNextItem();
|
||||
}, 5);
|
||||
} else if (DOCUMENTATION_OPTIONS.HAS_SOURCE) {
|
||||
$.ajax({url: requestUrl,
|
||||
dataType: "text",
|
||||
complete: function(jqxhr, textstatus) {
|
||||
var data = jqxhr.responseText;
|
||||
if (data !== '' && data !== undefined) {
|
||||
var summary = Search.makeSearchSummary(data, searchterms, hlterms);
|
||||
if (summary) {
|
||||
listItem.append(summary);
|
||||
}
|
||||
}
|
||||
Search.output.append(listItem);
|
||||
setTimeout(function() {
|
||||
displayNextItem();
|
||||
}, 5);
|
||||
}});
|
||||
} else {
|
||||
// no source available, just display title
|
||||
Search.output.append(listItem);
|
||||
setTimeout(function() {
|
||||
displayNextItem();
|
||||
}, 5);
|
||||
}
|
||||
}
|
||||
// search finished, update title and status message
|
||||
else {
|
||||
Search.stopPulse();
|
||||
Search.title.text(_('Search Results'));
|
||||
if (!resultCount)
|
||||
Search.status.text(_('Your search did not match any documents. Please make sure that all words are spelled correctly and that you\'ve selected enough categories.'));
|
||||
else
|
||||
Search.status.text(_('Search finished, found %s page(s) matching the search query.').replace('%s', resultCount));
|
||||
Search.status.fadeIn(500);
|
||||
}
|
||||
}
|
||||
displayNextItem();
|
||||
_displayNextItem(results, results.length, searchTerms);
|
||||
},
|
||||
|
||||
/**
|
||||
* search for object names
|
||||
*/
|
||||
performObjectSearch : function(object, otherterms) {
|
||||
var filenames = this._index.filenames;
|
||||
var docnames = this._index.docnames;
|
||||
var objects = this._index.objects;
|
||||
var objnames = this._index.objnames;
|
||||
var titles = this._index.titles;
|
||||
performObjectSearch: (object, objectTerms) => {
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const objects = Search._index.objects;
|
||||
const objNames = Search._index.objnames;
|
||||
const titles = Search._index.titles;
|
||||
|
||||
var i;
|
||||
var results = [];
|
||||
const results = [];
|
||||
|
||||
for (var prefix in objects) {
|
||||
for (var name in objects[prefix]) {
|
||||
var fullname = (prefix ? prefix + '.' : '') + name;
|
||||
var fullnameLower = fullname.toLowerCase()
|
||||
if (fullnameLower.indexOf(object) > -1) {
|
||||
var score = 0;
|
||||
var parts = fullnameLower.split('.');
|
||||
// check for different match types: exact matches of full name or
|
||||
// "last name" (i.e. last dotted part)
|
||||
if (fullnameLower == object || parts[parts.length - 1] == object) {
|
||||
score += Scorer.objNameMatch;
|
||||
// matches in last name
|
||||
} else if (parts[parts.length - 1].indexOf(object) > -1) {
|
||||
score += Scorer.objPartialMatch;
|
||||
}
|
||||
var match = objects[prefix][name];
|
||||
var objname = objnames[match[1]][2];
|
||||
var title = titles[match[0]];
|
||||
// If more than one term searched for, we require other words to be
|
||||
// found in the name/title/description
|
||||
if (otherterms.length > 0) {
|
||||
var haystack = (prefix + ' ' + name + ' ' +
|
||||
objname + ' ' + title).toLowerCase();
|
||||
var allfound = true;
|
||||
for (i = 0; i < otherterms.length; i++) {
|
||||
if (haystack.indexOf(otherterms[i]) == -1) {
|
||||
allfound = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!allfound) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
var descr = objname + _(', in ') + title;
|
||||
const objectSearchCallback = (prefix, match) => {
|
||||
const name = match[4]
|
||||
const fullname = (prefix ? prefix + "." : "") + name;
|
||||
const fullnameLower = fullname.toLowerCase();
|
||||
if (fullnameLower.indexOf(object) < 0) return;
|
||||
|
||||
var anchor = match[3];
|
||||
if (anchor === '')
|
||||
anchor = fullname;
|
||||
else if (anchor == '-')
|
||||
anchor = objnames[match[1]][1] + '-' + fullname;
|
||||
// add custom score for some objects according to scorer
|
||||
if (Scorer.objPrio.hasOwnProperty(match[2])) {
|
||||
score += Scorer.objPrio[match[2]];
|
||||
} else {
|
||||
score += Scorer.objPrioDefault;
|
||||
}
|
||||
results.push([docnames[match[0]], fullname, '#'+anchor, descr, score, filenames[match[0]]]);
|
||||
}
|
||||
let score = 0;
|
||||
const parts = fullnameLower.split(".");
|
||||
|
||||
// check for different match types: exact matches of full name or
|
||||
// "last name" (i.e. last dotted part)
|
||||
if (fullnameLower === object || parts.slice(-1)[0] === object)
|
||||
score += Scorer.objNameMatch;
|
||||
else if (parts.slice(-1)[0].indexOf(object) > -1)
|
||||
score += Scorer.objPartialMatch; // matches in last name
|
||||
|
||||
const objName = objNames[match[1]][2];
|
||||
const title = titles[match[0]];
|
||||
|
||||
// If more than one term searched for, we require other words to be
|
||||
// found in the name/title/description
|
||||
const otherTerms = new Set(objectTerms);
|
||||
otherTerms.delete(object);
|
||||
if (otherTerms.size > 0) {
|
||||
const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase();
|
||||
if (
|
||||
[...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0)
|
||||
)
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
let anchor = match[3];
|
||||
if (anchor === "") anchor = fullname;
|
||||
else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname;
|
||||
|
||||
const descr = objName + _(", in ") + title;
|
||||
|
||||
// add custom score for some objects according to scorer
|
||||
if (Scorer.objPrio.hasOwnProperty(match[2]))
|
||||
score += Scorer.objPrio[match[2]];
|
||||
else score += Scorer.objPrioDefault;
|
||||
|
||||
results.push([
|
||||
docNames[match[0]],
|
||||
fullname,
|
||||
"#" + anchor,
|
||||
descr,
|
||||
score,
|
||||
filenames[match[0]],
|
||||
]);
|
||||
};
|
||||
Object.keys(objects).forEach((prefix) =>
|
||||
objects[prefix].forEach((array) =>
|
||||
objectSearchCallback(prefix, array)
|
||||
)
|
||||
);
|
||||
return results;
|
||||
},
|
||||
|
||||
/**
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
|
||||
*/
|
||||
escapeRegExp : function(string) {
|
||||
return string.replace(/[.*+\-?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
|
||||
},
|
||||
|
||||
/**
|
||||
* search for full-text terms in the index
|
||||
*/
|
||||
performTermsSearch : function(searchterms, excluded, terms, titleterms) {
|
||||
var docnames = this._index.docnames;
|
||||
var filenames = this._index.filenames;
|
||||
var titles = this._index.titles;
|
||||
performTermsSearch: (searchTerms, excludedTerms) => {
|
||||
// prepare search
|
||||
const terms = Search._index.terms;
|
||||
const titleTerms = Search._index.titleterms;
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const titles = Search._index.titles;
|
||||
|
||||
var i, j, file;
|
||||
var fileMap = {};
|
||||
var scoreMap = {};
|
||||
var results = [];
|
||||
const scoreMap = new Map();
|
||||
const fileMap = new Map();
|
||||
|
||||
// perform the search on the required terms
|
||||
for (i = 0; i < searchterms.length; i++) {
|
||||
var word = searchterms[i];
|
||||
var files = [];
|
||||
var _o = [
|
||||
{files: terms[word], score: Scorer.term},
|
||||
{files: titleterms[word], score: Scorer.title}
|
||||
searchTerms.forEach((word) => {
|
||||
const files = [];
|
||||
const arr = [
|
||||
{ files: terms[word], score: Scorer.term },
|
||||
{ files: titleTerms[word], score: Scorer.title },
|
||||
];
|
||||
// add support for partial matches
|
||||
if (word.length > 2) {
|
||||
var word_regex = this.escapeRegExp(word);
|
||||
for (var w in terms) {
|
||||
if (w.match(word_regex) && !terms[word]) {
|
||||
_o.push({files: terms[w], score: Scorer.partialTerm})
|
||||
}
|
||||
}
|
||||
for (var w in titleterms) {
|
||||
if (w.match(word_regex) && !titleterms[word]) {
|
||||
_o.push({files: titleterms[w], score: Scorer.partialTitle})
|
||||
}
|
||||
}
|
||||
const escapedWord = _escapeRegExp(word);
|
||||
Object.keys(terms).forEach((term) => {
|
||||
if (term.match(escapedWord) && !terms[word])
|
||||
arr.push({ files: terms[term], score: Scorer.partialTerm });
|
||||
});
|
||||
Object.keys(titleTerms).forEach((term) => {
|
||||
if (term.match(escapedWord) && !titleTerms[word])
|
||||
arr.push({ files: titleTerms[word], score: Scorer.partialTitle });
|
||||
});
|
||||
}
|
||||
|
||||
// no match but word was a required one
|
||||
if ($u.every(_o, function(o){return o.files === undefined;})) {
|
||||
break;
|
||||
}
|
||||
if (arr.every((record) => record.files === undefined)) return;
|
||||
|
||||
// found search word in contents
|
||||
$u.each(_o, function(o) {
|
||||
var _files = o.files;
|
||||
if (_files === undefined)
|
||||
return
|
||||
arr.forEach((record) => {
|
||||
if (record.files === undefined) return;
|
||||
|
||||
if (_files.length === undefined)
|
||||
_files = [_files];
|
||||
files = files.concat(_files);
|
||||
let recordFiles = record.files;
|
||||
if (recordFiles.length === undefined) recordFiles = [recordFiles];
|
||||
files.push(...recordFiles);
|
||||
|
||||
// set score for the word in each file to Scorer.term
|
||||
for (j = 0; j < _files.length; j++) {
|
||||
file = _files[j];
|
||||
if (!(file in scoreMap))
|
||||
scoreMap[file] = {};
|
||||
scoreMap[file][word] = o.score;
|
||||
}
|
||||
// set score for the word in each file
|
||||
recordFiles.forEach((file) => {
|
||||
if (!scoreMap.has(file)) scoreMap.set(file, {});
|
||||
scoreMap.get(file)[word] = record.score;
|
||||
});
|
||||
});
|
||||
|
||||
// create the mapping
|
||||
for (j = 0; j < files.length; j++) {
|
||||
file = files[j];
|
||||
if (file in fileMap && fileMap[file].indexOf(word) === -1)
|
||||
fileMap[file].push(word);
|
||||
else
|
||||
fileMap[file] = [word];
|
||||
}
|
||||
}
|
||||
files.forEach((file) => {
|
||||
if (fileMap.has(file) && fileMap.get(file).indexOf(word) === -1)
|
||||
fileMap.get(file).push(word);
|
||||
else fileMap.set(file, [word]);
|
||||
});
|
||||
});
|
||||
|
||||
// now check if the files don't contain excluded terms
|
||||
for (file in fileMap) {
|
||||
var valid = true;
|
||||
|
||||
const results = [];
|
||||
for (const [file, wordList] of fileMap) {
|
||||
// check if all requirements are matched
|
||||
var filteredTermCount = // as search terms with length < 3 are discarded: ignore
|
||||
searchterms.filter(function(term){return term.length > 2}).length
|
||||
|
||||
// as search terms with length < 3 are discarded
|
||||
const filteredTermCount = [...searchTerms].filter(
|
||||
(term) => term.length > 2
|
||||
).length;
|
||||
if (
|
||||
fileMap[file].length != searchterms.length &&
|
||||
fileMap[file].length != filteredTermCount
|
||||
) continue;
|
||||
wordList.length !== searchTerms.size &&
|
||||
wordList.length !== filteredTermCount
|
||||
)
|
||||
continue;
|
||||
|
||||
// ensure that none of the excluded terms is in the search result
|
||||
for (i = 0; i < excluded.length; i++) {
|
||||
if (terms[excluded[i]] == file ||
|
||||
titleterms[excluded[i]] == file ||
|
||||
$u.contains(terms[excluded[i]] || [], file) ||
|
||||
$u.contains(titleterms[excluded[i]] || [], file)) {
|
||||
valid = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (
|
||||
[...excludedTerms].some(
|
||||
(term) =>
|
||||
terms[term] === file ||
|
||||
titleTerms[term] === file ||
|
||||
(terms[term] || []).includes(file) ||
|
||||
(titleTerms[term] || []).includes(file)
|
||||
)
|
||||
)
|
||||
break;
|
||||
|
||||
// if we have still a valid result we can add it to the result list
|
||||
if (valid) {
|
||||
// select one (max) score for the file.
|
||||
// for better ranking, we should calculate ranking by using words statistics like basic tf-idf...
|
||||
var score = $u.max($u.map(fileMap[file], function(w){return scoreMap[file][w]}));
|
||||
results.push([docnames[file], titles[file], '', null, score, filenames[file]]);
|
||||
}
|
||||
// select one (max) score for the file.
|
||||
const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w]));
|
||||
// add result to the result list
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file],
|
||||
"",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
return results;
|
||||
},
|
||||
|
@ -495,34 +539,28 @@ var Search = {
|
|||
/**
|
||||
* helper function to return a node containing the
|
||||
* search summary for a given text. keywords is a list
|
||||
* of stemmed words, hlwords is the list of normal, unstemmed
|
||||
* words. the first one is used to find the occurrence, the
|
||||
* latter for highlighting it.
|
||||
* of stemmed words.
|
||||
*/
|
||||
makeSearchSummary : function(htmlText, keywords, hlwords) {
|
||||
var text = Search.htmlToText(htmlText);
|
||||
if (text == "") {
|
||||
return null;
|
||||
}
|
||||
var textLower = text.toLowerCase();
|
||||
var start = 0;
|
||||
$.each(keywords, function() {
|
||||
var i = textLower.indexOf(this.toLowerCase());
|
||||
if (i > -1)
|
||||
start = i;
|
||||
});
|
||||
start = Math.max(start - 120, 0);
|
||||
var excerpt = ((start > 0) ? '...' : '') +
|
||||
$.trim(text.substr(start, 240)) +
|
||||
((start + 240 - text.length) ? '...' : '');
|
||||
var rv = $('<p class="context"></p>').text(excerpt);
|
||||
$.each(hlwords, function() {
|
||||
rv = rv.highlightText(this, 'highlighted');
|
||||
});
|
||||
return rv;
|
||||
}
|
||||
makeSearchSummary: (htmlText, keywords) => {
|
||||
const text = Search.htmlToText(htmlText);
|
||||
if (text === "") return null;
|
||||
|
||||
const textLower = text.toLowerCase();
|
||||
const actualStartPosition = [...keywords]
|
||||
.map((k) => textLower.indexOf(k.toLowerCase()))
|
||||
.filter((i) => i > -1)
|
||||
.slice(-1)[0];
|
||||
const startWithContext = Math.max(actualStartPosition - 120, 0);
|
||||
|
||||
const top = startWithContext === 0 ? "" : "...";
|
||||
const tail = startWithContext + 240 < text.length ? "..." : "";
|
||||
|
||||
let summary = document.createElement("p");
|
||||
summary.classList.add("context");
|
||||
summary.textContent = top + text.substr(startWithContext, 240).trim() + tail;
|
||||
|
||||
return summary;
|
||||
},
|
||||
};
|
||||
|
||||
$(document).ready(function() {
|
||||
Search.init();
|
||||
});
|
||||
_ready(Search.init);
|
||||
|
|
|
@ -2,18 +2,20 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Index — QuaPy 0.1.6 documentation</title>
|
||||
<title>Index — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="#" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
@ -31,7 +33,7 @@
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Index</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -54,6 +56,7 @@
|
|||
| <a href="#G"><strong>G</strong></a>
|
||||
| <a href="#H"><strong>H</strong></a>
|
||||
| <a href="#I"><strong>I</strong></a>
|
||||
| <a href="#J"><strong>J</strong></a>
|
||||
| <a href="#K"><strong>K</strong></a>
|
||||
| <a href="#L"><strong>L</strong></a>
|
||||
| <a href="#M"><strong>M</strong></a>
|
||||
|
@ -68,12 +71,17 @@
|
|||
| <a href="#V"><strong>V</strong></a>
|
||||
| <a href="#W"><strong>W</strong></a>
|
||||
| <a href="#X"><strong>X</strong></a>
|
||||
| <a href="#Y"><strong>Y</strong></a>
|
||||
|
||||
</div>
|
||||
<h2 id="A">A</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.error.absolute_error">absolute_error() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractProtocol">AbstractProtocol (class in quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol">AbstractStochasticSeededProtocol (class in quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ACC">ACC (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
|
@ -96,46 +104,36 @@
|
|||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.CC.aggregate">(quapy.method.aggregative.CC method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ELM.aggregate">(quapy.method.aggregative.ELM method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.DistributionMatching.aggregate">(quapy.method.aggregative.DistributionMatching method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.DyS.aggregate">(quapy.method.aggregative.DyS method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.aggregate">(quapy.method.aggregative.EMQ method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.HDy.aggregate">(quapy.method.aggregative.HDy method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.aggregate">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAllAggregative.aggregate">(quapy.method.aggregative.OneVsAllAggregative method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.aggregate">(quapy.method.aggregative.PACC method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PCC.aggregate">(quapy.method.aggregative.PCC method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SMM.aggregate">(quapy.method.aggregative.SMM method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ThresholdOptimization.aggregate">(quapy.method.aggregative.ThresholdOptimization method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.aggregative">aggregative (quapy.method.aggregative.AggregativeQuantifier property)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.aggregative">(quapy.method.base.BaseQuantifier property)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.aggregative">aggregative (quapy.method.meta.Ensemble property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.aggregative">(quapy.method.meta.Ensemble property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier">AggregativeProbabilisticQuantifier (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier">AggregativeQuantifier (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.artificial_prevalence_prediction">artificial_prevalence_prediction() (in module quapy.evaluation)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.APP">APP (class in quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.artificial_prevalence_protocol">artificial_prevalence_protocol() (in module quapy.evaluation)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.artificial_prevalence_report">artificial_prevalence_report() (in module quapy.evaluation)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.artificial_prevalence_sampling">artificial_prevalence_sampling() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.artificial_sampling_generator">artificial_sampling_generator() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.artificial_sampling_index_generator">artificial_sampling_index_generator() (quapy.data.base.LabelledCollection method)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.ArtificialPrevalenceProtocol">ArtificialPrevalenceProtocol (in module quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TorchDataset.asDataloader">asDataloader() (quapy.classification.neural.TorchDataset method)</a>
|
||||
</li>
|
||||
|
@ -146,6 +144,8 @@
|
|||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier">BaseQuantifier (class in quapy.method.base)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.BCTSCalibration">BCTSCalibration (class in quapy.classification.calibration)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.best_model">best_model() (quapy.model_selection.GridSearchQ method)</a>
|
||||
</li>
|
||||
|
@ -155,14 +155,6 @@
|
|||
|
||||
<ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.binary">(quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.binary">(quapy.method.aggregative.OneVsAll property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.binary">(quapy.method.base.BaseQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BinaryQuantifier.binary">(quapy.method.base.BinaryQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.binary">(quapy.method.meta.Ensemble property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
|
@ -185,32 +177,30 @@
|
|||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.CC">CC (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.Dataset.classes_">classes_ (quapy.data.base.Dataset property)</a>
|
||||
<li><a href="quapy.html#quapy.functional.check_prevalence_vector">check_prevalence_vector() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.classes_">classes_ (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase property)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.Dataset.classes_">(quapy.data.base.Dataset property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.classes_">(quapy.method.aggregative.AggregativeQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.classes_">(quapy.method.aggregative.OneVsAll property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.classes_">(quapy.method.base.BaseQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.classes_">(quapy.method.meta.Ensemble property)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric.classes_">(quapy.method.base.OneVsAllGeneric property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.classes_">(quapy.method.neural.QuaNetTrainer property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.classes_">(quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.classes_">(quapy.model_selection.GridSearchQ property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.classifier">classifier (quapy.method.aggregative.AggregativeQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ACC.classify">classify() (quapy.method.aggregative.ACC method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.classify">(quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.classify">(quapy.method.aggregative.AggregativeQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ELM.classify">(quapy.method.aggregative.ELM method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.classify">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAllAggregative.classify">(quapy.method.aggregative.OneVsAllAggregative method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.classify">(quapy.method.aggregative.PACC method)</a>
|
||||
</li>
|
||||
|
@ -224,12 +214,20 @@
|
|||
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.clean_checkpoint_dir">clean_checkpoint_dir() (quapy.method.neural.QuaNetTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.CNNnet">CNNnet (class in quapy.classification.neural)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.collator">collator() (quapy.protocol.AbstractStochasticSeededProtocol method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.counts">counts() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.util.create_if_not_exist">create_if_not_exist() (in module quapy.util)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.util.create_parent_dir">create_parent_dir() (in module quapy.util)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.cross_generate_predictions">cross_generate_predictions() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.cross_generate_predictions_depr">cross_generate_predictions_depr() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.model_selection.cross_val_predict">cross_val_predict() (in module quapy.model_selection)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -248,6 +246,8 @@
|
|||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.dimensions">dimensions() (quapy.classification.neural.TextClassifierNet method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.DistributionMatching">DistributionMatching (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
|
@ -259,9 +259,13 @@
|
|||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.document_embedding">(quapy.classification.neural.TextClassifierNet method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.html#quapy.protocol.DomainMixer">DomainMixer (class in quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.util.download_file">download_file() (in module quapy.util)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.util.download_file_if_not_exists">download_file_if_not_exists() (in module quapy.util)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.DyS">DyS (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -278,17 +282,15 @@
|
|||
<li><a href="quapy.method.html#quapy.method.meta.EEMQ">EEMQ() (in module quapy.method.meta)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.EHDy">EHDy() (in module quapy.method.meta)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ELM">ELM (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.EM">EM() (quapy.method.aggregative.EMQ class method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ">EMQ (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble">Ensemble (class in quapy.method.meta)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.ensembleFactory">ensembleFactory() (in module quapy.method.meta)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.EPACC">EPACC() (in module quapy.method.meta)</a>
|
||||
|
@ -299,9 +301,11 @@
|
|||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.evaluate">evaluate() (in module quapy.evaluation)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ExpectationMaximizationQuantifier">ExpectationMaximizationQuantifier (in module quapy.method.aggregative)</a>
|
||||
<li><a href="quapy.html#quapy.evaluation.evaluate_on_samples">evaluate_on_samples() (in module quapy.evaluation)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ExplicitLossMinimisation">ExplicitLossMinimisation (in module quapy.method.aggregative)</a>
|
||||
<li><a href="quapy.html#quapy.evaluation.evaluation_report">evaluation_report() (in module quapy.evaluation)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ExpectationMaximizationQuantifier">ExpectationMaximizationQuantifier (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -312,6 +316,8 @@
|
|||
<li><a href="quapy.html#quapy.error.f1_error">f1_error() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.error.f1e">f1e() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.datasets.fetch_lequa2022">fetch_lequa2022() (in module quapy.data.datasets)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.datasets.fetch_reviews">fetch_reviews() (in module quapy.data.datasets)</a>
|
||||
</li>
|
||||
|
@ -321,9 +327,11 @@
|
|||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.datasets.fetch_UCILabelledCollection">fetch_UCILabelledCollection() (in module quapy.data.datasets)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.fit">fit() (quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit">fit() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.fit">(quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.fit">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf.fit">(quapy.classification.svmperf.SVMperf method)</a>
|
||||
|
@ -336,21 +344,25 @@
|
|||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.CC.fit">(quapy.method.aggregative.CC method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ELM.fit">(quapy.method.aggregative.ELM method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.DistributionMatching.fit">(quapy.method.aggregative.DistributionMatching method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.DyS.fit">(quapy.method.aggregative.DyS method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.fit">(quapy.method.aggregative.EMQ method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.HDy.fit">(quapy.method.aggregative.HDy method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.fit">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.fit">(quapy.method.aggregative.PACC method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PCC.fit">(quapy.method.aggregative.PCC method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SMM.fit">(quapy.method.aggregative.SMM method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ThresholdOptimization.fit">(quapy.method.aggregative.ThresholdOptimization method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.fit">(quapy.method.base.BaseQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric.fit">(quapy.method.base.OneVsAllGeneric method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.fit">(quapy.method.meta.Ensemble method)</a>
|
||||
</li>
|
||||
|
@ -363,6 +375,10 @@
|
|||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv">fit_cv() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val">fit_tr_val() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer.fit_transform">fit_transform() (quapy.data.preprocessing.IndexTransformer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.forward">forward() (quapy.classification.neural.TextClassifierNet method)</a>
|
||||
|
@ -385,9 +401,9 @@
|
|||
<h2 id="G">G</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.evaluation.gen_prevalence_prediction">gen_prevalence_prediction() (in module quapy.evaluation)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.get_collator">get_collator() (quapy.protocol.OnLabelledCollectionProtocol class method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.gen_prevalence_report">gen_prevalence_report() (in module quapy.evaluation)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.get_labelled_collection">get_labelled_collection() (quapy.protocol.OnLabelledCollectionProtocol method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.get_nprevpoints_approximation">get_nprevpoints_approximation() (in module quapy.functional)</a>
|
||||
</li>
|
||||
|
@ -401,18 +417,10 @@
|
|||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.get_params">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.get_params">(quapy.classification.neural.TextClassifierNet method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.get_params">(quapy.method.aggregative.AggregativeQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.get_params">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.get_params">(quapy.method.base.BaseQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.get_params">(quapy.method.meta.Ensemble method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.get_params">(quapy.method.neural.QuaNetTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.get_params">(quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.get_params">(quapy.model_selection.GridSearchQ method)</a>
|
||||
</li>
|
||||
|
@ -423,6 +431,12 @@
|
|||
</li>
|
||||
<li><a href="quapy.html#quapy.util.get_quapy_home">get_quapy_home() (in module quapy.util)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ACC.getPteCondEstim">getPteCondEstim() (quapy.method.aggregative.ACC class method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PACC.getPteCondEstim">(quapy.method.aggregative.PACC class method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.html#quapy.model_selection.GridSearchQ">GridSearchQ (class in quapy.model_selection)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
|
@ -446,22 +460,20 @@
|
|||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.index">index() (in module quapy.data.preprocessing)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer">IndexTransformer (class in quapy.data.preprocessing)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.isaggregative">isaggregative() (in module quapy.method.base)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.isbinary">isbinary() (in module quapy)</a>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer">IndexTransformer (class in quapy.data.preprocessing)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.IterateProtocol">IterateProtocol (class in quapy.protocol)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.isbinary">(in module quapy.data.base)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.isbinary">(in module quapy.method.base)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.isprobabilistic">isprobabilistic() (in module quapy.method.base)</a>
|
||||
<h2 id="J">J</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.join">join() (quapy.data.base.LabelledCollection class method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -486,8 +498,6 @@
|
|||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection">LabelledCollection (class in quapy.data.base)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.learner">learner (quapy.method.aggregative.AggregativeQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.Dataset.load">load() (quapy.data.base.Dataset class method)</a>
|
||||
|
||||
|
@ -538,6 +548,8 @@
|
|||
<li><a href="quapy.html#module-quapy">quapy</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#module-quapy.classification">quapy.classification</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#module-quapy.classification.calibration">quapy.classification.calibration</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#module-quapy.classification.methods">quapy.classification.methods</a>
|
||||
</li>
|
||||
|
@ -576,6 +588,8 @@
|
|||
<li><a href="quapy.html#module-quapy.model_selection">quapy.model_selection</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#module-quapy.plot">quapy.plot</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#module-quapy.protocol">quapy.protocol</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#module-quapy.util">quapy.util</a>
|
||||
</li>
|
||||
|
@ -600,27 +614,33 @@
|
|||
|
||||
<ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.n_classes">(quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.n_classes">(quapy.method.base.BaseQuantifier property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.html#quapy.evaluation.natural_prevalence_prediction">natural_prevalence_prediction() (in module quapy.evaluation)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.NaturalPrevalenceProtocol">NaturalPrevalenceProtocol (in module quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.natural_prevalence_protocol">natural_prevalence_protocol() (in module quapy.evaluation)</a>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.NBVSCalibration">NBVSCalibration (class in quapy.classification.calibration)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.evaluation.natural_prevalence_report">natural_prevalence_report() (in module quapy.evaluation)</a>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer">NeuralClassifierTrainer (class in quapy.classification.neural)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.newELM">newELM() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.newOneVsAll">newOneVsAll() (in module quapy.method.base)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.natural_sampling_generator">natural_sampling_generator() (quapy.data.base.LabelledCollection method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMAE">newSVMAE() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.natural_sampling_index_generator">natural_sampling_index_generator() (quapy.data.base.LabelledCollection method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMKLD">newSVMKLD() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer">NeuralClassifierTrainer (class in quapy.classification.neural)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMQ">newSVMQ() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.newSVMRAE">newSVMRAE() (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.error.nkld">nkld() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.normalize_prevalence">normalize_prevalence() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.NPP">NPP (class in quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.num_prevalence_combinations">num_prevalence_combinations() (in module quapy.functional)</a>
|
||||
</li>
|
||||
|
@ -630,7 +650,17 @@
|
|||
<h2 id="O">O</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll">OneVsAll (class in quapy.method.aggregative)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.on_preclassified_instances">on_preclassified_instances() (quapy.protocol.OnLabelledCollectionProtocol method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.OneVsAll">OneVsAll (class in quapy.method.base)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAllAggregative">OneVsAllAggregative (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric">OneVsAllGeneric (class in quapy.method.base)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol">OnLabelledCollectionProtocol (class in quapy.protocol)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -638,6 +668,8 @@
|
|||
<h2 id="P">P</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.p">p (quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.PACC">PACC (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.util.parallel">parallel() (in module quapy.util)</a>
|
||||
|
@ -646,52 +678,44 @@
|
|||
</li>
|
||||
<li><a href="quapy.html#quapy.util.pickled_resource">pickled_resource() (in module quapy.util)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.posterior_probabilities">posterior_probabilities() (quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict">predict() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.posterior_probabilities">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict">(quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict">predict() (quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.predict">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf.predict">(quapy.classification.svmperf.SVMperf method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict_proba">predict_proba() (quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba">predict_proba() (quapy.classification.calibration.RecalibratedProbabilisticClassifierBase method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict_proba">(quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.predict_proba">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.predict_proba">(quapy.classification.neural.TextClassifierNet method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.predict_proba">(quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.EMQ.predict_proba">(quapy.method.aggregative.EMQ method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.evaluation.prediction">prediction() (in module quapy.evaluation)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.prevalence">prevalence() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.prevalence_from_labels">prevalence_from_labels() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.prevalence_from_probabilities">prevalence_from_probabilities() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.APP.prevalence_grid">prevalence_grid() (quapy.protocol.APP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.prevalence_linspace">prevalence_linspace() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.probabilistic">probabilistic (quapy.method.aggregative.AggregativeProbabilisticQuantifier property)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.probabilistic">(quapy.method.aggregative.OneVsAll property)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.probabilistic">probabilistic (quapy.method.meta.Ensemble property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.probabilistic">(quapy.method.base.BaseQuantifier property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.probabilistic">(quapy.method.meta.Ensemble property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ProbabilisticAdjustedClassifyAndCount">ProbabilisticAdjustedClassifyAndCount (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ProbabilisticClassifyAndCount">ProbabilisticClassifyAndCount (in module quapy.method.aggregative)</a>
|
||||
|
@ -706,14 +730,12 @@
|
|||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer">QuaNetTrainer (class in quapy.method.neural)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.quantify">quantify() (quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.quantify">quantify() (quapy.method.aggregative.AggregativeQuantifier method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.quantify">(quapy.method.aggregative.AggregativeQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.quantify">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.quantify">(quapy.method.base.BaseQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.OneVsAllGeneric.quantify">(quapy.method.base.OneVsAllGeneric method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.quantify">(quapy.method.meta.Ensemble method)</a>
|
||||
</li>
|
||||
|
@ -736,6 +758,13 @@
|
|||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#module-quapy.classification">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
quapy.classification.calibration
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#module-quapy.classification.calibration">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
|
@ -871,6 +900,13 @@
|
|||
|
||||
<ul>
|
||||
<li><a href="quapy.html#module-quapy.plot">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
quapy.protocol
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.html#module-quapy.protocol">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
|
@ -888,15 +924,25 @@
|
|||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.error.rae">rae() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.reduce_columns">reduce_columns() (in module quapy.data.preprocessing)</a>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.random_state">random_state (quapy.protocol.AbstractStochasticSeededProtocol property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifier">RecalibratedProbabilisticClassifier (class in quapy.classification.calibration)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase">RecalibratedProbabilisticClassifierBase (class in quapy.classification.calibration)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.Dataset.reduce">reduce() (quapy.data.base.Dataset method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.reduce_columns">reduce_columns() (in module quapy.data.preprocessing)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.reader.reindex_labels">reindex_labels() (in module quapy.data.reader)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.error.relative_absolute_error">relative_absolute_error() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.reset_net_params">reset_net_params() (quapy.classification.neural.NeuralClassifierTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.OnLabelledCollectionProtocol.RETURN_TYPES">RETURN_TYPES (quapy.protocol.OnLabelledCollectionProtocol attribute)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -904,6 +950,30 @@
|
|||
<h2 id="S">S</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.sample">sample() (quapy.protocol.AbstractStochasticSeededProtocol method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.html#quapy.protocol.APP.sample">(quapy.protocol.APP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.DomainMixer.sample">(quapy.protocol.DomainMixer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.NPP.sample">(quapy.protocol.NPP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.UPP.sample">(quapy.protocol.UPP method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.samples_parameters">samples_parameters() (quapy.protocol.AbstractStochasticSeededProtocol method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.html#quapy.protocol.APP.samples_parameters">(quapy.protocol.APP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.DomainMixer.samples_parameters">(quapy.protocol.DomainMixer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.NPP.samples_parameters">(quapy.protocol.NPP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.UPP.samples_parameters">(quapy.protocol.UPP method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.sampling">sampling() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.sampling_from_index">sampling_from_index() (quapy.data.base.LabelledCollection method)</a>
|
||||
|
@ -918,22 +988,10 @@
|
|||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.set_params">(quapy.classification.neural.NeuralClassifierTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf.set_params">(quapy.classification.svmperf.SVMperf method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeProbabilisticQuantifier.set_params">(quapy.method.aggregative.AggregativeProbabilisticQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.AggregativeQuantifier.set_params">(quapy.method.aggregative.AggregativeQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.OneVsAll.set_params">(quapy.method.aggregative.OneVsAll method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.base.BaseQuantifier.set_params">(quapy.method.base.BaseQuantifier method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.meta.Ensemble.set_params">(quapy.method.meta.Ensemble method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.neural.QuaNetTrainer.set_params">(quapy.method.neural.QuaNetTrainer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.set_params">(quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.model_selection.GridSearchQ.set_params">(quapy.model_selection.GridSearchQ method)</a>
|
||||
</li>
|
||||
|
@ -941,10 +999,14 @@
|
|||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SLD">SLD (in module quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SMM">SMM (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.error.smooth">smooth() (in module quapy.error)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ACC.solve_adjustment">solve_adjustment() (quapy.method.aggregative.ACC class method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.split_random">split_random() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.split_stratified">split_stratified() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
|
@ -959,18 +1021,8 @@
|
|||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.html#quapy.functional.strprev">strprev() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SVMAE">SVMAE (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SVMKLD">SVMKLD (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SVMNKLD">SVMNKLD (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.svmperf.SVMperf">SVMperf (class in quapy.classification.svmperf)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SVMQ">SVMQ (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.SVMRAE">SVMRAE (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -986,12 +1038,40 @@
|
|||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet">TextClassifierNet (class in quapy.classification.neural)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.ThresholdOptimization">ThresholdOptimization (class in quapy.method.aggregative)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.functional.TopsoeDistance">TopsoeDistance() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TorchDataset">TorchDataset (class in quapy.classification.neural)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.AbstractProtocol.total">total() (quapy.protocol.AbstractProtocol method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.html#quapy.protocol.APP.total">(quapy.protocol.APP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.DomainMixer.total">(quapy.protocol.DomainMixer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.IterateProtocol.total">(quapy.protocol.IterateProtocol method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.NPP.total">(quapy.protocol.NPP method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.UPP.total">(quapy.protocol.UPP method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.Dataset.train_test">train_test (quapy.data.base.Dataset property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.CNNnet.training">training (quapy.classification.neural.CNNnet attribute)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.LSTMnet.training">(quapy.classification.neural.LSTMnet attribute)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.training">(quapy.classification.neural.TextClassifierNet attribute)</a>
|
||||
</li>
|
||||
<li><a href="quapy.method.html#quapy.method.neural.QuaNetModule.training">(quapy.method.neural.QuaNetModule attribute)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.transform">transform() (quapy.classification.methods.LowRankLogisticRegression method)</a>
|
||||
|
||||
<ul>
|
||||
|
@ -1000,6 +1080,8 @@
|
|||
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer.transform">(quapy.data.preprocessing.IndexTransformer method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.TSCalibration">TSCalibration (class in quapy.classification.calibration)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
|
@ -1010,11 +1092,15 @@
|
|||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.uniform_sampling">uniform_sampling() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.uniform_sampling_index">uniform_sampling_index() (quapy.data.base.LabelledCollection method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.html#quapy.functional.uniform_simplex_sampling">uniform_simplex_sampling() (in module quapy.functional)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.UniformPrevalenceProtocol">UniformPrevalenceProtocol (in module quapy.protocol)</a>
|
||||
</li>
|
||||
<li><a href="quapy.html#quapy.protocol.UPP">UPP (class in quapy.protocol)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -1039,6 +1125,8 @@
|
|||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.preprocessing.IndexTransformer.vocabulary_size">vocabulary_size() (quapy.data.preprocessing.IndexTransformer method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.classification.html#quapy.classification.calibration.VSCalibration">VSCalibration (class in quapy.classification.calibration)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
@ -1055,16 +1143,30 @@
|
|||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.method.html#quapy.method.aggregative.X">X (class in quapy.method.aggregative)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.X">(quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.classification.html#quapy.classification.neural.TextClassifierNet.xavier_uniform">xavier_uniform() (quapy.classification.neural.TextClassifierNet method)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.Xp">Xp (quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.Xy">Xy (quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
<h2 id="Y">Y</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="quapy.data.html#quapy.data.base.LabelledCollection.y">y (quapy.data.base.LabelledCollection property)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
|
@ -1082,7 +1184,7 @@
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -1096,13 +1198,13 @@
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Index</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,19 +2,21 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>Welcome to QuaPy’s documentation! — QuaPy 0.1.6 documentation</title>
|
||||
<title>Welcome to QuaPy’s documentation! — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
@ -36,7 +38,7 @@
|
|||
<li class="right" >
|
||||
<a href="Installation.html" title="Installation"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Welcome to QuaPy’s documentation!</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -47,11 +49,11 @@
|
|||
<div class="body" role="main">
|
||||
|
||||
<section id="welcome-to-quapy-s-documentation">
|
||||
<h1>Welcome to QuaPy’s documentation!<a class="headerlink" href="#welcome-to-quapy-s-documentation" title="Permalink to this headline">¶</a></h1>
|
||||
<h1>Welcome to QuaPy’s documentation!<a class="headerlink" href="#welcome-to-quapy-s-documentation" title="Permalink to this heading">¶</a></h1>
|
||||
<p>QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)
|
||||
written in Python.</p>
|
||||
<section id="introduction">
|
||||
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
|
||||
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading">¶</a></h2>
|
||||
<p>QuaPy roots on the concept of data sample, and provides implementations of most important concepts
|
||||
in quantification literature, such as the most important quantification baselines, many advanced
|
||||
quantification methods, quantification-oriented model selection, many evaluation measures and protocols
|
||||
|
@ -59,7 +61,7 @@ used for evaluating quantification methods.
|
|||
QuaPy also integrates commonly used datasets and offers visualization tools for facilitating the analysis and
|
||||
interpretation of results.</p>
|
||||
<section id="a-quick-example">
|
||||
<h3>A quick example:<a class="headerlink" href="#a-quick-example" title="Permalink to this headline">¶</a></h3>
|
||||
<h3>A quick example:<a class="headerlink" href="#a-quick-example" title="Permalink to this heading">¶</a></h3>
|
||||
<p>The following script fetchs a Twitter dataset, trains and evaluates an
|
||||
<cite>Adjusted Classify & Count</cite> model in terms of the <cite>Mean Absolute Error</cite> (MAE)
|
||||
between the class prevalences estimated for the test set and the true prevalences
|
||||
|
@ -90,7 +92,7 @@ QuaPy implements sampling procedures and evaluation protocols that automates thi
|
|||
See the <a class="reference internal" href="Evaluation.html"><span class="doc">Evaluation</span></a> for detailed examples.</p>
|
||||
</section>
|
||||
<section id="features">
|
||||
<h3>Features<a class="headerlink" href="#features" title="Permalink to this headline">¶</a></h3>
|
||||
<h3>Features<a class="headerlink" href="#features" title="Permalink to this heading">¶</a></h3>
|
||||
<ul class="simple">
|
||||
<li><p>Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization, SVM-based variants for quantification, HDy, QuaNet, and Ensembles).</p></li>
|
||||
<li><p>Versatile functionality for performing evaluation based on artificial sampling protocols.</p></li>
|
||||
|
@ -100,6 +102,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
<li><p>32 UCI Machine Learning datasets.</p></li>
|
||||
<li><p>11 Twitter Sentiment datasets.</p></li>
|
||||
<li><p>3 Reviews Sentiment datasets.</p></li>
|
||||
<li><p>4 tasks from LeQua competition (_new in v0.1.7!_)</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -120,6 +123,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#reviews-datasets">Reviews Datasets</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#twitter-sentiment-datasets">Twitter Sentiment Datasets</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#uci-machine-learning">UCI Machine Learning</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#lequa-datasets">LeQua Datasets</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Datasets.html#adding-custom-datasets">Adding Custom Datasets</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
|
@ -128,11 +132,23 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
<li class="toctree-l2"><a class="reference internal" href="Evaluation.html#evaluation-protocols">Evaluation Protocols</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="Protocols.html">Protocols</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#artificial-prevalence-protocol">Artificial-Prevalence Protocol</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#sampling-from-the-unit-simplex-the-uniform-prevalence-protocol-upp">Sampling from the unit-simplex, the Uniform-Prevalence Protocol (UPP)</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#natural-prevalence-protocol">Natural-Prevalence Protocol</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Protocols.html#other-protocols">Other protocols</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="Methods.html">Quantification Methods</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Methods.html#aggregative-methods">Aggregative Methods</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Methods.html#meta-models">Meta Models</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="Model-Selection.html">Model Selection</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Model-Selection.html#targeting-a-quantification-oriented-loss">Targeting a Quantification-oriented loss</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Model-Selection.html#targeting-a-classification-oriented-loss">Targeting a Classification-oriented loss</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="Plotting.html">Plotting</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Plotting.html#diagonal-plot">Diagonal Plot</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Plotting.html#quantification-bias">Quantification bias</a></li>
|
||||
|
@ -149,7 +165,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
</section>
|
||||
</section>
|
||||
<section id="indices-and-tables">
|
||||
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline">¶</a></h1>
|
||||
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this heading">¶</a></h1>
|
||||
<ul class="simple">
|
||||
<li><p><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></p></li>
|
||||
<li><p><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></p></li>
|
||||
|
@ -164,8 +180,9 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="#">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="#">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Welcome to QuaPy’s documentation!</a><ul>
|
||||
<li><a class="reference internal" href="#introduction">Introduction</a><ul>
|
||||
<li><a class="reference internal" href="#a-quick-example">A quick example:</a></li>
|
||||
|
@ -177,9 +194,12 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
<li><a class="reference internal" href="#indices-and-tables">Indices and tables</a></li>
|
||||
</ul>
|
||||
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Installation.html"
|
||||
title="next chapter">Installation</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="Installation.html"
|
||||
title="next chapter">Installation</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -196,7 +216,7 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -213,13 +233,13 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
|
|||
<li class="right" >
|
||||
<a href="Installation.html" title="Installation"
|
||||
>next</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="#">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Welcome to QuaPy’s documentation!</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,19 +2,21 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>quapy — QuaPy 0.1.6 documentation</title>
|
||||
<title>quapy — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
@ -40,7 +42,7 @@
|
|||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">quapy</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -51,47 +53,19 @@
|
|||
<div class="body" role="main">
|
||||
|
||||
<section id="quapy">
|
||||
<h1>quapy<a class="headerlink" href="#quapy" title="Permalink to this headline">¶</a></h1>
|
||||
<h1>quapy<a class="headerlink" href="#quapy" title="Permalink to this heading">¶</a></h1>
|
||||
<div class="toctree-wrapper compound">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="quapy.html">quapy package</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#subpackages">Subpackages</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="quapy.classification.html">quapy.classification package</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#submodules">Submodules</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification.methods">quapy.classification.methods module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification.neural">quapy.classification.neural module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification.svmperf">quapy.classification.svmperf module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.classification.html#module-quapy.classification">Module contents</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="quapy.data.html">quapy.data package</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#submodules">Submodules</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.base">quapy.data.base module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.datasets">quapy.data.datasets module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.preprocessing">quapy.data.preprocessing module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data.reader">quapy.data.reader module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.data.html#module-quapy.data">Module contents</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="quapy.method.html">quapy.method package</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#submodules">Submodules</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.aggregative">quapy.method.aggregative module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.base">quapy.method.base module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.meta">quapy.method.meta module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.neural">quapy.method.neural module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method.non_aggregative">quapy.method.non_aggregative module</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="quapy.method.html#module-quapy.method">Module contents</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#submodules">Submodules</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.error">quapy.error module</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.evaluation">quapy.evaluation module</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.functional">quapy.functional module</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.model_selection">quapy.model_selection module</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.plot">quapy.plot module</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.util">quapy.util module</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.error">quapy.error</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.evaluation">quapy.evaluation</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#quapy-protocol">quapy.protocol</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.functional">quapy.functional</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.model_selection">quapy.model_selection</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.plot">quapy.plot</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy.util">quapy.util</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#subpackages">Subpackages</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="quapy.html#module-quapy">Module contents</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
|
@ -106,12 +80,16 @@
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Plotting.html"
|
||||
title="previous chapter">Plotting</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="quapy.html"
|
||||
title="next chapter">quapy package</a></p>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="Plotting.html"
|
||||
title="previous chapter">Plotting</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="quapy.html"
|
||||
title="next chapter">quapy package</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -128,7 +106,7 @@
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -148,13 +126,13 @@
|
|||
<li class="right" >
|
||||
<a href="Plotting.html" title="Plotting"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">quapy</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
Binary file not shown.
|
@ -2,18 +2,20 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Python Module Index — QuaPy 0.1.6 documentation</title>
|
||||
<title>Python Module Index — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
@ -34,7 +36,7 @@
|
|||
<li class="right" >
|
||||
<a href="#" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Python Module Index</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -66,6 +68,11 @@
|
|||
<td>   
|
||||
<a href="quapy.classification.html#module-quapy.classification"><code class="xref">quapy.classification</code></a></td><td>
|
||||
<em></em></td></tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="quapy.classification.html#module-quapy.classification.calibration"><code class="xref">quapy.classification.calibration</code></a></td><td>
|
||||
<em></em></td></tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
|
@ -161,6 +168,11 @@
|
|||
<td>   
|
||||
<a href="quapy.html#module-quapy.plot"><code class="xref">quapy.plot</code></a></td><td>
|
||||
<em></em></td></tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="quapy.html#module-quapy.protocol"><code class="xref">quapy.protocol</code></a></td><td>
|
||||
<em></em></td></tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
|
@ -184,7 +196,7 @@
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -198,13 +210,13 @@
|
|||
<li class="right" >
|
||||
<a href="#" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Python Module Index</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,19 +2,21 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<title>quapy.classification package — QuaPy 0.1.6 documentation</title>
|
||||
<title>quapy.classification package — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
|
@ -40,7 +42,7 @@
|
|||
<li class="right" >
|
||||
<a href="quapy.html" title="quapy package"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> »</li>
|
||||
<li class="nav-item nav-item-2"><a href="quapy.html" accesskey="U">quapy package</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">quapy.classification package</a></li>
|
||||
|
@ -53,16 +55,232 @@
|
|||
<div class="body" role="main">
|
||||
|
||||
<section id="quapy-classification-package">
|
||||
<h1>quapy.classification package<a class="headerlink" href="#quapy-classification-package" title="Permalink to this headline">¶</a></h1>
|
||||
<h1>quapy.classification package<a class="headerlink" href="#quapy-classification-package" title="Permalink to this heading">¶</a></h1>
|
||||
<section id="submodules">
|
||||
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline">¶</a></h2>
|
||||
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this heading">¶</a></h2>
|
||||
</section>
|
||||
<section id="quapy-classification-calibration">
|
||||
<h2>quapy.classification.calibration<a class="headerlink" href="#quapy-classification-calibration" title="Permalink to this heading">¶</a></h2>
|
||||
<div class="versionadded">
|
||||
<p><span class="versionmodified added">New in version 0.1.7.</span></p>
|
||||
</div>
|
||||
<span class="target" id="module-quapy.classification.calibration"></span><dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.BCTSCalibration">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">BCTSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.BCTSCalibration" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
|
||||
<p>Applies the Bias-Corrected Temperature Scaling (BCTS) calibration method from <cite>abstention.calibration</cite>, as defined in
|
||||
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>classifier</strong> – a scikit-learn probabilistic classifier</p></li>
|
||||
<li><p><strong>val_split</strong> – indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.</p></li>
|
||||
<li><p><strong>n_jobs</strong> – indicate the number of parallel workers (only when val_split is an integer)</p></li>
|
||||
<li><p><strong>verbose</strong> – whether or not to display information in the standard output</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.NBVSCalibration">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">NBVSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.NBVSCalibration" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
|
||||
<p>Applies the No-Bias Vector Scaling (NBVS) calibration method from <cite>abstention.calibration</cite>, as defined in
|
||||
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>classifier</strong> – a scikit-learn probabilistic classifier</p></li>
|
||||
<li><p><strong>val_split</strong> – indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.</p></li>
|
||||
<li><p><strong>n_jobs</strong> – indicate the number of parallel workers (only when val_split is an integer)</p></li>
|
||||
<li><p><strong>verbose</strong> – whether or not to display information in the standard output</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifier">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">RecalibratedProbabilisticClassifier</span></span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifier" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
|
||||
<p>Abstract class for (re)calibration method from <cite>abstention.calibration</cite>, as defined in
|
||||
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari, A., Kundaje, A., & Shrikumar, A. (2020, November). Maximum likelihood with bias-corrected calibration
|
||||
is hard-to-beat at label shift adaptation. In International Conference on Machine Learning (pp. 222-232). PMLR.</a>:</p>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">RecalibratedProbabilisticClassifierBase</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">calibrator</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifier" title="quapy.classification.calibration.RecalibratedProbabilisticClassifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifier</span></code></a></p>
|
||||
<p>Applies a (re)calibration method from <cite>abstention.calibration</cite>, as defined in
|
||||
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>classifier</strong> – a scikit-learn probabilistic classifier</p></li>
|
||||
<li><p><strong>calibrator</strong> – the calibration object (an instance of abstention.calibration.CalibratorFactory)</p></li>
|
||||
<li><p><strong>val_split</strong> – indicate an integer k for performing kFCV to obtain the posterior probabilities, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.</p></li>
|
||||
<li><p><strong>n_jobs</strong> – indicate the number of parallel workers (only when val_split is an integer); default=None</p></li>
|
||||
<li><p><strong>verbose</strong> – whether or not to display information in the standard output</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.classes_">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">classes_</span></span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.classes_" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Returns the classes on which the classifier has been trained on</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>array-like of shape <cite>(n_classes)</cite></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit">
|
||||
<span class="sig-name descname"><span class="pre">fit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Fits the calibration for the probabilistic classifier.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p></li>
|
||||
<li><p><strong>y</strong> – array-like of shape <cite>(n_samples,)</cite> with the class labels</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>self</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv">
|
||||
<span class="sig-name descname"><span class="pre">fit_cv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Fits the calibration in a cross-validation manner, i.e., it generates posterior probabilities for all
|
||||
training instances via cross-validation, and then retrains the classifier on all training instances.
|
||||
The posterior probabilities thus generated are used for calibrating the outputs of the classifier.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p></li>
|
||||
<li><p><strong>y</strong> – array-like of shape <cite>(n_samples,)</cite> with the class labels</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>self</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val">
|
||||
<span class="sig-name descname"><span class="pre">fit_tr_val</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Fits the calibration in a train/val-split manner, i.e.t, it partitions the training instances into a
|
||||
training and a validation set, and then uses the training samples to learn classifier which is then used
|
||||
to generate posterior probabilities for the held-out validation data. These posteriors are used to calibrate
|
||||
the classifier. The classifier is not retrained on the whole dataset.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p></li>
|
||||
<li><p><strong>y</strong> – array-like of shape <cite>(n_samples,)</cite> with the class labels</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>self</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict">
|
||||
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts class labels for the data instances in <cite>X</cite></p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples,)</cite> with the class label predictions</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba">
|
||||
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Generates posterior probabilities for the data instances in <cite>X</cite></p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> with the data instances</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with posterior probabilities</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.TSCalibration">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">TSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.TSCalibration" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
|
||||
<p>Applies the Temperature Scaling (TS) calibration method from <cite>abstention.calibration</cite>, as defined in
|
||||
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>classifier</strong> – a scikit-learn probabilistic classifier</p></li>
|
||||
<li><p><strong>val_split</strong> – indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.</p></li>
|
||||
<li><p><strong>n_jobs</strong> – indicate the number of parallel workers (only when val_split is an integer)</p></li>
|
||||
<li><p><strong>verbose</strong> – whether or not to display information in the standard output</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.calibration.VSCalibration">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.calibration.</span></span><span class="sig-name descname"><span class="pre">VSCalibration</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">classifier</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">5</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.calibration.VSCalibration" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase" title="quapy.classification.calibration.RecalibratedProbabilisticClassifierBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">RecalibratedProbabilisticClassifierBase</span></code></a></p>
|
||||
<p>Applies the Vector Scaling (VS) calibration method from <cite>abstention.calibration</cite>, as defined in
|
||||
<a class="reference external" href="http://proceedings.mlr.press/v119/alexandari20a.html">Alexandari et al. paper</a>:</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>classifier</strong> – a scikit-learn probabilistic classifier</p></li>
|
||||
<li><p><strong>val_split</strong> – indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.</p></li>
|
||||
<li><p><strong>n_jobs</strong> – indicate the number of parallel workers (only when val_split is an integer)</p></li>
|
||||
<li><p><strong>verbose</strong> – whether or not to display information in the standard output</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-quapy.classification.methods">
|
||||
<span id="quapy-classification-methods-module"></span><h2>quapy.classification.methods module<a class="headerlink" href="#module-quapy.classification.methods" title="Permalink to this headline">¶</a></h2>
|
||||
<span id="quapy-classification-methods"></span><h2>quapy.classification.methods<a class="headerlink" href="#module-quapy.classification.methods" title="Permalink to this heading">¶</a></h2>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.methods.LowRankLogisticRegression">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.methods.</span></span><span class="sig-name descname"><span class="pre">LowRankLogisticRegression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n_components</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.base.BaseEstimator</span></code></p>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.methods.</span></span><span class="sig-name descname"><span class="pre">LowRankLogisticRegression</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n_components</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code></p>
|
||||
<p>An example of a classification method (i.e., an object that implements <cite>fit</cite>, <cite>predict</cite>, and <cite>predict_proba</cite>)
|
||||
that also generates embedded inputs (i.e., that implements <cite>transform</cite>), as those required for
|
||||
<code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.method.neural.QuaNet</span></code>. This is a mock method to allow for easily instantiating
|
||||
|
@ -70,7 +288,7 @@ that also generates embedded inputs (i.e., that implements <cite>transform</cite
|
|||
The transformation consists of applying <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.decomposition.TruncatedSVD</span></code>
|
||||
while classification is performed using <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.linear_model.LogisticRegression</span></code> on the low-rank space.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>n_components</strong> – the number of principal components to retain</p></li>
|
||||
<li><p><strong>kwargs</strong> – parameters for the
|
||||
|
@ -84,13 +302,13 @@ while classification is performed using <code class="xref py py-class docutils l
|
|||
<dd><p>Fit the model according to the given training data. The fit consists of
|
||||
fitting <cite>TruncatedSVD</cite> and then <cite>LogisticRegression</cite> on the low-rank representation.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> with the instances</p></li>
|
||||
<li><p><strong>y</strong> – array-like of shape <cite>(n_samples, n_classes)</cite> with the class labels</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p><cite>self</cite></p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -101,7 +319,7 @@ fitting <cite>TruncatedSVD</cite> and then <cite>LogisticRegression</cite> on th
|
|||
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.get_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get hyper-parameters for this estimator.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -112,10 +330,10 @@ fitting <cite>TruncatedSVD</cite> and then <cite>LogisticRegression</cite> on th
|
|||
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.predict" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts labels for the instances <cite>X</cite> embedded into the low-rank space.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of
|
||||
instances in <cite>X</cite></p>
|
||||
</dd>
|
||||
|
@ -127,10 +345,10 @@ instances in <cite>X</cite></p>
|
|||
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.predict_proba" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts posterior probabilities for the instances <cite>X</cite> embedded into the low-rank space.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with the posterior probabilities</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -141,7 +359,7 @@ instances in <cite>X</cite></p>
|
|||
<span class="sig-name descname"><span class="pre">set_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.methods.LowRankLogisticRegression.set_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Set the parameters of this estimator.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>parameters</strong> – a <cite>**kwargs</cite> dictionary with the estimator parameters for
|
||||
<a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html">Logistic Regression</a>
|
||||
and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
|
||||
|
@ -155,10 +373,10 @@ and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
|
|||
<dd><p>Returns the low-rank approximation of <cite>X</cite> with <cite>n_components</cite> dimensions, or <cite>X</cite> unaltered if
|
||||
<cite>n_components</cite> >= <cite>X.shape[1]</cite>.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> instances to embed</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_components)</cite> with the embedded instances</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -168,15 +386,15 @@ and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
|
|||
|
||||
</section>
|
||||
<section id="module-quapy.classification.neural">
|
||||
<span id="quapy-classification-neural-module"></span><h2>quapy.classification.neural module<a class="headerlink" href="#module-quapy.classification.neural" title="Permalink to this headline">¶</a></h2>
|
||||
<span id="quapy-classification-neural"></span><h2>quapy.classification.neural<a class="headerlink" href="#module-quapy.classification.neural" title="Permalink to this heading">¶</a></h2>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">CNNnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_heights</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[3,</span> <span class="pre">5,</span> <span class="pre">7]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.CNNnet" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a></p>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">CNNnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_heights</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[3,</span> <span class="pre">5,</span> <span class="pre">7]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.CNNnet" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextClassifierNet</span></code></a></p>
|
||||
<p>An implementation of <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a> based on
|
||||
Convolutional Neural Networks.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>vocabulary_size</strong> – the size of the vocabulary</p></li>
|
||||
<li><p><strong>n_classes</strong> – number of target classes</p></li>
|
||||
|
@ -197,11 +415,11 @@ consecutive tokens that each kernel covers</p></li>
|
|||
<dd><p>Embeds documents (i.e., performs the forward pass up to the
|
||||
next-to-last layer).</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>input</strong> – a batch of instances, typically generated by a torch’s <cite>DataLoader</cite>
|
||||
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a torch tensor of shape <cite>(n_samples, n_dimensions)</cite>, where
|
||||
<cite>n_samples</cite> is the number of documents, and <cite>n_dimensions</cite> is the
|
||||
dimensionality of the embedding</p>
|
||||
|
@ -214,18 +432,23 @@ dimensionality of the embedding</p>
|
|||
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.CNNnet.get_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get hyper-parameters for this estimator</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet.training">
|
||||
<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em><a class="headerlink" href="#quapy.classification.neural.CNNnet.training" title="Permalink to this definition">¶</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet.vocabulary_size">
|
||||
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.CNNnet.vocabulary_size" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.CNNnet.vocabulary_size" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Return the size of the vocabulary</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>integer</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -235,12 +458,12 @@ dimensionality of the embedding</p>
|
|||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.LSTMnet">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">LSTMnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lstm_class_nlayers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.LSTMnet" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a></p>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">LSTMnet</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocabulary_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">embedding_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">256</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">repr_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">100</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lstm_class_nlayers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">drop_p</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.5</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.LSTMnet" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">TextClassifierNet</span></code></a></p>
|
||||
<p>An implementation of <a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TextClassifierNet</span></code></a> based on
|
||||
Long Short Term Memory networks.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>vocabulary_size</strong> – the size of the vocabulary</p></li>
|
||||
<li><p><strong>n_classes</strong> – number of target classes</p></li>
|
||||
|
@ -258,11 +481,11 @@ Long Short Term Memory networks.</p>
|
|||
<dd><p>Embeds documents (i.e., performs the forward pass up to the
|
||||
next-to-last layer).</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>x</strong> – a batch of instances, typically generated by a torch’s <cite>DataLoader</cite>
|
||||
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a torch tensor of shape <cite>(n_samples, n_dimensions)</cite>, where
|
||||
<cite>n_samples</cite> is the number of documents, and <cite>n_dimensions</cite> is the
|
||||
dimensionality of the embedding</p>
|
||||
|
@ -275,18 +498,23 @@ dimensionality of the embedding</p>
|
|||
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.LSTMnet.get_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get hyper-parameters for this estimator</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.LSTMnet.training">
|
||||
<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em><a class="headerlink" href="#quapy.classification.neural.LSTMnet.training" title="Permalink to this definition">¶</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.LSTMnet.vocabulary_size">
|
||||
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.LSTMnet.vocabulary_size" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.LSTMnet.vocabulary_size" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Return the size of the vocabulary</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>integer</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -296,11 +524,11 @@ dimensionality of the embedding</p>
|
|||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.NeuralClassifierTrainer">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">NeuralClassifierTrainer</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">net</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><span class="pre">quapy.classification.neural.TextClassifierNet</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">lr</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.001</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decay</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patience</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">10</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">epochs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">200</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">64</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size_test</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">512</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_length</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">300</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">device</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'cpu'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">checkpointpath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'../checkpoint/classifier_net.dat'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">NeuralClassifierTrainer</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">net</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="#quapy.classification.neural.TextClassifierNet" title="quapy.classification.neural.TextClassifierNet"><span class="pre">TextClassifierNet</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">lr</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.001</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decay</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patience</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">10</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">epochs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">200</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">64</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_size_test</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">512</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_length</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">300</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">device</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'cpu'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">checkpointpath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'../checkpoint/classifier_net.dat'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
|
||||
<p>Trains a neural network for text classification.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>net</strong> – an instance of <cite>TextClassifierNet</cite> implementing the forward pass</p></li>
|
||||
<li><p><strong>lr</strong> – learning rate (default 1e-3)</p></li>
|
||||
|
@ -319,10 +547,10 @@ according to the evaluation in the held-out validation split (default ‘../chec
|
|||
</dl>
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.NeuralClassifierTrainer.device">
|
||||
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">device</span></span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.device" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">device</span></span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.device" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Gets the device in which the network is allocated</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>device</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -333,14 +561,14 @@ according to the evaluation in the held-out validation split (default ‘../chec
|
|||
<span class="sig-name descname"><span class="pre">fit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">labels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">val_split</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.3</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.fit" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Fits the model according to the given training data.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>instances</strong> – list of lists of indexed tokens</p></li>
|
||||
<li><p><strong>labels</strong> – array-like of shape <cite>(n_samples, n_classes)</cite> with the class labels</p></li>
|
||||
<li><p><strong>val_split</strong> – proportion of training documents to be taken as the validation set (default 0.3)</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p></p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -351,7 +579,7 @@ according to the evaluation in the held-out validation split (default ‘../chec
|
|||
<span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.get_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get hyper-parameters for this estimator</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -362,10 +590,10 @@ according to the evaluation in the held-out validation split (default ‘../chec
|
|||
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.predict" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts labels for the instances</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>instances</strong> – list of lists of indexed tokens</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of
|
||||
instances in <cite>X</cite></p>
|
||||
</dd>
|
||||
|
@ -377,10 +605,10 @@ instances in <cite>X</cite></p>
|
|||
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.predict_proba" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts posterior probabilities for the instances</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with the posterior probabilities</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -391,7 +619,7 @@ instances in <cite>X</cite></p>
|
|||
<span class="sig-name descname"><span class="pre">reset_net_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">vocab_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.reset_net_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Reinitialize the network parameters</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>vocab_size</strong> – the size of the vocabulary</p></li>
|
||||
<li><p><strong>n_classes</strong> – the number of target classes</p></li>
|
||||
|
@ -407,7 +635,7 @@ instances in <cite>X</cite></p>
|
|||
In this current version, parameter names for the trainer and learner should
|
||||
be disjoint.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>params</strong> – a <cite>**kwargs</cite> dictionary with the parameters</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -418,10 +646,10 @@ be disjoint.</p>
|
|||
<span class="sig-name descname"><span class="pre">transform</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.NeuralClassifierTrainer.transform" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Returns the embeddings of the instances</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>instances</strong> – list of lists of indexed tokens</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples, embed_size)</cite> with the embedded instances,
|
||||
where <cite>embed_size</cite> is defined by the classification network</p>
|
||||
</dd>
|
||||
|
@ -432,15 +660,15 @@ where <cite>embed_size</cite> is defined by the classification network</p>
|
|||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TextClassifierNet</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.nn.modules.module.Module</span></code></p>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TextClassifierNet</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">Module</span></code></p>
|
||||
<p>Abstract Text classifier (<cite>torch.nn.Module</cite>)</p>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.dimensions">
|
||||
<span class="sig-name descname"><span class="pre">dimensions</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.dimensions" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Gets the number of dimensions of the embedding space</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>integer</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -448,15 +676,15 @@ where <cite>embed_size</cite> is defined by the classification network</p>
|
|||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.document_embedding">
|
||||
<em class="property"><span class="pre">abstract</span> </em><span class="sig-name descname"><span class="pre">document_embedding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.document_embedding" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">abstract</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">document_embedding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.document_embedding" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Embeds documents (i.e., performs the forward pass up to the
|
||||
next-to-last layer).</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>x</strong> – a batch of instances, typically generated by a torch’s <cite>DataLoader</cite>
|
||||
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a torch tensor of shape <cite>(n_samples, n_dimensions)</cite>, where
|
||||
<cite>n_samples</cite> is the number of documents, and <cite>n_dimensions</cite> is the
|
||||
dimensionality of the embedding</p>
|
||||
|
@ -469,11 +697,11 @@ dimensionality of the embedding</p>
|
|||
<span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.forward" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Performs the forward pass.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>x</strong> – a batch of instances, typically generated by a torch’s <cite>DataLoader</cite>
|
||||
instance (see <a class="reference internal" href="#quapy.classification.neural.TorchDataset" title="quapy.classification.neural.TorchDataset"><code class="xref py py-class docutils literal notranslate"><span class="pre">quapy.classification.neural.TorchDataset</span></code></a>)</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a tensor of shape <cite>(n_instances, n_classes)</cite> with the decision scores
|
||||
for each of the instances and classes</p>
|
||||
</dd>
|
||||
|
@ -482,10 +710,10 @@ for each of the instances and classes</p>
|
|||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.get_params">
|
||||
<em class="property"><span class="pre">abstract</span> </em><span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.get_params" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">abstract</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">get_params</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.get_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Get hyper-parameters for this estimator</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>a dictionary with parameter names mapped to their values</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -496,23 +724,28 @@ for each of the instances and classes</p>
|
|||
<span class="sig-name descname"><span class="pre">predict_proba</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.predict_proba" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts posterior probabilities for the instances in <cite>x</cite></p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>x</strong> – a torch tensor of indexed tokens with shape <cite>(n_instances, pad_length)</cite>
|
||||
where <cite>n_instances</cite> is the number of instances in the batch, and <cite>pad_length</cite>
|
||||
is length of the pad in the batch</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples, n_classes)</cite> with the posterior probabilities</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.training">
|
||||
<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.training" title="Permalink to this definition">¶</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TextClassifierNet.vocabulary_size">
|
||||
<em class="property"><span class="pre">property</span> </em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.vocabulary_size" title="Permalink to this definition">¶</a></dt>
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">vocabulary_size</span></span><a class="headerlink" href="#quapy.classification.neural.TextClassifierNet.vocabulary_size" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Return the size of the vocabulary</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns</dt>
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>integer</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -528,11 +761,11 @@ is length of the pad in the batch</p>
|
|||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.neural.TorchDataset">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TorchDataset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">labels</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TorchDataset" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.utils.data.dataset.Dataset</span></code></p>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.neural.</span></span><span class="sig-name descname"><span class="pre">TorchDataset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">instances</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">labels</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.neural.TorchDataset" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">Dataset</span></code></p>
|
||||
<p>Transforms labelled instances into a Torch’s <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.utils.data.DataLoader</span></code> object</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>instances</strong> – list of lists of indexed tokens</p></li>
|
||||
<li><p><strong>labels</strong> – array-like of shape <cite>(n_samples, n_classes)</cite> with the class labels</p></li>
|
||||
|
@ -545,7 +778,7 @@ is length of the pad in the batch</p>
|
|||
<dd><p>Converts the labelled collection into a Torch DataLoader with dynamic padding for
|
||||
the batch</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>batch_size</strong> – batch size</p></li>
|
||||
<li><p><strong>shuffle</strong> – whether or not to shuffle instances</p></li>
|
||||
|
@ -555,7 +788,7 @@ applied, meaning that if the longest document in the batch is shorter than
|
|||
<li><p><strong>device</strong> – whether to allocate tensors in cpu or in cuda</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a <code class="xref py py-class docutils literal notranslate"><span class="pre">torch.utils.data.DataLoader</span></code> object</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -565,11 +798,11 @@ applied, meaning that if the longest document in the batch is shorter than
|
|||
|
||||
</section>
|
||||
<section id="module-quapy.classification.svmperf">
|
||||
<span id="quapy-classification-svmperf-module"></span><h2>quapy.classification.svmperf module<a class="headerlink" href="#module-quapy.classification.svmperf" title="Permalink to this headline">¶</a></h2>
|
||||
<span id="quapy-classification-svmperf"></span><h2>quapy.classification.svmperf<a class="headerlink" href="#module-quapy.classification.svmperf" title="Permalink to this heading">¶</a></h2>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf">
|
||||
<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">quapy.classification.svmperf.</span></span><span class="sig-name descname"><span class="pre">SVMperf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">svmperf_base</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.01</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">loss</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'01'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.base.BaseEstimator</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">sklearn.base.ClassifierMixin</span></code></p>
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">quapy.classification.svmperf.</span></span><span class="sig-name descname"><span class="pre">SVMperf</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">svmperf_base</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.01</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">loss</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'01'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">host_folder</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">ClassifierMixin</span></code></p>
|
||||
<p>A wrapper for the <a class="reference external" href="https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html">SVM-perf package</a> by Thorsten Joachims.
|
||||
When using losses for quantification, the source code has to be patched. See
|
||||
the <a class="reference external" href="https://hlt-isti.github.io/QuaPy/build/html/Installation.html#svm-perf-with-quantification-oriented-losses">installation documentation</a>
|
||||
|
@ -582,12 +815,14 @@ for further details.</p>
|
|||
</ul>
|
||||
</div></blockquote>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>svmperf_base</strong> – path to directory containing the binary files <cite>svm_perf_learn</cite> and <cite>svm_perf_classify</cite></p></li>
|
||||
<li><p><strong>C</strong> – trade-off between training error and margin (default 0.01)</p></li>
|
||||
<li><p><strong>verbose</strong> – set to True to print svm-perf std outputs</p></li>
|
||||
<li><p><strong>loss</strong> – the loss to optimize for. Available losses are “01”, “f1”, “kld”, “nkld”, “q”, “qacc”, “qf1”, “qgm”, “mae”, “mrae”.</p></li>
|
||||
<li><p><strong>host_folder</strong> – directory where to store the trained model; set to None (default) for using a tmp directory
|
||||
(temporal directories are automatically deleted)</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -596,13 +831,13 @@ for further details.</p>
|
|||
<span class="sig-name descname"><span class="pre">decision_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.decision_function" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Evaluate the decision function for the samples in <cite>X</cite>.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> containing the instances to classify</p></li>
|
||||
<li><p><strong>y</strong> – unused</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>array-like of shape <cite>(n_samples,)</cite> containing the decision scores of the instances</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -613,13 +848,13 @@ for further details.</p>
|
|||
<span class="sig-name descname"><span class="pre">fit</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.fit" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Trains the SVM for the multivariate performance loss</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>X</strong> – training instances</p></li>
|
||||
<li><p><strong>y</strong> – a binary vector of labels</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns</dt>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p><cite>self</cite></p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
@ -628,35 +863,28 @@ for further details.</p>
|
|||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf.predict">
|
||||
<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">X</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.predict" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Predicts labels for the instances <cite>X</cite>
|
||||
:param X: array-like of shape <cite>(n_samples, n_features)</cite> instances to classify
|
||||
:return: a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of</p>
|
||||
<blockquote>
|
||||
<div><p>instances in <cite>X</cite></p>
|
||||
</div></blockquote>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf.set_params">
|
||||
<span class="sig-name descname"><span class="pre">set_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">parameters</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.set_params" title="Permalink to this definition">¶</a></dt>
|
||||
<dd><p>Set the hyper-parameters for svm-perf. Currently, only the <cite>C</cite> parameter is supported</p>
|
||||
<dd><p>Predicts labels for the instances <cite>X</cite></p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters</dt>
|
||||
<dd class="field-odd"><p><strong>parameters</strong> – a <cite>**kwargs</cite> dictionary <cite>{‘C’: <float>}</cite></p>
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>X</strong> – array-like of shape <cite>(n_samples, n_features)</cite> instances to classify</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>a <cite>numpy</cite> array of length <cite>n</cite> containing the label predictions, where <cite>n</cite> is the number of
|
||||
instances in <cite>X</cite></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf.valid_losses">
|
||||
<span class="sig-name descname"><span class="pre">valid_losses</span></span><em class="property"> <span class="pre">=</span> <span class="pre">{'01':</span> <span class="pre">0,</span> <span class="pre">'f1':</span> <span class="pre">1,</span> <span class="pre">'kld':</span> <span class="pre">12,</span> <span class="pre">'mae':</span> <span class="pre">26,</span> <span class="pre">'mrae':</span> <span class="pre">27,</span> <span class="pre">'nkld':</span> <span class="pre">13,</span> <span class="pre">'q':</span> <span class="pre">22,</span> <span class="pre">'qacc':</span> <span class="pre">23,</span> <span class="pre">'qf1':</span> <span class="pre">24,</span> <span class="pre">'qgm':</span> <span class="pre">25}</span></em><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.valid_losses" title="Permalink to this definition">¶</a></dt>
|
||||
<span class="sig-name descname"><span class="pre">valid_losses</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{'01':</span> <span class="pre">0,</span> <span class="pre">'f1':</span> <span class="pre">1,</span> <span class="pre">'kld':</span> <span class="pre">12,</span> <span class="pre">'mae':</span> <span class="pre">26,</span> <span class="pre">'mrae':</span> <span class="pre">27,</span> <span class="pre">'nkld':</span> <span class="pre">13,</span> <span class="pre">'q':</span> <span class="pre">22,</span> <span class="pre">'qacc':</span> <span class="pre">23,</span> <span class="pre">'qf1':</span> <span class="pre">24,</span> <span class="pre">'qgm':</span> <span class="pre">25}</span></em><a class="headerlink" href="#quapy.classification.svmperf.SVMperf.valid_losses" title="Permalink to this definition">¶</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-quapy.classification">
|
||||
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.classification" title="Permalink to this headline">¶</a></h2>
|
||||
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.classification" title="Permalink to this heading">¶</a></h2>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
@ -667,24 +895,31 @@ for further details.</p>
|
|||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<div>
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">quapy.classification package</a><ul>
|
||||
<li><a class="reference internal" href="#submodules">Submodules</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification.methods">quapy.classification.methods module</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification.neural">quapy.classification.neural module</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification.svmperf">quapy.classification.svmperf module</a></li>
|
||||
<li><a class="reference internal" href="#quapy-classification-calibration">quapy.classification.calibration</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification.methods">quapy.classification.methods</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification.neural">quapy.classification.neural</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification.svmperf">quapy.classification.svmperf</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.classification">Module contents</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="quapy.html"
|
||||
title="previous chapter">quapy package</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="quapy.data.html"
|
||||
title="next chapter">quapy.data package</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="quapy.html"
|
||||
title="previous chapter">quapy package</a></p>
|
||||
</div>
|
||||
<div>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="quapy.data.html"
|
||||
title="next chapter">quapy.data package</a></p>
|
||||
</div>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
|
@ -701,7 +936,7 @@ for further details.</p>
|
|||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
|
@ -721,7 +956,7 @@ for further details.</p>
|
|||
<li class="right" >
|
||||
<a href="quapy.html" title="quapy package"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> »</li>
|
||||
<li class="nav-item nav-item-2"><a href="quapy.html" >quapy package</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">quapy.classification package</a></li>
|
||||
|
@ -729,7 +964,7 @@ for further details.</p>
|
|||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,135 +0,0 @@
|
|||
|
||||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests package — QuaPy 0.1.6 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="prev" title="quapy.method package" href="quapy.method.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
<![endif]-->
|
||||
</head><body>
|
||||
<div class="related" role="navigation" aria-label="related navigation">
|
||||
<h3>Navigation</h3>
|
||||
<ul>
|
||||
<li class="right" style="margin-right: 10px">
|
||||
<a href="genindex.html" title="General Index"
|
||||
accesskey="I">index</a></li>
|
||||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="quapy.method.html" title="quapy.method package"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> »</li>
|
||||
<li class="nav-item nav-item-2"><a href="quapy.html" accesskey="U">quapy package</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">quapy.tests package</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="document">
|
||||
<div class="documentwrapper">
|
||||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="section" id="quapy-tests-package">
|
||||
<h1>quapy.tests package<a class="headerlink" href="#quapy-tests-package" title="Permalink to this headline">¶</a></h1>
|
||||
<div class="section" id="submodules">
|
||||
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline">¶</a></h2>
|
||||
</div>
|
||||
<div class="section" id="quapy-tests-test-base-module">
|
||||
<h2>quapy.tests.test_base module<a class="headerlink" href="#quapy-tests-test-base-module" title="Permalink to this headline">¶</a></h2>
|
||||
</div>
|
||||
<div class="section" id="quapy-tests-test-datasets-module">
|
||||
<h2>quapy.tests.test_datasets module<a class="headerlink" href="#quapy-tests-test-datasets-module" title="Permalink to this headline">¶</a></h2>
|
||||
</div>
|
||||
<div class="section" id="quapy-tests-test-methods-module">
|
||||
<h2>quapy.tests.test_methods module<a class="headerlink" href="#quapy-tests-test-methods-module" title="Permalink to this headline">¶</a></h2>
|
||||
</div>
|
||||
<div class="section" id="module-quapy.tests">
|
||||
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.tests" title="Permalink to this headline">¶</a></h2>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">quapy.tests package</a><ul>
|
||||
<li><a class="reference internal" href="#submodules">Submodules</a></li>
|
||||
<li><a class="reference internal" href="#quapy-tests-test-base-module">quapy.tests.test_base module</a></li>
|
||||
<li><a class="reference internal" href="#quapy-tests-test-datasets-module">quapy.tests.test_datasets module</a></li>
|
||||
<li><a class="reference internal" href="#quapy-tests-test-methods-module">quapy.tests.test_methods module</a></li>
|
||||
<li><a class="reference internal" href="#module-quapy.tests">Module contents</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="quapy.method.html"
|
||||
title="previous chapter">quapy.method package</a></p>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
<li><a href="_sources/quapy.tests.rst.txt"
|
||||
rel="nofollow">Show Source</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div id="searchbox" style="display: none" role="search">
|
||||
<h3 id="searchlabel">Quick search</h3>
|
||||
<div class="searchformwrapper">
|
||||
<form class="search" action="search.html" method="get">
|
||||
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
||||
<input type="submit" value="Go" />
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
</div>
|
||||
<div class="related" role="navigation" aria-label="related navigation">
|
||||
<h3>Navigation</h3>
|
||||
<ul>
|
||||
<li class="right" style="margin-right: 10px">
|
||||
<a href="genindex.html" title="General Index"
|
||||
>index</a></li>
|
||||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="quapy.method.html" title="quapy.method package"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-1"><a href="modules.html" >quapy</a> »</li>
|
||||
<li class="nav-item nav-item-2"><a href="quapy.html" >quapy package</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">quapy.tests package</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -1,129 +0,0 @@
|
|||
|
||||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Getting Started — QuaPy 0.1.6 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<link rel="next" title="quapy" href="modules.html" />
|
||||
<link rel="prev" title="Welcome to QuaPy’s documentation!" href="index.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
<![endif]-->
|
||||
</head><body>
|
||||
<div class="related" role="navigation" aria-label="related navigation">
|
||||
<h3>Navigation</h3>
|
||||
<ul>
|
||||
<li class="right" style="margin-right: 10px">
|
||||
<a href="genindex.html" title="General Index"
|
||||
accesskey="I">index</a></li>
|
||||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="modules.html" title="quapy"
|
||||
accesskey="N">next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="index.html" title="Welcome to QuaPy’s documentation!"
|
||||
accesskey="P">previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Getting Started</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="document">
|
||||
<div class="documentwrapper">
|
||||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<div class="section" id="getting-started">
|
||||
<h1>Getting Started<a class="headerlink" href="#getting-started" title="Permalink to this headline">¶</a></h1>
|
||||
<p>QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python.</p>
|
||||
<div class="section" id="installation">
|
||||
<h2>Installation<a class="headerlink" href="#installation" title="Permalink to this headline">¶</a></h2>
|
||||
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">pip</span> <span class="n">install</span> <span class="n">quapy</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<h3><a href="index.html">Table of Contents</a></h3>
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Getting Started</a><ul>
|
||||
<li><a class="reference internal" href="#installation">Installation</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h4>Previous topic</h4>
|
||||
<p class="topless"><a href="index.html"
|
||||
title="previous chapter">Welcome to QuaPy’s documentation!</a></p>
|
||||
<h4>Next topic</h4>
|
||||
<p class="topless"><a href="modules.html"
|
||||
title="next chapter">quapy</a></p>
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
<li><a href="_sources/readme.rst.txt"
|
||||
rel="nofollow">Show Source</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div id="searchbox" style="display: none" role="search">
|
||||
<h3 id="searchlabel">Quick search</h3>
|
||||
<div class="searchformwrapper">
|
||||
<form class="search" action="search.html" method="get">
|
||||
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
||||
<input type="submit" value="Go" />
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
</div>
|
||||
<div class="related" role="navigation" aria-label="related navigation">
|
||||
<h3>Navigation</h3>
|
||||
<ul>
|
||||
<li class="right" style="margin-right: 10px">
|
||||
<a href="genindex.html" title="General Index"
|
||||
>index</a></li>
|
||||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="modules.html" title="quapy"
|
||||
>next</a> |</li>
|
||||
<li class="right" >
|
||||
<a href="index.html" title="Welcome to QuaPy’s documentation!"
|
||||
>previous</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Getting Started</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -1,92 +0,0 @@
|
|||
|
||||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title><no title> — QuaPy 0.1.6 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/css3-mediaqueries.js"></script>
|
||||
<![endif]-->
|
||||
</head><body>
|
||||
<div class="related" role="navigation" aria-label="related navigation">
|
||||
<h3>Navigation</h3>
|
||||
<ul>
|
||||
<li class="right" style="margin-right: 10px">
|
||||
<a href="genindex.html" title="General Index"
|
||||
accesskey="I">index</a></li>
|
||||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href=""><no title></a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="document">
|
||||
<div class="documentwrapper">
|
||||
<div class="bodywrapper">
|
||||
<div class="body" role="main">
|
||||
|
||||
<p>.. include:: ../../README.md</p>
|
||||
|
||||
|
||||
<div class="clearer"></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||||
<div class="sphinxsidebarwrapper">
|
||||
<div role="note" aria-label="source link">
|
||||
<h3>This Page</h3>
|
||||
<ul class="this-page-menu">
|
||||
<li><a href="_sources/readme2.md.txt"
|
||||
rel="nofollow">Show Source</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div id="searchbox" style="display: none" role="search">
|
||||
<h3 id="searchlabel">Quick search</h3>
|
||||
<div class="searchformwrapper">
|
||||
<form class="search" action="search.html" method="get">
|
||||
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
||||
<input type="submit" value="Go" />
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
<script>$('#searchbox').show(0);</script>
|
||||
</div>
|
||||
</div>
|
||||
<div class="clearer"></div>
|
||||
</div>
|
||||
<div class="related" role="navigation" aria-label="related navigation">
|
||||
<h3>Navigation</h3>
|
||||
<ul>
|
||||
<li class="right" style="margin-right: 10px">
|
||||
<a href="genindex.html" title="General Index"
|
||||
>index</a></li>
|
||||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href=""><no title></a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -2,11 +2,11 @@
|
|||
|
||||
<!doctype html>
|
||||
|
||||
<html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Search — QuaPy 0.1.6 documentation</title>
|
||||
<title>Search — QuaPy 0.1.7 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||||
|
||||
|
@ -14,7 +14,9 @@
|
|||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/jquery.js"></script>
|
||||
<script src="_static/underscore.js"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/bizstyle.js"></script>
|
||||
<script src="_static/searchtools.js"></script>
|
||||
<script src="_static/language_data.js"></script>
|
||||
|
@ -37,7 +39,7 @@
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Search</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
|
@ -97,13 +99,13 @@
|
|||
<li class="right" >
|
||||
<a href="py-modindex.html" title="Python Module Index"
|
||||
>modules</a> |</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.6 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||||
<li class="nav-item nav-item-this"><a href="">Search</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="footer" role="contentinfo">
|
||||
© Copyright 2021, Alejandro Moreo.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.2.0.
|
||||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
File diff suppressed because one or more lines are too long
|
@ -0,0 +1,69 @@
|
|||
import quapy as qp
|
||||
from data import LabelledCollection
|
||||
from method.base import BaseQuantifier, BinaryQuantifier
|
||||
from model_selection import GridSearchQ
|
||||
from quapy.method.aggregative import PACC, AggregativeProbabilisticQuantifier
|
||||
from quapy.protocol import APP
|
||||
import numpy as np
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
|
||||
# Define a custom quantifier: for this example, we will consider a new quantification algorithm that uses a
|
||||
# logistic regressor for generating posterior probabilities, and then applies a custom threshold value to the
|
||||
# posteriors. Since the quantifier internally uses a classifier, it is an aggregative quantifier; and since it
|
||||
# relies on posterior probabilities, it is a probabilistic-aggregative quantifier. Note also it has an
|
||||
# internal hyperparameter (let say, alpha) which is the decision threshold. Let's also assume the quantifier
|
||||
# is binary, for simplicity.
|
||||
|
||||
class MyQuantifier(AggregativeProbabilisticQuantifier, BinaryQuantifier):
|
||||
def __init__(self, classifier, alpha=0.5):
|
||||
self.alpha = alpha
|
||||
# aggregative quantifiers have an internal self.classifier attribute
|
||||
self.classifier = classifier
|
||||
|
||||
def fit(self, data: LabelledCollection, fit_classifier=True):
|
||||
assert fit_classifier, 'this quantifier needs to fit the classifier!'
|
||||
self.classifier.fit(*data.Xy)
|
||||
return self
|
||||
|
||||
# in general, we would need to implement the method quantify(self, instances) but, since this method is of
|
||||
# type aggregative, we can simply implement the method aggregate, which has the following interface
|
||||
def aggregate(self, classif_predictions: np.ndarray):
|
||||
# the posterior probabilities have already been generated by the quantify method; we only need to
|
||||
# specify what to do with them
|
||||
positive_probabilities = classif_predictions[:, 1]
|
||||
crisp_decisions = positive_probabilities > self.alpha
|
||||
pos_prev = crisp_decisions.mean()
|
||||
neg_prev = 1-pos_prev
|
||||
return np.asarray([neg_prev, pos_prev])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
|
||||
# define an instance of our custom quantifier
|
||||
quantifier = MyQuantifier(LogisticRegression(), alpha=0.5)
|
||||
|
||||
# load the IMDb dataset
|
||||
train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
||||
|
||||
# model selection
|
||||
# let us assume we want to explore our hyperparameter alpha along with one hyperparameter of the classifier
|
||||
train, val = train.split_stratified(train_prop=0.75)
|
||||
param_grid = {
|
||||
'alpha': np.linspace(0, 1, 11), # quantifier-dependent hyperparameter
|
||||
'classifier__C': np.logspace(-2, 2, 5) # classifier-dependent hyperparameter
|
||||
}
|
||||
quantifier = GridSearchQ(quantifier, param_grid, protocol=APP(val), n_jobs=-1, verbose=True).fit(train)
|
||||
|
||||
# evaluation
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=APP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE = {mae:.4f}')
|
||||
|
||||
# final remarks: this method is only for demonstration purposes and makes little sense in general. The method relies
|
||||
# on an hyperparameter alpha for binarizing the posterior probabilities. A much better way for fulfilling this
|
||||
# goal would be to calibrate the classifier (LogisticRegression is already reasonably well calibrated) and then
|
||||
# simply cut at 0.5.
|
||||
|
|
@ -0,0 +1,72 @@
|
|||
import quapy as qp
|
||||
from quapy.method.aggregative import newELM
|
||||
from quapy.method.base import newOneVsAll
|
||||
from quapy.model_selection import GridSearchQ
|
||||
from quapy.protocol import UPP
|
||||
|
||||
"""
|
||||
In this example, we will show hoy to define a quantifier based on explicit loss minimization (ELM).
|
||||
ELM is a family of quantification methods relying on structured output learning. In particular, we will
|
||||
showcase how to instantiate SVM(Q) as proposed by `Barranquero et al. 2015
|
||||
<https://www.sciencedirect.com/science/article/pii/S003132031400291X>`_, and SVM(KLD) and SVM(nKLD) as proposed by
|
||||
`Esuli et al. 2015 <https://dl.acm.org/doi/abs/10.1145/2700406>`_.
|
||||
|
||||
All ELM quantifiers rely on SVMperf for optimizing a structured loss function (Q, KLD, or nKLD). Since these are
|
||||
not part of the original SVMperf package by Joachims, you have to first download the SVMperf package, apply the
|
||||
patch svm-perf-quantification-ext.patch (provided with QuaPy library), and compile the sources.
|
||||
The script prepare_svmperf.sh does all the job. Simply run:
|
||||
|
||||
>>> ./prepare_svmperf.sh
|
||||
|
||||
Note that ELM quantifiers are nothing but a classify and count (CC) model instantiated with SVMperf as the
|
||||
underlying classifier. E.g., SVM(Q) comes down to:
|
||||
|
||||
>>> CC(SVMperf(svmperf_base, loss='q'))
|
||||
|
||||
this means that ELM are aggregative quantifiers (since CC is an aggregative quantifier). QuaPy provides some helper
|
||||
functions for simplify this; for example:
|
||||
|
||||
>>> newSVMQ(svmperf_base)
|
||||
|
||||
returns an instance of SVM(Q) (i.e., an instance of CC properly set to work with SVMperf optimizing for Q.
|
||||
|
||||
Since we wan to explore the losses, we will instead use newELM. For this example we will create a quantifier for tweet
|
||||
sentiment analysis considering three classes: negative, neutral, and positive. Since SVMperf is a binary classifier,
|
||||
our quantifier will be binary as well. We will use a one-vs-all approach to work in multiclass model.
|
||||
For more details about how one-vs-all works, we refer to the example "one_vs_all.py" and to the API documentation.
|
||||
"""
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
qp.environ['N_JOBS'] = -1
|
||||
qp.environ['SVMPERF_HOME'] = '../svm_perf_quantification'
|
||||
|
||||
quantifier = newOneVsAll(newELM())
|
||||
print(f'the quantifier is an instance of {quantifier.__class__.__name__}')
|
||||
|
||||
# load a ternary dataset
|
||||
train_modsel, val = qp.datasets.fetch_twitter('hcr', for_model_selection=True, pickle=True).train_test
|
||||
|
||||
"""
|
||||
model selection:
|
||||
We explore the classifier's loss and the classifier's C hyperparameters.
|
||||
Since our model is actually an instance of OneVsAllAggregative, we need to add the prefix "binary_quantifier", and
|
||||
since our binary quantifier is an instance of CC, we need to add the prefix "classifier".
|
||||
"""
|
||||
param_grid = {
|
||||
'binary_quantifier__classifier__loss': ['q', 'kld', 'mae'], # classifier-dependent hyperparameter
|
||||
'binary_quantifier__classifier__C': [0.01, 1, 100], # classifier-dependent hyperparameter
|
||||
}
|
||||
print('starting model selection')
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=UPP(val), verbose=True, refit=False)
|
||||
quantifier = model_selection.fit(train_modsel).best_model()
|
||||
|
||||
print('training on the whole training set')
|
||||
train, test = qp.datasets.fetch_twitter('hcr', for_model_selection=False, pickle=True).train_test
|
||||
quantifier.fit(train)
|
||||
|
||||
# evaluation
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=UPP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE = {mae:.4f}')
|
||||
|
||||
|
|
@ -0,0 +1,54 @@
|
|||
import numpy as np
|
||||
from sklearn.calibration import CalibratedClassifierCV
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import quapy as qp
|
||||
import quapy.functional as F
|
||||
from data.datasets import LEQUA2022_SAMPLE_SIZE, fetch_lequa2022
|
||||
from evaluation import evaluation_report
|
||||
from method.aggregative import EMQ
|
||||
from model_selection import GridSearchQ
|
||||
import pandas as pd
|
||||
|
||||
"""
|
||||
This example shows hoy to use the LeQua datasets (new in v0.1.7). For more information about the datasets, and the
|
||||
LeQua competition itself, check:
|
||||
https://lequa2022.github.io/index (the site of the competition)
|
||||
https://ceur-ws.org/Vol-3180/paper-146.pdf (the overview paper)
|
||||
"""
|
||||
|
||||
# there are 4 tasks (T1A, T1B, T2A, T2B)
|
||||
task = 'T1A'
|
||||
|
||||
# set the sample size in the environment. The sample size is task-dendendent and can be consulted by doing:
|
||||
qp.environ['SAMPLE_SIZE'] = LEQUA2022_SAMPLE_SIZE[task]
|
||||
qp.environ['N_JOBS'] = -1
|
||||
|
||||
# the fetch method returns a training set (an instance of LabelledCollection) and two generators: one for the
|
||||
# validation set and another for the test sets. These generators are both instances of classes that extend
|
||||
# AbstractProtocol (i.e., classes that implement sampling generation procedures) and, in particular, are instances
|
||||
# of SamplesFromDir, a protocol that simply iterates over pre-generated samples (those provided for the competition)
|
||||
# stored in a directory.
|
||||
training, val_generator, test_generator = fetch_lequa2022(task=task)
|
||||
|
||||
# define the quantifier
|
||||
quantifier = EMQ(classifier=LogisticRegression())
|
||||
|
||||
# model selection
|
||||
param_grid = {
|
||||
'classifier__C': np.logspace(-3, 3, 7), # classifier-dependent: inverse of regularization strength
|
||||
'classifier__class_weight': ['balanced', None], # classifier-dependent: weights of each class
|
||||
'recalib': ['bcts', 'platt', None] # quantifier-dependent: recalibration method (new in v0.1.7)
|
||||
}
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=val_generator, error='mrae', refit=False, verbose=True)
|
||||
quantifier = model_selection.fit(training)
|
||||
|
||||
# evaluation
|
||||
report = evaluation_report(quantifier, protocol=test_generator, error_metrics=['mae', 'mrae', 'mkld'], verbose=True)
|
||||
|
||||
# printing results
|
||||
pd.set_option('display.expand_frame_repr', False)
|
||||
report['estim-prev'] = report['estim-prev'].map(F.strprev)
|
||||
print(report)
|
||||
|
||||
print('Averaged values:')
|
||||
print(report.mean())
|
|
@ -0,0 +1,63 @@
|
|||
import numpy as np
|
||||
from abstention.calibration import NoBiasVectorScaling, VectorScaling, TempScaling
|
||||
from sklearn.calibration import CalibratedClassifierCV
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import quapy as qp
|
||||
import quapy.functional as F
|
||||
from classification.calibration import RecalibratedProbabilisticClassifierBase, NBVSCalibration, \
|
||||
BCTSCalibration
|
||||
from data.datasets import LEQUA2022_SAMPLE_SIZE, fetch_lequa2022
|
||||
from evaluation import evaluation_report
|
||||
from method.aggregative import EMQ
|
||||
from model_selection import GridSearchQ
|
||||
import pandas as pd
|
||||
|
||||
for task in ['T1A', 'T1B']:
|
||||
|
||||
# calibration = TempScaling(verbose=False, bias_positions='all')
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = LEQUA2022_SAMPLE_SIZE[task]
|
||||
training, val_generator, test_generator = fetch_lequa2022(task=task)
|
||||
|
||||
# define the quantifier
|
||||
# learner = BCTSCalibration(LogisticRegression(), n_jobs=-1)
|
||||
# learner = CalibratedClassifierCV(LogisticRegression())
|
||||
learner = LogisticRegression()
|
||||
quantifier = EMQ(classifier=learner)
|
||||
|
||||
# model selection
|
||||
param_grid = {
|
||||
'classifier__C': np.logspace(-3, 3, 7),
|
||||
'classifier__class_weight': ['balanced', None],
|
||||
'recalib': ['platt', 'ts', 'vs', 'nbvs', 'bcts', None],
|
||||
'exact_train_prev': [False, True]
|
||||
}
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=val_generator, error='mrae', n_jobs=-1, refit=False, verbose=True)
|
||||
quantifier = model_selection.fit(training)
|
||||
|
||||
# evaluation
|
||||
report = evaluation_report(quantifier, protocol=test_generator, error_metrics=['mae', 'mrae', 'mkld'], verbose=True)
|
||||
|
||||
# import os
|
||||
# os.makedirs(f'./out', exist_ok=True)
|
||||
# with open(f'./out/EMQ_{calib}_{task}.txt', 'wt') as foo:
|
||||
# estim_prev = report['estim-prev'].values
|
||||
# nclasses = len(estim_prev[0])
|
||||
# foo.write(f'id,'+','.join([str(x) for x in range(nclasses)])+'\n')
|
||||
# for id, prev in enumerate(estim_prev):
|
||||
# foo.write(f'{id},'+','.join([f'{p:.5f}' for p in prev])+'\n')
|
||||
#
|
||||
# #os.makedirs(f'./errors/{task}', exist_ok=True)
|
||||
# with open(f'./out/EMQ_{calib}_{task}_errors.txt', 'wt') as foo:
|
||||
# maes, mraes = report['mae'].values, report['mrae'].values
|
||||
# foo.write(f'id,AE,RAE\n')
|
||||
# for id, (ae_i, rae_i) in enumerate(zip(maes, mraes)):
|
||||
# foo.write(f'{id},{ae_i:.5f},{rae_i:.5f}\n')
|
||||
|
||||
# printing results
|
||||
pd.set_option('display.expand_frame_repr', False)
|
||||
report['estim-prev'] = report['estim-prev'].map(F.strprev)
|
||||
print(report)
|
||||
|
||||
print('Averaged values:')
|
||||
print(report.mean())
|
|
@ -0,0 +1,57 @@
|
|||
import quapy as qp
|
||||
from quapy.protocol import APP
|
||||
from quapy.method.aggregative import DistributionMatching
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import numpy as np
|
||||
|
||||
"""
|
||||
In this example, we show how to perform model selection on a DistributionMatching quantifier.
|
||||
"""
|
||||
|
||||
model = DistributionMatching(LogisticRegression())
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
qp.environ['N_JOBS'] = -1
|
||||
|
||||
training, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
||||
|
||||
# The model will be returned by the fit method of GridSearchQ.
|
||||
# Every combination of hyper-parameters will be evaluated by confronting the
|
||||
# quantifier thus configured against a series of samples generated by means
|
||||
# of a sample generation protocol. For this example, we will use the
|
||||
# artificial-prevalence protocol (APP), that generates samples with prevalence
|
||||
# values in the entire range of values from a grid (e.g., [0, 0.1, 0.2, ..., 1]).
|
||||
# We devote 30% of the dataset for this exploration.
|
||||
training, validation = training.split_stratified(train_prop=0.7)
|
||||
protocol = APP(validation)
|
||||
|
||||
# We will explore a classification-dependent hyper-parameter (e.g., the 'C'
|
||||
# hyper-parameter of LogisticRegression) and a quantification-dependent hyper-parameter
|
||||
# (e.g., the number of bins in a DistributionMatching quantifier.
|
||||
# Classifier-dependent hyper-parameters have to be marked with a prefix "classifier__"
|
||||
# in order to let the quantifier know this hyper-parameter belongs to its underlying
|
||||
# classifier.
|
||||
param_grid = {
|
||||
'classifier__C': np.logspace(-3,3,7),
|
||||
'nbins': [8, 16, 32, 64],
|
||||
}
|
||||
|
||||
model = qp.model_selection.GridSearchQ(
|
||||
model=model,
|
||||
param_grid=param_grid,
|
||||
protocol=protocol,
|
||||
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
|
||||
refit=True, # retrain on the whole labelled set once done
|
||||
verbose=True # show information as the process goes on
|
||||
).fit(training)
|
||||
|
||||
print(f'model selection ended: best hyper-parameters={model.best_params_}')
|
||||
model = model.best_model_
|
||||
|
||||
# evaluation in terms of MAE
|
||||
# we use the same evaluation protocol (APP) on the test set
|
||||
mae_score = qp.evaluation.evaluate(model, protocol=APP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE={mae_score:.5f}')
|
||||
|
||||
|
|
@ -0,0 +1,54 @@
|
|||
import quapy as qp
|
||||
from quapy.method.aggregative import MS2
|
||||
from quapy.method.base import newOneVsAll
|
||||
from quapy.model_selection import GridSearchQ
|
||||
from quapy.protocol import UPP
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import numpy as np
|
||||
|
||||
"""
|
||||
In this example, we will create a quantifier for tweet sentiment analysis considering three classes: negative, neutral,
|
||||
and positive. We will use a one-vs-all approach using a binary quantifier for demonstration purposes.
|
||||
"""
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
qp.environ['N_JOBS'] = -1
|
||||
|
||||
"""
|
||||
Any binary quantifier can be turned into a single-label quantifier by means of getOneVsAll function.
|
||||
This function returns an instance of OneVsAll quantifier. Actually, it either returns the subclass OneVsAllGeneric
|
||||
when the quantifier is an instance of BaseQuantifier, and it returns OneVsAllAggregative when the quantifier is
|
||||
an instance of AggregativeQuantifier. Although OneVsAllGeneric works in all cases, using OneVsAllAggregative has
|
||||
some additional advantages (namely, all the advantages that AggregativeQuantifiers enjoy, i.e., faster predictions
|
||||
during evaluation).
|
||||
"""
|
||||
quantifier = newOneVsAll(MS2(LogisticRegression()))
|
||||
print(f'the quantifier is an instance of {quantifier.__class__.__name__}')
|
||||
|
||||
# load a ternary dataset
|
||||
train_modsel, val = qp.datasets.fetch_twitter('hcr', for_model_selection=True, pickle=True).train_test
|
||||
|
||||
"""
|
||||
model selection: for this example, we are relying on the UPP protocol, i.e., a variant of the
|
||||
artificial-prevalence protocol that generates random samples (100 in this case) for randomly picked priors
|
||||
from the unit simplex. The priors are sampled using the Kraemer algorithm. Note this is in contrast to the
|
||||
standard APP protocol, that instead explores a prefixed grid of prevalence values.
|
||||
"""
|
||||
param_grid = {
|
||||
'binary_quantifier__classifier__C': np.logspace(-2,2,5), # classifier-dependent hyperparameter
|
||||
'binary_quantifier__classifier__class_weight': ['balanced', None] # classifier-dependent hyperparameter
|
||||
}
|
||||
print('starting model selection')
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=UPP(val), verbose=True, refit=False)
|
||||
quantifier = model_selection.fit(train_modsel).best_model()
|
||||
|
||||
print('training on the whole training set')
|
||||
train, test = qp.datasets.fetch_twitter('hcr', for_model_selection=False, pickle=True).train_test
|
||||
quantifier.fit(train)
|
||||
|
||||
# evaluation
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=UPP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE = {mae:.4f}')
|
||||
|
||||
|
|
@ -0,0 +1,35 @@
|
|||
import quapy as qp
|
||||
from quapy.classification.neural import CNNnet
|
||||
from quapy.classification.neural import NeuralClassifierTrainer
|
||||
from quapy.method.meta import QuaNet
|
||||
import quapy.functional as F
|
||||
|
||||
"""
|
||||
This example shows how to train QuaNet. The internal classifier is a word-based CNN.
|
||||
"""
|
||||
|
||||
# set the sample size in the environment
|
||||
qp.environ["SAMPLE_SIZE"] = 100
|
||||
|
||||
# the dataset is textual (Kindle reviews from Amazon), so we need to index terms, i.e.,
|
||||
# we need to convert distinct terms into numerical ids
|
||||
dataset = qp.datasets.fetch_reviews('kindle', pickle=True)
|
||||
qp.data.preprocessing.index(dataset, min_df=5, inplace=True)
|
||||
train, test = dataset.train_test
|
||||
|
||||
# train the text classifier:
|
||||
cnn_module = CNNnet(dataset.vocabulary_size, dataset.training.n_classes)
|
||||
cnn_classifier = NeuralClassifierTrainer(cnn_module, device='cuda')
|
||||
cnn_classifier.fit(*dataset.training.Xy)
|
||||
|
||||
# train QuaNet (alternatively, we can set fit_classifier=True and let QuaNet train the classifier)
|
||||
quantifier = QuaNet(cnn_classifier, device='cuda')
|
||||
quantifier.fit(train, fit_classifier=False)
|
||||
|
||||
# prediction and evaluation
|
||||
estim_prevalence = quantifier.quantify(test.instances)
|
||||
mae = qp.error.mae(test.prevalence(), estim_prevalence)
|
||||
|
||||
print(f'true prevalence: {F.strprev(test.prevalence())}')
|
||||
print(f'estim prevalence: {F.strprev(estim_prevalence)}')
|
||||
print(f'MAE = {mae:.4f}')
|
|
@ -0,0 +1,85 @@
|
|||
Change Log 0.1.7
|
||||
----------------
|
||||
|
||||
- Protocols are now abstracted as instances of AbstractProtocol. There is a new class extending AbstractProtocol called
|
||||
AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
|
||||
There are some examples of protocols, APP, NPP, UPP, DomainMixer (experimental).
|
||||
The idea is to start the sample generation by simply calling the __call__ method.
|
||||
This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
|
||||
and sampling functions in LabelledCollection relied of the old functions. E.g., the functionality of
|
||||
qp.evaluation.artificial_prevalence_report or qp.evaluation.natural_prevalence_report is now obtained by means of
|
||||
qp.evaluation.report which takes a protocol as an argument. I have not maintained compatibility with the old
|
||||
interfaces because I did not really like them. Check the wiki guide and the examples for more details.
|
||||
|
||||
- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
|
||||
QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).
|
||||
|
||||
- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
|
||||
consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
|
||||
which the total number of classifications would be smaller than the number of classifications with the standard
|
||||
procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
|
||||
or not.
|
||||
|
||||
- examples directory created!
|
||||
|
||||
- DyS, Topsoe distance and binary search (thanks to Pablo González)
|
||||
|
||||
- Multi-thread reproducibility via seeding (thanks to Pablo González)
|
||||
|
||||
- n_jobs is now taken from the environment if set to None
|
||||
|
||||
- ACC, PACC, Forman's threshold variants have been parallelized.
|
||||
|
||||
- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
|
||||
test protocol maybe, or None for bypassing it?
|
||||
|
||||
- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes
|
||||
|
||||
- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
|
||||
with the plain python type (e.g., float).
|
||||
|
||||
- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
|
||||
https://github.com/kundajelab/abstention added.
|
||||
|
||||
- the internal classifier of aggregative methods is now called "classifier" instead of "learner"
|
||||
|
||||
- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters
|
||||
should be marked with a "classifier__" prefix (just like in scikit-learn with estimators), while the quantifier's
|
||||
specific hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has hyperparameters
|
||||
"classifier__C", "classifier__class_weight", etc., instead of "C" and "class_weight" as in v0.1.6.
|
||||
|
||||
- hyperparameters yielding to inconsistent runs raise a ValueError exception, while hyperparameter combinations
|
||||
yielding to internal errors of surrogate functions are reported and skipped, without stopping the grid search.
|
||||
|
||||
- DistributionMatching methods added. This is a general framework for distribution matching methods that catters for
|
||||
multiclass quantification. That is to say, one could get a multiclass variant of the (originally binary) HDy
|
||||
method aligned with the Firat's formulation.
|
||||
|
||||
- internal method properties "binary", "aggregative", and "probabilistic" have been removed; these conditions are
|
||||
checked via isinstance
|
||||
|
||||
- quantifiers (i.e., classes that inherit from BaseQuantifier) are not forced to implement classes_ or n_classes;
|
||||
these can be used anyway internally, but the framework will not suppose (nor impose) that a quantifier implements
|
||||
them
|
||||
|
||||
- qp.evaluation.prediction has been optimized so that, if a quantifier is of type aggregative, and if the evaluation
|
||||
protocol is of type OnLabelledCollection, then the computation is faster. In this specific case, the predictions
|
||||
are issued only once and for all, and not for each sample. An exception to this (which is implement also), is
|
||||
when the number of instances across all samples is anyway smaller than the number of instances in the original
|
||||
labelled collection; in this case the heuristic is of no help, and is therefore not applied.
|
||||
|
||||
- the distinction between "classify" and "posterior_probabilities" has been removed in Aggregative quantifiers,
|
||||
so that probabilistic classifiers return posterior probabilities, while non-probabilistic quantifiers
|
||||
return crisp decisions.
|
||||
|
||||
- OneVsAll fixed. There are now two classes: a generic one OneVsAllGeneric that works with any quantifier (e.g.,
|
||||
any instance of BaseQuantifier), and a subclass of it called OneVsAllAggregative which implements the
|
||||
classify / aggregate interface. Both are instances of OneVsAll. There is a method getOneVsAll that returns the
|
||||
best instance based on the type of quantifier.
|
||||
|
||||
Things to fix:
|
||||
--------------
|
||||
- update unit tests
|
||||
- improve plots
|
||||
- svmperf clean temp dirs; check also ELM when instantiated using SVMperf directly
|
||||
|
|
@ -2,15 +2,15 @@ from . import error
|
|||
from . import data
|
||||
from quapy.data import datasets
|
||||
from . import functional
|
||||
from . import method
|
||||
# from . import method
|
||||
from . import evaluation
|
||||
from . import protocol
|
||||
from . import plot
|
||||
from . import util
|
||||
from . import model_selection
|
||||
from . import classification
|
||||
from quapy.method.base import isprobabilistic, isaggregative
|
||||
|
||||
__version__ = '0.1.6'
|
||||
__version__ = '0.1.7'
|
||||
|
||||
environ = {
|
||||
'SAMPLE_SIZE': None,
|
||||
|
@ -18,8 +18,33 @@ environ = {
|
|||
'UNK_INDEX': 0,
|
||||
'PAD_TOKEN': '[PAD]',
|
||||
'PAD_INDEX': 1,
|
||||
'SVMPERF_HOME': './svm_perf_quantification'
|
||||
'SVMPERF_HOME': './svm_perf_quantification',
|
||||
'N_JOBS': 1
|
||||
}
|
||||
|
||||
def isbinary(x):
|
||||
return x.binary
|
||||
|
||||
def _get_njobs(n_jobs):
|
||||
"""
|
||||
If `n_jobs` is None, then it returns `environ['N_JOBS']`; if otherwise, returns `n_jobs`.
|
||||
|
||||
:param n_jobs: the number of `n_jobs` or None if not specified
|
||||
:return: int
|
||||
"""
|
||||
return environ['N_JOBS'] if n_jobs is None else n_jobs
|
||||
|
||||
|
||||
def _get_sample_size(sample_size):
|
||||
"""
|
||||
If `sample_size` is None, then it returns `environ['SAMPLE_SIZE']`; if otherwise, returns `sample_size`.
|
||||
If none of these are set, then a ValueError exception is raised.
|
||||
|
||||
:param sample_size: integer or None
|
||||
:return: int
|
||||
"""
|
||||
sample_size = environ['SAMPLE_SIZE'] if sample_size is None else sample_size
|
||||
if sample_size is None:
|
||||
raise ValueError('neither sample_size nor qp.environ["SAMPLE_SIZE"] have been specified')
|
||||
return sample_size
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,215 @@
|
|||
from copy import deepcopy
|
||||
|
||||
from abstention.calibration import NoBiasVectorScaling, TempScaling, VectorScaling
|
||||
from sklearn.base import BaseEstimator, clone
|
||||
from sklearn.model_selection import cross_val_predict, train_test_split
|
||||
import numpy as np
|
||||
|
||||
|
||||
# Wrappers of calibration defined by Alexandari et al. in paper <http://proceedings.mlr.press/v119/alexandari20a.html>
|
||||
# requires "pip install abstension"
|
||||
# see https://github.com/kundajelab/abstention
|
||||
|
||||
|
||||
class RecalibratedProbabilisticClassifier:
|
||||
"""
|
||||
Abstract class for (re)calibration method from `abstention.calibration`, as defined in
|
||||
`Alexandari, A., Kundaje, A., & Shrikumar, A. (2020, November). Maximum likelihood with bias-corrected calibration
|
||||
is hard-to-beat at label shift adaptation. In International Conference on Machine Learning (pp. 222-232). PMLR.
|
||||
<http://proceedings.mlr.press/v119/alexandari20a.html>`_:
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class RecalibratedProbabilisticClassifierBase(BaseEstimator, RecalibratedProbabilisticClassifier):
|
||||
"""
|
||||
Applies a (re)calibration method from `abstention.calibration`, as defined in
|
||||
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
|
||||
|
||||
:param classifier: a scikit-learn probabilistic classifier
|
||||
:param calibrator: the calibration object (an instance of abstention.calibration.CalibratorFactory)
|
||||
:param val_split: indicate an integer k for performing kFCV to obtain the posterior probabilities, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.
|
||||
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer); default=None
|
||||
:param verbose: whether or not to display information in the standard output
|
||||
"""
|
||||
|
||||
def __init__(self, classifier, calibrator, val_split=5, n_jobs=None, verbose=False):
|
||||
self.classifier = classifier
|
||||
self.calibrator = calibrator
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.verbose = verbose
|
||||
|
||||
def fit(self, X, y):
|
||||
"""
|
||||
Fits the calibration for the probabilistic classifier.
|
||||
|
||||
:param X: array-like of shape `(n_samples, n_features)` with the data instances
|
||||
:param y: array-like of shape `(n_samples,)` with the class labels
|
||||
:return: self
|
||||
"""
|
||||
k = self.val_split
|
||||
if isinstance(k, int):
|
||||
if k < 2:
|
||||
raise ValueError('wrong value for val_split: the number of folds must be > 2')
|
||||
return self.fit_cv(X, y)
|
||||
elif isinstance(k, float):
|
||||
if not (0 < k < 1):
|
||||
raise ValueError('wrong value for val_split: the proportion of validation documents must be in (0,1)')
|
||||
return self.fit_cv(X, y)
|
||||
|
||||
def fit_cv(self, X, y):
|
||||
"""
|
||||
Fits the calibration in a cross-validation manner, i.e., it generates posterior probabilities for all
|
||||
training instances via cross-validation, and then retrains the classifier on all training instances.
|
||||
The posterior probabilities thus generated are used for calibrating the outputs of the classifier.
|
||||
|
||||
:param X: array-like of shape `(n_samples, n_features)` with the data instances
|
||||
:param y: array-like of shape `(n_samples,)` with the class labels
|
||||
:return: self
|
||||
"""
|
||||
posteriors = cross_val_predict(
|
||||
self.classifier, X, y, cv=self.val_split, n_jobs=self.n_jobs, verbose=self.verbose, method='predict_proba'
|
||||
)
|
||||
self.classifier.fit(X, y)
|
||||
nclasses = len(np.unique(y))
|
||||
self.calibration_function = self.calibrator(posteriors, np.eye(nclasses)[y], posterior_supplied=True)
|
||||
return self
|
||||
|
||||
def fit_tr_val(self, X, y):
|
||||
"""
|
||||
Fits the calibration in a train/val-split manner, i.e.t, it partitions the training instances into a
|
||||
training and a validation set, and then uses the training samples to learn classifier which is then used
|
||||
to generate posterior probabilities for the held-out validation data. These posteriors are used to calibrate
|
||||
the classifier. The classifier is not retrained on the whole dataset.
|
||||
|
||||
:param X: array-like of shape `(n_samples, n_features)` with the data instances
|
||||
:param y: array-like of shape `(n_samples,)` with the class labels
|
||||
:return: self
|
||||
"""
|
||||
Xtr, Xva, ytr, yva = train_test_split(X, y, test_size=self.val_split, stratify=y)
|
||||
self.classifier.fit(Xtr, ytr)
|
||||
posteriors = self.classifier.predict_proba(Xva)
|
||||
nclasses = len(np.unique(yva))
|
||||
self.calibrator = self.calibrator(posteriors, np.eye(nclasses)[yva], posterior_supplied=True)
|
||||
return self
|
||||
|
||||
def predict(self, X):
|
||||
"""
|
||||
Predicts class labels for the data instances in `X`
|
||||
|
||||
:param X: array-like of shape `(n_samples, n_features)` with the data instances
|
||||
:return: array-like of shape `(n_samples,)` with the class label predictions
|
||||
"""
|
||||
return self.classifier.predict(X)
|
||||
|
||||
def predict_proba(self, X):
|
||||
"""
|
||||
Generates posterior probabilities for the data instances in `X`
|
||||
|
||||
:param X: array-like of shape `(n_samples, n_features)` with the data instances
|
||||
:return: array-like of shape `(n_samples, n_classes)` with posterior probabilities
|
||||
"""
|
||||
posteriors = self.classifier.predict_proba(X)
|
||||
return self.calibration_function(posteriors)
|
||||
|
||||
@property
|
||||
def classes_(self):
|
||||
"""
|
||||
Returns the classes on which the classifier has been trained on
|
||||
|
||||
:return: array-like of shape `(n_classes)`
|
||||
"""
|
||||
return self.classifier.classes_
|
||||
|
||||
|
||||
class NBVSCalibration(RecalibratedProbabilisticClassifierBase):
|
||||
"""
|
||||
Applies the No-Bias Vector Scaling (NBVS) calibration method from `abstention.calibration`, as defined in
|
||||
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
|
||||
|
||||
:param classifier: a scikit-learn probabilistic classifier
|
||||
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.
|
||||
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
|
||||
:param verbose: whether or not to display information in the standard output
|
||||
"""
|
||||
|
||||
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
|
||||
self.classifier = classifier
|
||||
self.calibrator = NoBiasVectorScaling(verbose=verbose)
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.verbose = verbose
|
||||
|
||||
|
||||
class BCTSCalibration(RecalibratedProbabilisticClassifierBase):
|
||||
"""
|
||||
Applies the Bias-Corrected Temperature Scaling (BCTS) calibration method from `abstention.calibration`, as defined in
|
||||
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
|
||||
|
||||
:param classifier: a scikit-learn probabilistic classifier
|
||||
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.
|
||||
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
|
||||
:param verbose: whether or not to display information in the standard output
|
||||
"""
|
||||
|
||||
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
|
||||
self.classifier = classifier
|
||||
self.calibrator = TempScaling(verbose=verbose, bias_positions='all')
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.verbose = verbose
|
||||
|
||||
|
||||
class TSCalibration(RecalibratedProbabilisticClassifierBase):
|
||||
"""
|
||||
Applies the Temperature Scaling (TS) calibration method from `abstention.calibration`, as defined in
|
||||
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
|
||||
|
||||
:param classifier: a scikit-learn probabilistic classifier
|
||||
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.
|
||||
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
|
||||
:param verbose: whether or not to display information in the standard output
|
||||
"""
|
||||
|
||||
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
|
||||
self.classifier = classifier
|
||||
self.calibrator = TempScaling(verbose=verbose)
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.verbose = verbose
|
||||
|
||||
|
||||
class VSCalibration(RecalibratedProbabilisticClassifierBase):
|
||||
"""
|
||||
Applies the Vector Scaling (VS) calibration method from `abstention.calibration`, as defined in
|
||||
`Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:
|
||||
|
||||
:param classifier: a scikit-learn probabilistic classifier
|
||||
:param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p
|
||||
in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the
|
||||
training instances (the rest is used for training). In any case, the classifier is retrained in the whole
|
||||
training set afterwards. Default value is 5.
|
||||
:param n_jobs: indicate the number of parallel workers (only when val_split is an integer)
|
||||
:param verbose: whether or not to display information in the standard output
|
||||
"""
|
||||
|
||||
def __init__(self, classifier, val_split=5, n_jobs=None, verbose=False):
|
||||
self.classifier = classifier
|
||||
self.calibrator = VectorScaling(verbose=verbose)
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.verbose = verbose
|
||||
|
|
@ -229,11 +229,11 @@ class NeuralClassifierTrainer:
|
|||
self.net.eval()
|
||||
opt = self.trainer_hyperparams
|
||||
with torch.no_grad():
|
||||
positive_probs = []
|
||||
posteriors = []
|
||||
for xi in TorchDataset(instances).asDataloader(
|
||||
opt['batch_size_test'], shuffle=False, pad_length=opt['padding_length'], device=opt['device']):
|
||||
positive_probs.append(self.net.predict_proba(xi))
|
||||
return np.concatenate(positive_probs)
|
||||
posteriors.append(self.net.predict_proba(xi))
|
||||
return np.concatenate(posteriors)
|
||||
|
||||
def transform(self, instances):
|
||||
"""
|
||||
|
|
|
@ -1,5 +1,7 @@
|
|||
import random
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
from os import remove, makedirs
|
||||
from os.path import join, exists
|
||||
from subprocess import PIPE, STDOUT
|
||||
|
@ -23,26 +25,34 @@ class SVMperf(BaseEstimator, ClassifierMixin):
|
|||
:param C: trade-off between training error and margin (default 0.01)
|
||||
:param verbose: set to True to print svm-perf std outputs
|
||||
:param loss: the loss to optimize for. Available losses are "01", "f1", "kld", "nkld", "q", "qacc", "qf1", "qgm", "mae", "mrae".
|
||||
:param host_folder: directory where to store the trained model; set to None (default) for using a tmp directory
|
||||
(temporal directories are automatically deleted)
|
||||
"""
|
||||
|
||||
# losses with their respective codes in svm_perf implementation
|
||||
valid_losses = {'01':0, 'f1':1, 'kld':12, 'nkld':13, 'q':22, 'qacc':23, 'qf1':24, 'qgm':25, 'mae':26, 'mrae':27}
|
||||
|
||||
def __init__(self, svmperf_base, C=0.01, verbose=False, loss='01'):
|
||||
def __init__(self, svmperf_base, C=0.01, verbose=False, loss='01', host_folder=None):
|
||||
assert exists(svmperf_base), f'path {svmperf_base} does not seem to point to a valid path'
|
||||
self.svmperf_base = svmperf_base
|
||||
self.C = C
|
||||
self.verbose = verbose
|
||||
self.loss = loss
|
||||
self.host_folder = host_folder
|
||||
|
||||
def set_params(self, **parameters):
|
||||
"""
|
||||
Set the hyper-parameters for svm-perf. Currently, only the `C` parameter is supported
|
||||
|
||||
:param parameters: a `**kwargs` dictionary `{'C': <float>}`
|
||||
"""
|
||||
assert list(parameters.keys()) == ['C'], 'currently, only the C parameter is supported'
|
||||
self.C = parameters['C']
|
||||
# def set_params(self, **parameters):
|
||||
# """
|
||||
# Set the hyper-parameters for svm-perf. Currently, only the `C` and `loss` parameters are supported
|
||||
#
|
||||
# :param parameters: a `**kwargs` dictionary `{'C': <float>}`
|
||||
# """
|
||||
# assert sorted(list(parameters.keys())) == ['C', 'loss'], \
|
||||
# 'currently, only the C and loss parameters are supported'
|
||||
# self.C = parameters.get('C', self.C)
|
||||
# self.loss = parameters.get('loss', self.loss)
|
||||
#
|
||||
# def get_params(self, deep=True):
|
||||
# return {'C': self.C, 'loss': self.loss}
|
||||
|
||||
def fit(self, X, y):
|
||||
"""
|
||||
|
@ -65,14 +75,14 @@ class SVMperf(BaseEstimator, ClassifierMixin):
|
|||
|
||||
local_random = random.Random()
|
||||
# this would allow to run parallel instances of predict
|
||||
random_code = '-'.join(str(local_random.randint(0,1000000)) for _ in range(5))
|
||||
# self.tmpdir = tempfile.TemporaryDirectory(suffix=random_code)
|
||||
# tmp dir are removed after the fit terminates in multiprocessing... moving to regular directories + __del__
|
||||
self.tmpdir = '.svmperf-' + random_code
|
||||
random_code = 'svmperfprocess'+'-'.join(str(local_random.randint(0, 1000000)) for _ in range(5))
|
||||
if self.host_folder is None:
|
||||
# tmp dir are removed after the fit terminates in multiprocessing...
|
||||
self.tmpdir = tempfile.TemporaryDirectory(suffix=random_code).name
|
||||
else:
|
||||
self.tmpdir = join(self.host_folder, '.' + random_code)
|
||||
makedirs(self.tmpdir, exist_ok=True)
|
||||
|
||||
# self.model = join(self.tmpdir.name, 'model-'+random_code)
|
||||
# traindat = join(self.tmpdir.name, f'train-{random_code}.dat')
|
||||
self.model = join(self.tmpdir, 'model-'+random_code)
|
||||
traindat = join(self.tmpdir, f'train-{random_code}.dat')
|
||||
|
||||
|
@ -94,6 +104,7 @@ class SVMperf(BaseEstimator, ClassifierMixin):
|
|||
def predict(self, X):
|
||||
"""
|
||||
Predicts labels for the instances `X`
|
||||
|
||||
:param X: array-like of shape `(n_samples, n_features)` instances to classify
|
||||
:return: a `numpy` array of length `n` containing the label predictions, where `n` is the number of
|
||||
instances in `X`
|
||||
|
@ -119,8 +130,6 @@ class SVMperf(BaseEstimator, ClassifierMixin):
|
|||
# in order to allow for parallel runs of predict, a random code is assigned
|
||||
local_random = random.Random()
|
||||
random_code = '-'.join(str(local_random.randint(0, 1000000)) for _ in range(5))
|
||||
# predictions_path = join(self.tmpdir.name, 'predictions'+random_code+'.dat')
|
||||
# testdat = join(self.tmpdir.name, 'test'+random_code+'.dat')
|
||||
predictions_path = join(self.tmpdir, 'predictions' + random_code + '.dat')
|
||||
testdat = join(self.tmpdir, 'test' + random_code + '.dat')
|
||||
dump_svmlight_file(X, y, testdat, zero_based=False)
|
||||
|
@ -141,5 +150,5 @@ class SVMperf(BaseEstimator, ClassifierMixin):
|
|||
|
||||
def __del__(self):
|
||||
if hasattr(self, 'tmpdir'):
|
||||
pass # shutil.rmtree(self.tmpdir, ignore_errors=True)
|
||||
shutil.rmtree(self.tmpdir, ignore_errors=True)
|
||||
|
||||
|
|
|
@ -0,0 +1,169 @@
|
|||
from typing import Tuple, Union
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import os
|
||||
|
||||
from quapy.protocol import AbstractProtocol
|
||||
|
||||
DEV_SAMPLES = 1000
|
||||
TEST_SAMPLES = 5000
|
||||
|
||||
ERROR_TOL = 1E-3
|
||||
|
||||
|
||||
def load_category_map(path):
|
||||
cat2code = {}
|
||||
with open(path, 'rt') as fin:
|
||||
for line in fin:
|
||||
category, code = line.split()
|
||||
cat2code[category] = int(code)
|
||||
code2cat = [cat for cat, code in sorted(cat2code.items(), key=lambda x: x[1])]
|
||||
return cat2code, code2cat
|
||||
|
||||
|
||||
def load_raw_documents(path):
|
||||
df = pd.read_csv(path)
|
||||
documents = list(df["text"].values)
|
||||
labels = None
|
||||
if "label" in df.columns:
|
||||
labels = df["label"].values.astype(int)
|
||||
return documents, labels
|
||||
|
||||
|
||||
def load_vector_documents(path):
|
||||
D = pd.read_csv(path).to_numpy(dtype=float)
|
||||
labelled = D.shape[1] == 301
|
||||
if labelled:
|
||||
X, y = D[:, 1:], D[:, 0].astype(int).flatten()
|
||||
else:
|
||||
X, y = D, None
|
||||
return X, y
|
||||
|
||||
|
||||
class SamplesFromDir(AbstractProtocol):
|
||||
|
||||
def __init__(self, path_dir:str, ground_truth_path:str, load_fn):
|
||||
self.path_dir = path_dir
|
||||
self.load_fn = load_fn
|
||||
self.true_prevs = ResultSubmission.load(ground_truth_path)
|
||||
|
||||
def __call__(self):
|
||||
for id, prevalence in self.true_prevs.iterrows():
|
||||
sample, _ = self.load_fn(os.path.join(self.path_dir, f'{id}.txt'))
|
||||
yield sample, prevalence
|
||||
|
||||
|
||||
class ResultSubmission:
|
||||
|
||||
def __init__(self):
|
||||
self.df = None
|
||||
|
||||
def __init_df(self, categories: int):
|
||||
if not isinstance(categories, int) or categories < 2:
|
||||
raise TypeError('wrong format for categories: an int (>=2) was expected')
|
||||
df = pd.DataFrame(columns=list(range(categories)))
|
||||
df.index.set_names('id', inplace=True)
|
||||
self.df = df
|
||||
|
||||
@property
|
||||
def n_categories(self):
|
||||
return len(self.df.columns.values)
|
||||
|
||||
def add(self, sample_id: int, prevalence_values: np.ndarray):
|
||||
if not isinstance(sample_id, int):
|
||||
raise TypeError(f'error: expected int for sample_sample, found {type(sample_id)}')
|
||||
if not isinstance(prevalence_values, np.ndarray):
|
||||
raise TypeError(f'error: expected np.ndarray for prevalence_values, found {type(prevalence_values)}')
|
||||
if self.df is None:
|
||||
self.__init_df(categories=len(prevalence_values))
|
||||
if sample_id in self.df.index.values:
|
||||
raise ValueError(f'error: prevalence values for "{sample_id}" already added')
|
||||
if prevalence_values.ndim != 1 and prevalence_values.size != self.n_categories:
|
||||
raise ValueError(f'error: wrong shape found for prevalence vector {prevalence_values}')
|
||||
if (prevalence_values < 0).any() or (prevalence_values > 1).any():
|
||||
raise ValueError(f'error: prevalence values out of range [0,1] for "{sample_id}"')
|
||||
if np.abs(prevalence_values.sum() - 1) > ERROR_TOL:
|
||||
raise ValueError(f'error: prevalence values do not sum up to one for "{sample_id}"'
|
||||
f'(error tolerance {ERROR_TOL})')
|
||||
|
||||
self.df.loc[sample_id] = prevalence_values
|
||||
|
||||
def __len__(self):
|
||||
return len(self.df)
|
||||
|
||||
@classmethod
|
||||
def load(cls, path: str) -> 'ResultSubmission':
|
||||
df = ResultSubmission.check_file_format(path)
|
||||
r = ResultSubmission()
|
||||
r.df = df
|
||||
return r
|
||||
|
||||
def dump(self, path: str):
|
||||
ResultSubmission.check_dataframe_format(self.df)
|
||||
self.df.to_csv(path)
|
||||
|
||||
def prevalence(self, sample_id: int):
|
||||
sel = self.df.loc[sample_id]
|
||||
if sel.empty:
|
||||
return None
|
||||
else:
|
||||
return sel.values.flatten()
|
||||
|
||||
def iterrows(self):
|
||||
for index, row in self.df.iterrows():
|
||||
prevalence = row.values.flatten()
|
||||
yield index, prevalence
|
||||
|
||||
@classmethod
|
||||
def check_file_format(cls, path) -> Union[pd.DataFrame, Tuple[pd.DataFrame, str]]:
|
||||
try:
|
||||
df = pd.read_csv(path, index_col=0)
|
||||
except Exception as e:
|
||||
print(f'the file {path} does not seem to be a valid csv file. ')
|
||||
print(e)
|
||||
return ResultSubmission.check_dataframe_format(df, path=path)
|
||||
|
||||
@classmethod
|
||||
def check_dataframe_format(cls, df, path=None) -> Union[pd.DataFrame, Tuple[pd.DataFrame, str]]:
|
||||
hint_path = '' # if given, show the data path in the error message
|
||||
if path is not None:
|
||||
hint_path = f' in {path}'
|
||||
|
||||
if df.index.name != 'id' or len(df.columns) < 2:
|
||||
raise ValueError(f'wrong header{hint_path}, '
|
||||
f'the format of the header should be "id,0,...,n-1", '
|
||||
f'where n is the number of categories')
|
||||
if [int(ci) for ci in df.columns.values] != list(range(len(df.columns))):
|
||||
raise ValueError(f'wrong header{hint_path}, category ids should be 0,1,2,...,n-1, '
|
||||
f'where n is the number of categories')
|
||||
if df.empty:
|
||||
raise ValueError(f'error{hint_path}: results file is empty')
|
||||
elif len(df) != DEV_SAMPLES and len(df) != TEST_SAMPLES:
|
||||
raise ValueError(f'wrong number of prevalence values found{hint_path}; '
|
||||
f'expected {DEV_SAMPLES} for development sets and '
|
||||
f'{TEST_SAMPLES} for test sets; found {len(df)}')
|
||||
|
||||
ids = set(df.index.values)
|
||||
expected_ids = set(range(len(df)))
|
||||
if ids != expected_ids:
|
||||
missing = expected_ids - ids
|
||||
if missing:
|
||||
raise ValueError(f'there are {len(missing)} missing ids{hint_path}: {sorted(missing)}')
|
||||
unexpected = ids - expected_ids
|
||||
if unexpected:
|
||||
raise ValueError(f'there are {len(missing)} unexpected ids{hint_path}: {sorted(unexpected)}')
|
||||
|
||||
for category_id in df.columns:
|
||||
if (df[category_id] < 0).any() or (df[category_id] > 1).any():
|
||||
raise ValueError(f'error{hint_path} column "{category_id}" contains values out of range [0,1]')
|
||||
|
||||
prevs = df.values
|
||||
round_errors = np.abs(prevs.sum(axis=-1) - 1.) > ERROR_TOL
|
||||
if round_errors.any():
|
||||
raise ValueError(f'warning: prevalence values in rows with id {np.where(round_errors)[0].tolist()} '
|
||||
f'do not sum up to 1 (error tolerance {ERROR_TOL}), '
|
||||
f'probably due to some rounding errors.')
|
||||
|
||||
return df
|
||||
|
||||
|
|
@ -1,24 +1,29 @@
|
|||
import itertools
|
||||
from functools import cached_property
|
||||
from typing import Iterable
|
||||
|
||||
import numpy as np
|
||||
from scipy.sparse import issparse
|
||||
from scipy.sparse import vstack
|
||||
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold
|
||||
|
||||
from quapy.functional import artificial_prevalence_sampling, strprev
|
||||
from numpy.random import RandomState
|
||||
from quapy.functional import strprev
|
||||
from quapy.util import temp_seed
|
||||
|
||||
|
||||
class LabelledCollection:
|
||||
"""
|
||||
A LabelledCollection is a set of objects each with a label associated to it. This class implements many sampling
|
||||
routines.
|
||||
A LabelledCollection is a set of objects each with a label attached to each of them.
|
||||
This class implements several sampling routines and other utilities.
|
||||
|
||||
:param instances: array-like (np.ndarray, list, or csr_matrix are supported)
|
||||
:param labels: array-like with the same length of instances
|
||||
:param classes_: optional, list of classes from which labels are taken. If not specified, the classes are inferred
|
||||
:param classes: optional, list of classes from which labels are taken. If not specified, the classes are inferred
|
||||
from the labels. The classes must be indicated in cases in which some of the labels might have no examples
|
||||
(i.e., a prevalence of 0)
|
||||
"""
|
||||
|
||||
def __init__(self, instances, labels, classes_=None):
|
||||
def __init__(self, instances, labels, classes=None):
|
||||
if issparse(instances):
|
||||
self.instances = instances
|
||||
elif isinstance(instances, list) and len(instances) > 0 and isinstance(instances[0], str):
|
||||
|
@ -28,14 +33,14 @@ class LabelledCollection:
|
|||
self.instances = np.asarray(instances)
|
||||
self.labels = np.asarray(labels)
|
||||
n_docs = len(self)
|
||||
if classes_ is None:
|
||||
if classes is None:
|
||||
self.classes_ = np.unique(self.labels)
|
||||
self.classes_.sort()
|
||||
else:
|
||||
self.classes_ = np.unique(np.asarray(classes_))
|
||||
self.classes_ = np.unique(np.asarray(classes))
|
||||
self.classes_.sort()
|
||||
if len(set(self.labels).difference(set(classes_))) > 0:
|
||||
raise ValueError(f'labels ({set(self.labels)}) contain values not included in classes_ ({set(classes_)})')
|
||||
if len(set(self.labels).difference(set(classes))) > 0:
|
||||
raise ValueError(f'labels ({set(self.labels)}) contain values not included in classes_ ({set(classes)})')
|
||||
self.index = {class_: np.arange(n_docs)[self.labels == class_] for class_ in self.classes_}
|
||||
|
||||
@classmethod
|
||||
|
@ -65,7 +70,7 @@ class LabelledCollection:
|
|||
|
||||
def prevalence(self):
|
||||
"""
|
||||
Returns the prevalence, or relative frequency, of the classes of interest.
|
||||
Returns the prevalence, or relative frequency, of the classes in the codeframe.
|
||||
|
||||
:return: a np.ndarray of shape `(n_classes)` with the relative frequencies of each class, in the same order
|
||||
as listed by `self.classes_`
|
||||
|
@ -74,7 +79,7 @@ class LabelledCollection:
|
|||
|
||||
def counts(self):
|
||||
"""
|
||||
Returns the number of instances for each of the classes of interest.
|
||||
Returns the number of instances for each of the classes in the codeframe.
|
||||
|
||||
:return: a np.ndarray of shape `(n_classes)` with the number of instances of each class, in the same order
|
||||
as listed by `self.classes_`
|
||||
|
@ -99,7 +104,7 @@ class LabelledCollection:
|
|||
"""
|
||||
return self.n_classes == 2
|
||||
|
||||
def sampling_index(self, size, *prevs, shuffle=True):
|
||||
def sampling_index(self, size, *prevs, shuffle=True, random_state=None):
|
||||
"""
|
||||
Returns an index to be used to extract a random sample of desired size and desired prevalence values. If the
|
||||
prevalence values are not specified, then returns the index of a uniform sampling.
|
||||
|
@ -111,50 +116,72 @@ class LabelledCollection:
|
|||
it is constrained. E.g., for binary collections, only the prevalence `p` for the first class (as listed in
|
||||
`self.classes_` can be specified, while the other class takes prevalence value `1-p`
|
||||
:param shuffle: if set to True (default), shuffles the index before returning it
|
||||
:param random_state: seed for reproducing sampling
|
||||
:return: a np.ndarray of shape `(size)` with the indexes
|
||||
"""
|
||||
if len(prevs) == 0: # no prevalence was indicated; returns an index for uniform sampling
|
||||
return self.uniform_sampling_index(size)
|
||||
return self.uniform_sampling_index(size, random_state=random_state)
|
||||
if len(prevs) == self.n_classes - 1:
|
||||
prevs = prevs + (1 - sum(prevs),)
|
||||
assert len(prevs) == self.n_classes, 'unexpected number of prevalences'
|
||||
assert sum(prevs) == 1, f'prevalences ({prevs}) wrong range (sum={sum(prevs)})'
|
||||
|
||||
taken = 0
|
||||
indexes_sample = []
|
||||
for i, class_ in enumerate(self.classes_):
|
||||
if i == self.n_classes - 1:
|
||||
n_requested = size - taken
|
||||
else:
|
||||
n_requested = int(size * prevs[i])
|
||||
# Decide how many instances should be taken for each class in order to satisfy the requested prevalence
|
||||
# accurately, and the number of instances in the sample (exactly). If int(size * prevs[i]) (which is
|
||||
# <= size * prevs[i]) examples are drawn from class i, there could be a remainder number of instances to take
|
||||
# to satisfy the size constrain. The remainder is distributed along the classes with probability = prevs.
|
||||
# (This aims at avoiding the remainder to be placed in a class for which the prevalence requested is 0.)
|
||||
n_requests = {class_: round(size * prevs[i]) for i, class_ in enumerate(self.classes_)}
|
||||
remainder = size - sum(n_requests.values())
|
||||
with temp_seed(random_state):
|
||||
# due to rounding, the remainder can be 0, >0, or <0
|
||||
if remainder > 0:
|
||||
# when the remainder is >0 we randomly add 1 to the requests for each class;
|
||||
# more prevalent classes are more likely to be taken in order to minimize the impact in the final prevalence
|
||||
for rand_class in np.random.choice(self.classes_, size=remainder, p=prevs):
|
||||
n_requests[rand_class] += 1
|
||||
elif remainder < 0:
|
||||
# when the remainder is <0 we randomly remove 1 from the requests, unless the request is 0 for a chosen
|
||||
# class; we repeat until remainder==0
|
||||
while remainder!=0:
|
||||
rand_class = np.random.choice(self.classes_, p=prevs)
|
||||
if n_requests[rand_class] > 0:
|
||||
n_requests[rand_class] -= 1
|
||||
remainder += 1
|
||||
|
||||
n_candidates = len(self.index[class_])
|
||||
index_sample = self.index[class_][
|
||||
np.random.choice(n_candidates, size=n_requested, replace=(n_requested > n_candidates))
|
||||
] if n_requested > 0 else []
|
||||
indexes_sample = []
|
||||
for class_, n_requested in n_requests.items():
|
||||
n_candidates = len(self.index[class_])
|
||||
index_sample = self.index[class_][
|
||||
np.random.choice(n_candidates, size=n_requested, replace=(n_requested > n_candidates))
|
||||
] if n_requested > 0 else []
|
||||
|
||||
indexes_sample.append(index_sample)
|
||||
taken += n_requested
|
||||
indexes_sample.append(index_sample)
|
||||
|
||||
indexes_sample = np.concatenate(indexes_sample).astype(int)
|
||||
indexes_sample = np.concatenate(indexes_sample).astype(int)
|
||||
|
||||
if shuffle:
|
||||
indexes_sample = np.random.permutation(indexes_sample)
|
||||
if shuffle:
|
||||
indexes_sample = np.random.permutation(indexes_sample)
|
||||
|
||||
return indexes_sample
|
||||
|
||||
def uniform_sampling_index(self, size):
|
||||
def uniform_sampling_index(self, size, random_state=None):
|
||||
"""
|
||||
Returns an index to be used to extract a uniform sample of desired size. The sampling is drawn
|
||||
with replacement if the requested size is greater than the number of instances, or without replacement
|
||||
otherwise.
|
||||
|
||||
:param size: integer, the size of the uniform sample
|
||||
:param random_state: if specified, guarantees reproducibility of the split.
|
||||
:return: a np.ndarray of shape `(size)` with the indexes
|
||||
"""
|
||||
return np.random.choice(len(self), size, replace=False)
|
||||
if random_state is not None:
|
||||
ng = RandomState(seed=random_state)
|
||||
else:
|
||||
ng = np.random
|
||||
return ng.choice(len(self), size, replace=size > len(self))
|
||||
|
||||
def sampling(self, size, *prevs, shuffle=True):
|
||||
def sampling(self, size, *prevs, shuffle=True, random_state=None):
|
||||
"""
|
||||
Return a random sample (an instance of :class:`LabelledCollection`) of desired size and desired prevalence
|
||||
values. For each class, the sampling is drawn without replacement if the requested prevalence is larger than
|
||||
|
@ -165,22 +192,24 @@ class LabelledCollection:
|
|||
it is constrained. E.g., for binary collections, only the prevalence `p` for the first class (as listed in
|
||||
`self.classes_` can be specified, while the other class takes prevalence value `1-p`
|
||||
:param shuffle: if set to True (default), shuffles the index before returning it
|
||||
:param random_state: seed for reproducing sampling
|
||||
:return: an instance of :class:`LabelledCollection` with length == `size` and prevalence close to `prevs` (or
|
||||
prevalence == `prevs` if the exact prevalence values can be met as proportions of instances)
|
||||
"""
|
||||
prev_index = self.sampling_index(size, *prevs, shuffle=shuffle)
|
||||
prev_index = self.sampling_index(size, *prevs, shuffle=shuffle, random_state=random_state)
|
||||
return self.sampling_from_index(prev_index)
|
||||
|
||||
def uniform_sampling(self, size):
|
||||
def uniform_sampling(self, size, random_state=None):
|
||||
"""
|
||||
Returns a uniform sample (an instance of :class:`LabelledCollection`) of desired size. The sampling is drawn
|
||||
with replacement if the requested size is greater than the number of instances, or without replacement
|
||||
otherwise.
|
||||
|
||||
:param size: integer, the requested size
|
||||
:param random_state: if specified, guarantees reproducibility of the split.
|
||||
:return: an instance of :class:`LabelledCollection` with length == `size`
|
||||
"""
|
||||
unif_index = self.uniform_sampling_index(size)
|
||||
unif_index = self.uniform_sampling_index(size, random_state=random_state)
|
||||
return self.sampling_from_index(unif_index)
|
||||
|
||||
def sampling_from_index(self, index):
|
||||
|
@ -193,7 +222,7 @@ class LabelledCollection:
|
|||
"""
|
||||
documents = self.instances[index]
|
||||
labels = self.labels[index]
|
||||
return LabelledCollection(documents, labels, classes_=self.classes_)
|
||||
return LabelledCollection(documents, labels, classes=self.classes_)
|
||||
|
||||
def split_stratified(self, train_prop=0.6, random_state=None):
|
||||
"""
|
||||
|
@ -207,92 +236,91 @@ class LabelledCollection:
|
|||
:return: two instances of :class:`LabelledCollection`, the first one with `train_prop` elements, and the
|
||||
second one with `1-train_prop` elements
|
||||
"""
|
||||
tr_docs, te_docs, tr_labels, te_labels = \
|
||||
train_test_split(self.instances, self.labels, train_size=train_prop, stratify=self.labels,
|
||||
random_state=random_state)
|
||||
return LabelledCollection(tr_docs, tr_labels), LabelledCollection(te_docs, te_labels)
|
||||
tr_docs, te_docs, tr_labels, te_labels = train_test_split(
|
||||
self.instances, self.labels, train_size=train_prop, stratify=self.labels, random_state=random_state
|
||||
)
|
||||
training = LabelledCollection(tr_docs, tr_labels, classes=self.classes_)
|
||||
test = LabelledCollection(te_docs, te_labels, classes=self.classes_)
|
||||
return training, test
|
||||
|
||||
def artificial_sampling_generator(self, sample_size, n_prevalences=101, repeats=1):
|
||||
def split_random(self, train_prop=0.6, random_state=None):
|
||||
"""
|
||||
A generator of samples that implements the artificial prevalence protocol (APP).
|
||||
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
|
||||
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
|
||||
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
|
||||
[1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of samples for each valid
|
||||
combination of prevalence values is indicated by `repeats`.
|
||||
Returns two instances of :class:`LabelledCollection` split randomly from this collection, at desired
|
||||
proportion.
|
||||
|
||||
:param sample_size: the number of instances in each sample
|
||||
:param n_prevalences: the number of prevalence points to be taken from the [0,1] interval (including the
|
||||
limits {0,1}). E.g., if `n_prevalences=11`, then the prevalence points to take are [0, 0.1, 0.2, ..., 1]
|
||||
:param repeats: the number of samples to generate for each valid combination of prevalence values (default 1)
|
||||
:return: yield samples generated at artificially controlled prevalence values
|
||||
:param train_prop: the proportion of elements to include in the left-most returned collection (typically used
|
||||
as the training collection). The rest of elements are included in the right-most returned collection
|
||||
(typically used as a test collection).
|
||||
:param random_state: if specified, guarantees reproducibility of the split.
|
||||
:return: two instances of :class:`LabelledCollection`, the first one with `train_prop` elements, and the
|
||||
second one with `1-train_prop` elements
|
||||
"""
|
||||
dimensions = self.n_classes
|
||||
for prevs in artificial_prevalence_sampling(dimensions, n_prevalences, repeats):
|
||||
yield self.sampling(sample_size, *prevs)
|
||||
|
||||
def artificial_sampling_index_generator(self, sample_size, n_prevalences=101, repeats=1):
|
||||
"""
|
||||
A generator of sample indexes implementing the artificial prevalence protocol (APP).
|
||||
The APP consists of exploring
|
||||
a grid of prevalence values (e.g., [0, 0.05, 0.1, 0.15, ..., 1]), and generating all valid combinations of
|
||||
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
|
||||
[1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of sample indexes for each valid
|
||||
combination of prevalence values is indicated by `repeats`
|
||||
|
||||
:param sample_size: the number of instances in each sample (i.e., length of each index)
|
||||
:param n_prevalences: the number of prevalence points to be taken from the [0,1] interval (including the
|
||||
limits {0,1}). E.g., if `n_prevalences=11`, then the prevalence points to take are [0, 0.1, 0.2, ..., 1]
|
||||
:param repeats: the number of samples to generate for each valid combination of prevalence values (default 1)
|
||||
:return: yield the indexes that generate the samples according to APP
|
||||
"""
|
||||
dimensions = self.n_classes
|
||||
for prevs in artificial_prevalence_sampling(dimensions, n_prevalences, repeats):
|
||||
yield self.sampling_index(sample_size, *prevs)
|
||||
|
||||
def natural_sampling_generator(self, sample_size, repeats=100):
|
||||
"""
|
||||
A generator of samples that implements the natural prevalence protocol (NPP). The NPP consists of drawing
|
||||
samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
|
||||
|
||||
:param sample_size: integer, the number of instances in each sample
|
||||
:param repeats: the number of samples to generate
|
||||
:return: yield instances of :class:`LabelledCollection`
|
||||
"""
|
||||
for _ in range(repeats):
|
||||
yield self.uniform_sampling(sample_size)
|
||||
|
||||
def natural_sampling_index_generator(self, sample_size, repeats=100):
|
||||
"""
|
||||
A generator of sample indexes according to the natural prevalence protocol (NPP). The NPP consists of drawing
|
||||
samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
|
||||
|
||||
:param sample_size: integer, the number of instances in each sample (i.e., the length of each index)
|
||||
:param repeats: the number of indexes to generate
|
||||
:return: yield `repeats` instances of np.ndarray with shape `(sample_size,)`
|
||||
"""
|
||||
for _ in range(repeats):
|
||||
yield self.uniform_sampling_index(sample_size)
|
||||
indexes = np.random.RandomState(seed=random_state).permutation(len(self))
|
||||
if isinstance(train_prop, int):
|
||||
assert train_prop < len(self), \
|
||||
'argument train_prop cannot be greater than the number of elements in the collection'
|
||||
splitpoint = train_prop
|
||||
elif isinstance(train_prop, float):
|
||||
assert 0 < train_prop < 1, \
|
||||
'argument train_prop out of range (0,1)'
|
||||
splitpoint = int(np.round(len(self)*train_prop))
|
||||
left, right = indexes[:splitpoint], indexes[splitpoint:]
|
||||
training = self.sampling_from_index(left)
|
||||
test = self.sampling_from_index(right)
|
||||
return training, test
|
||||
|
||||
def __add__(self, other):
|
||||
"""
|
||||
Returns a new :class:`LabelledCollection` as the union of this collection with another collection
|
||||
Returns a new :class:`LabelledCollection` as the union of this collection with another collection.
|
||||
Both labelled collections must have the same classes.
|
||||
|
||||
:param other: another :class:`LabelledCollection`
|
||||
:return: a :class:`LabelledCollection` representing the union of both collections
|
||||
"""
|
||||
if other is None:
|
||||
return self
|
||||
elif issparse(self.instances) and issparse(other.instances):
|
||||
join_instances = vstack([self.instances, other.instances])
|
||||
elif isinstance(self.instances, list) and isinstance(other.instances, list):
|
||||
join_instances = self.instances + other.instances
|
||||
elif isinstance(self.instances, np.ndarray) and isinstance(other.instances, np.ndarray):
|
||||
join_instances = np.concatenate([self.instances, other.instances])
|
||||
if not all(np.sort(self.classes_)==np.sort(other.classes_)):
|
||||
raise NotImplementedError(f'unsupported operation for collections on different classes; '
|
||||
f'expected {self.classes_}, found {other.classes_}')
|
||||
return LabelledCollection.join(self, other)
|
||||
|
||||
@classmethod
|
||||
def join(cls, *args: Iterable['LabelledCollection']):
|
||||
"""
|
||||
Returns a new :class:`LabelledCollection` as the union of the collections given in input.
|
||||
|
||||
:param args: instances of :class:`LabelledCollection`
|
||||
:return: a :class:`LabelledCollection` representing the union of both collections
|
||||
"""
|
||||
|
||||
args = [lc for lc in args if lc is not None]
|
||||
assert len(args) > 0, 'empty list is not allowed for mix'
|
||||
|
||||
assert all([isinstance(lc, LabelledCollection) for lc in args]), \
|
||||
'only instances of LabelledCollection allowed'
|
||||
|
||||
first_instances = args[0].instances
|
||||
first_type = type(first_instances)
|
||||
assert all([type(lc.instances)==first_type for lc in args[1:]]), \
|
||||
'not all the collections are of instances of the same type'
|
||||
|
||||
if issparse(first_instances) or isinstance(first_instances, np.ndarray):
|
||||
first_ndim = first_instances.ndim
|
||||
assert all([lc.instances.ndim == first_ndim for lc in args[1:]]), \
|
||||
'not all the ndarrays are of the same dimension'
|
||||
if first_ndim > 1:
|
||||
first_shape = first_instances.shape[1:]
|
||||
assert all([lc.instances.shape[1:] == first_shape for lc in args[1:]]), \
|
||||
'not all the ndarrays are of the same shape'
|
||||
if issparse(first_instances):
|
||||
instances = vstack([lc.instances for lc in args])
|
||||
else:
|
||||
instances = np.concatenate([lc.instances for lc in args])
|
||||
elif isinstance(first_instances, list):
|
||||
instances = list(itertools.chain(lc.instances for lc in args))
|
||||
else:
|
||||
raise NotImplementedError('unsupported operation for collection types')
|
||||
labels = np.concatenate([self.labels, other.labels])
|
||||
return LabelledCollection(join_instances, labels)
|
||||
labels = np.concatenate([lc.labels for lc in args])
|
||||
classes = np.unique(labels).sort()
|
||||
return LabelledCollection(instances, labels, classes=classes)
|
||||
|
||||
@property
|
||||
def Xy(self):
|
||||
|
@ -305,6 +333,44 @@ class LabelledCollection:
|
|||
"""
|
||||
return self.instances, self.labels
|
||||
|
||||
@property
|
||||
def Xp(self):
|
||||
"""
|
||||
Gets the instances and the true prevalence. This is useful when implementing evaluation protocols from
|
||||
a :class:`LabelledCollection` object.
|
||||
|
||||
:return: a tuple `(instances, prevalence)` from this collection
|
||||
"""
|
||||
return self.instances, self.prevalence()
|
||||
|
||||
@property
|
||||
def X(self):
|
||||
"""
|
||||
An alias to self.instances
|
||||
|
||||
:return: self.instances
|
||||
"""
|
||||
return self.instances
|
||||
|
||||
@property
|
||||
def y(self):
|
||||
"""
|
||||
An alias to self.labels
|
||||
|
||||
:return: self.labels
|
||||
"""
|
||||
return self.labels
|
||||
|
||||
@property
|
||||
def p(self):
|
||||
"""
|
||||
An alias to self.prevalence()
|
||||
|
||||
:return: self.prevalence()
|
||||
"""
|
||||
return self.prevalence()
|
||||
|
||||
|
||||
def stats(self, show=True):
|
||||
"""
|
||||
Returns (and eventually prints) a dictionary with some stats of this collection. E.g.,:
|
||||
|
@ -337,7 +403,7 @@ class LabelledCollection:
|
|||
f'#classes={stats_["classes"]}, prevs={stats_["prevs"]}')
|
||||
return stats_
|
||||
|
||||
def kFCV(self, nfolds=5, nrepeats=1, random_state=0):
|
||||
def kFCV(self, nfolds=5, nrepeats=1, random_state=None):
|
||||
"""
|
||||
Generator of stratified folds to be used in k-fold cross validation.
|
||||
|
||||
|
@ -439,7 +505,17 @@ class Dataset:
|
|||
"""
|
||||
return len(self.vocabulary)
|
||||
|
||||
def stats(self, show):
|
||||
@property
|
||||
def train_test(self):
|
||||
"""
|
||||
Alias to `self.training` and `self.test`
|
||||
|
||||
:return: the training and test collections
|
||||
:return: the training and test collections
|
||||
"""
|
||||
return self.training, self.test
|
||||
|
||||
def stats(self, show=True):
|
||||
"""
|
||||
Returns (and eventually prints) a dictionary with some stats of this dataset. E.g.,:
|
||||
|
||||
|
@ -477,13 +553,14 @@ class Dataset:
|
|||
yield Dataset(train, test, name=f'fold {(i % nfolds) + 1}/{nfolds} (round={(i // nfolds) + 1})')
|
||||
|
||||
|
||||
def isbinary(data):
|
||||
"""
|
||||
Returns True if `data` is either a binary :class:`Dataset` or a binary :class:`LabelledCollection`
|
||||
def reduce(self, n_train=100, n_test=100):
|
||||
"""
|
||||
Reduce the number of instances in place for quick experiments. Preserves the prevalence of each set.
|
||||
|
||||
:param data: a :class:`Dataset` or a :class:`LabelledCollection` object
|
||||
:return: True if labelled according to two classes
|
||||
"""
|
||||
if isinstance(data, Dataset) or isinstance(data, LabelledCollection):
|
||||
return data.binary
|
||||
return False
|
||||
:param n_train: number of training documents to keep (default 100)
|
||||
:param n_test: number of test documents to keep (default 100)
|
||||
:return: self
|
||||
"""
|
||||
self.training = self.training.sampling(n_train, *self.training.prevalence())
|
||||
self.test = self.test.sampling(n_test, *self.test.prevalence())
|
||||
return self
|
|
@ -6,12 +6,14 @@ import os
|
|||
import zipfile
|
||||
from os.path import join
|
||||
import pandas as pd
|
||||
import scipy
|
||||
|
||||
from quapy.data.base import Dataset, LabelledCollection
|
||||
from quapy.data.preprocessing import text2tfidf, reduce_columns
|
||||
from quapy.data.reader import *
|
||||
from quapy.util import download_file_if_not_exists, download_file, get_quapy_home, pickled_resource
|
||||
|
||||
|
||||
REVIEWS_SENTIMENT_DATASETS = ['hp', 'kindle', 'imdb']
|
||||
TWITTER_SENTIMENT_DATASETS_TEST = ['gasp', 'hcr', 'omd', 'sanders',
|
||||
'semeval13', 'semeval14', 'semeval15', 'semeval16',
|
||||
|
@ -43,6 +45,22 @@ UCI_DATASETS = ['acute.a', 'acute.b',
|
|||
'wine-q-red', 'wine-q-white',
|
||||
'yeast']
|
||||
|
||||
LEQUA2022_TASKS = ['T1A', 'T1B', 'T2A', 'T2B']
|
||||
|
||||
_TXA_SAMPLE_SIZE = 250
|
||||
_TXB_SAMPLE_SIZE = 1000
|
||||
|
||||
LEQUA2022_SAMPLE_SIZE = {
|
||||
'TXA': _TXA_SAMPLE_SIZE,
|
||||
'TXB': _TXB_SAMPLE_SIZE,
|
||||
'T1A': _TXA_SAMPLE_SIZE,
|
||||
'T1B': _TXB_SAMPLE_SIZE,
|
||||
'T2A': _TXA_SAMPLE_SIZE,
|
||||
'T2B': _TXB_SAMPLE_SIZE,
|
||||
'binary': _TXA_SAMPLE_SIZE,
|
||||
'multiclass': _TXB_SAMPLE_SIZE
|
||||
}
|
||||
|
||||
|
||||
def fetch_reviews(dataset_name, tfidf=False, min_df=None, data_home=None, pickle=False) -> Dataset:
|
||||
"""
|
||||
|
@ -533,3 +551,76 @@ def fetch_UCILabelledCollection(dataset_name, data_home=None, verbose=False) ->
|
|||
|
||||
def _df_replace(df, col, repl={'yes': 1, 'no':0}, astype=float):
|
||||
df[col] = df[col].apply(lambda x:repl[x]).astype(astype, copy=False)
|
||||
|
||||
|
||||
def fetch_lequa2022(task, data_home=None):
|
||||
"""
|
||||
Loads the official datasets provided for the `LeQua <https://lequa2022.github.io/index>`_ competition.
|
||||
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
|
||||
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide raw documents instead.
|
||||
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B are multiclass quantification
|
||||
problems consisting of estimating the class prevalence values of 28 different merchandise products.
|
||||
We refer to the `Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
|
||||
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
|
||||
<https://ceur-ws.org/Vol-3180/paper-146.pdf>`_ for a detailed description
|
||||
on the tasks and datasets.
|
||||
|
||||
The datasets are downloaded only once, and stored for fast reuse.
|
||||
|
||||
See `lequa2022_experiments.py` provided in the example folder, that can serve as a guide on how to use these
|
||||
datasets.
|
||||
|
||||
|
||||
:param task: a string representing the task name; valid ones are T1A, T1B, T2A, and T2B
|
||||
:param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default
|
||||
~/quay_data/ directory)
|
||||
:return: a tuple `(train, val_gen, test_gen)` where `train` is an instance of
|
||||
:class:`quapy.data.base.LabelledCollection`, `val_gen` and `test_gen` are instances of
|
||||
:class:`quapy.protocol.SamplesFromDir`, i.e., are sampling protocols that return a series of samples
|
||||
labelled by prevalence.
|
||||
"""
|
||||
|
||||
from quapy.data._lequa2022 import load_raw_documents, load_vector_documents, SamplesFromDir
|
||||
|
||||
assert task in LEQUA2022_TASKS, \
|
||||
f'Unknown task {task}. Valid ones are {LEQUA2022_TASKS}'
|
||||
if data_home is None:
|
||||
data_home = get_quapy_home()
|
||||
|
||||
URL_TRAINDEV=f'https://zenodo.org/record/6546188/files/{task}.train_dev.zip'
|
||||
URL_TEST=f'https://zenodo.org/record/6546188/files/{task}.test.zip'
|
||||
URL_TEST_PREV=f'https://zenodo.org/record/6546188/files/{task}.test_prevalences.zip'
|
||||
|
||||
lequa_dir = join(data_home, 'lequa2022')
|
||||
os.makedirs(lequa_dir, exist_ok=True)
|
||||
|
||||
def download_unzip_and_remove(unzipped_path, url):
|
||||
tmp_path = join(lequa_dir, task + '_tmp.zip')
|
||||
download_file_if_not_exists(url, tmp_path)
|
||||
with zipfile.ZipFile(tmp_path) as file:
|
||||
file.extractall(unzipped_path)
|
||||
os.remove(tmp_path)
|
||||
|
||||
if not os.path.exists(join(lequa_dir, task)):
|
||||
download_unzip_and_remove(lequa_dir, URL_TRAINDEV)
|
||||
download_unzip_and_remove(lequa_dir, URL_TEST)
|
||||
download_unzip_and_remove(lequa_dir, URL_TEST_PREV)
|
||||
|
||||
if task in ['T1A', 'T1B']:
|
||||
load_fn = load_vector_documents
|
||||
elif task in ['T2A', 'T2B']:
|
||||
load_fn = load_raw_documents
|
||||
|
||||
tr_path = join(lequa_dir, task, 'public', 'training_data.txt')
|
||||
train = LabelledCollection.load(tr_path, loader_func=load_fn)
|
||||
|
||||
val_samples_path = join(lequa_dir, task, 'public', 'dev_samples')
|
||||
val_true_prev_path = join(lequa_dir, task, 'public', 'dev_prevalences.txt')
|
||||
val_gen = SamplesFromDir(val_samples_path, val_true_prev_path, load_fn=load_fn)
|
||||
|
||||
test_samples_path = join(lequa_dir, task, 'public', 'test_samples')
|
||||
test_true_prev_path = join(lequa_dir, task, 'public', 'test_prevalences.txt')
|
||||
test_gen = SamplesFromDir(test_samples_path, test_true_prev_path, load_fn=load_fn)
|
||||
|
||||
return train, val_gen, test_gen
|
||||
|
||||
|
|
|
@ -88,7 +88,7 @@ def standardize(dataset: Dataset, inplace=False):
|
|||
:param dataset: a :class:`quapy.data.base.Dataset` object
|
||||
:param inplace: set to True if the transformation is to be applied inplace, or to False (default) if a new
|
||||
:class:`quapy.data.base.Dataset` is to be returned
|
||||
:return:
|
||||
:return: an instance of :class:`quapy.data.base.Dataset`
|
||||
"""
|
||||
s = StandardScaler(copy=not inplace)
|
||||
training = s.fit_transform(dataset.training.instances)
|
||||
|
@ -110,7 +110,7 @@ def index(dataset: Dataset, min_df=5, inplace=False, **kwargs):
|
|||
:param min_df: minimum number of occurrences below which the term is replaced by a `UNK` index
|
||||
:param inplace: whether or not to apply the transformation inplace (True), or to a new copy (False, default)
|
||||
:param kwargs: the rest of parameters of the transformation (as for sklearn's
|
||||
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>_`)
|
||||
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>_`)
|
||||
:return: a new :class:`quapy.data.base.Dataset` (if inplace=False) or a reference to the current
|
||||
:class:`quapy.data.base.Dataset` (inplace=True) consisting of lists of integer values representing indices.
|
||||
"""
|
||||
|
@ -121,6 +121,9 @@ def index(dataset: Dataset, min_df=5, inplace=False, **kwargs):
|
|||
training_index = indexer.fit_transform(dataset.training.instances)
|
||||
test_index = indexer.transform(dataset.test.instances)
|
||||
|
||||
training_index = np.asarray(training_index, dtype=object)
|
||||
test_index = np.asarray(test_index, dtype=object)
|
||||
|
||||
if inplace:
|
||||
dataset.training = LabelledCollection(training_index, dataset.training.labels, dataset.classes_)
|
||||
dataset.test = LabelledCollection(test_index, dataset.test.labels, dataset.classes_)
|
||||
|
@ -147,7 +150,8 @@ class IndexTransformer:
|
|||
contains, and that would be generated by sklearn's
|
||||
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
|
||||
|
||||
:param kwargs: keyworded arguments from `CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
|
||||
:param kwargs: keyworded arguments from
|
||||
`CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
|
||||
"""
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
|
@ -169,7 +173,7 @@ class IndexTransformer:
|
|||
self.pad = self.add_word(qp.environ['PAD_TOKEN'], qp.environ['PAD_INDEX'])
|
||||
return self
|
||||
|
||||
def transform(self, X, n_jobs=-1):
|
||||
def transform(self, X, n_jobs=None):
|
||||
"""
|
||||
Transforms the strings in `X` as lists of numerical ids
|
||||
|
||||
|
@ -179,14 +183,15 @@ class IndexTransformer:
|
|||
"""
|
||||
# given the number of tasks and the number of jobs, generates the slices for the parallel processes
|
||||
assert self.unk != -1, 'transform called before fit'
|
||||
indexed = map_parallel(func=self._index, args=X, n_jobs=n_jobs)
|
||||
return np.asarray(indexed)
|
||||
n_jobs = qp._get_njobs(n_jobs)
|
||||
return map_parallel(func=self._index, args=X, n_jobs=n_jobs)
|
||||
|
||||
|
||||
def _index(self, documents):
|
||||
vocab = self.vocabulary_.copy()
|
||||
return [[vocab.get(word, self.unk) for word in self.analyzer(doc)] for doc in tqdm(documents, 'indexing')]
|
||||
|
||||
def fit_transform(self, X, n_jobs=-1):
|
||||
def fit_transform(self, X, n_jobs=None):
|
||||
"""
|
||||
Fits the transform on `X` and transforms it.
|
||||
|
||||
|
|
|
@ -102,7 +102,7 @@ def reindex_labels(y):
|
|||
y = np.asarray(y)
|
||||
classnames = np.asarray(sorted(np.unique(y)))
|
||||
label2index = {label: index for index, label in enumerate(classnames)}
|
||||
indexed = np.empty(y.shape, dtype=np.int)
|
||||
indexed = np.empty(y.shape, dtype=int)
|
||||
for label in classnames:
|
||||
indexed[y==label] = label2index[label]
|
||||
return indexed, classnames
|
||||
|
@ -121,7 +121,7 @@ def binarize(y, pos_class):
|
|||
0 otherwise
|
||||
"""
|
||||
y = np.asarray(y)
|
||||
ybin = np.zeros(y.shape, dtype=np.int)
|
||||
ybin = np.zeros(y.shape, dtype=int)
|
||||
ybin[y == pos_class] = 1
|
||||
return ybin
|
||||
|
||||
|
|
|
@ -11,11 +11,6 @@ def from_name(err_name):
|
|||
"""
|
||||
assert err_name in ERROR_NAMES, f'unknown error {err_name}'
|
||||
callable_error = globals()[err_name]
|
||||
if err_name in QUANTIFICATION_ERROR_SMOOTH_NAMES:
|
||||
eps = __check_eps()
|
||||
def bound_callable_error(y_true, y_pred):
|
||||
return callable_error(y_true, y_pred, eps)
|
||||
return bound_callable_error
|
||||
return callable_error
|
||||
|
||||
|
||||
|
@ -215,12 +210,14 @@ def __check_eps(eps=None):
|
|||
|
||||
|
||||
CLASSIFICATION_ERROR = {f1e, acce}
|
||||
QUANTIFICATION_ERROR = {mae, mrae, mse, mkld, mnkld, ae, rae, se, kld, nkld}
|
||||
QUANTIFICATION_ERROR = {mae, mrae, mse, mkld, mnkld}
|
||||
QUANTIFICATION_ERROR_SINGLE = {ae, rae, se, kld, nkld}
|
||||
QUANTIFICATION_ERROR_SMOOTH = {kld, nkld, rae, mkld, mnkld, mrae}
|
||||
CLASSIFICATION_ERROR_NAMES = {func.__name__ for func in CLASSIFICATION_ERROR}
|
||||
QUANTIFICATION_ERROR_NAMES = {func.__name__ for func in QUANTIFICATION_ERROR}
|
||||
QUANTIFICATION_ERROR_SINGLE_NAMES = {func.__name__ for func in QUANTIFICATION_ERROR_SINGLE}
|
||||
QUANTIFICATION_ERROR_SMOOTH_NAMES = {func.__name__ for func in QUANTIFICATION_ERROR_SMOOTH}
|
||||
ERROR_NAMES = CLASSIFICATION_ERROR_NAMES | QUANTIFICATION_ERROR_NAMES
|
||||
ERROR_NAMES = CLASSIFICATION_ERROR_NAMES | QUANTIFICATION_ERROR_NAMES | QUANTIFICATION_ERROR_SINGLE_NAMES
|
||||
|
||||
f1_error = f1e
|
||||
acc_error = acce
|
||||
|
|
|
@ -1,296 +1,122 @@
|
|||
from typing import Union, Callable, Iterable
|
||||
import numpy as np
|
||||
from tqdm import tqdm
|
||||
import inspect
|
||||
|
||||
import quapy as qp
|
||||
from quapy.data import LabelledCollection
|
||||
from quapy.protocol import AbstractProtocol, OnLabelledCollectionProtocol, IterateProtocol
|
||||
from quapy.method.base import BaseQuantifier
|
||||
from quapy.util import temp_seed, _check_sample_size
|
||||
import quapy.functional as F
|
||||
import pandas as pd
|
||||
|
||||
|
||||
def artificial_prevalence_prediction(
|
||||
def prediction(
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
sample_size=None,
|
||||
n_prevpoints=101,
|
||||
n_repetitions=1,
|
||||
eval_budget: int = None,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
protocol: AbstractProtocol,
|
||||
aggr_speedup: Union[str, bool] = 'auto',
|
||||
verbose=False):
|
||||
"""
|
||||
Performs the predictions for all samples generated according to the Artificial Prevalence Protocol (APP).
|
||||
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
|
||||
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
|
||||
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
|
||||
[1, 0, 0] prevalence values of size `sample_size` will be considered). The number of samples for each valid
|
||||
combination of prevalence values is indicated by `repeats`.
|
||||
Uses a quantification model to generate predictions for the samples generated via a specific protocol.
|
||||
This function is central to all evaluation processes, and is endowed with an optimization to speed-up the
|
||||
prediction of protocols that generate samples from a large collection. The optimization applies to aggregative
|
||||
quantifiers only, and to OnLabelledCollectionProtocol protocols, and comes down to generating the classification
|
||||
predictions once and for all, and then generating samples over the classification predictions (instead of over
|
||||
the raw instances), so that the classifier prediction is never called again. This behaviour is obtained by
|
||||
setting `aggr_speedup` to 'auto' or True, and is only carried out if the overall process is convenient in terms
|
||||
of computations (e.g., if the number of classification predictions needed for the original collection exceed the
|
||||
number of classification predictions needed for all samples, then the optimization is not undertaken).
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test: the test set on which to perform APP
|
||||
:param sample_size: integer, the size of the samples; if None, then the sample size is
|
||||
taken from qp.environ['SAMPLE_SIZE']
|
||||
:param n_prevpoints: integer, the number of different prevalences to sample (or set to None if eval_budget
|
||||
is specified; default 101, i.e., steps of 1%)
|
||||
:param n_repetitions: integer, the number of repetitions for each prevalence (default 1)
|
||||
:param eval_budget: integer, if specified, sets a ceil on the number of evaluations to perform. For example, if
|
||||
there are 3 classes, `repeats=1`, and `eval_budget=20`, then `n_prevpoints` will be set to 5, since this
|
||||
will generate 15 different prevalence vectors ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0]) and
|
||||
since setting `n_prevpoints=6` would produce more than 20 evaluations.
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:param random_seed: integer, allows to replicate the samplings. The seed is local to the method and does not affect
|
||||
any other random process (default 42)
|
||||
:param verbose: if True, shows a progress bar
|
||||
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
|
||||
`(n_prevpoints*repeats)` and `n` the number of classes. The first one contains the true prevalence values
|
||||
for the samples generated while the second one contains the prevalence estimations
|
||||
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
|
||||
:param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of
|
||||
:class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the protocol
|
||||
in charge of generating the samples for which the model has to issue class prevalence predictions.
|
||||
:param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of
|
||||
instances in the original collection on which the protocol acts is larger than the number of instances
|
||||
in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is
|
||||
convenient or not. Set to False to deactivate.
|
||||
:param verbose: boolean, show or not information in stdout
|
||||
:return: a tuple `(true_prevs, estim_prevs)` in which each element in the tuple is an array of shape
|
||||
`(n_samples, n_classes)` containing the true, or predicted, prevalence values for each sample
|
||||
"""
|
||||
assert aggr_speedup in [False, True, 'auto', 'force'], 'invalid value for aggr_speedup'
|
||||
|
||||
sample_size = _check_sample_size(sample_size)
|
||||
n_prevpoints, _ = qp.evaluation._check_num_evals(test.n_classes, n_prevpoints, eval_budget, n_repetitions, verbose)
|
||||
sout = lambda x: print(x) if verbose else None
|
||||
|
||||
with temp_seed(random_seed):
|
||||
indexes = list(test.artificial_sampling_index_generator(sample_size, n_prevpoints, n_repetitions))
|
||||
apply_optimization = False
|
||||
|
||||
return _predict_from_indexes(indexes, model, test, n_jobs, verbose)
|
||||
if aggr_speedup in [True, 'auto', 'force']:
|
||||
# checks whether the prediction can be made more efficiently; this check consists in verifying if the model is
|
||||
# of type aggregative, if the protocol is based on LabelledCollection, and if the total number of documents to
|
||||
# classify using the protocol would exceed the number of test documents in the original collection
|
||||
from quapy.method.aggregative import AggregativeQuantifier
|
||||
if isinstance(model, AggregativeQuantifier) and isinstance(protocol, OnLabelledCollectionProtocol):
|
||||
if aggr_speedup == 'force':
|
||||
apply_optimization = True
|
||||
sout(f'forcing aggregative speedup')
|
||||
elif hasattr(protocol, 'sample_size'):
|
||||
nD = len(protocol.get_labelled_collection())
|
||||
samplesD = protocol.total() * protocol.sample_size
|
||||
if nD < samplesD:
|
||||
apply_optimization = True
|
||||
sout(f'speeding up the prediction for the aggregative quantifier, '
|
||||
f'total classifications {nD} instead of {samplesD}')
|
||||
|
||||
|
||||
def natural_prevalence_prediction(
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
sample_size=None,
|
||||
repeats=100,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
verbose=False):
|
||||
"""
|
||||
Performs the predictions for all samples generated according to the Natural Prevalence Protocol (NPP).
|
||||
The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural
|
||||
prevalence of the collection.
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test: the test set on which to perform NPP
|
||||
:param sample_size: integer, the size of the samples; if None, then the sample size is
|
||||
taken from qp.environ['SAMPLE_SIZE']
|
||||
:param repeats: integer, the number of samples to generate (default 100)
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect
|
||||
any other random process (default 42)
|
||||
:param verbose: if True, shows a progress bar
|
||||
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
|
||||
`(repeats)` and `n` the number of classes. The first one contains the true prevalence values
|
||||
for the samples generated while the second one contains the prevalence estimations
|
||||
"""
|
||||
|
||||
sample_size = _check_sample_size(sample_size)
|
||||
with temp_seed(random_seed):
|
||||
indexes = list(test.natural_sampling_index_generator(sample_size, repeats))
|
||||
|
||||
return _predict_from_indexes(indexes, model, test, n_jobs, verbose)
|
||||
|
||||
|
||||
def gen_prevalence_prediction(model: BaseQuantifier, gen_fn: Callable, eval_budget=None):
|
||||
"""
|
||||
Generates prevalence predictions for a custom protocol defined as a generator function that yields
|
||||
samples at each iteration. The sequence of samples is processed exhaustively if `eval_budget=None`
|
||||
or up to the `eval_budget` iterations if specified.
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param gen_fn: a generator function yielding one sample at each iteration
|
||||
:param eval_budget: a maximum number of evaluations to run. Set to None (default) for exploring the
|
||||
entire sequence
|
||||
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
|
||||
generated and `n` the number of classes. The first one contains the true prevalence values
|
||||
for the samples generated while the second one contains the prevalence estimations
|
||||
"""
|
||||
if not inspect.isgenerator(gen_fn()):
|
||||
raise ValueError('param "gen_fun" is not a callable returning a generator')
|
||||
|
||||
if not isinstance(eval_budget, int):
|
||||
eval_budget = -1
|
||||
|
||||
true_prevalences, estim_prevalences = [], []
|
||||
for sample_instances, true_prev in gen_fn():
|
||||
true_prevalences.append(true_prev)
|
||||
estim_prevalences.append(model.quantify(sample_instances))
|
||||
eval_budget -= 1
|
||||
if eval_budget == 0:
|
||||
break
|
||||
|
||||
true_prevalences = np.asarray(true_prevalences)
|
||||
estim_prevalences = np.asarray(estim_prevalences)
|
||||
|
||||
return true_prevalences, estim_prevalences
|
||||
|
||||
|
||||
def _predict_from_indexes(
|
||||
indexes,
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
n_jobs=1,
|
||||
verbose=False):
|
||||
|
||||
if model.aggregative: #isinstance(model, qp.method.aggregative.AggregativeQuantifier):
|
||||
# print('\tinstance of aggregative-quantifier')
|
||||
quantification_func = model.aggregate
|
||||
if model.probabilistic: # isinstance(model, qp.method.aggregative.AggregativeProbabilisticQuantifier):
|
||||
# print('\t\tinstance of probabilitstic-aggregative-quantifier')
|
||||
preclassified_instances = model.posterior_probabilities(test.instances)
|
||||
else:
|
||||
# print('\t\tinstance of hard-aggregative-quantifier')
|
||||
preclassified_instances = model.classify(test.instances)
|
||||
test = LabelledCollection(preclassified_instances, test.labels)
|
||||
if apply_optimization:
|
||||
pre_classified = model.classify(protocol.get_labelled_collection().instances)
|
||||
protocol_with_predictions = protocol.on_preclassified_instances(pre_classified)
|
||||
return __prediction_helper(model.aggregate, protocol_with_predictions, verbose)
|
||||
else:
|
||||
# print('\t\tinstance of base-quantifier')
|
||||
quantification_func = model.quantify
|
||||
|
||||
def _predict_prevalences(index):
|
||||
sample = test.sampling_from_index(index)
|
||||
true_prevalence = sample.prevalence()
|
||||
estim_prevalence = quantification_func(sample.instances)
|
||||
return true_prevalence, estim_prevalence
|
||||
|
||||
pbar = tqdm(indexes, desc='[artificial sampling protocol] generating predictions') if verbose else indexes
|
||||
results = qp.util.parallel(_predict_prevalences, pbar, n_jobs=n_jobs)
|
||||
|
||||
true_prevalences, estim_prevalences = zip(*results)
|
||||
true_prevalences = np.asarray(true_prevalences)
|
||||
estim_prevalences = np.asarray(estim_prevalences)
|
||||
|
||||
return true_prevalences, estim_prevalences
|
||||
return __prediction_helper(model.quantify, protocol, verbose)
|
||||
|
||||
|
||||
def artificial_prevalence_report(
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
sample_size=None,
|
||||
n_prevpoints=101,
|
||||
n_repetitions=1,
|
||||
eval_budget: int = None,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
error_metrics:Iterable[Union[str,Callable]]='mae',
|
||||
verbose=False):
|
||||
def __prediction_helper(quantification_fn, protocol: AbstractProtocol, verbose=False):
|
||||
true_prevs, estim_prevs = [], []
|
||||
for sample_instances, sample_prev in tqdm(protocol(), total=protocol.total(), desc='predicting') if verbose else protocol():
|
||||
estim_prevs.append(quantification_fn(sample_instances))
|
||||
true_prevs.append(sample_prev)
|
||||
|
||||
true_prevs = np.asarray(true_prevs)
|
||||
estim_prevs = np.asarray(estim_prevs)
|
||||
|
||||
return true_prevs, estim_prevs
|
||||
|
||||
|
||||
def evaluation_report(model: BaseQuantifier,
|
||||
protocol: AbstractProtocol,
|
||||
error_metrics: Iterable[Union[str,Callable]] = 'mae',
|
||||
aggr_speedup: Union[str, bool] = 'auto',
|
||||
verbose=False):
|
||||
"""
|
||||
Generates an evaluation report for all samples generated according to the Artificial Prevalence Protocol (APP).
|
||||
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
|
||||
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
|
||||
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
|
||||
[1, 0, 0] prevalence values of size `sample_size` will be considered). The number of samples for each valid
|
||||
combination of prevalence values is indicated by `repeats`.
|
||||
Te report takes the form of a
|
||||
pandas' `dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
|
||||
in which the rows correspond to different samples, and the columns inform of the true prevalence values,
|
||||
the estimated prevalence values, and the score obtained by each of the evaluation measures indicated.
|
||||
Generates a report (a pandas' DataFrame) containing information of the evaluation of the model as according
|
||||
to a specific protocol and in terms of one or more evaluation metrics (errors).
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test: the test set on which to perform APP
|
||||
:param sample_size: integer, the size of the samples; if None, then the sample size is
|
||||
taken from qp.environ['SAMPLE_SIZE']
|
||||
:param n_prevpoints: integer, the number of different prevalences to sample (or set to None if eval_budget
|
||||
is specified; default 101, i.e., steps of 1%)
|
||||
:param n_repetitions: integer, the number of repetitions for each prevalence (default 1)
|
||||
:param eval_budget: integer, if specified, sets a ceil on the number of evaluations to perform. For example, if
|
||||
there are 3 classes, `repeats=1`, and `eval_budget=20`, then `n_prevpoints` will be set to 5, since this
|
||||
will generate 15 different prevalence vectors ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0]) and
|
||||
since setting `n_prevpoints=6` would produce more than 20 evaluations.
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:param random_seed: integer, allows to replicate the samplings. The seed is local to the method and does not affect
|
||||
any other random process (default 42)
|
||||
:param error_metrics: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
|
||||
callable error function; optionally, a list of strings or callables can be indicated, if the results
|
||||
are to be evaluated with more than one error metric. Default is "mae"
|
||||
:param verbose: if True, shows a progress bar
|
||||
:return: pandas' dataframe with rows corresponding to different samples, and with columns informing of the
|
||||
true prevalence values, the estimated prevalence values, and the score obtained by each of the evaluation
|
||||
measures indicated.
|
||||
|
||||
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
|
||||
:param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of
|
||||
:class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the protocol
|
||||
in charge of generating the samples in which the model is evaluated.
|
||||
:param error_metrics: a string, or list of strings, representing the name(s) of an error function in `qp.error`
|
||||
(e.g., 'mae', the default value), or a callable function, or a list of callable functions, implementing
|
||||
the error function itself.
|
||||
:param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of
|
||||
instances in the original collection on which the protocol acts is larger than the number of instances
|
||||
in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is
|
||||
convenient or not. Set to False to deactivate.
|
||||
:param verbose: boolean, show or not information in stdout
|
||||
:return: a pandas' DataFrame containing the columns 'true-prev' (the true prevalence of each sample),
|
||||
'estim-prev' (the prevalence estimated by the model for each sample), and as many columns as error metrics
|
||||
have been indicated, each displaying the score in terms of that metric for every sample.
|
||||
"""
|
||||
|
||||
true_prevs, estim_prevs = artificial_prevalence_prediction(
|
||||
model, test, sample_size, n_prevpoints, n_repetitions, eval_budget, n_jobs, random_seed, verbose
|
||||
)
|
||||
true_prevs, estim_prevs = prediction(model, protocol, aggr_speedup=aggr_speedup, verbose=verbose)
|
||||
return _prevalence_report(true_prevs, estim_prevs, error_metrics)
|
||||
|
||||
|
||||
def natural_prevalence_report(
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
sample_size=None,
|
||||
repeats=100,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
error_metrics:Iterable[Union[str,Callable]]='mae',
|
||||
verbose=False):
|
||||
"""
|
||||
Generates an evaluation report for all samples generated according to the Natural Prevalence Protocol (NPP).
|
||||
The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural
|
||||
prevalence of the collection.
|
||||
Te report takes the form of a
|
||||
pandas' `dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
|
||||
in which the rows correspond to different samples, and the columns inform of the true prevalence values,
|
||||
the estimated prevalence values, and the score obtained by each of the evaluation measures indicated.
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test: the test set on which to perform NPP
|
||||
:param sample_size: integer, the size of the samples; if None, then the sample size is
|
||||
taken from qp.environ['SAMPLE_SIZE']
|
||||
:param repeats: integer, the number of samples to generate (default 100)
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect
|
||||
any other random process (default 42)
|
||||
:param error_metrics: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
|
||||
callable error function; optionally, a list of strings or callables can be indicated, if the results
|
||||
are to be evaluated with more than one error metric. Default is "mae"
|
||||
:param verbose: if True, shows a progress bar
|
||||
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
|
||||
`(repeats)` and `n` the number of classes. The first one contains the true prevalence values
|
||||
for the samples generated while the second one contains the prevalence estimations
|
||||
|
||||
"""
|
||||
sample_size = _check_sample_size(sample_size)
|
||||
true_prevs, estim_prevs = natural_prevalence_prediction(
|
||||
model, test, sample_size, repeats, n_jobs, random_seed, verbose
|
||||
)
|
||||
return _prevalence_report(true_prevs, estim_prevs, error_metrics)
|
||||
|
||||
|
||||
def gen_prevalence_report(model: BaseQuantifier, gen_fn: Callable, eval_budget=None,
|
||||
error_metrics:Iterable[Union[str,Callable]]='mae'):
|
||||
"""
|
||||
GGenerates an evaluation report for a custom protocol defined as a generator function that yields
|
||||
samples at each iteration. The sequence of samples is processed exhaustively if `eval_budget=None`
|
||||
or up to the `eval_budget` iterations if specified.
|
||||
Te report takes the form of a
|
||||
pandas' `dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
|
||||
in which the rows correspond to different samples, and the columns inform of the true prevalence values,
|
||||
the estimated prevalence values, and the score obtained by each of the evaluation measures indicated.
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param gen_fn: a generator function yielding one sample at each iteration
|
||||
:param eval_budget: a maximum number of evaluations to run. Set to None (default) for exploring the
|
||||
entire sequence
|
||||
:return: a tuple containing two `np.ndarrays` of shape `(m,n,)` with `m` the number of samples
|
||||
generated. The first one contains the true prevalence values
|
||||
for the samples generated while the second one contains the prevalence estimations
|
||||
"""
|
||||
true_prevs, estim_prevs = gen_prevalence_prediction(model, gen_fn, eval_budget)
|
||||
return _prevalence_report(true_prevs, estim_prevs, error_metrics)
|
||||
|
||||
|
||||
def _prevalence_report(
|
||||
true_prevs,
|
||||
estim_prevs,
|
||||
error_metrics: Iterable[Union[str, Callable]] = 'mae'):
|
||||
def _prevalence_report(true_prevs, estim_prevs, error_metrics: Iterable[Union[str, Callable]] = 'mae'):
|
||||
|
||||
if isinstance(error_metrics, str):
|
||||
error_metrics = [error_metrics]
|
||||
|
||||
error_names = [e if isinstance(e, str) else e.__name__ for e in error_metrics]
|
||||
error_funcs = [qp.error.from_name(e) if isinstance(e, str) else e for e in error_metrics]
|
||||
assert all(hasattr(e, '__call__') for e in error_funcs), 'invalid error functions'
|
||||
error_names = [e.__name__ for e in error_funcs]
|
||||
|
||||
df = pd.DataFrame(columns=['true-prev', 'estim-prev'] + error_names)
|
||||
for true_prev, estim_prev in zip(true_prevs, estim_prevs):
|
||||
|
@ -303,145 +129,59 @@ def _prevalence_report(
|
|||
return df
|
||||
|
||||
|
||||
def artificial_prevalence_protocol(
|
||||
def evaluate(
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
sample_size=None,
|
||||
n_prevpoints=101,
|
||||
repeats=1,
|
||||
eval_budget: int = None,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
error_metric:Union[str,Callable]='mae',
|
||||
protocol: AbstractProtocol,
|
||||
error_metric: Union[str, Callable],
|
||||
aggr_speedup: Union[str, bool] = 'auto',
|
||||
verbose=False):
|
||||
"""
|
||||
Generates samples according to the Artificial Prevalence Protocol (APP).
|
||||
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
|
||||
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
|
||||
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
|
||||
[1, 0, 0] prevalence values of size `sample_size` will be considered). The number of samples for each valid
|
||||
combination of prevalence values is indicated by `repeats`.
|
||||
Evaluates a quantification model according to a specific sample generation protocol and in terms of one
|
||||
evaluation metric (error).
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test: the test set on which to perform APP
|
||||
:param sample_size: integer, the size of the samples; if None, then the sample size is
|
||||
taken from qp.environ['SAMPLE_SIZE']
|
||||
:param n_prevpoints: integer, the number of different prevalences to sample (or set to None if eval_budget
|
||||
is specified; default 101, i.e., steps of 1%)
|
||||
:param repeats: integer, the number of repetitions for each prevalence (default 1)
|
||||
:param eval_budget: integer, if specified, sets a ceil on the number of evaluations to perform. For example, if
|
||||
there are 3 classes, `repeats=1`, and `eval_budget=20`, then `n_prevpoints` will be set to 5, since this
|
||||
will generate 15 different prevalence vectors ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0]) and
|
||||
since setting `n_prevpoints=6` would produce more than 20 evaluations.
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:param random_seed: integer, allows to replicate the samplings. The seed is local to the method and does not affect
|
||||
any other random process (default 42)
|
||||
:param error_metric: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
|
||||
callable error function
|
||||
:param verbose: set to True (default False) for displaying some information on standard output
|
||||
:return: yields one sample at a time
|
||||
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
|
||||
:param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of
|
||||
:class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the
|
||||
protocol in charge of generating the samples in which the model is evaluated.
|
||||
:param error_metric: a string representing the name(s) of an error function in `qp.error`
|
||||
(e.g., 'mae'), or a callable function implementing the error function itself.
|
||||
:param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of
|
||||
instances in the original collection on which the protocol acts is larger than the number of instances
|
||||
in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is
|
||||
convenient or not. Set to False to deactivate.
|
||||
:param verbose: boolean, show or not information in stdout
|
||||
:return: if the error metric is not averaged (e.g., 'ae', 'rae'), returns an array of shape `(n_samples,)` with
|
||||
the error scores for each sample; if the error metric is averaged (e.g., 'mae', 'mrae') then returns
|
||||
a single float
|
||||
"""
|
||||
|
||||
if isinstance(error_metric, str):
|
||||
error_metric = qp.error.from_name(error_metric)
|
||||
|
||||
assert hasattr(error_metric, '__call__'), 'invalid error function'
|
||||
|
||||
true_prevs, estim_prevs = artificial_prevalence_prediction(
|
||||
model, test, sample_size, n_prevpoints, repeats, eval_budget, n_jobs, random_seed, verbose
|
||||
)
|
||||
|
||||
true_prevs, estim_prevs = prediction(model, protocol, aggr_speedup=aggr_speedup, verbose=verbose)
|
||||
return error_metric(true_prevs, estim_prevs)
|
||||
|
||||
|
||||
def natural_prevalence_protocol(
|
||||
def evaluate_on_samples(
|
||||
model: BaseQuantifier,
|
||||
test: LabelledCollection,
|
||||
sample_size=None,
|
||||
repeats=100,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
error_metric:Union[str,Callable]='mae',
|
||||
samples: Iterable[qp.data.LabelledCollection],
|
||||
error_metric: Union[str, Callable],
|
||||
verbose=False):
|
||||
"""
|
||||
Generates samples according to the Natural Prevalence Protocol (NPP).
|
||||
The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural
|
||||
prevalence of the collection.
|
||||
Evaluates a quantification model on a given set of samples and in terms of one evaluation metric (error).
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test: the test set on which to perform NPP
|
||||
:param sample_size: integer, the size of the samples; if None, then the sample size is
|
||||
taken from qp.environ['SAMPLE_SIZE']
|
||||
:param repeats: integer, the number of samples to generate
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect
|
||||
any other random process (default 42)
|
||||
:param error_metric: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
|
||||
callable error function
|
||||
:param verbose: if True, shows a progress bar
|
||||
:return: yields one sample at a time
|
||||
:param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`
|
||||
:param samples: a list of samples on which the quantifier is to be evaluated
|
||||
:param error_metric: a string representing the name(s) of an error function in `qp.error`
|
||||
(e.g., 'mae'), or a callable function implementing the error function itself.
|
||||
:param verbose: boolean, show or not information in stdout
|
||||
:return: if the error metric is not averaged (e.g., 'ae', 'rae'), returns an array of shape `(n_samples,)` with
|
||||
the error scores for each sample; if the error metric is averaged (e.g., 'mae', 'mrae') then returns
|
||||
a single float
|
||||
"""
|
||||
|
||||
if isinstance(error_metric, str):
|
||||
error_metric = qp.error.from_name(error_metric)
|
||||
|
||||
assert hasattr(error_metric, '__call__'), 'invalid error function'
|
||||
|
||||
true_prevs, estim_prevs = natural_prevalence_prediction(
|
||||
model, test, sample_size, repeats, n_jobs, random_seed, verbose
|
||||
)
|
||||
|
||||
return error_metric(true_prevs, estim_prevs)
|
||||
return evaluate(model, IterateProtocol(samples), error_metric, aggr_speedup=False, verbose=verbose)
|
||||
|
||||
|
||||
def evaluate(model: BaseQuantifier, test_samples:Iterable[LabelledCollection], error_metric:Union[str, Callable], n_jobs:int=-1):
|
||||
"""
|
||||
Evaluates a model on a sequence of test samples in terms of a given error metric.
|
||||
|
||||
:param model: the model in charge of generating the class prevalence estimations
|
||||
:param test_samples: an iterable yielding one sample at a time
|
||||
:param error_metric: a string indicating the name of the error (as defined in :mod:`quapy.error`) or a
|
||||
callable error function
|
||||
:param n_jobs: integer, number of jobs to be run in parallel (default 1)
|
||||
:return: the score obtained using `error_metric`
|
||||
"""
|
||||
if isinstance(error_metric, str):
|
||||
error_metric = qp.error.from_name(error_metric)
|
||||
scores = qp.util.parallel(_delayed_eval, ((model, Ti, error_metric) for Ti in test_samples), n_jobs=n_jobs)
|
||||
return np.mean(scores)
|
||||
|
||||
|
||||
def _delayed_eval(args):
|
||||
model, test, error = args
|
||||
prev_estim = model.quantify(test.instances)
|
||||
prev_true = test.prevalence()
|
||||
return error(prev_true, prev_estim)
|
||||
|
||||
|
||||
def _check_num_evals(n_classes, n_prevpoints=None, eval_budget=None, repeats=1, verbose=False):
|
||||
if n_prevpoints is None and eval_budget is None:
|
||||
raise ValueError('either n_prevpoints or eval_budget has to be specified')
|
||||
elif n_prevpoints is None:
|
||||
assert eval_budget > 0, 'eval_budget must be a positive integer'
|
||||
n_prevpoints = F.get_nprevpoints_approximation(eval_budget, n_classes, repeats)
|
||||
eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
|
||||
if verbose:
|
||||
print(f'setting n_prevpoints={n_prevpoints} so that the number of '
|
||||
f'evaluations ({eval_computations}) does not exceed the evaluation '
|
||||
f'budget ({eval_budget})')
|
||||
elif eval_budget is None:
|
||||
eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
|
||||
if verbose:
|
||||
print(f'{eval_computations} evaluations will be performed for each '
|
||||
f'combination of hyper-parameters')
|
||||
else:
|
||||
eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
|
||||
if eval_computations > eval_budget:
|
||||
n_prevpoints = F.get_nprevpoints_approximation(eval_budget, n_classes, repeats)
|
||||
new_eval_computations = F.num_prevalence_combinations(n_prevpoints, n_classes, repeats)
|
||||
if verbose:
|
||||
print(f'the budget of evaluations would be exceeded with '
|
||||
f'n_prevpoints={n_prevpoints}. Chaning to n_prevpoints={n_prevpoints}. This will produce '
|
||||
f'{new_eval_computations} evaluation computations for each hyper-parameter combination.')
|
||||
return n_prevpoints, eval_computations
|
||||
|
||||
|
|
|
@ -4,37 +4,6 @@ import scipy
|
|||
import numpy as np
|
||||
|
||||
|
||||
def artificial_prevalence_sampling(dimensions, n_prevalences=21, repeat=1, return_constrained_dim=False):
|
||||
"""
|
||||
Generates vectors of prevalence values artificially drawn from an exhaustive grid of prevalence values. The
|
||||
number of prevalence values explored for each dimension depends on `n_prevalences`, so that, if, for example,
|
||||
`n_prevalences=11` then the prevalence values of the grid are taken from [0, 0.1, 0.2, ..., 0.9, 1]. Only
|
||||
valid prevalence distributions are returned, i.e., vectors of prevalence values that sum up to 1. For each
|
||||
valid vector of prevalence values, `repeat` copies are returned. The vector of prevalence values can be
|
||||
implicit (by setting `return_constrained_dim=False`), meaning that the last dimension (which is constrained
|
||||
to 1 - sum of the rest) is not returned (note that, quite obviously, in this case the vector does not sum up to 1).
|
||||
|
||||
:param dimensions: the number of classes
|
||||
:param n_prevalences: the number of equidistant prevalence points to extract from the [0,1] interval for the grid
|
||||
(default is 21)
|
||||
:param repeat: number of copies for each valid prevalence vector (default is 1)
|
||||
:param return_constrained_dim: set to True to return all dimensions, or to False (default) for ommitting the
|
||||
constrained dimension
|
||||
:return: a `np.ndarray` of shape `(n, dimensions)` if `return_constrained_dim=True` or of shape `(n, dimensions-1)`
|
||||
if `return_constrained_dim=False`, where `n` is the number of valid combinations found in the grid multiplied
|
||||
by `repeat`
|
||||
"""
|
||||
s = np.linspace(0., 1., n_prevalences, endpoint=True)
|
||||
s = [s] * (dimensions - 1)
|
||||
prevs = [p for p in itertools.product(*s, repeat=1) if sum(p)<=1]
|
||||
if return_constrained_dim:
|
||||
prevs = [p+(1-sum(p),) for p in prevs]
|
||||
prevs = np.asarray(prevs).reshape(len(prevs), -1)
|
||||
if repeat>1:
|
||||
prevs = np.repeat(prevs, repeat, axis=0)
|
||||
return prevs
|
||||
|
||||
|
||||
def prevalence_linspace(n_prevalences=21, repeats=1, smooth_limits_epsilon=0.01):
|
||||
"""
|
||||
Produces an array of uniformly separated values of prevalence.
|
||||
|
@ -70,7 +39,7 @@ def prevalence_from_labels(labels, classes):
|
|||
raise ValueError(f'param labels does not seem to be a ndarray of label predictions')
|
||||
unique, counts = np.unique(labels, return_counts=True)
|
||||
by_class = defaultdict(lambda:0, dict(zip(unique, counts)))
|
||||
prevalences = np.asarray([by_class[class_] for class_ in classes], dtype=np.float)
|
||||
prevalences = np.asarray([by_class[class_] for class_ in classes], dtype=float)
|
||||
prevalences /= prevalences.sum()
|
||||
return prevalences
|
||||
|
||||
|
@ -101,7 +70,7 @@ def HellingerDistance(P, Q):
|
|||
The HD for two discrete distributions of `k` bins is defined as:
|
||||
|
||||
.. math::
|
||||
HD(P,Q) = \\frac{ 1 }{ \\sqrt{ 2 } } \\sqrt{ \sum_{i=1}^k ( \\sqrt{p_i} - \\sqrt{q_i} )^2 }
|
||||
HD(P,Q) = \\frac{ 1 }{ \\sqrt{ 2 } } \\sqrt{ \\sum_{i=1}^k ( \\sqrt{p_i} - \\sqrt{q_i} )^2 }
|
||||
|
||||
:param P: real-valued array-like of shape `(k,)` representing a discrete distribution
|
||||
:param Q: real-valued array-like of shape `(k,)` representing a discrete distribution
|
||||
|
@ -110,6 +79,22 @@ def HellingerDistance(P, Q):
|
|||
return np.sqrt(np.sum((np.sqrt(P) - np.sqrt(Q))**2))
|
||||
|
||||
|
||||
def TopsoeDistance(P, Q, epsilon=1e-20):
|
||||
"""
|
||||
Topsoe distance between two (discretized) distributions `P` and `Q`.
|
||||
The Topsoe distance for two discrete distributions of `k` bins is defined as:
|
||||
|
||||
.. math::
|
||||
Topsoe(P,Q) = \\sum_{i=1}^k \\left( p_i \\log\\left(\\frac{ 2 p_i + \\epsilon }{ p_i+q_i+\\epsilon }\\right) +
|
||||
q_i \\log\\left(\\frac{ 2 q_i + \\epsilon }{ p_i+q_i+\\epsilon }\\right) \\right)
|
||||
|
||||
:param P: real-valued array-like of shape `(k,)` representing a discrete distribution
|
||||
:param Q: real-valued array-like of shape `(k,)` representing a discrete distribution
|
||||
:return: float
|
||||
"""
|
||||
return np.sum(P*np.log((2*P+epsilon)/(P+Q+epsilon)) + Q*np.log((2*Q+epsilon)/(P+Q+epsilon)))
|
||||
|
||||
|
||||
def uniform_prevalence_sampling(n_classes, size=1):
|
||||
"""
|
||||
Implements the `Kraemer algorithm <http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf>`_
|
||||
|
@ -161,7 +146,6 @@ def adjusted_quantification(prevalence_estim, tpr, fpr, clip=True):
|
|||
.. math::
|
||||
ACC(p) = \\frac{ p - fpr }{ tpr - fpr }
|
||||
|
||||
|
||||
:param prevalence_estim: float, the estimated value for the positive class
|
||||
:param tpr: float, the true positive rate of the classifier
|
||||
:param fpr: float, the false positive rate of the classifier
|
||||
|
@ -209,7 +193,7 @@ def __num_prevalence_combinations_depr(n_prevpoints:int, n_classes:int, n_repeat
|
|||
:param n_prevpoints: integer, number of prevalence points.
|
||||
:param n_repeats: integer, number of repetitions for each prevalence combination
|
||||
:return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the
|
||||
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
|
||||
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
|
||||
"""
|
||||
__cache={}
|
||||
def __f(nc,np):
|
||||
|
@ -241,7 +225,7 @@ def num_prevalence_combinations(n_prevpoints:int, n_classes:int, n_repeats:int=1
|
|||
:param n_prevpoints: integer, number of prevalence points.
|
||||
:param n_repeats: integer, number of repetitions for each prevalence combination
|
||||
:return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the
|
||||
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
|
||||
number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
|
||||
"""
|
||||
N = n_prevpoints-1
|
||||
C = n_classes
|
||||
|
@ -255,7 +239,7 @@ def get_nprevpoints_approximation(combinations_budget:int, n_classes:int, n_repe
|
|||
that the number of valid prevalence values generated as combinations of prevalence points (points in a
|
||||
`n_classes`-dimensional simplex) do not exceed combinations_budget.
|
||||
|
||||
:param combinations_budget: integer, maximum number of combinatios allowed
|
||||
:param combinations_budget: integer, maximum number of combinations allowed
|
||||
:param n_classes: integer, number of classes
|
||||
:param n_repeats: integer, number of repetitions for each prevalence combination
|
||||
:return: the largest number of prevalence points that generate less than combinations_budget valid prevalences
|
||||
|
@ -269,3 +253,26 @@ def get_nprevpoints_approximation(combinations_budget:int, n_classes:int, n_repe
|
|||
else:
|
||||
n_prevpoints += 1
|
||||
|
||||
|
||||
def check_prevalence_vector(p, raise_exception=False, toleranze=1e-08):
|
||||
"""
|
||||
Checks that p is a valid prevalence vector, i.e., that it contains values in [0,1] and that the values sum up to 1.
|
||||
|
||||
:param p: the prevalence vector to check
|
||||
:return: True if `p` is valid, False otherwise
|
||||
"""
|
||||
p = np.asarray(p)
|
||||
if not all(p>=0):
|
||||
if raise_exception:
|
||||
raise ValueError('the prevalence vector contains negative numbers')
|
||||
return False
|
||||
if not all(p<=1):
|
||||
if raise_exception:
|
||||
raise ValueError('the prevalence vector contains values >1')
|
||||
return False
|
||||
if not np.isclose(p.sum(), 1, atol=toleranze):
|
||||
if raise_exception:
|
||||
raise ValueError('the prevalence vector does not sum up to 1')
|
||||
return False
|
||||
return True
|
||||
|
||||
|
|
|
@ -3,15 +3,6 @@ from . import base
|
|||
from . import meta
|
||||
from . import non_aggregative
|
||||
|
||||
EXPLICIT_LOSS_MINIMIZATION_METHODS = {
|
||||
aggregative.ELM,
|
||||
aggregative.SVMQ,
|
||||
aggregative.SVMAE,
|
||||
aggregative.SVMKLD,
|
||||
aggregative.SVMRAE,
|
||||
aggregative.SVMNKLD
|
||||
}
|
||||
|
||||
AGGREGATIVE_METHODS = {
|
||||
aggregative.CC,
|
||||
aggregative.ACC,
|
||||
|
@ -19,12 +10,14 @@ AGGREGATIVE_METHODS = {
|
|||
aggregative.PACC,
|
||||
aggregative.EMQ,
|
||||
aggregative.HDy,
|
||||
aggregative.DyS,
|
||||
aggregative.SMM,
|
||||
aggregative.X,
|
||||
aggregative.T50,
|
||||
aggregative.MAX,
|
||||
aggregative.MS,
|
||||
aggregative.MS2,
|
||||
} | EXPLICIT_LOSS_MINIMIZATION_METHODS
|
||||
}
|
||||
|
||||
|
||||
NON_AGGREGATIVE_METHODS = {
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,11 +1,17 @@
|
|||
from abc import ABCMeta, abstractmethod
|
||||
from copy import deepcopy
|
||||
|
||||
from joblib import Parallel, delayed
|
||||
from sklearn.base import BaseEstimator
|
||||
|
||||
import quapy as qp
|
||||
from quapy.data import LabelledCollection
|
||||
import numpy as np
|
||||
|
||||
|
||||
# Base Quantifier abstract class
|
||||
# ------------------------------------
|
||||
class BaseQuantifier(metaclass=ABCMeta):
|
||||
class BaseQuantifier(BaseEstimator):
|
||||
"""
|
||||
Abstract Quantifier. A quantifier is defined as an object of a class that implements the method :meth:`fit` on
|
||||
:class:`quapy.data.base.LabelledCollection`, the method :meth:`quantify`, and the :meth:`set_params` and
|
||||
|
@ -28,79 +34,10 @@ class BaseQuantifier(metaclass=ABCMeta):
|
|||
Generate class prevalence estimates for the sample's instances
|
||||
|
||||
:param instances: array-like
|
||||
:return: `np.ndarray` of shape `(self.n_classes_,)` with class prevalence estimates.
|
||||
:return: `np.ndarray` of shape `(n_classes,)` with class prevalence estimates.
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
def set_params(self, **parameters):
|
||||
"""
|
||||
Set the parameters of the quantifier.
|
||||
|
||||
:param parameters: dictionary of param-value pairs
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
def get_params(self, deep=True):
|
||||
"""
|
||||
Return the current parameters of the quantifier.
|
||||
|
||||
:param deep: for compatibility with sklearn
|
||||
:return: a dictionary of param-value pairs
|
||||
"""
|
||||
...
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def classes_(self):
|
||||
"""
|
||||
Class labels, in the same order in which class prevalence values are to be computed.
|
||||
|
||||
:return: array-like
|
||||
"""
|
||||
...
|
||||
|
||||
@property
|
||||
def n_classes(self):
|
||||
"""
|
||||
Returns the number of classes
|
||||
|
||||
:return: integer
|
||||
"""
|
||||
return len(self.classes_)
|
||||
|
||||
# these methods allows meta-learners to reimplement the decision based on their constituents, and not
|
||||
# based on class structure
|
||||
@property
|
||||
def binary(self):
|
||||
"""
|
||||
Indicates whether the quantifier is binary or not.
|
||||
|
||||
:return: False (to be overridden)
|
||||
"""
|
||||
return False
|
||||
|
||||
@property
|
||||
def aggregative(self):
|
||||
"""
|
||||
Indicates whether the quantifier is of type aggregative or not
|
||||
|
||||
:return: False (to be overridden)
|
||||
"""
|
||||
|
||||
return False
|
||||
|
||||
@property
|
||||
def probabilistic(self):
|
||||
"""
|
||||
Indicates whether the quantifier is of type probabilistic or not
|
||||
|
||||
:return: False (to be overridden)
|
||||
"""
|
||||
|
||||
return False
|
||||
|
||||
|
||||
class BinaryQuantifier(BaseQuantifier):
|
||||
"""
|
||||
|
@ -112,90 +49,61 @@ class BinaryQuantifier(BaseQuantifier):
|
|||
assert data.binary, f'{quantifier_name} works only on problems of binary classification. ' \
|
||||
f'Use the class OneVsAll to enable {quantifier_name} work on single-label data.'
|
||||
|
||||
|
||||
class OneVsAll:
|
||||
pass
|
||||
|
||||
|
||||
def newOneVsAll(binary_quantifier, n_jobs=None):
|
||||
assert isinstance(binary_quantifier, BaseQuantifier), \
|
||||
f'{binary_quantifier} does not seem to be a Quantifier'
|
||||
if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):
|
||||
return qp.method.aggregative.OneVsAllAggregative(binary_quantifier, n_jobs)
|
||||
else:
|
||||
return OneVsAllGeneric(binary_quantifier, n_jobs)
|
||||
|
||||
|
||||
class OneVsAllGeneric(OneVsAll,BaseQuantifier):
|
||||
"""
|
||||
Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary
|
||||
quantifier for each class, and then l1-normalizes the outputs so that the class prevelence values sum up to 1.
|
||||
"""
|
||||
|
||||
def __init__(self, binary_quantifier, n_jobs=None):
|
||||
assert isinstance(binary_quantifier, BaseQuantifier), \
|
||||
f'{binary_quantifier} does not seem to be a Quantifier'
|
||||
if isinstance(binary_quantifier, qp.method.aggregative.AggregativeQuantifier):
|
||||
print('[warning] the quantifier seems to be an instance of qp.method.aggregative.AggregativeQuantifier; '
|
||||
f'you might prefer instantiating {qp.method.aggregative.OneVsAllAggregative.__name__}')
|
||||
self.binary_quantifier = binary_quantifier
|
||||
self.n_jobs = qp._get_njobs(n_jobs)
|
||||
|
||||
def fit(self, data: LabelledCollection, fit_classifier=True):
|
||||
assert not data.binary, f'{self.__class__.__name__} expect non-binary data'
|
||||
assert fit_classifier == True, 'fit_classifier must be True'
|
||||
|
||||
self.dict_binary_quantifiers = {c: deepcopy(self.binary_quantifier) for c in data.classes_}
|
||||
self._parallel(self._delayed_binary_fit, data)
|
||||
return self
|
||||
|
||||
def _parallel(self, func, *args, **kwargs):
|
||||
return np.asarray(
|
||||
Parallel(n_jobs=self.n_jobs, backend='threading')(
|
||||
delayed(func)(c, *args, **kwargs) for c in self.classes_
|
||||
)
|
||||
)
|
||||
|
||||
def quantify(self, instances):
|
||||
prevalences = self._parallel(self._delayed_binary_predict, instances)
|
||||
return qp.functional.normalize_prevalence(prevalences)
|
||||
|
||||
@property
|
||||
def binary(self):
|
||||
"""
|
||||
Informs that the quantifier is binary
|
||||
|
||||
:return: True
|
||||
"""
|
||||
return True
|
||||
|
||||
|
||||
def isbinary(model:BaseQuantifier):
|
||||
"""
|
||||
Alias for property `binary`
|
||||
|
||||
:param model: the model
|
||||
:return: True if the model is binary, False otherwise
|
||||
"""
|
||||
return model.binary
|
||||
|
||||
|
||||
def isaggregative(model:BaseQuantifier):
|
||||
"""
|
||||
Alias for property `aggregative`
|
||||
|
||||
:param model: the model
|
||||
:return: True if the model is aggregative, False otherwise
|
||||
"""
|
||||
|
||||
return model.aggregative
|
||||
|
||||
|
||||
def isprobabilistic(model:BaseQuantifier):
|
||||
"""
|
||||
Alias for property `probabilistic`
|
||||
|
||||
:param model: the model
|
||||
:return: True if the model is probabilistic, False otherwise
|
||||
"""
|
||||
|
||||
return model.probabilistic
|
||||
|
||||
|
||||
# class OneVsAll:
|
||||
# """
|
||||
# Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary
|
||||
# quantifier for each class, and then l1-normalizes the outputs so that the class prevelences sum up to 1.
|
||||
# """
|
||||
#
|
||||
# def __init__(self, binary_method, n_jobs=-1):
|
||||
# self.binary_method = binary_method
|
||||
# self.n_jobs = n_jobs
|
||||
#
|
||||
# def fit(self, data: LabelledCollection, **kwargs):
|
||||
# assert not data.binary, f'{self.__class__.__name__} expect non-binary data'
|
||||
# assert isinstance(self.binary_method, BaseQuantifier), f'{self.binary_method} does not seem to be a Quantifier'
|
||||
# self.class_method = {c: deepcopy(self.binary_method) for c in data.classes_}
|
||||
# Parallel(n_jobs=self.n_jobs, backend='threading')(
|
||||
# delayed(self._delayed_binary_fit)(c, self.class_method, data, **kwargs) for c in data.classes_
|
||||
# )
|
||||
# return self
|
||||
#
|
||||
# def quantify(self, X, *args):
|
||||
# prevalences = np.asarray(
|
||||
# Parallel(n_jobs=self.n_jobs, backend='threading')(
|
||||
# delayed(self._delayed_binary_predict)(c, self.class_method, X) for c in self.classes
|
||||
# )
|
||||
# )
|
||||
# return F.normalize_prevalence(prevalences)
|
||||
#
|
||||
# @property
|
||||
# def classes(self):
|
||||
# return sorted(self.class_method.keys())
|
||||
#
|
||||
# def set_params(self, **parameters):
|
||||
# self.binary_method.set_params(**parameters)
|
||||
#
|
||||
# def get_params(self, deep=True):
|
||||
# return self.binary_method.get_params()
|
||||
#
|
||||
# def _delayed_binary_predict(self, c, learners, X):
|
||||
# return learners[c].quantify(X)[:,1] # the mean is the estimation for the positive class prevalence
|
||||
#
|
||||
# def _delayed_binary_fit(self, c, learners, data, **kwargs):
|
||||
# bindata = LabelledCollection(data.instances, data.labels == c, n_classes=2)
|
||||
# learners[c].fit(bindata, **kwargs)
|
||||
def classes_(self):
|
||||
return sorted(self.dict_binary_quantifiers.keys())
|
||||
|
||||
def _delayed_binary_predict(self, c, X):
|
||||
return self.dict_binary_quantifiers[c].quantify(X)[1]
|
||||
|
||||
def _delayed_binary_fit(self, c, data):
|
||||
bindata = LabelledCollection(data.instances, data.labels == c, classes=[False, True])
|
||||
self.dict_binary_quantifiers[c].fit(bindata)
|
||||
|
|
|
@ -7,9 +7,9 @@ from sklearn.model_selection import GridSearchCV, cross_val_predict
|
|||
from tqdm import tqdm
|
||||
|
||||
import quapy as qp
|
||||
from evaluation import evaluate_on_samples
|
||||
from quapy import functional as F
|
||||
from quapy.data import LabelledCollection
|
||||
from quapy.evaluation import evaluate
|
||||
from quapy.model_selection import GridSearchQ
|
||||
|
||||
try:
|
||||
|
@ -73,7 +73,7 @@ class Ensemble(BaseQuantifier):
|
|||
policy='ave',
|
||||
max_sample_size=None,
|
||||
val_split:Union[qp.data.LabelledCollection, float]=None,
|
||||
n_jobs=1,
|
||||
n_jobs=None,
|
||||
verbose=False):
|
||||
assert policy in Ensemble.VALID_POLICIES, \
|
||||
f'unknown policy={policy}; valid are {Ensemble.VALID_POLICIES}'
|
||||
|
@ -85,7 +85,7 @@ class Ensemble(BaseQuantifier):
|
|||
self.red_size = red_size
|
||||
self.policy = policy
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.n_jobs = qp._get_njobs(n_jobs)
|
||||
self.post_proba_fn = None
|
||||
self.verbose = verbose
|
||||
self.max_sample_size = max_sample_size
|
||||
|
@ -147,15 +147,15 @@ class Ensemble(BaseQuantifier):
|
|||
This function should not be used within :class:`quapy.model_selection.GridSearchQ` (is here for compatibility
|
||||
with the abstract class).
|
||||
Instead, use `Ensemble(GridSearchQ(q),...)`, with `q` a Quantifier (recommended), or
|
||||
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a learner `l` optimized for
|
||||
classification (not recommended).
|
||||
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a classifier `l` optimized for
|
||||
classification (not recommended).
|
||||
|
||||
:param parameters: dictionary
|
||||
:return: raises an Exception
|
||||
"""
|
||||
raise NotImplementedError(f'{self.__class__.__name__} should not be used within GridSearchQ; '
|
||||
f'instead, use Ensemble(GridSearchQ(q),...), with q a Quantifier (recommended), '
|
||||
f'or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a learner '
|
||||
f'or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a classifier '
|
||||
f'l optimized for classification (not recommended).')
|
||||
|
||||
def get_params(self, deep=True):
|
||||
|
@ -163,11 +163,13 @@ class Ensemble(BaseQuantifier):
|
|||
This function should not be used within :class:`quapy.model_selection.GridSearchQ` (is here for compatibility
|
||||
with the abstract class).
|
||||
Instead, use `Ensemble(GridSearchQ(q),...)`, with `q` a Quantifier (recommended), or
|
||||
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a learner `l` optimized for
|
||||
classification (not recommended).
|
||||
`Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a classifier `l` optimized for
|
||||
classification (not recommended).
|
||||
|
||||
:param deep: for compatibility with scikit-learn
|
||||
:return: raises an Exception
|
||||
"""
|
||||
|
||||
raise NotImplementedError()
|
||||
|
||||
def _accuracy_policy(self, error_name):
|
||||
|
@ -176,11 +178,12 @@ class Ensemble(BaseQuantifier):
|
|||
For each model in the ensemble, the performance is measured in terms of _error_name_ on the quantification of
|
||||
the samples used for training the rest of the models in the ensemble.
|
||||
"""
|
||||
from quapy.evaluation import evaluate
|
||||
error = qp.error.from_name(error_name)
|
||||
tests = [m[3] for m in self.ensemble]
|
||||
scores = []
|
||||
for i, model in enumerate(self.ensemble):
|
||||
scores.append(evaluate(model[0], tests[:i] + tests[i + 1:], error, self.n_jobs))
|
||||
scores.append(evaluate_on_samples(model[0], tests[:i] + tests[i + 1:], error))
|
||||
order = np.argsort(scores)
|
||||
|
||||
self.ensemble = _select_k(self.ensemble, order, k=self.red_size)
|
||||
|
@ -234,19 +237,6 @@ class Ensemble(BaseQuantifier):
|
|||
order = np.argsort(dist)
|
||||
return _select_k(predictions, order, k=self.red_size)
|
||||
|
||||
@property
|
||||
def classes_(self):
|
||||
return self.base_quantifier.classes_
|
||||
|
||||
@property
|
||||
def binary(self):
|
||||
"""
|
||||
Returns a boolean indicating whether the base quantifiers are binary or not
|
||||
|
||||
:return: boolean
|
||||
"""
|
||||
return self.base_quantifier.binary
|
||||
|
||||
@property
|
||||
def aggregative(self):
|
||||
"""
|
||||
|
@ -339,18 +329,18 @@ def _draw_simplex(ndim, min_val, max_trials=100):
|
|||
f'>= {min_val} is unlikely (it failed after {max_trials} trials)')
|
||||
|
||||
|
||||
def _instantiate_ensemble(learner, base_quantifier_class, param_grid, optim, param_model_sel, **kwargs):
|
||||
def _instantiate_ensemble(classifier, base_quantifier_class, param_grid, optim, param_model_sel, **kwargs):
|
||||
if optim is None:
|
||||
base_quantifier = base_quantifier_class(learner)
|
||||
base_quantifier = base_quantifier_class(classifier)
|
||||
elif optim in qp.error.CLASSIFICATION_ERROR:
|
||||
if optim == qp.error.f1e:
|
||||
scoring = make_scorer(f1_score)
|
||||
elif optim == qp.error.acce:
|
||||
scoring = make_scorer(accuracy_score)
|
||||
learner = GridSearchCV(learner, param_grid, scoring=scoring)
|
||||
base_quantifier = base_quantifier_class(learner)
|
||||
classifier = GridSearchCV(classifier, param_grid, scoring=scoring)
|
||||
base_quantifier = base_quantifier_class(classifier)
|
||||
else:
|
||||
base_quantifier = GridSearchQ(base_quantifier_class(learner),
|
||||
base_quantifier = GridSearchQ(base_quantifier_class(classifier),
|
||||
param_grid=param_grid,
|
||||
**param_model_sel,
|
||||
error=optim)
|
||||
|
@ -370,7 +360,7 @@ def _check_error(error):
|
|||
f'the name of an error function in {qp.error.ERROR_NAMES}')
|
||||
|
||||
|
||||
def ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None, param_model_sel: dict = None,
|
||||
def ensembleFactory(classifier, base_quantifier_class, param_grid=None, optim=None, param_model_sel: dict = None,
|
||||
**kwargs):
|
||||
"""
|
||||
Ensemble factory. Provides a unified interface for instantiating ensembles that can be optimized (via model
|
||||
|
@ -403,7 +393,7 @@ def ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None,
|
|||
>>>
|
||||
>>> ensembleFactory(LogisticRegression(), PACC, optim='mae', policy='mae', **common)
|
||||
|
||||
:param learner: sklearn's Estimator that generates a classifier
|
||||
:param classifier: sklearn's Estimator that generates a classifier
|
||||
:param base_quantifier_class: a class of quantifiers
|
||||
:param param_grid: a dictionary with the grid of parameters to optimize for
|
||||
:param optim: a valid quantification or classification error, or a string name of it
|
||||
|
@ -418,21 +408,21 @@ def ensembleFactory(learner, base_quantifier_class, param_grid=None, optim=None,
|
|||
if param_model_sel is None:
|
||||
raise ValueError(f'param_model_sel is None but optim was requested.')
|
||||
error = _check_error(optim)
|
||||
return _instantiate_ensemble(learner, base_quantifier_class, param_grid, error, param_model_sel, **kwargs)
|
||||
return _instantiate_ensemble(classifier, base_quantifier_class, param_grid, error, param_model_sel, **kwargs)
|
||||
|
||||
|
||||
def ECC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
def ECC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
"""
|
||||
Implements an ensemble of :class:`quapy.method.aggregative.CC` quantifiers, as used by
|
||||
`Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.
|
||||
|
||||
Equivalent to:
|
||||
|
||||
>>> ensembleFactory(learner, CC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
>>> ensembleFactory(classifier, CC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
See :meth:`ensembleFactory` for further details.
|
||||
|
||||
:param learner: sklearn's Estimator that generates a classifier
|
||||
:param classifier: sklearn's Estimator that generates a classifier
|
||||
:param param_grid: a dictionary with the grid of parameters to optimize for
|
||||
:param optim: a valid quantification or classification error, or a string name of it
|
||||
:param param_model_sel: a dictionary containing any keyworded argument to pass to
|
||||
|
@ -441,21 +431,21 @@ def ECC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
|||
:return: an instance of :class:`Ensemble`
|
||||
"""
|
||||
|
||||
return ensembleFactory(learner, CC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
return ensembleFactory(classifier, CC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
|
||||
def EACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
def EACC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
"""
|
||||
Implements an ensemble of :class:`quapy.method.aggregative.ACC` quantifiers, as used by
|
||||
`Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.
|
||||
|
||||
Equivalent to:
|
||||
|
||||
>>> ensembleFactory(learner, ACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
>>> ensembleFactory(classifier, ACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
See :meth:`ensembleFactory` for further details.
|
||||
|
||||
:param learner: sklearn's Estimator that generates a classifier
|
||||
:param classifier: sklearn's Estimator that generates a classifier
|
||||
:param param_grid: a dictionary with the grid of parameters to optimize for
|
||||
:param optim: a valid quantification or classification error, or a string name of it
|
||||
:param param_model_sel: a dictionary containing any keyworded argument to pass to
|
||||
|
@ -464,20 +454,20 @@ def EACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
|||
:return: an instance of :class:`Ensemble`
|
||||
"""
|
||||
|
||||
return ensembleFactory(learner, ACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
return ensembleFactory(classifier, ACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
|
||||
def EPACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
def EPACC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
"""
|
||||
Implements an ensemble of :class:`quapy.method.aggregative.PACC` quantifiers.
|
||||
|
||||
Equivalent to:
|
||||
|
||||
>>> ensembleFactory(learner, PACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
>>> ensembleFactory(classifier, PACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
See :meth:`ensembleFactory` for further details.
|
||||
|
||||
:param learner: sklearn's Estimator that generates a classifier
|
||||
:param classifier: sklearn's Estimator that generates a classifier
|
||||
:param param_grid: a dictionary with the grid of parameters to optimize for
|
||||
:param optim: a valid quantification or classification error, or a string name of it
|
||||
:param param_model_sel: a dictionary containing any keyworded argument to pass to
|
||||
|
@ -486,21 +476,21 @@ def EPACC(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
|||
:return: an instance of :class:`Ensemble`
|
||||
"""
|
||||
|
||||
return ensembleFactory(learner, PACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
return ensembleFactory(classifier, PACC, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
|
||||
def EHDy(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
def EHDy(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
"""
|
||||
Implements an ensemble of :class:`quapy.method.aggregative.HDy` quantifiers, as used by
|
||||
`Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.
|
||||
|
||||
Equivalent to:
|
||||
|
||||
>>> ensembleFactory(learner, HDy, param_grid, optim, param_mod_sel, **kwargs)
|
||||
>>> ensembleFactory(classifier, HDy, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
See :meth:`ensembleFactory` for further details.
|
||||
|
||||
:param learner: sklearn's Estimator that generates a classifier
|
||||
:param classifier: sklearn's Estimator that generates a classifier
|
||||
:param param_grid: a dictionary with the grid of parameters to optimize for
|
||||
:param optim: a valid quantification or classification error, or a string name of it
|
||||
:param param_model_sel: a dictionary containing any keyworded argument to pass to
|
||||
|
@ -509,20 +499,20 @@ def EHDy(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
|||
:return: an instance of :class:`Ensemble`
|
||||
"""
|
||||
|
||||
return ensembleFactory(learner, HDy, param_grid, optim, param_mod_sel, **kwargs)
|
||||
return ensembleFactory(classifier, HDy, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
|
||||
def EEMQ(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
def EEMQ(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
||||
"""
|
||||
Implements an ensemble of :class:`quapy.method.aggregative.EMQ` quantifiers.
|
||||
|
||||
Equivalent to:
|
||||
|
||||
>>> ensembleFactory(learner, EMQ, param_grid, optim, param_mod_sel, **kwargs)
|
||||
>>> ensembleFactory(classifier, EMQ, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
||||
See :meth:`ensembleFactory` for further details.
|
||||
|
||||
:param learner: sklearn's Estimator that generates a classifier
|
||||
:param classifier: sklearn's Estimator that generates a classifier
|
||||
:param param_grid: a dictionary with the grid of parameters to optimize for
|
||||
:param optim: a valid quantification or classification error, or a string name of it
|
||||
:param param_model_sel: a dictionary containing any keyworded argument to pass to
|
||||
|
@ -531,4 +521,4 @@ def EEMQ(learner, param_grid=None, optim=None, param_mod_sel=None, **kwargs):
|
|||
:return: an instance of :class:`Ensemble`
|
||||
"""
|
||||
|
||||
return ensembleFactory(learner, EMQ, param_grid, optim, param_mod_sel, **kwargs)
|
||||
return ensembleFactory(classifier, EMQ, param_grid, optim, param_mod_sel, **kwargs)
|
||||
|
|
|
@ -6,6 +6,7 @@ import torch
|
|||
from torch.nn import MSELoss
|
||||
from torch.nn.functional import relu
|
||||
|
||||
from protocol import UPP
|
||||
from quapy.method.aggregative import *
|
||||
from quapy.util import EarlyStop
|
||||
|
||||
|
@ -31,17 +32,18 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
>>>
|
||||
>>> # the text classifier is a CNN trained by NeuralClassifierTrainer
|
||||
>>> cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
|
||||
>>> learner = NeuralClassifierTrainer(cnn, device='cuda')
|
||||
>>> classifier = NeuralClassifierTrainer(cnn, device='cuda')
|
||||
>>>
|
||||
>>> # train QuaNet (QuaNet is an alias to QuaNetTrainer)
|
||||
>>> model = QuaNet(learner, qp.environ['SAMPLE_SIZE'], device='cuda')
|
||||
>>> model = QuaNet(classifier, qp.environ['SAMPLE_SIZE'], device='cuda')
|
||||
>>> model.fit(dataset.training)
|
||||
>>> estim_prevalence = model.quantify(dataset.test.instances)
|
||||
|
||||
:param learner: an object implementing `fit` (i.e., that can be trained on labelled data),
|
||||
:param classifier: an object implementing `fit` (i.e., that can be trained on labelled data),
|
||||
`predict_proba` (i.e., that can generate posterior probabilities of unlabelled examples) and
|
||||
`transform` (i.e., that can generate embedded representations of the unlabelled instances).
|
||||
:param sample_size: integer, the sample size
|
||||
:param sample_size: integer, the sample size; default is None, meaning that the sample size should be
|
||||
taken from qp.environ["SAMPLE_SIZE"]
|
||||
:param n_epochs: integer, maximum number of training epochs
|
||||
:param tr_iter_per_poch: integer, number of training iterations before considering an epoch complete
|
||||
:param va_iter_per_poch: integer, number of validation iterations to perform after each epoch
|
||||
|
@ -60,8 +62,8 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
"""
|
||||
|
||||
def __init__(self,
|
||||
learner,
|
||||
sample_size,
|
||||
classifier,
|
||||
sample_size=None,
|
||||
n_epochs=100,
|
||||
tr_iter_per_poch=500,
|
||||
va_iter_per_poch=100,
|
||||
|
@ -76,15 +78,14 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
checkpointname=None,
|
||||
device='cuda'):
|
||||
|
||||
assert hasattr(learner, 'transform'), \
|
||||
f'the learner {learner.__class__.__name__} does not seem to be able to produce document embeddings ' \
|
||||
assert hasattr(classifier, 'transform'), \
|
||||
f'the classifier {classifier.__class__.__name__} does not seem to be able to produce document embeddings ' \
|
||||
f'since it does not implement the method "transform"'
|
||||
assert hasattr(learner, 'predict_proba'), \
|
||||
f'the learner {learner.__class__.__name__} does not seem to be able to produce posterior probabilities ' \
|
||||
assert hasattr(classifier, 'predict_proba'), \
|
||||
f'the classifier {classifier.__class__.__name__} does not seem to be able to produce posterior probabilities ' \
|
||||
f'since it does not implement the method "predict_proba"'
|
||||
assert sample_size is not None, 'sample_size cannot be None'
|
||||
self.learner = learner
|
||||
self.sample_size = sample_size
|
||||
self.classifier = classifier
|
||||
self.sample_size = qp._get_sample_size(sample_size)
|
||||
self.n_epochs = n_epochs
|
||||
self.tr_iter = tr_iter_per_poch
|
||||
self.va_iter = va_iter_per_poch
|
||||
|
@ -106,26 +107,26 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
self.checkpoint = os.path.join(checkpointdir, checkpointname)
|
||||
self.device = torch.device(device)
|
||||
|
||||
self.__check_params_colision(self.quanet_params, self.learner.get_params())
|
||||
self.__check_params_colision(self.quanet_params, self.classifier.get_params())
|
||||
self._classes_ = None
|
||||
|
||||
def fit(self, data: LabelledCollection, fit_learner=True):
|
||||
def fit(self, data: LabelledCollection, fit_classifier=True):
|
||||
"""
|
||||
Trains QuaNet.
|
||||
|
||||
:param data: the training data on which to train QuaNet. If `fit_learner=True`, the data will be split in
|
||||
:param data: the training data on which to train QuaNet. If `fit_classifier=True`, the data will be split in
|
||||
40/40/20 for training the classifier, training QuaNet, and validating QuaNet, respectively. If
|
||||
`fit_learner=False`, the data will be split in 66/34 for training QuaNet and validating it, respectively.
|
||||
:param fit_learner: if True, trains the classifier on a split containing 40% of the data
|
||||
`fit_classifier=False`, the data will be split in 66/34 for training QuaNet and validating it, respectively.
|
||||
:param fit_classifier: if True, trains the classifier on a split containing 40% of the data
|
||||
:return: self
|
||||
"""
|
||||
self._classes_ = data.classes_
|
||||
os.makedirs(self.checkpointdir, exist_ok=True)
|
||||
|
||||
if fit_learner:
|
||||
if fit_classifier:
|
||||
classifier_data, unused_data = data.split_stratified(0.4)
|
||||
train_data, valid_data = unused_data.split_stratified(0.66) # 0.66 split of 60% makes 40% and 20%
|
||||
self.learner.fit(*classifier_data.Xy)
|
||||
self.classifier.fit(*classifier_data.Xy)
|
||||
else:
|
||||
classifier_data = None
|
||||
train_data, valid_data = data.split_stratified(0.66)
|
||||
|
@ -134,21 +135,21 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
self.tr_prev = data.prevalence()
|
||||
|
||||
# compute the posterior probabilities of the instances
|
||||
valid_posteriors = self.learner.predict_proba(valid_data.instances)
|
||||
train_posteriors = self.learner.predict_proba(train_data.instances)
|
||||
valid_posteriors = self.classifier.predict_proba(valid_data.instances)
|
||||
train_posteriors = self.classifier.predict_proba(train_data.instances)
|
||||
|
||||
# turn instances' original representations into embeddings
|
||||
valid_data_embed = LabelledCollection(self.learner.transform(valid_data.instances), valid_data.labels, self._classes_)
|
||||
train_data_embed = LabelledCollection(self.learner.transform(train_data.instances), train_data.labels, self._classes_)
|
||||
valid_data_embed = LabelledCollection(self.classifier.transform(valid_data.instances), valid_data.labels, self._classes_)
|
||||
train_data_embed = LabelledCollection(self.classifier.transform(train_data.instances), train_data.labels, self._classes_)
|
||||
|
||||
self.quantifiers = {
|
||||
'cc': CC(self.learner).fit(None, fit_learner=False),
|
||||
'acc': ACC(self.learner).fit(None, fit_learner=False, val_split=valid_data),
|
||||
'pcc': PCC(self.learner).fit(None, fit_learner=False),
|
||||
'pacc': PACC(self.learner).fit(None, fit_learner=False, val_split=valid_data),
|
||||
'cc': CC(self.classifier).fit(None, fit_classifier=False),
|
||||
'acc': ACC(self.classifier).fit(None, fit_classifier=False, val_split=valid_data),
|
||||
'pcc': PCC(self.classifier).fit(None, fit_classifier=False),
|
||||
'pacc': PACC(self.classifier).fit(None, fit_classifier=False, val_split=valid_data),
|
||||
}
|
||||
if classifier_data is not None:
|
||||
self.quantifiers['emq'] = EMQ(self.learner).fit(classifier_data, fit_learner=False)
|
||||
self.quantifiers['emq'] = EMQ(self.classifier).fit(classifier_data, fit_classifier=False)
|
||||
|
||||
self.status = {
|
||||
'tr-loss': -1,
|
||||
|
@ -192,7 +193,7 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
label_predictions = np.argmax(posteriors, axis=-1)
|
||||
prevs_estim = []
|
||||
for quantifier in self.quantifiers.values():
|
||||
predictions = posteriors if quantifier.probabilistic else label_predictions
|
||||
predictions = posteriors if isinstance(quantifier, AggregativeProbabilisticQuantifier) else label_predictions
|
||||
prevs_estim.extend(quantifier.aggregate(predictions))
|
||||
|
||||
# there is no real need for adding static estims like the TPR or FPR from training since those are constant
|
||||
|
@ -200,8 +201,8 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
return prevs_estim
|
||||
|
||||
def quantify(self, instances):
|
||||
posteriors = self.learner.predict_proba(instances)
|
||||
embeddings = self.learner.transform(instances)
|
||||
posteriors = self.classifier.predict_proba(instances)
|
||||
embeddings = self.classifier.transform(instances)
|
||||
quant_estims = self._get_aggregative_estims(posteriors)
|
||||
self.quanet.eval()
|
||||
with torch.no_grad():
|
||||
|
@ -217,16 +218,13 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
self.quanet.train(mode=train)
|
||||
losses = []
|
||||
mae_errors = []
|
||||
if train==False:
|
||||
prevpoints = F.get_nprevpoints_approximation(iterations, self.quanet.n_classes)
|
||||
iterations = F.num_prevalence_combinations(prevpoints, self.quanet.n_classes)
|
||||
with qp.util.temp_seed(0):
|
||||
sampling_index_gen = data.artificial_sampling_index_generator(self.sample_size, prevpoints)
|
||||
else:
|
||||
sampling_index_gen = [data.sampling_index(self.sample_size, *prev) for prev in
|
||||
F.uniform_simplex_sampling(data.n_classes, iterations)]
|
||||
pbar = tqdm(sampling_index_gen, total=iterations) if train else sampling_index_gen
|
||||
|
||||
sampler = UPP(
|
||||
data,
|
||||
sample_size=self.sample_size,
|
||||
repeats=iterations,
|
||||
random_state=None if train else 0 # different samples during train, same samples during validation
|
||||
)
|
||||
pbar = tqdm(sampler.samples_parameters(), total=sampler.total())
|
||||
for it, index in enumerate(pbar):
|
||||
sample_data = data.sampling_from_index(index)
|
||||
sample_posteriors = posteriors[index]
|
||||
|
@ -265,7 +263,7 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
f'patience={early_stop.patience}/{early_stop.PATIENCE_LIMIT}')
|
||||
|
||||
def get_params(self, deep=True):
|
||||
return {**self.learner.get_params(), **self.quanet_params}
|
||||
return {**self.classifier.get_params(), **self.quanet_params}
|
||||
|
||||
def set_params(self, **parameters):
|
||||
learner_params = {}
|
||||
|
@ -274,7 +272,7 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
self.quanet_params[key] = val
|
||||
else:
|
||||
learner_params[key] = val
|
||||
self.learner.set_params(**learner_params)
|
||||
self.classifier.set_params(**learner_params)
|
||||
|
||||
def __check_params_colision(self, quanet_params, learner_params):
|
||||
quanet_keys = set(quanet_params.keys())
|
||||
|
@ -282,7 +280,7 @@ class QuaNetTrainer(BaseQuantifier):
|
|||
intersection = quanet_keys.intersection(learner_keys)
|
||||
if len(intersection) > 0:
|
||||
raise ValueError(f'the use of parameters {intersection} is ambiguous sine those can refer to '
|
||||
f'the parameters of QuaNet or the learner {self.learner.__class__.__name__}')
|
||||
f'the parameters of QuaNet or the learner {self.classifier.__class__.__name__}')
|
||||
|
||||
def clean_checkpoint(self):
|
||||
"""
|
||||
|
|
|
@ -21,7 +21,6 @@ class MaximumLikelihoodPrevalenceEstimation(BaseQuantifier):
|
|||
:param data: the training sample
|
||||
:return: self
|
||||
"""
|
||||
self._classes_ = data.classes_
|
||||
self.estimated_prevalence = data.prevalence()
|
||||
return self
|
||||
|
||||
|
@ -34,29 +33,3 @@ class MaximumLikelihoodPrevalenceEstimation(BaseQuantifier):
|
|||
"""
|
||||
return self.estimated_prevalence
|
||||
|
||||
@property
|
||||
def classes_(self):
|
||||
"""
|
||||
Number of classes
|
||||
|
||||
:return: integer
|
||||
"""
|
||||
|
||||
return self._classes_
|
||||
|
||||
def get_params(self, deep=True):
|
||||
"""
|
||||
Does nothing, since this learner has no parameters.
|
||||
|
||||
:param deep: for compatibility with sklearn
|
||||
:return: `None`
|
||||
"""
|
||||
return None
|
||||
|
||||
def set_params(self, **parameters):
|
||||
"""
|
||||
Does nothing, since this learner has no parameters.
|
||||
|
||||
:param parameters: dictionary of param-value pairs (ignored)
|
||||
"""
|
||||
pass
|
||||
|
|
|
@ -4,14 +4,14 @@ from copy import deepcopy
|
|||
from typing import Union, Callable
|
||||
|
||||
import numpy as np
|
||||
from sklearn import clone
|
||||
|
||||
import quapy as qp
|
||||
from quapy import evaluation
|
||||
from quapy.protocol import AbstractProtocol, OnLabelledCollectionProtocol
|
||||
from quapy.data.base import LabelledCollection
|
||||
from quapy.evaluation import artificial_prevalence_prediction, natural_prevalence_prediction, gen_prevalence_prediction
|
||||
from quapy.method.aggregative import BaseQuantifier
|
||||
import inspect
|
||||
|
||||
from quapy.util import _check_sample_size
|
||||
from time import time
|
||||
|
||||
|
||||
class GridSearchQ(BaseQuantifier):
|
||||
|
@ -23,33 +23,11 @@ class GridSearchQ(BaseQuantifier):
|
|||
:param model: the quantifier to optimize
|
||||
:type model: BaseQuantifier
|
||||
:param param_grid: a dictionary with keys the parameter names and values the list of values to explore
|
||||
:param sample_size: the size of the samples to extract from the validation set (ignored if protocl='gen')
|
||||
:param protocol: either 'app' for the artificial prevalence protocol, 'npp' for the natural prevalence
|
||||
protocol, or 'gen' for using a custom sampling generator function
|
||||
:param n_prevpoints: if specified, indicates the number of equally distant points to extract from the interval
|
||||
[0,1] in order to define the prevalences of the samples; e.g., if n_prevpoints=5, then the prevalences for
|
||||
each class will be explored in [0.00, 0.25, 0.50, 0.75, 1.00]. If not specified, then eval_budget is requested.
|
||||
Ignored if protocol!='app'.
|
||||
:param n_repetitions: the number of repetitions for each combination of prevalences. This parameter is ignored
|
||||
for the protocol='app' if eval_budget is set and is lower than the number of combinations that would be
|
||||
generated using the value assigned to n_prevpoints (for the current number of classes and n_repetitions).
|
||||
Ignored for protocol='npp' and protocol='gen' (use eval_budget for setting a maximum number of samples in
|
||||
those cases).
|
||||
:param eval_budget: if specified, sets a ceil on the number of evaluations to perform for each hyper-parameter
|
||||
combination. For example, if protocol='app', there are 3 classes, n_repetitions=1 and eval_budget=20, then
|
||||
n_prevpoints will be set to 5, since this will generate 15 different prevalences, i.e., [0, 0, 1],
|
||||
[0, 0.25, 0.75], [0, 0.5, 0.5] ... [1, 0, 0], and since setting it to 6 would generate more than
|
||||
20. When protocol='gen', indicates the maximum number of samples to generate, but less samples will be
|
||||
generated if the generator yields less samples.
|
||||
:param protocol: a sample generation protocol, an instance of :class:`quapy.protocol.AbstractProtocol`
|
||||
:param error: an error function (callable) or a string indicating the name of an error function (valid ones
|
||||
are those in qp.error.QUANTIFICATION_ERROR
|
||||
are those in :class:`quapy.error.QUANTIFICATION_ERROR`
|
||||
:param refit: whether or not to refit the model on the whole labelled collection (training+validation) with
|
||||
the best chosen hyperparameter combination. Ignored if protocol='gen'
|
||||
:param val_split: either a LabelledCollection on which to test the performance of the different settings, or
|
||||
a float in [0,1] indicating the proportion of labelled data to extract from the training set, or a callable
|
||||
returning a generator function each time it is invoked (only for protocol='gen').
|
||||
:param n_jobs: number of parallel jobs
|
||||
:param random_seed: set the seed of the random generator to replicate experiments. Ignored if protocol='gen'.
|
||||
:param timeout: establishes a timer (in seconds) for each of the hyperparameters configurations being tested.
|
||||
Whenever a run takes longer than this timer, that configuration will be ignored. If all configurations end up
|
||||
being ignored, a TimeoutError exception is raised. If -1 (default) then no time bound is set.
|
||||
|
@ -59,65 +37,27 @@ class GridSearchQ(BaseQuantifier):
|
|||
def __init__(self,
|
||||
model: BaseQuantifier,
|
||||
param_grid: dict,
|
||||
sample_size: Union[int, None] = None,
|
||||
protocol='app',
|
||||
n_prevpoints: int = None,
|
||||
n_repetitions: int = 1,
|
||||
eval_budget: int = None,
|
||||
protocol: AbstractProtocol,
|
||||
error: Union[Callable, str] = qp.error.mae,
|
||||
refit=True,
|
||||
val_split=0.4,
|
||||
n_jobs=1,
|
||||
random_seed=42,
|
||||
timeout=-1,
|
||||
n_jobs=None,
|
||||
verbose=False):
|
||||
|
||||
self.model = model
|
||||
self.param_grid = param_grid
|
||||
self.sample_size = sample_size
|
||||
self.protocol = protocol.lower()
|
||||
self.n_prevpoints = n_prevpoints
|
||||
self.n_repetitions = n_repetitions
|
||||
self.eval_budget = eval_budget
|
||||
self.protocol = protocol
|
||||
self.refit = refit
|
||||
self.val_split = val_split
|
||||
self.n_jobs = n_jobs
|
||||
self.random_seed = random_seed
|
||||
self.timeout = timeout
|
||||
self.n_jobs = qp._get_njobs(n_jobs)
|
||||
self.verbose = verbose
|
||||
self.__check_error(error)
|
||||
assert self.protocol in {'app', 'npp', 'gen'}, \
|
||||
'unknown protocol: valid ones are "app" or "npp" for the "artificial" or the "natural" prevalence ' \
|
||||
'protocols. Use protocol="gen" when passing a generator function thorough val_split that yields a ' \
|
||||
'sample (instances) and their prevalence (ndarray) at each iteration.'
|
||||
assert self.eval_budget is None or isinstance(self.eval_budget, int)
|
||||
if self.protocol in ['npp', 'gen']:
|
||||
if self.protocol=='npp' and (self.eval_budget is None or self.eval_budget <= 0):
|
||||
raise ValueError(f'when protocol="npp" the parameter eval_budget should be '
|
||||
f'indicated (and should be >0).')
|
||||
if self.n_repetitions != 1:
|
||||
print('[warning] n_repetitions has been set and will be ignored for the selected protocol')
|
||||
assert isinstance(protocol, AbstractProtocol), 'unknown protocol'
|
||||
|
||||
def _sout(self, msg):
|
||||
if self.verbose:
|
||||
print(f'[{self.__class__.__name__}]: {msg}')
|
||||
|
||||
def __check_training_validation(self, training, validation):
|
||||
if isinstance(validation, LabelledCollection):
|
||||
return training, validation
|
||||
elif isinstance(validation, float):
|
||||
assert 0. < validation < 1., 'validation proportion should be in (0,1)'
|
||||
training, validation = training.split_stratified(train_prop=1 - validation, random_state=self.random_seed)
|
||||
return training, validation
|
||||
elif self.protocol=='gen' and inspect.isgenerator(validation()):
|
||||
return training, validation
|
||||
else:
|
||||
raise ValueError(f'"validation" must either be a LabelledCollection or a float in (0,1) indicating the'
|
||||
f'proportion of training documents to extract (type found: {type(validation)}). '
|
||||
f'Optionally, "validation" can be a callable function returning a generator that yields '
|
||||
f'the sample instances along with their true prevalence at each iteration by '
|
||||
f'setting protocol="gen".')
|
||||
|
||||
def __check_error(self, error):
|
||||
if error in qp.error.QUANTIFICATION_ERROR:
|
||||
self.error = error
|
||||
|
@ -129,95 +69,103 @@ class GridSearchQ(BaseQuantifier):
|
|||
raise ValueError(f'unexpected error type; must either be a callable function or a str representing\n'
|
||||
f'the name of an error function in {qp.error.QUANTIFICATION_ERROR_NAMES}')
|
||||
|
||||
def __generate_predictions(self, model, val_split):
|
||||
commons = {
|
||||
'n_repetitions': self.n_repetitions,
|
||||
'n_jobs': self.n_jobs,
|
||||
'random_seed': self.random_seed,
|
||||
'verbose': False
|
||||
}
|
||||
if self.protocol == 'app':
|
||||
return artificial_prevalence_prediction(
|
||||
model, val_split, self.sample_size,
|
||||
n_prevpoints=self.n_prevpoints,
|
||||
eval_budget=self.eval_budget,
|
||||
**commons
|
||||
)
|
||||
elif self.protocol == 'npp':
|
||||
return natural_prevalence_prediction(
|
||||
model, val_split, self.sample_size,
|
||||
**commons)
|
||||
elif self.protocol == 'gen':
|
||||
return gen_prevalence_prediction(model, gen_fn=val_split, eval_budget=self.eval_budget)
|
||||
else:
|
||||
raise ValueError('unknown protocol')
|
||||
|
||||
def fit(self, training: LabelledCollection, val_split: Union[LabelledCollection, float, Callable] = None):
|
||||
def fit(self, training: LabelledCollection):
|
||||
""" Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing
|
||||
the error metric.
|
||||
|
||||
:param training: the training set on which to optimize the hyperparameters
|
||||
:param val_split: either a LabelledCollection on which to test the performance of the different settings, or
|
||||
a float in [0,1] indicating the proportion of labelled data to extract from the training set
|
||||
:return: self
|
||||
"""
|
||||
if val_split is None:
|
||||
val_split = self.val_split
|
||||
training, val_split = self.__check_training_validation(training, val_split)
|
||||
if self.protocol != 'gen':
|
||||
self.sample_size = _check_sample_size(self.sample_size)
|
||||
|
||||
params_keys = list(self.param_grid.keys())
|
||||
params_values = list(self.param_grid.values())
|
||||
|
||||
model = self.model
|
||||
protocol = self.protocol
|
||||
|
||||
self.param_scores_ = {}
|
||||
self.best_score_ = None
|
||||
|
||||
tinit = time()
|
||||
|
||||
hyper = [dict({k: val[i] for i, k in enumerate(params_keys)}) for val in itertools.product(*params_values)]
|
||||
self._sout(f'starting model selection with {self.n_jobs =}')
|
||||
#pass a seed to parallel so it is set in clild processes
|
||||
scores = qp.util.parallel(
|
||||
self._delayed_eval,
|
||||
((params, training) for params in hyper),
|
||||
seed=qp.environ.get('_R_SEED', None),
|
||||
n_jobs=self.n_jobs
|
||||
)
|
||||
|
||||
for params, score, model in scores:
|
||||
if score is not None:
|
||||
if self.best_score_ is None or score < self.best_score_:
|
||||
self.best_score_ = score
|
||||
self.best_params_ = params
|
||||
self.best_model_ = model
|
||||
self.param_scores_[str(params)] = score
|
||||
else:
|
||||
self.param_scores_[str(params)] = 'timeout'
|
||||
|
||||
tend = time()-tinit
|
||||
|
||||
if self.best_score_ is None:
|
||||
raise TimeoutError('no combination of hyperparameters seem to work')
|
||||
|
||||
self._sout(f'optimization finished: best params {self.best_params_} (score={self.best_score_:.5f}) '
|
||||
f'[took {tend:.4f}s]')
|
||||
|
||||
if self.refit:
|
||||
if isinstance(protocol, OnLabelledCollectionProtocol):
|
||||
self._sout(f'refitting on the whole development set')
|
||||
self.best_model_.fit(training + protocol.get_labelled_collection())
|
||||
else:
|
||||
raise RuntimeWarning(f'"refit" was requested, but the protocol does not '
|
||||
f'implement the {OnLabelledCollectionProtocol.__name__} interface')
|
||||
|
||||
return self
|
||||
|
||||
def _delayed_eval(self, args):
|
||||
params, training = args
|
||||
|
||||
protocol = self.protocol
|
||||
error = self.error
|
||||
|
||||
if self.timeout > 0:
|
||||
def handler(signum, frame):
|
||||
self._sout('timeout reached')
|
||||
raise TimeoutError()
|
||||
|
||||
signal.signal(signal.SIGALRM, handler)
|
||||
|
||||
self.param_scores_ = {}
|
||||
self.best_score_ = None
|
||||
some_timeouts = False
|
||||
for values in itertools.product(*params_values):
|
||||
params = dict({k: values[i] for i, k in enumerate(params_keys)})
|
||||
tinit = time()
|
||||
|
||||
if self.timeout > 0:
|
||||
signal.alarm(self.timeout)
|
||||
|
||||
try:
|
||||
model = deepcopy(self.model)
|
||||
# overrides default parameters with the parameters being explored at this iteration
|
||||
model.set_params(**params)
|
||||
model.fit(training)
|
||||
score = evaluation.evaluate(model, protocol=protocol, error_metric=error)
|
||||
|
||||
ttime = time()-tinit
|
||||
self._sout(f'hyperparams={params}\t got {error.__name__} score {score:.5f} [took {ttime:.4f}s]')
|
||||
|
||||
if self.timeout > 0:
|
||||
signal.alarm(self.timeout)
|
||||
signal.alarm(0)
|
||||
except TimeoutError:
|
||||
self._sout(f'timeout ({self.timeout}s) reached for config {params}')
|
||||
score = None
|
||||
except ValueError as e:
|
||||
self._sout(f'the combination of hyperparameters {params} is invalid')
|
||||
raise e
|
||||
except Exception as e:
|
||||
self._sout(f'something went wrong for config {params}; skipping:')
|
||||
self._sout(f'\tException: {e}')
|
||||
score = None
|
||||
|
||||
try:
|
||||
# overrides default parameters with the parameters being explored at this iteration
|
||||
model.set_params(**params)
|
||||
model.fit(training)
|
||||
true_prevalences, estim_prevalences = self.__generate_predictions(model, val_split)
|
||||
score = self.error(true_prevalences, estim_prevalences)
|
||||
return params, score, model
|
||||
|
||||
self._sout(f'checking hyperparams={params} got {self.error.__name__} score {score:.5f}')
|
||||
if self.best_score_ is None or score < self.best_score_:
|
||||
self.best_score_ = score
|
||||
self.best_params_ = params
|
||||
self.best_model_ = deepcopy(model)
|
||||
self.param_scores_[str(params)] = score
|
||||
|
||||
if self.timeout > 0:
|
||||
signal.alarm(0)
|
||||
except TimeoutError:
|
||||
print(f'timeout reached for config {params}')
|
||||
some_timeouts = True
|
||||
|
||||
if self.best_score_ is None and some_timeouts:
|
||||
raise TimeoutError('all jobs took more than the timeout time to end')
|
||||
|
||||
self._sout(f'optimization finished: best params {self.best_params_} (score={self.best_score_:.5f})')
|
||||
|
||||
if self.refit:
|
||||
self._sout(f'refitting on the whole development set')
|
||||
self.best_model_.fit(training + val_split)
|
||||
|
||||
return self
|
||||
|
||||
def quantify(self, instances):
|
||||
"""Estimate class prevalence values using the best model found after calling the :meth:`fit` method.
|
||||
|
@ -229,14 +177,6 @@ class GridSearchQ(BaseQuantifier):
|
|||
assert hasattr(self, 'best_model_'), 'quantify called before fit'
|
||||
return self.best_model().quantify(instances)
|
||||
|
||||
@property
|
||||
def classes_(self):
|
||||
"""
|
||||
Classes on which the quantifier has been trained on.
|
||||
:return: a ndarray of shape `(n_classes)` with the class identifiers
|
||||
"""
|
||||
return self.best_model().classes_
|
||||
|
||||
def set_params(self, **parameters):
|
||||
"""Sets the hyper-parameters to explore.
|
||||
|
||||
|
@ -262,3 +202,30 @@ class GridSearchQ(BaseQuantifier):
|
|||
if hasattr(self, 'best_model_'):
|
||||
return self.best_model_
|
||||
raise ValueError('best_model called before fit')
|
||||
|
||||
|
||||
|
||||
|
||||
def cross_val_predict(quantifier: BaseQuantifier, data: LabelledCollection, nfolds=3, random_state=0):
|
||||
"""
|
||||
Akin to `scikit-learn's cross_val_predict <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html>`_
|
||||
but for quantification.
|
||||
|
||||
:param quantifier: a quantifier issuing class prevalence values
|
||||
:param data: a labelled collection
|
||||
:param nfolds: number of folds for k-fold cross validation generation
|
||||
:param random_state: random seed for reproducibility
|
||||
:return: a vector of class prevalence values
|
||||
"""
|
||||
|
||||
total_prev = np.zeros(shape=data.n_classes)
|
||||
|
||||
for train, test in data.kFCV(nfolds=nfolds, random_state=random_state):
|
||||
quantifier.fit(train)
|
||||
fold_prev = quantifier.quantify(test.X)
|
||||
rel_size = len(test.X)/len(data)
|
||||
total_prev += fold_prev*rel_size
|
||||
|
||||
return total_prev
|
||||
|
||||
|
||||
|
|
|
@ -4,6 +4,8 @@ from matplotlib.cm import get_cmap
|
|||
import numpy as np
|
||||
from matplotlib import cm
|
||||
from scipy.stats import ttest_ind_from_stats
|
||||
from matplotlib.ticker import ScalarFormatter
|
||||
import math
|
||||
|
||||
import quapy as qp
|
||||
|
||||
|
@ -49,9 +51,10 @@ def binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=No
|
|||
table = {method_name:[true_prev, estim_prev] for method_name, true_prev, estim_prev in order}
|
||||
order = [(method_name, *table[method_name]) for method_name in method_order]
|
||||
|
||||
cm = plt.get_cmap('tab10')
|
||||
NUM_COLORS = len(method_names)
|
||||
# ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
|
||||
if NUM_COLORS>10:
|
||||
cm = plt.get_cmap('tab20')
|
||||
ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
|
||||
for method, true_prev, estim_prev in order:
|
||||
true_prev = true_prev[:,pos_class]
|
||||
estim_prev = estim_prev[:,pos_class]
|
||||
|
@ -74,10 +77,9 @@ def binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=No
|
|||
ax.set_xlim(0, 1)
|
||||
|
||||
if legend:
|
||||
box = ax.get_position()
|
||||
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
|
||||
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
|
||||
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
|
||||
# box = ax.get_position()
|
||||
# ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
|
||||
# ax.legend(loc='lower center',
|
||||
# bbox_to_anchor=(1, -0.5),
|
||||
# ncol=(len(method_names)+1)//2)
|
||||
|
@ -212,6 +214,7 @@ def binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=N
|
|||
def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
|
||||
n_bins=20, error_name='ae', show_std=False,
|
||||
show_density=True,
|
||||
show_legend=True,
|
||||
logscale=False,
|
||||
title=f'Quantification error as a function of distribution shift',
|
||||
vlines=None,
|
||||
|
@ -234,6 +237,7 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
|
|||
:param error_name: a string representing the name of an error function (as defined in `quapy.error`, default is "ae")
|
||||
:param show_std: whether or not to show standard deviations as color bands (default is False)
|
||||
:param show_density: whether or not to display the distribution of experiments for each bin (default is True)
|
||||
:param show_density: whether or not to display the legend of the chart (default is True)
|
||||
:param logscale: whether or not to log-scale the y-error measure (default is False)
|
||||
:param title: title of the plot (default is "Quantification error as a function of distribution shift")
|
||||
:param vlines: array-like list of values (default is None). If indicated, highlights some regions of the space
|
||||
|
@ -254,6 +258,9 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
|
|||
# x_error function) and 'y' is the estim-test shift (computed as according to y_error)
|
||||
data = _join_data_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, x_error, y_error, method_order)
|
||||
|
||||
if method_order is None:
|
||||
method_order = method_names
|
||||
|
||||
_set_colors(ax, n_methods=len(method_order))
|
||||
|
||||
bins = np.linspace(0, 1, n_bins+1)
|
||||
|
@ -264,7 +271,10 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
|
|||
tr_test_drifts = data[method]['x']
|
||||
method_drifts = data[method]['y']
|
||||
if logscale:
|
||||
method_drifts=np.log(1+method_drifts)
|
||||
ax.set_yscale("log")
|
||||
ax.yaxis.set_major_formatter(ScalarFormatter())
|
||||
ax.yaxis.get_major_formatter().set_scientific(False)
|
||||
ax.minorticks_off()
|
||||
|
||||
inds = np.digitize(tr_test_drifts, bins, right=True)
|
||||
|
||||
|
@ -295,8 +305,13 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
|
|||
ax.fill_between(xs, ys-ystds, ys+ystds, alpha=0.25)
|
||||
|
||||
if show_density:
|
||||
ax.bar([ind * binwidth-binwidth/2 for ind in range(len(bins))],
|
||||
max_y*npoints/np.max(npoints), alpha=0.15, color='g', width=binwidth, label='density')
|
||||
ax2 = ax.twinx()
|
||||
densities = npoints/np.sum(npoints)
|
||||
ax2.bar([ind * binwidth-binwidth/2 for ind in range(len(bins))],
|
||||
densities, alpha=0.15, color='g', width=binwidth, label='density')
|
||||
ax2.set_ylim(0,max(densities))
|
||||
ax2.spines['right'].set_color('g')
|
||||
ax2.tick_params(axis='y', colors='g')
|
||||
|
||||
ax.set(xlabel=f'Distribution shift between training set and test sample',
|
||||
ylabel=f'{error_name.upper()} (true distribution, predicted distribution)',
|
||||
|
@ -306,8 +321,17 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs,
|
|||
if vlines:
|
||||
for vline in vlines:
|
||||
ax.axvline(vline, 0, 1, linestyle='--', color='k')
|
||||
ax.set_xlim(0, max_x)
|
||||
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
|
||||
|
||||
ax.set_xlim(min_x, max_x)
|
||||
if logscale:
|
||||
#nice scale for the logaritmic axis
|
||||
ax.set_ylim(0,10 ** math.ceil(math.log10(max_y)))
|
||||
|
||||
|
||||
if show_legend:
|
||||
fig.legend(loc='lower center',
|
||||
bbox_to_anchor=(1, 0.5),
|
||||
ncol=(len(method_names)+1)//2)
|
||||
|
||||
_save_or_show(savepath)
|
||||
|
||||
|
@ -370,7 +394,7 @@ def brokenbar_supremacy_by_drift(method_names, true_prevs, estim_prevs, tr_prevs
|
|||
bins[-1] += 0.001
|
||||
|
||||
# we use this to keep track of how many datapoits contribute to each bin
|
||||
inds_histogram_global = np.zeros(n_bins, dtype=np.float)
|
||||
inds_histogram_global = np.zeros(n_bins, dtype=float)
|
||||
n_methods = len(method_order)
|
||||
buckets = np.zeros(shape=(n_methods, n_bins, 3))
|
||||
for i, method in enumerate(method_order):
|
||||
|
|
|
@ -0,0 +1,490 @@
|
|||
from copy import deepcopy
|
||||
import quapy as qp
|
||||
import numpy as np
|
||||
import itertools
|
||||
from contextlib import ExitStack
|
||||
from abc import ABCMeta, abstractmethod
|
||||
from quapy.data import LabelledCollection
|
||||
import quapy.functional as F
|
||||
from os.path import exists
|
||||
from glob import glob
|
||||
|
||||
|
||||
class AbstractProtocol(metaclass=ABCMeta):
|
||||
"""
|
||||
Abstract parent class for sample generation protocols.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def __call__(self):
|
||||
"""
|
||||
Implements the protocol. Yields one sample at a time along with its prevalence
|
||||
|
||||
:return: yields a tuple `(sample, prev) at a time, where `sample` is a set of instances
|
||||
and in which `prev` is an `nd.array` with the class prevalence values
|
||||
"""
|
||||
...
|
||||
|
||||
def total(self):
|
||||
"""
|
||||
Indicates the total number of samples that the protocol generates.
|
||||
|
||||
:return: The number of samples to generate if known, or `None` otherwise.
|
||||
"""
|
||||
return None
|
||||
|
||||
|
||||
class IterateProtocol(AbstractProtocol):
|
||||
"""
|
||||
A very simple protocol which simply iterates over a list of previously generated samples
|
||||
|
||||
:param samples: a list of :class:`quapy.data.base.LabelledCollection`
|
||||
"""
|
||||
def __init__(self, samples: [LabelledCollection]):
|
||||
self.samples = samples
|
||||
|
||||
def __call__(self):
|
||||
"""
|
||||
Yields one sample from the initial list at a time
|
||||
|
||||
:return: yields a tuple `(sample, prev) at a time, where `sample` is a set of instances
|
||||
and in which `prev` is an `nd.array` with the class prevalence values
|
||||
"""
|
||||
for sample in self.samples:
|
||||
yield sample.Xp
|
||||
|
||||
def total(self):
|
||||
"""
|
||||
Returns the number of samples in this protocol
|
||||
|
||||
:return: int
|
||||
"""
|
||||
return len(self.samples)
|
||||
|
||||
|
||||
class AbstractStochasticSeededProtocol(AbstractProtocol):
|
||||
"""
|
||||
An `AbstractStochasticSeededProtocol` is a protocol that generates, via any random procedure (e.g.,
|
||||
via random sampling), sequences of :class:`quapy.data.base.LabelledCollection` samples.
|
||||
The protocol abstraction enforces
|
||||
the object to be instantiated using a seed, so that the sequence can be fully replicated.
|
||||
In order to make this functionality possible, the classes extending this abstraction need to
|
||||
implement only two functions, :meth:`samples_parameters` which generates all the parameters
|
||||
needed for extracting the samples, and :meth:`sample` that, given some parameters as input,
|
||||
deterministically generates a sample.
|
||||
|
||||
:param random_state: the seed for allowing to replicate any sequence of samples. Default is 0, meaning that
|
||||
the sequence will be consistent every time the protocol is called.
|
||||
"""
|
||||
|
||||
_random_state = -1 # means "not set"
|
||||
|
||||
def __init__(self, random_state=0):
|
||||
self.random_state = random_state
|
||||
|
||||
@property
|
||||
def random_state(self):
|
||||
return self._random_state
|
||||
|
||||
@random_state.setter
|
||||
def random_state(self, random_state):
|
||||
self._random_state = random_state
|
||||
|
||||
@abstractmethod
|
||||
def samples_parameters(self):
|
||||
"""
|
||||
This function has to return all the necessary parameters to replicate the samples
|
||||
|
||||
:return: a list of parameters, each of which serves to deterministically generate a sample
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
def sample(self, params):
|
||||
"""
|
||||
Extract one sample determined by the given parameters
|
||||
|
||||
:param params: all the necessary parameters to generate a sample
|
||||
:return: one sample (the same sample has to be generated for the same parameters)
|
||||
"""
|
||||
...
|
||||
|
||||
def __call__(self):
|
||||
"""
|
||||
Yields one sample at a time. The type of object returned depends on the `collator` function. The
|
||||
default behaviour returns tuples of the form `(sample, prevalence)`.
|
||||
|
||||
:return: a tuple `(sample, prevalence)` if return_type='sample_prev', or an instance of
|
||||
:class:`qp.data.LabelledCollection` if return_type='labelled_collection'
|
||||
"""
|
||||
with ExitStack() as stack:
|
||||
if self.random_state == -1:
|
||||
raise ValueError('The random seed has never been initialized. '
|
||||
'Set it to None not to impose replicability.')
|
||||
if self.random_state is not None:
|
||||
stack.enter_context(qp.util.temp_seed(self.random_state))
|
||||
for params in self.samples_parameters():
|
||||
yield self.collator(self.sample(params))
|
||||
|
||||
def collator(self, sample, *args):
|
||||
"""
|
||||
The collator prepares the sample to accommodate the desired output format before returning the output.
|
||||
This collator simply returns the sample as it is. Classes inheriting from this abstract class can
|
||||
implement their custom collators.
|
||||
|
||||
:param sample: the sample to be returned
|
||||
:param args: additional arguments
|
||||
:return: the sample adhering to a desired output format (in this case, the sample is returned as it is)
|
||||
"""
|
||||
return sample
|
||||
|
||||
|
||||
class OnLabelledCollectionProtocol:
|
||||
"""
|
||||
Protocols that generate samples from a :class:`qp.data.LabelledCollection` object.
|
||||
"""
|
||||
|
||||
RETURN_TYPES = ['sample_prev', 'labelled_collection', 'index']
|
||||
|
||||
def get_labelled_collection(self):
|
||||
"""
|
||||
Returns the labelled collection on which this protocol acts.
|
||||
|
||||
:return: an object of type :class:`qp.data.LabelledCollection`
|
||||
"""
|
||||
return self.data
|
||||
|
||||
def on_preclassified_instances(self, pre_classifications, in_place=False):
|
||||
"""
|
||||
Returns a copy of this protocol that acts on a modified version of the original
|
||||
:class:`qp.data.LabelledCollection` in which the original instances have been replaced
|
||||
with the outputs of a classifier for each instance. (This is convenient for speeding-up
|
||||
the evaluation procedures for many samples, by pre-classifying the instances in advance.)
|
||||
|
||||
:param pre_classifications: the predictions issued by a classifier, typically an array-like
|
||||
with shape `(n_instances,)` when the classifier is a hard one, or with shape
|
||||
`(n_instances, n_classes)` when the classifier is a probabilistic one.
|
||||
:param in_place: whether or not to apply the modification in-place or in a new copy (default).
|
||||
:return: a copy of this protocol
|
||||
"""
|
||||
assert len(pre_classifications) == len(self.data), \
|
||||
f'error: the pre-classified data has different shape ' \
|
||||
f'(expected {len(self.data)}, found {len(pre_classifications)})'
|
||||
if in_place:
|
||||
self.data.instances = pre_classifications
|
||||
return self
|
||||
else:
|
||||
new = deepcopy(self)
|
||||
return new.on_preclassified_instances(pre_classifications, in_place=True)
|
||||
|
||||
@classmethod
|
||||
def get_collator(cls, return_type='sample_prev'):
|
||||
"""
|
||||
Returns a collator function, i.e., a function that prepares the yielded data
|
||||
|
||||
:param return_type: either 'sample_prev' (default) if the collator is requested to yield tuples of
|
||||
`(sample, prevalence)`, or 'labelled_collection' when it is requested to yield instances of
|
||||
:class:`qp.data.LabelledCollection`
|
||||
:return: the collator function (a callable function that takes as input an instance of
|
||||
:class:`qp.data.LabelledCollection`)
|
||||
"""
|
||||
assert return_type in cls.RETURN_TYPES, \
|
||||
f'unknown return type passed as argument; valid ones are {cls.RETURN_TYPES}'
|
||||
if return_type=='sample_prev':
|
||||
return lambda lc:lc.Xp
|
||||
elif return_type=='labelled_collection':
|
||||
return lambda lc:lc
|
||||
|
||||
|
||||
class APP(AbstractStochasticSeededProtocol, OnLabelledCollectionProtocol):
|
||||
"""
|
||||
Implementation of the artificial prevalence protocol (APP).
|
||||
The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,
|
||||
[0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of
|
||||
prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,
|
||||
[1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of samples for each valid
|
||||
combination of prevalence values is indicated by `repeats`.
|
||||
|
||||
:param data: a `LabelledCollection` from which the samples will be drawn
|
||||
:param sample_size: integer, number of instances in each sample; if None (default) then it is taken from
|
||||
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
|
||||
:param n_prevalences: the number of equidistant prevalence points to extract from the [0,1] interval for the
|
||||
grid (default is 21)
|
||||
:param repeats: number of copies for each valid prevalence vector (default is 10)
|
||||
:param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1
|
||||
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
|
||||
will be the same every time the protocol is called)
|
||||
:param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or
|
||||
to "labelled_collection" to get instead instances of LabelledCollection
|
||||
"""
|
||||
|
||||
def __init__(self, data:LabelledCollection, sample_size=None, n_prevalences=21, repeats=10,
|
||||
smooth_limits_epsilon=0, random_state=0, return_type='sample_prev'):
|
||||
super(APP, self).__init__(random_state)
|
||||
self.data = data
|
||||
self.sample_size = qp._get_sample_size(sample_size)
|
||||
self.n_prevalences = n_prevalences
|
||||
self.repeats = repeats
|
||||
self.smooth_limits_epsilon = smooth_limits_epsilon
|
||||
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
|
||||
|
||||
def prevalence_grid(self):
|
||||
"""
|
||||
Generates vectors of prevalence values from an exhaustive grid of prevalence values. The
|
||||
number of prevalence values explored for each dimension depends on `n_prevalences`, so that, if, for example,
|
||||
`n_prevalences=11` then the prevalence values of the grid are taken from [0, 0.1, 0.2, ..., 0.9, 1]. Only
|
||||
valid prevalence distributions are returned, i.e., vectors of prevalence values that sum up to 1. For each
|
||||
valid vector of prevalence values, `repeat` copies are returned. The vector of prevalence values can be
|
||||
implicit (by setting `return_constrained_dim=False`), meaning that the last dimension (which is constrained
|
||||
to 1 - sum of the rest) is not returned (note that, quite obviously, in this case the vector does not sum up to
|
||||
1). Note that this method is deterministic, i.e., there is no random sampling anywhere.
|
||||
|
||||
:return: a `np.ndarray` of shape `(n, dimensions)` if `return_constrained_dim=True` or of shape
|
||||
`(n, dimensions-1)` if `return_constrained_dim=False`, where `n` is the number of valid combinations found
|
||||
in the grid multiplied by `repeat`
|
||||
"""
|
||||
dimensions = self.data.n_classes
|
||||
s = F.prevalence_linspace(self.n_prevalences, repeats=1, smooth_limits_epsilon=self.smooth_limits_epsilon)
|
||||
s = [s] * (dimensions - 1)
|
||||
prevs = [p for p in itertools.product(*s, repeat=1) if (sum(p) <= 1.0)]
|
||||
prevs = np.asarray(prevs).reshape(len(prevs), -1)
|
||||
if self.repeats > 1:
|
||||
prevs = np.repeat(prevs, self.repeats, axis=0)
|
||||
return prevs
|
||||
|
||||
def samples_parameters(self):
|
||||
"""
|
||||
Return all the necessary parameters to replicate the samples as according to the APP protocol.
|
||||
|
||||
:return: a list of indexes that realize the APP sampling
|
||||
"""
|
||||
indexes = []
|
||||
for prevs in self.prevalence_grid():
|
||||
index = self.data.sampling_index(self.sample_size, *prevs)
|
||||
indexes.append(index)
|
||||
return indexes
|
||||
|
||||
def sample(self, index):
|
||||
"""
|
||||
Realizes the sample given the index of the instances.
|
||||
|
||||
:param index: indexes of the instances to select
|
||||
:return: an instance of :class:`qp.data.LabelledCollection`
|
||||
"""
|
||||
return self.data.sampling_from_index(index)
|
||||
|
||||
def total(self):
|
||||
"""
|
||||
Returns the number of samples that will be generated
|
||||
|
||||
:return: int
|
||||
"""
|
||||
return F.num_prevalence_combinations(self.n_prevalences, self.data.n_classes, self.repeats)
|
||||
|
||||
|
||||
class NPP(AbstractStochasticSeededProtocol, OnLabelledCollectionProtocol):
|
||||
"""
|
||||
A generator of samples that implements the natural prevalence protocol (NPP). The NPP consists of drawing
|
||||
samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
|
||||
|
||||
:param data: a `LabelledCollection` from which the samples will be drawn
|
||||
:param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from
|
||||
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
|
||||
:param repeats: the number of samples to generate. Default is 100.
|
||||
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
|
||||
will be the same every time the protocol is called)
|
||||
:param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or
|
||||
to "labelled_collection" to get instead instances of LabelledCollection
|
||||
"""
|
||||
|
||||
def __init__(self, data:LabelledCollection, sample_size=None, repeats=100, random_state=0,
|
||||
return_type='sample_prev'):
|
||||
super(NPP, self).__init__(random_state)
|
||||
self.data = data
|
||||
self.sample_size = qp._get_sample_size(sample_size)
|
||||
self.repeats = repeats
|
||||
self.random_state = random_state
|
||||
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
|
||||
|
||||
def samples_parameters(self):
|
||||
"""
|
||||
Return all the necessary parameters to replicate the samples as according to the NPP protocol.
|
||||
|
||||
:return: a list of indexes that realize the NPP sampling
|
||||
"""
|
||||
indexes = []
|
||||
for _ in range(self.repeats):
|
||||
index = self.data.uniform_sampling_index(self.sample_size)
|
||||
indexes.append(index)
|
||||
return indexes
|
||||
|
||||
def sample(self, index):
|
||||
"""
|
||||
Realizes the sample given the index of the instances.
|
||||
|
||||
:param index: indexes of the instances to select
|
||||
:return: an instance of :class:`qp.data.LabelledCollection`
|
||||
"""
|
||||
return self.data.sampling_from_index(index)
|
||||
|
||||
def total(self):
|
||||
"""
|
||||
Returns the number of samples that will be generated (equals to "repeats")
|
||||
|
||||
:return: int
|
||||
"""
|
||||
return self.repeats
|
||||
|
||||
|
||||
class UPP(AbstractStochasticSeededProtocol, OnLabelledCollectionProtocol):
|
||||
"""
|
||||
A variant of :class:`APP` that, instead of using a grid of equidistant prevalence values,
|
||||
relies on the Kraemer algorithm for sampling unit (k-1)-simplex uniformly at random, with
|
||||
k the number of classes. This protocol covers the entire range of prevalence values in a
|
||||
statistical sense, i.e., unlike APP there is no guarantee that it is covered precisely
|
||||
equally for all classes, but it is preferred in cases in which the number of possible
|
||||
combinations of the grid values of APP makes this endeavour intractable.
|
||||
|
||||
:param data: a `LabelledCollection` from which the samples will be drawn
|
||||
:param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from
|
||||
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
|
||||
:param repeats: the number of samples to generate. Default is 100.
|
||||
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
|
||||
will be the same every time the protocol is called)
|
||||
:param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or
|
||||
to "labelled_collection" to get instead instances of LabelledCollection
|
||||
"""
|
||||
|
||||
def __init__(self, data: LabelledCollection, sample_size=None, repeats=100, random_state=0,
|
||||
return_type='sample_prev'):
|
||||
super(UPP, self).__init__(random_state)
|
||||
self.data = data
|
||||
self.sample_size = qp._get_sample_size(sample_size)
|
||||
self.repeats = repeats
|
||||
self.random_state = random_state
|
||||
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
|
||||
|
||||
def samples_parameters(self):
|
||||
"""
|
||||
Return all the necessary parameters to replicate the samples as according to the UPP protocol.
|
||||
|
||||
:return: a list of indexes that realize the UPP sampling
|
||||
"""
|
||||
indexes = []
|
||||
for prevs in F.uniform_simplex_sampling(n_classes=self.data.n_classes, size=self.repeats):
|
||||
index = self.data.sampling_index(self.sample_size, *prevs)
|
||||
indexes.append(index)
|
||||
return indexes
|
||||
|
||||
def sample(self, index):
|
||||
"""
|
||||
Realizes the sample given the index of the instances.
|
||||
|
||||
:param index: indexes of the instances to select
|
||||
:return: an instance of :class:`qp.data.LabelledCollection`
|
||||
"""
|
||||
return self.data.sampling_from_index(index)
|
||||
|
||||
def total(self):
|
||||
"""
|
||||
Returns the number of samples that will be generated (equals to "repeats")
|
||||
|
||||
:return: int
|
||||
"""
|
||||
return self.repeats
|
||||
|
||||
|
||||
class DomainMixer(AbstractStochasticSeededProtocol):
|
||||
"""
|
||||
Generates mixtures of two domains (A and B) at controlled rates, but preserving the original class prevalence.
|
||||
|
||||
:param domainA: one domain, an object of :class:`qp.data.LabelledCollection`
|
||||
:param domainB: another domain, an object of :class:`qp.data.LabelledCollection`
|
||||
:param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from
|
||||
qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.
|
||||
:param repeats: int, number of samples to draw for every mixture rate
|
||||
:param prevalence: the prevalence to preserv along the mixtures. If specified, should be an array containing
|
||||
one prevalence value (positive float) for each class and summing up to one. If not specified, the prevalence
|
||||
will be taken from the domain A (default).
|
||||
:param mixture_points: an integer indicating the number of points to take from a linear scale (e.g., 21 will
|
||||
generate the mixture points [1, 0.95, 0.9, ..., 0]), or the array of mixture values itself.
|
||||
the specific points
|
||||
:param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples
|
||||
will be the same every time the protocol is called)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
domainA: LabelledCollection,
|
||||
domainB: LabelledCollection,
|
||||
sample_size,
|
||||
repeats=1,
|
||||
prevalence=None,
|
||||
mixture_points=11,
|
||||
random_state=0,
|
||||
return_type='sample_prev'):
|
||||
super(DomainMixer, self).__init__(random_state)
|
||||
self.A = domainA
|
||||
self.B = domainB
|
||||
self.sample_size = qp._get_sample_size(sample_size)
|
||||
self.repeats = repeats
|
||||
if prevalence is None:
|
||||
self.prevalence = domainA.prevalence()
|
||||
else:
|
||||
self.prevalence = np.asarray(prevalence)
|
||||
assert len(self.prevalence) == domainA.n_classes, \
|
||||
f'wrong shape for the vector prevalence (expected {domainA.n_classes})'
|
||||
assert F.check_prevalence_vector(self.prevalence), \
|
||||
f'the prevalence vector is not valid (either it contains values outside [0,1] or does not sum up to 1)'
|
||||
if isinstance(mixture_points, int):
|
||||
self.mixture_points = np.linspace(0, 1, mixture_points)[::-1]
|
||||
else:
|
||||
self.mixture_points = np.asarray(mixture_points)
|
||||
assert all(np.logical_and(self.mixture_points >= 0, self.mixture_points<=1)), \
|
||||
'mixture_model datatype not understood (expected int or a sequence of real values in [0,1])'
|
||||
self.random_state = random_state
|
||||
self.collator = OnLabelledCollectionProtocol.get_collator(return_type)
|
||||
|
||||
def samples_parameters(self):
|
||||
"""
|
||||
Return all the necessary parameters to replicate the samples as according to the this protocol.
|
||||
|
||||
:return: a list of zipped indexes (from A and B) that realize the sampling
|
||||
"""
|
||||
indexesA, indexesB = [], []
|
||||
for propA in self.mixture_points:
|
||||
for _ in range(self.repeats):
|
||||
nA = int(np.round(self.sample_size * propA))
|
||||
nB = self.sample_size-nA
|
||||
sampleAidx = self.A.sampling_index(nA, *self.prevalence)
|
||||
sampleBidx = self.B.sampling_index(nB, *self.prevalence)
|
||||
indexesA.append(sampleAidx)
|
||||
indexesB.append(sampleBidx)
|
||||
return list(zip(indexesA, indexesB))
|
||||
|
||||
def sample(self, indexes):
|
||||
"""
|
||||
Realizes the sample given a pair of indexes of the instances from A and B.
|
||||
|
||||
:param indexes: indexes of the instances to select from A and B
|
||||
:return: an instance of :class:`qp.data.LabelledCollection`
|
||||
"""
|
||||
indexesA, indexesB = indexes
|
||||
sampleA = self.A.sampling_from_index(indexesA)
|
||||
sampleB = self.B.sampling_from_index(indexesB)
|
||||
return sampleA+sampleB
|
||||
|
||||
def total(self):
|
||||
"""
|
||||
Returns the number of samples that will be generated (equals to "repeats * mixture_points")
|
||||
|
||||
:return: int
|
||||
"""
|
||||
return self.repeats * len(self.mixture_points)
|
||||
|
||||
|
||||
# aliases
|
||||
|
||||
ArtificialPrevalenceProtocol = APP
|
||||
NaturalPrevalenceProtocol = NPP
|
||||
UniformPrevalenceProtocol = UPP
|
|
@ -1,7 +1,8 @@
|
|||
import pytest
|
||||
|
||||
from quapy.data.datasets import REVIEWS_SENTIMENT_DATASETS, TWITTER_SENTIMENT_DATASETS_TEST, \
|
||||
TWITTER_SENTIMENT_DATASETS_TRAIN, UCI_DATASETS, fetch_reviews, fetch_twitter, fetch_UCIDataset
|
||||
TWITTER_SENTIMENT_DATASETS_TRAIN, UCI_DATASETS, LEQUA2022_TASKS, \
|
||||
fetch_reviews, fetch_twitter, fetch_UCIDataset, fetch_lequa2022
|
||||
|
||||
|
||||
@pytest.mark.parametrize('dataset_name', REVIEWS_SENTIMENT_DATASETS)
|
||||
|
@ -41,3 +42,11 @@ def test_fetch_UCIDataset(dataset_name):
|
|||
print('Training set stats')
|
||||
dataset.training.stats()
|
||||
print('Test set stats')
|
||||
|
||||
|
||||
@pytest.mark.parametrize('dataset_name', LEQUA2022_TASKS)
|
||||
def test_fetch_lequa2022(dataset_name):
|
||||
train, gen_val, gen_test = fetch_lequa2022(dataset_name)
|
||||
print(train.stats())
|
||||
print('Val:', gen_val.total())
|
||||
print('Test:', gen_test.total())
|
||||
|
|
|
@ -0,0 +1,84 @@
|
|||
import unittest
|
||||
|
||||
import numpy as np
|
||||
|
||||
import quapy as qp
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from time import time
|
||||
|
||||
from error import QUANTIFICATION_ERROR_SINGLE, QUANTIFICATION_ERROR, QUANTIFICATION_ERROR_NAMES, \
|
||||
QUANTIFICATION_ERROR_SINGLE_NAMES
|
||||
from quapy.method.aggregative import EMQ, PCC
|
||||
from quapy.method.base import BaseQuantifier
|
||||
|
||||
|
||||
class EvalTestCase(unittest.TestCase):
|
||||
def test_eval_speedup(self):
|
||||
|
||||
data = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=10, pickle=True)
|
||||
train, test = data.training, data.test
|
||||
|
||||
protocol = qp.protocol.APP(test, sample_size=1000, n_prevalences=11, repeats=1, random_state=1)
|
||||
|
||||
class SlowLR(LogisticRegression):
|
||||
def predict_proba(self, X):
|
||||
import time
|
||||
time.sleep(1)
|
||||
return super().predict_proba(X)
|
||||
|
||||
emq = EMQ(SlowLR()).fit(train)
|
||||
|
||||
tinit = time()
|
||||
score = qp.evaluation.evaluate(emq, protocol, error_metric='mae', verbose=True, aggr_speedup='force')
|
||||
tend_optim = time()-tinit
|
||||
print(f'evaluation (with optimization) took {tend_optim}s [MAE={score:.4f}]')
|
||||
|
||||
class NonAggregativeEMQ(BaseQuantifier):
|
||||
|
||||
def __init__(self, cls):
|
||||
self.emq = EMQ(cls)
|
||||
|
||||
def quantify(self, instances):
|
||||
return self.emq.quantify(instances)
|
||||
|
||||
def fit(self, data):
|
||||
self.emq.fit(data)
|
||||
return self
|
||||
|
||||
emq = NonAggregativeEMQ(SlowLR()).fit(train)
|
||||
|
||||
tinit = time()
|
||||
score = qp.evaluation.evaluate(emq, protocol, error_metric='mae', verbose=True)
|
||||
tend_no_optim = time() - tinit
|
||||
print(f'evaluation (w/o optimization) took {tend_no_optim}s [MAE={score:.4f}]')
|
||||
|
||||
self.assertEqual(tend_no_optim>(tend_optim/2), True)
|
||||
|
||||
def test_evaluation_output(self):
|
||||
|
||||
data = qp.datasets.fetch_reviews('hp', tfidf=True, min_df=10, pickle=True)
|
||||
train, test = data.training, data.test
|
||||
|
||||
qp.environ['SAMPLE_SIZE']=100
|
||||
|
||||
protocol = qp.protocol.APP(test, random_state=0)
|
||||
|
||||
q = PCC(LogisticRegression()).fit(train)
|
||||
|
||||
single_errors = list(QUANTIFICATION_ERROR_SINGLE_NAMES)
|
||||
averaged_errors = ['m'+e for e in single_errors]
|
||||
single_errors = single_errors + [qp.error.from_name(e) for e in single_errors]
|
||||
averaged_errors = averaged_errors + [qp.error.from_name(e) for e in averaged_errors]
|
||||
for error_metric, averaged_error_metric in zip(single_errors, averaged_errors):
|
||||
score = qp.evaluation.evaluate(q, protocol, error_metric=averaged_error_metric)
|
||||
self.assertTrue(isinstance(score, float))
|
||||
|
||||
scores = qp.evaluation.evaluate(q, protocol, error_metric=error_metric)
|
||||
self.assertTrue(isinstance(scores, np.ndarray))
|
||||
|
||||
self.assertEqual(scores.mean(), score)
|
||||
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -0,0 +1,31 @@
|
|||
import unittest
|
||||
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import *
|
||||
|
||||
|
||||
|
||||
class HierarchyTestCase(unittest.TestCase):
|
||||
|
||||
def test_aggregative(self):
|
||||
lr = LogisticRegression()
|
||||
for m in [CC(lr), PCC(lr), ACC(lr), PACC(lr)]:
|
||||
self.assertEqual(isinstance(m, AggregativeQuantifier), True)
|
||||
|
||||
def test_binary(self):
|
||||
lr = LogisticRegression()
|
||||
for m in [HDy(lr)]:
|
||||
self.assertEqual(isinstance(m, BinaryQuantifier), True)
|
||||
|
||||
def test_probabilistic(self):
|
||||
lr = LogisticRegression()
|
||||
for m in [CC(lr), ACC(lr)]:
|
||||
self.assertEqual(isinstance(m, AggregativeProbabilisticQuantifier), False)
|
||||
for m in [PCC(lr), PACC(lr)]:
|
||||
self.assertEqual(isinstance(m, AggregativeProbabilisticQuantifier), True)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -0,0 +1,65 @@
|
|||
import unittest
|
||||
import numpy as np
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
import quapy as qp
|
||||
|
||||
|
||||
class LabelCollectionTestCase(unittest.TestCase):
|
||||
def test_split(self):
|
||||
x = np.arange(100)
|
||||
y = np.random.randint(0,5,100)
|
||||
data = qp.data.LabelledCollection(x,y)
|
||||
tr, te = data.split_random(0.7)
|
||||
check_prev = tr.prevalence()*0.7 + te.prevalence()*0.3
|
||||
|
||||
self.assertEqual(len(tr), 70)
|
||||
self.assertEqual(len(te), 30)
|
||||
self.assertEqual(np.allclose(check_prev, data.prevalence()), True)
|
||||
self.assertEqual(len(tr+te), len(data))
|
||||
|
||||
def test_join(self):
|
||||
x = np.arange(50)
|
||||
y = np.random.randint(2, 5, 50)
|
||||
data1 = qp.data.LabelledCollection(x, y)
|
||||
|
||||
x = np.arange(200)
|
||||
y = np.random.randint(0, 3, 200)
|
||||
data2 = qp.data.LabelledCollection(x, y)
|
||||
|
||||
x = np.arange(100)
|
||||
y = np.random.randint(0, 6, 100)
|
||||
data3 = qp.data.LabelledCollection(x, y)
|
||||
|
||||
combined = qp.data.LabelledCollection.join(data1, data2, data3)
|
||||
self.assertEqual(len(combined), len(data1)+len(data2)+len(data3))
|
||||
self.assertEqual(all(combined.classes_ == np.arange(6)), True)
|
||||
|
||||
x = np.random.rand(10, 3)
|
||||
y = np.random.randint(0, 1, 10)
|
||||
data4 = qp.data.LabelledCollection(x, y)
|
||||
with self.assertRaises(Exception):
|
||||
combined = qp.data.LabelledCollection.join(data1, data2, data3, data4)
|
||||
|
||||
x = np.random.rand(20, 3)
|
||||
y = np.random.randint(0, 1, 20)
|
||||
data5 = qp.data.LabelledCollection(x, y)
|
||||
combined = qp.data.LabelledCollection.join(data4, data5)
|
||||
self.assertEqual(len(combined), len(data4)+len(data5))
|
||||
|
||||
x = np.random.rand(10, 4)
|
||||
y = np.random.randint(0, 1, 10)
|
||||
data6 = qp.data.LabelledCollection(x, y)
|
||||
with self.assertRaises(Exception):
|
||||
combined = qp.data.LabelledCollection.join(data4, data5, data6)
|
||||
|
||||
data4.instances = csr_matrix(data4.instances)
|
||||
with self.assertRaises(Exception):
|
||||
combined = qp.data.LabelledCollection.join(data4, data5)
|
||||
data5.instances = csr_matrix(data5.instances)
|
||||
combined = qp.data.LabelledCollection.join(data4, data5)
|
||||
self.assertEqual(len(combined), len(data4) + len(data5))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -4,24 +4,28 @@ from sklearn.linear_model import LogisticRegression
|
|||
from sklearn.svm import LinearSVC
|
||||
|
||||
import quapy as qp
|
||||
from quapy.method.base import BinaryQuantifier
|
||||
from quapy.data import Dataset, LabelledCollection
|
||||
from quapy.method import AGGREGATIVE_METHODS, NON_AGGREGATIVE_METHODS, EXPLICIT_LOSS_MINIMIZATION_METHODS
|
||||
from quapy.method import AGGREGATIVE_METHODS, NON_AGGREGATIVE_METHODS
|
||||
from quapy.method.aggregative import ACC, PACC, HDy
|
||||
from quapy.method.meta import Ensemble
|
||||
|
||||
datasets = [pytest.param(qp.datasets.fetch_twitter('hcr'), id='hcr'),
|
||||
datasets = [pytest.param(qp.datasets.fetch_twitter('hcr', pickle=True), id='hcr'),
|
||||
pytest.param(qp.datasets.fetch_UCIDataset('ionosphere'), id='ionosphere')]
|
||||
|
||||
tinydatasets = [pytest.param(qp.datasets.fetch_twitter('hcr', pickle=True).reduce(), id='tiny_hcr'),
|
||||
pytest.param(qp.datasets.fetch_UCIDataset('ionosphere').reduce(), id='tiny_ionosphere')]
|
||||
|
||||
learners = [LogisticRegression, LinearSVC]
|
||||
|
||||
|
||||
@pytest.mark.parametrize('dataset', datasets)
|
||||
@pytest.mark.parametrize('aggregative_method', AGGREGATIVE_METHODS.difference(EXPLICIT_LOSS_MINIMIZATION_METHODS))
|
||||
@pytest.mark.parametrize('aggregative_method', AGGREGATIVE_METHODS)
|
||||
@pytest.mark.parametrize('learner', learners)
|
||||
def test_aggregative_methods(dataset: Dataset, aggregative_method, learner):
|
||||
model = aggregative_method(learner())
|
||||
|
||||
if model.binary and not dataset.binary:
|
||||
if isinstance(model, BinaryQuantifier) and not dataset.binary:
|
||||
print(f'skipping the test of binary model {type(model)} on non-binary dataset {dataset}')
|
||||
return
|
||||
|
||||
|
@ -35,36 +39,12 @@ def test_aggregative_methods(dataset: Dataset, aggregative_method, learner):
|
|||
assert type(error) == numpy.float64
|
||||
|
||||
|
||||
@pytest.mark.parametrize('dataset', datasets)
|
||||
@pytest.mark.parametrize('elm_method', EXPLICIT_LOSS_MINIMIZATION_METHODS)
|
||||
def test_elm_methods(dataset: Dataset, elm_method):
|
||||
try:
|
||||
model = elm_method()
|
||||
except AssertionError as ae:
|
||||
if ae.args[0].find('does not seem to point to a valid path') > 0:
|
||||
print('Missing SVMperf binary program, skipping test')
|
||||
return
|
||||
|
||||
if model.binary and not dataset.binary:
|
||||
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
|
||||
return
|
||||
|
||||
model.fit(dataset.training)
|
||||
|
||||
estim_prevalences = model.quantify(dataset.test.instances)
|
||||
|
||||
true_prevalences = dataset.test.prevalence()
|
||||
error = qp.error.mae(true_prevalences, estim_prevalences)
|
||||
|
||||
assert type(error) == numpy.float64
|
||||
|
||||
|
||||
@pytest.mark.parametrize('dataset', datasets)
|
||||
@pytest.mark.parametrize('non_aggregative_method', NON_AGGREGATIVE_METHODS)
|
||||
def test_non_aggregative_methods(dataset: Dataset, non_aggregative_method):
|
||||
model = non_aggregative_method()
|
||||
|
||||
if model.binary and not dataset.binary:
|
||||
if isinstance(model, BinaryQuantifier) and not dataset.binary:
|
||||
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
|
||||
return
|
||||
|
||||
|
@ -78,16 +58,20 @@ def test_non_aggregative_methods(dataset: Dataset, non_aggregative_method):
|
|||
assert type(error) == numpy.float64
|
||||
|
||||
|
||||
@pytest.mark.parametrize('base_method', AGGREGATIVE_METHODS.difference(EXPLICIT_LOSS_MINIMIZATION_METHODS))
|
||||
@pytest.mark.parametrize('learner', learners)
|
||||
@pytest.mark.parametrize('dataset', datasets)
|
||||
@pytest.mark.parametrize('base_method', AGGREGATIVE_METHODS)
|
||||
@pytest.mark.parametrize('learner', [LogisticRegression])
|
||||
@pytest.mark.parametrize('dataset', tinydatasets)
|
||||
@pytest.mark.parametrize('policy', Ensemble.VALID_POLICIES)
|
||||
def test_ensemble_method(base_method, learner, dataset: Dataset, policy):
|
||||
qp.environ['SAMPLE_SIZE'] = len(dataset.training)
|
||||
model = Ensemble(quantifier=base_method(learner()), size=5, policy=policy, n_jobs=-1)
|
||||
if model.binary and not dataset.binary:
|
||||
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
|
||||
qp.environ['SAMPLE_SIZE'] = 20
|
||||
base_quantifier=base_method(learner())
|
||||
if isinstance(base_quantifier, BinaryQuantifier) and not dataset.binary:
|
||||
print(f'skipping the test of binary model {base_quantifier} on non-binary dataset {dataset}')
|
||||
return
|
||||
if not dataset.binary and policy=='ds':
|
||||
print(f'skipping the test of binary policy ds on non-binary dataset {dataset}')
|
||||
return
|
||||
model = Ensemble(quantifier=base_quantifier, size=5, policy=policy, n_jobs=-1)
|
||||
|
||||
model.fit(dataset.training)
|
||||
|
||||
|
@ -106,21 +90,25 @@ def test_quanet_method():
|
|||
print('skipping QuaNet test due to missing torch package')
|
||||
return
|
||||
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
|
||||
# load the kindle dataset as text, and convert words to numerical indexes
|
||||
dataset = qp.datasets.fetch_reviews('kindle', pickle=True)
|
||||
dataset = Dataset(dataset.training.sampling(100, *dataset.training.prevalence()),
|
||||
dataset.test.sampling(100, *dataset.test.prevalence()))
|
||||
dataset = Dataset(dataset.training.sampling(200, *dataset.training.prevalence()),
|
||||
dataset.test.sampling(200, *dataset.test.prevalence()))
|
||||
qp.data.preprocessing.index(dataset, min_df=5, inplace=True)
|
||||
|
||||
from quapy.classification.neural import CNNnet
|
||||
cnn = CNNnet(dataset.vocabulary_size, dataset.training.n_classes)
|
||||
cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
|
||||
|
||||
from quapy.classification.neural import NeuralClassifierTrainer
|
||||
learner = NeuralClassifierTrainer(cnn, device='cuda')
|
||||
|
||||
from quapy.method.meta import QuaNet
|
||||
model = QuaNet(learner, sample_size=len(dataset.training), device='cuda')
|
||||
model = QuaNet(learner, device='cuda')
|
||||
|
||||
if model.binary and not dataset.binary:
|
||||
if isinstance(model, BinaryQuantifier) and not dataset.binary:
|
||||
print(f'skipping the test of binary model {model} on non-binary dataset {dataset}')
|
||||
return
|
||||
|
||||
|
@ -134,28 +122,15 @@ def test_quanet_method():
|
|||
assert type(error) == numpy.float64
|
||||
|
||||
|
||||
def models_to_test_for_str_label_names():
|
||||
models = list()
|
||||
learner = LogisticRegression
|
||||
for method in AGGREGATIVE_METHODS.difference(EXPLICIT_LOSS_MINIMIZATION_METHODS):
|
||||
models.append(method(learner()))
|
||||
for method in NON_AGGREGATIVE_METHODS:
|
||||
models.append(method())
|
||||
return models
|
||||
|
||||
|
||||
@pytest.mark.parametrize('model', models_to_test_for_str_label_names())
|
||||
def test_str_label_names(model):
|
||||
if type(model) in {ACC, PACC, HDy}:
|
||||
print(
|
||||
f'skipping the test of binary model {type(model)} because it currently does not support random seed control.')
|
||||
return
|
||||
def test_str_label_names():
|
||||
model = qp.method.aggregative.CC(LogisticRegression())
|
||||
|
||||
dataset = qp.datasets.fetch_reviews('imdb', pickle=True)
|
||||
dataset = Dataset(dataset.training.sampling(1000, *dataset.training.prevalence()),
|
||||
dataset.test.sampling(1000, *dataset.test.prevalence()))
|
||||
dataset.test.sampling(1000, 0.25, 0.75))
|
||||
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
|
||||
|
||||
numpy.random.seed(0)
|
||||
model.fit(dataset.training)
|
||||
|
||||
int_estim_prevalences = model.quantify(dataset.test.instances)
|
||||
|
@ -168,7 +143,8 @@ def test_str_label_names(model):
|
|||
['one' if label == 1 else 'zero' for label in dataset.training.labels]),
|
||||
LabelledCollection(dataset.test.instances,
|
||||
['one' if label == 1 else 'zero' for label in dataset.test.labels]))
|
||||
|
||||
assert all(dataset_str.training.classes_ == dataset_str.test.classes_), 'wrong indexation'
|
||||
numpy.random.seed(0)
|
||||
model.fit(dataset_str.training)
|
||||
|
||||
str_estim_prevalences = model.quantify(dataset_str.test.instances)
|
||||
|
|
|
@ -0,0 +1,108 @@
|
|||
import unittest
|
||||
|
||||
import numpy as np
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.svm import SVC
|
||||
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import PACC
|
||||
from quapy.model_selection import GridSearchQ
|
||||
from quapy.protocol import APP
|
||||
import time
|
||||
|
||||
|
||||
class ModselTestCase(unittest.TestCase):
|
||||
|
||||
def test_modsel(self):
|
||||
|
||||
q = PACC(LogisticRegression(random_state=1, max_iter=5000))
|
||||
|
||||
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
|
||||
training, validation = data.training.split_stratified(0.7, random_state=1)
|
||||
|
||||
param_grid = {'classifier__C': np.logspace(-3,3,7)}
|
||||
app = APP(validation, sample_size=100, random_state=1)
|
||||
q = GridSearchQ(
|
||||
q, param_grid, protocol=app, error='mae', refit=True, timeout=-1, verbose=True
|
||||
).fit(training)
|
||||
print('best params', q.best_params_)
|
||||
print('best score', q.best_score_)
|
||||
|
||||
self.assertEqual(q.best_params_['classifier__C'], 10.0)
|
||||
self.assertEqual(q.best_model().get_params()['classifier__C'], 10.0)
|
||||
|
||||
def test_modsel_parallel(self):
|
||||
|
||||
q = PACC(LogisticRegression(random_state=1, max_iter=5000))
|
||||
|
||||
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
|
||||
training, validation = data.training.split_stratified(0.7, random_state=1)
|
||||
# test = data.test
|
||||
|
||||
param_grid = {'classifier__C': np.logspace(-3,3,7)}
|
||||
app = APP(validation, sample_size=100, random_state=1)
|
||||
q = GridSearchQ(
|
||||
q, param_grid, protocol=app, error='mae', refit=True, timeout=-1, n_jobs=-1, verbose=True
|
||||
).fit(training)
|
||||
print('best params', q.best_params_)
|
||||
print('best score', q.best_score_)
|
||||
|
||||
self.assertEqual(q.best_params_['classifier__C'], 10.0)
|
||||
self.assertEqual(q.best_model().get_params()['classifier__C'], 10.0)
|
||||
|
||||
def test_modsel_parallel_speedup(self):
|
||||
class SlowLR(LogisticRegression):
|
||||
def fit(self, X, y, sample_weight=None):
|
||||
time.sleep(1)
|
||||
return super(SlowLR, self).fit(X, y, sample_weight)
|
||||
|
||||
q = PACC(SlowLR(random_state=1, max_iter=5000))
|
||||
|
||||
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
|
||||
training, validation = data.training.split_stratified(0.7, random_state=1)
|
||||
|
||||
param_grid = {'classifier__C': np.logspace(-3, 3, 7)}
|
||||
app = APP(validation, sample_size=100, random_state=1)
|
||||
|
||||
tinit = time.time()
|
||||
GridSearchQ(
|
||||
q, param_grid, protocol=app, error='mae', refit=False, timeout=-1, n_jobs=1, verbose=True
|
||||
).fit(training)
|
||||
tend_nooptim = time.time()-tinit
|
||||
|
||||
tinit = time.time()
|
||||
GridSearchQ(
|
||||
q, param_grid, protocol=app, error='mae', refit=False, timeout=-1, n_jobs=-1, verbose=True
|
||||
).fit(training)
|
||||
tend_optim = time.time() - tinit
|
||||
|
||||
print(f'parallel training took {tend_optim:.4f}s')
|
||||
print(f'sequential training took {tend_nooptim:.4f}s')
|
||||
|
||||
self.assertEqual(tend_optim < (0.5*tend_nooptim), True)
|
||||
|
||||
def test_modsel_timeout(self):
|
||||
|
||||
class SlowLR(LogisticRegression):
|
||||
def fit(self, X, y, sample_weight=None):
|
||||
import time
|
||||
time.sleep(10)
|
||||
super(SlowLR, self).fit(X, y, sample_weight)
|
||||
|
||||
q = PACC(SlowLR())
|
||||
|
||||
data = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=10)
|
||||
training, validation = data.training.split_stratified(0.7, random_state=1)
|
||||
# test = data.test
|
||||
|
||||
param_grid = {'classifier__C': np.logspace(-3,3,7)}
|
||||
app = APP(validation, sample_size=100, random_state=1)
|
||||
q = GridSearchQ(
|
||||
q, param_grid, protocol=app, error='mae', refit=True, timeout=3, n_jobs=-1, verbose=True
|
||||
)
|
||||
with self.assertRaises(TimeoutError):
|
||||
q.fit(training)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -0,0 +1,179 @@
|
|||
import unittest
|
||||
import numpy as np
|
||||
from quapy.data import LabelledCollection
|
||||
from quapy.protocol import APP, NPP, UPP, DomainMixer, AbstractStochasticSeededProtocol
|
||||
|
||||
|
||||
def mock_labelled_collection(prefix=''):
|
||||
y = [0] * 250 + [1] * 250 + [2] * 250 + [3] * 250
|
||||
X = [prefix + str(i) + '-' + str(yi) for i, yi in enumerate(y)]
|
||||
return LabelledCollection(X, y, classes=sorted(np.unique(y)))
|
||||
|
||||
|
||||
def samples_to_str(protocol):
|
||||
samples_str = ""
|
||||
for instances, prev in protocol():
|
||||
samples_str += f'{instances}\t{prev}\n'
|
||||
return samples_str
|
||||
|
||||
|
||||
class TestProtocols(unittest.TestCase):
|
||||
|
||||
def test_app_replicate(self):
|
||||
data = mock_labelled_collection()
|
||||
p = APP(data, sample_size=5, n_prevalences=11, random_state=42)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
p = APP(data, sample_size=5, n_prevalences=11) # <- random_state is by default set to 0
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
def test_app_not_replicate(self):
|
||||
data = mock_labelled_collection()
|
||||
p = APP(data, sample_size=5, n_prevalences=11, random_state=None)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertNotEqual(samples1, samples2)
|
||||
|
||||
p = APP(data, sample_size=5, n_prevalences=11, random_state=42)
|
||||
samples1 = samples_to_str(p)
|
||||
p = APP(data, sample_size=5, n_prevalences=11, random_state=0)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertNotEqual(samples1, samples2)
|
||||
|
||||
def test_app_number(self):
|
||||
data = mock_labelled_collection()
|
||||
p = APP(data, sample_size=100, n_prevalences=10, repeats=1)
|
||||
|
||||
# surprisingly enough, for some n_prevalences the test fails, notwithstanding
|
||||
# everything is correct. The problem is that in function APP.prevalence_grid()
|
||||
# there is sometimes one rounding error that gets cumulated and
|
||||
# surpasses 1.0 (by a very small float value, 0.0000000000002 or sthe like)
|
||||
# so these tuples are mistakenly removed... I have tried with np.close, and
|
||||
# other workarounds, but eventually happens that there is some negative probability
|
||||
# in the sampling function...
|
||||
|
||||
count = 0
|
||||
for _ in p():
|
||||
count+=1
|
||||
|
||||
self.assertEqual(count, p.total())
|
||||
|
||||
def test_npp_replicate(self):
|
||||
data = mock_labelled_collection()
|
||||
p = NPP(data, sample_size=5, repeats=5, random_state=42)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
p = NPP(data, sample_size=5, repeats=5) # <- random_state is by default set to 0
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
def test_npp_not_replicate(self):
|
||||
data = mock_labelled_collection()
|
||||
p = NPP(data, sample_size=5, repeats=5, random_state=None)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertNotEqual(samples1, samples2)
|
||||
|
||||
p = NPP(data, sample_size=5, repeats=5, random_state=42)
|
||||
samples1 = samples_to_str(p)
|
||||
p = NPP(data, sample_size=5, repeats=5, random_state=0)
|
||||
samples2 = samples_to_str(p)
|
||||
self.assertNotEqual(samples1, samples2)
|
||||
|
||||
def test_kraemer_replicate(self):
|
||||
data = mock_labelled_collection()
|
||||
p = UPP(data, sample_size=5, repeats=10, random_state=42)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
p = UPP(data, sample_size=5, repeats=10) # <- random_state is by default set to 0
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
def test_kraemer_not_replicate(self):
|
||||
data = mock_labelled_collection()
|
||||
p = UPP(data, sample_size=5, repeats=10, random_state=None)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertNotEqual(samples1, samples2)
|
||||
|
||||
def test_covariate_shift_replicate(self):
|
||||
dataA = mock_labelled_collection('domA')
|
||||
dataB = mock_labelled_collection('domB')
|
||||
p = DomainMixer(dataA, dataB, sample_size=10, mixture_points=11, random_state=1)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
p = DomainMixer(dataA, dataB, sample_size=10, mixture_points=11) # <- random_state is by default set to 0
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertEqual(samples1, samples2)
|
||||
|
||||
def test_covariate_shift_not_replicate(self):
|
||||
dataA = mock_labelled_collection('domA')
|
||||
dataB = mock_labelled_collection('domB')
|
||||
p = DomainMixer(dataA, dataB, sample_size=10, mixture_points=11, random_state=None)
|
||||
|
||||
samples1 = samples_to_str(p)
|
||||
samples2 = samples_to_str(p)
|
||||
|
||||
self.assertNotEqual(samples1, samples2)
|
||||
|
||||
def test_no_seed_init(self):
|
||||
class NoSeedInit(AbstractStochasticSeededProtocol):
|
||||
def __init__(self):
|
||||
self.data = mock_labelled_collection()
|
||||
|
||||
def samples_parameters(self):
|
||||
# return a matrix containing sampling indexes in the rows
|
||||
return np.random.randint(0, len(self.data), 10*10).reshape(10, 10)
|
||||
|
||||
def sample(self, params):
|
||||
index = np.unique(params)
|
||||
return self.data.sampling_from_index(index)
|
||||
|
||||
p = NoSeedInit()
|
||||
|
||||
# this should raise a ValueError, since the class is said to be AbstractStochasticSeededProtocol but the
|
||||
# random_seed has never been passed to super(NoSeedInit, self).__init__(random_seed)
|
||||
with self.assertRaises(ValueError):
|
||||
for sample in p():
|
||||
pass
|
||||
print('done')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -0,0 +1,78 @@
|
|||
import unittest
|
||||
import quapy as qp
|
||||
from quapy.data import LabelledCollection
|
||||
from quapy.functional import strprev
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
from quapy.method.aggregative import PACC
|
||||
|
||||
|
||||
class MyTestCase(unittest.TestCase):
|
||||
def test_prediction_replicability(self):
|
||||
|
||||
dataset = qp.datasets.fetch_UCIDataset('yeast')
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
lr = LogisticRegression(random_state=0, max_iter=10000)
|
||||
pacc = PACC(lr)
|
||||
prev = pacc.fit(dataset.training).quantify(dataset.test.X)
|
||||
str_prev1 = strprev(prev, prec=5)
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
lr = LogisticRegression(random_state=0, max_iter=10000)
|
||||
pacc = PACC(lr)
|
||||
prev2 = pacc.fit(dataset.training).quantify(dataset.test.X)
|
||||
str_prev2 = strprev(prev2, prec=5)
|
||||
|
||||
self.assertEqual(str_prev1, str_prev2) # add assertion here
|
||||
|
||||
def test_samping_replicability(self):
|
||||
import numpy as np
|
||||
|
||||
def equal_collections(c1, c2, value=True):
|
||||
self.assertEqual(np.all(c1.X == c2.X), value)
|
||||
self.assertEqual(np.all(c1.y == c2.y), value)
|
||||
if value:
|
||||
self.assertEqual(np.all(c1.classes_ == c2.classes_), value)
|
||||
|
||||
X = list(map(str, range(100)))
|
||||
y = np.random.randint(0, 2, 100)
|
||||
data = LabelledCollection(instances=X, labels=y)
|
||||
|
||||
sample1 = data.sampling(50)
|
||||
sample2 = data.sampling(50)
|
||||
equal_collections(sample1, sample2, False)
|
||||
|
||||
sample1 = data.sampling(50, random_state=0)
|
||||
sample2 = data.sampling(50, random_state=0)
|
||||
equal_collections(sample1, sample2, True)
|
||||
|
||||
sample1 = data.sampling(50, *[0.7, 0.3], random_state=0)
|
||||
sample2 = data.sampling(50, *[0.7, 0.3], random_state=0)
|
||||
equal_collections(sample1, sample2, True)
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
sample1 = data.sampling(50, *[0.7, 0.3])
|
||||
with qp.util.temp_seed(0):
|
||||
sample2 = data.sampling(50, *[0.7, 0.3])
|
||||
equal_collections(sample1, sample2, True)
|
||||
|
||||
sample1 = data.sampling(50, *[0.7, 0.3], random_state=0)
|
||||
sample2 = data.sampling(50, *[0.7, 0.3], random_state=0)
|
||||
equal_collections(sample1, sample2, True)
|
||||
|
||||
sample1_tr, sample1_te = data.split_stratified(train_prop=0.7, random_state=0)
|
||||
sample2_tr, sample2_te = data.split_stratified(train_prop=0.7, random_state=0)
|
||||
equal_collections(sample1_tr, sample2_tr, True)
|
||||
equal_collections(sample1_te, sample2_te, True)
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
sample1_tr, sample1_te = data.split_stratified(train_prop=0.7)
|
||||
with qp.util.temp_seed(0):
|
||||
sample2_tr, sample2_te = data.split_stratified(train_prop=0.7)
|
||||
equal_collections(sample1_tr, sample2_tr, True)
|
||||
equal_collections(sample1_te, sample2_te, True)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
|
@ -5,13 +5,14 @@ import os
|
|||
import pickle
|
||||
import urllib
|
||||
from pathlib import Path
|
||||
from contextlib import ExitStack
|
||||
import quapy as qp
|
||||
|
||||
import numpy as np
|
||||
from joblib import Parallel, delayed
|
||||
|
||||
|
||||
def _get_parallel_slices(n_tasks, n_jobs=-1):
|
||||
def _get_parallel_slices(n_tasks, n_jobs):
|
||||
if n_jobs == -1:
|
||||
n_jobs = multiprocessing.cpu_count()
|
||||
batch = int(n_tasks / n_jobs)
|
||||
|
@ -21,8 +22,9 @@ def _get_parallel_slices(n_tasks, n_jobs=-1):
|
|||
|
||||
def map_parallel(func, args, n_jobs):
|
||||
"""
|
||||
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then
|
||||
func is applied in two parallel processes to args[0:50] and to args[50:99]
|
||||
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and `n_jobs`=2, then
|
||||
func is applied in two parallel processes to args[0:50] and to args[50:99]. func is a function
|
||||
that already works with a list of arguments.
|
||||
|
||||
:param func: function to be parallelized
|
||||
:param args: array-like of arguments to be passed to the function in different parallel calls
|
||||
|
@ -36,7 +38,7 @@ def map_parallel(func, args, n_jobs):
|
|||
return list(itertools.chain.from_iterable(results))
|
||||
|
||||
|
||||
def parallel(func, args, n_jobs):
|
||||
def parallel(func, args, n_jobs, seed=None):
|
||||
"""
|
||||
A wrapper of multiprocessing:
|
||||
|
||||
|
@ -44,32 +46,43 @@ def parallel(func, args, n_jobs):
|
|||
>>> delayed(func)(args_i) for args_i in args
|
||||
>>> )
|
||||
|
||||
that takes the `quapy.environ` variable as input silently
|
||||
that takes the `quapy.environ` variable as input silently.
|
||||
Seeds the child processes to ensure reproducibility when n_jobs>1
|
||||
"""
|
||||
def func_dec(environ, *args):
|
||||
qp.environ = environ
|
||||
return func(*args)
|
||||
def func_dec(environ, seed, *args):
|
||||
qp.environ = environ.copy()
|
||||
qp.environ['N_JOBS'] = 1
|
||||
#set a context with a temporal seed to ensure results are reproducibles in parallel
|
||||
with ExitStack() as stack:
|
||||
if seed is not None:
|
||||
stack.enter_context(qp.util.temp_seed(seed))
|
||||
return func(*args)
|
||||
|
||||
return Parallel(n_jobs=n_jobs)(
|
||||
delayed(func_dec)(qp.environ, args_i) for args_i in args
|
||||
delayed(func_dec)(qp.environ, None if seed is None else seed+i, args_i) for i, args_i in enumerate(args)
|
||||
)
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def temp_seed(seed):
|
||||
def temp_seed(random_state):
|
||||
"""
|
||||
Can be used in a "with" context to set a temporal seed without modifying the outer numpy's current state. E.g.:
|
||||
|
||||
>>> with temp_seed(random_seed):
|
||||
>>> pass # do any computation depending on np.random functionality
|
||||
|
||||
:param seed: the seed to set within the "with" context
|
||||
:param random_state: the seed to set within the "with" context
|
||||
"""
|
||||
state = np.random.get_state()
|
||||
np.random.seed(seed)
|
||||
if random_state is not None:
|
||||
state = np.random.get_state()
|
||||
#save the seed just in case is needed (for instance for setting the seed to child processes)
|
||||
qp.environ['_R_SEED'] = random_state
|
||||
np.random.seed(random_state)
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
np.random.set_state(state)
|
||||
if random_state is not None:
|
||||
np.random.set_state(state)
|
||||
|
||||
|
||||
def download_file(url, archive_filename):
|
||||
|
@ -117,6 +130,7 @@ def create_if_not_exist(path):
|
|||
def get_quapy_home():
|
||||
"""
|
||||
Gets the home directory of QuaPy, i.e., the directory where QuaPy saves permanent data, such as dowloaded datasets.
|
||||
This directory is `~/quapy_data`
|
||||
|
||||
:return: a string representing the path
|
||||
"""
|
||||
|
@ -151,7 +165,7 @@ def save_text_file(path, text):
|
|||
|
||||
def pickled_resource(pickle_path:str, generation_func:callable, *args):
|
||||
"""
|
||||
Allows for fast reuse of resources that are generated only once by calling generation_func(*args). The next times
|
||||
Allows for fast reuse of resources that are generated only once by calling generation_func(\\*args). The next times
|
||||
this function is invoked, it loads the pickled resource. Example:
|
||||
|
||||
>>> def some_array(n): # a mock resource created with one parameter (`n`)
|
||||
|
@ -190,10 +204,6 @@ class EarlyStop:
|
|||
"""
|
||||
A class implementing the early-stopping condition typically used for training neural networks.
|
||||
|
||||
:param patience: the number of (consecutive) times that a monitored evaluation metric (typically obtaind in a
|
||||
held-out validation split) can be found to be worse than the best one obtained so far, before flagging the
|
||||
stopping condition. An instance of this class is `callable`, and is to be used as follows:
|
||||
|
||||
>>> earlystop = EarlyStop(patience=2, lower_is_better=True)
|
||||
>>> earlystop(0.9, epoch=0)
|
||||
>>> earlystop(0.7, epoch=1)
|
||||
|
@ -205,14 +215,14 @@ class EarlyStop:
|
|||
>>> earlystop.best_epoch # is 1
|
||||
>>> earlystop.best_score # is 0.7
|
||||
|
||||
|
||||
:param patience: the number of (consecutive) times that a monitored evaluation metric (typically obtaind in a
|
||||
held-out validation split) can be found to be worse than the best one obtained so far, before flagging the
|
||||
stopping condition. An instance of this class is `callable`, and is to be used as follows:
|
||||
:param lower_is_better: if True (default) the metric is to be minimized.
|
||||
|
||||
:ivar best_score: keeps track of the best value seen so far
|
||||
:ivar best_epoch: keeps track of the epoch in which the best score was set
|
||||
:ivar STOP: flag (boolean) indicating the stopping condition
|
||||
:ivar IMPROVED: flag (boolean) indicating whether there was an improvement in the last call
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, patience, lower_is_better=True):
|
||||
|
@ -243,3 +253,4 @@ class EarlyStop:
|
|||
self.patience -= 1
|
||||
if self.patience <= 0:
|
||||
self.STOP = True
|
||||
|
||||
|
|
Loading…
Reference in New Issue