adding documentation, adding brokenbar plots, merging plots from tweetsent with density

This commit is contained in:
Alejandro Moreo Fernandez 2021-11-22 18:10:48 +01:00
parent b78c8268fd
commit 1a3755eb58
10 changed files with 436 additions and 135 deletions

View File

@ -176,6 +176,8 @@
<li><a href="quapy.html#quapy.plot.binary_diagonal">binary_diagonal() (in module quapy.plot)</a>
</li>
<li><a href="quapy.method.html#quapy.method.base.BinaryQuantifier">BinaryQuantifier (class in quapy.method.base)</a>
</li>
<li><a href="quapy.html#quapy.plot.brokenbar_supremacy_by_drift">brokenbar_supremacy_by_drift() (in module quapy.plot)</a>
</li>
</ul></td>
</tr></table>

View File

@ -5,7 +5,8 @@
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>Welcome to QuaPys documentation! &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -45,11 +46,11 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="welcome-to-quapy-s-documentation">
<section id="welcome-to-quapy-s-documentation">
<h1>Welcome to QuaPys documentation!<a class="headerlink" href="#welcome-to-quapy-s-documentation" title="Permalink to this headline"></a></h1>
<p>QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)
written in Python.</p>
<div class="section" id="introduction">
<section id="introduction">
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline"></a></h2>
<p>QuaPy roots on the concept of data sample, and provides implementations of most important concepts
in quantification literature, such as the most important quantification baselines, many advanced
@ -57,7 +58,7 @@ quantification methods, quantification-oriented model selection, many evaluation
used for evaluating quantification methods.
QuaPy also integrates commonly used datasets and offers visualization tools for facilitating the analysis and
interpretation of results.</p>
<div class="section" id="a-quick-example">
<section id="a-quick-example">
<h3>A quick example:<a class="headerlink" href="#a-quick-example" title="Permalink to this headline"></a></h3>
<p>The following script fetchs a Twitter dataset, trains and evaluates an
<cite>Adjusted Classify &amp; Count</cite> model in terms of the <cite>Mean Absolute Error</cite> (MAE)
@ -87,8 +88,8 @@ class prevalence of the training set. For this reason, any Quantification model
should be tested across samples characterized by different class prevalences.
QuaPy implements sampling procedures and evaluation protocols that automates this endeavour.
See the <a class="reference internal" href="Evaluation.html"><span class="doc">Evaluation</span></a> for detailed examples.</p>
</div>
<div class="section" id="features">
</section>
<section id="features">
<h3>Features<a class="headerlink" href="#features" title="Permalink to this headline"></a></h3>
<ul class="simple">
<li><p>Implementation of most popular quantification methods (Classify-&amp;-Count variants, Expectation-Maximization, SVM-based variants for quantification, HDy, QuaNet, and Ensembles).</p></li>
@ -144,17 +145,17 @@ See the <a class="reference internal" href="Evaluation.html"><span class="doc">E
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="section" id="indices-and-tables">
</section>
</section>
</section>
<section id="indices-and-tables">
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline"></a></h1>
<ul class="simple">
<li><p><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></p></li>
<li><p><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></p></li>
<li><p><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></p></li>
</ul>
</div>
</section>
<div class="clearer"></div>

View File

@ -5,7 +5,8 @@
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>quapy &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -49,7 +50,7 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="quapy">
<section id="quapy">
<h1>quapy<a class="headerlink" href="#quapy" title="Permalink to this headline"></a></h1>
<div class="toctree-wrapper compound">
<ul>
@ -96,7 +97,7 @@
</li>
</ul>
</div>
</div>
</section>
<div class="clearer"></div>

Binary file not shown.

View File

@ -5,7 +5,8 @@
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>quapy.classification package &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -51,12 +52,12 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="quapy-classification-package">
<section id="quapy-classification-package">
<h1>quapy.classification package<a class="headerlink" href="#quapy-classification-package" title="Permalink to this headline"></a></h1>
<div class="section" id="submodules">
<section id="submodules">
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="module-quapy.classification.methods">
</section>
<section id="module-quapy.classification.methods">
<span id="quapy-classification-methods-module"></span><h2>quapy.classification.methods module<a class="headerlink" href="#module-quapy.classification.methods" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.methods.LowRankLogisticRegression">
@ -165,8 +166,8 @@ and eventually also <cite>n_components</cite> for <cite>TruncatedSVD</cite></p>
</dd></dl>
</div>
<div class="section" id="module-quapy.classification.neural">
</section>
<section id="module-quapy.classification.neural">
<span id="quapy-classification-neural-module"></span><h2>quapy.classification.neural module<a class="headerlink" href="#module-quapy.classification.neural" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.neural.CNNnet">
@ -562,8 +563,8 @@ applied, meaning that if the longest document in the batch is shorter than
</dd></dl>
</div>
<div class="section" id="module-quapy.classification.svmperf">
</section>
<section id="module-quapy.classification.svmperf">
<span id="quapy-classification-svmperf-module"></span><h2>quapy.classification.svmperf module<a class="headerlink" href="#module-quapy.classification.svmperf" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.classification.svmperf.SVMperf">
@ -653,11 +654,11 @@ for further details.</p>
</dd></dl>
</div>
<div class="section" id="module-quapy.classification">
</section>
<section id="module-quapy.classification">
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.classification" title="Permalink to this headline"></a></h2>
</div>
</div>
</section>
</section>
<div class="clearer"></div>

View File

@ -5,7 +5,8 @@
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>quapy.data package &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -51,12 +52,12 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="quapy-data-package">
<section id="quapy-data-package">
<h1>quapy.data package<a class="headerlink" href="#quapy-data-package" title="Permalink to this headline"></a></h1>
<div class="section" id="submodules">
<section id="submodules">
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="module-quapy.data.base">
</section>
<section id="module-quapy.data.base">
<span id="quapy-data-base-module"></span><h2>quapy.data.base module<a class="headerlink" href="#module-quapy.data.base" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.data.base.Dataset">
@ -206,8 +207,8 @@
<span class="sig-prename descclassname"><span class="pre">quapy.data.base.</span></span><span class="sig-name descname"><span class="pre">isbinary</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">data</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.data.base.isbinary" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.data.datasets">
</section>
<section id="module-quapy.data.datasets">
<span id="quapy-data-datasets-module"></span><h2>quapy.data.datasets module<a class="headerlink" href="#module-quapy.data.datasets" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.data.datasets.df_replace">
@ -270,8 +271,8 @@ faster subsequent invokations
<span class="sig-prename descclassname"><span class="pre">quapy.data.datasets.</span></span><span class="sig-name descname"><span class="pre">warn</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.data.datasets.warn" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.data.preprocessing">
</section>
<section id="module-quapy.data.preprocessing">
<span id="quapy-data-preprocessing-module"></span><h2>quapy.data.preprocessing module<a class="headerlink" href="#module-quapy.data.preprocessing" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.data.preprocessing.IndexTransformer">
@ -360,8 +361,8 @@ where the dimensions corresponding to infrequent instances have been removed</p>
where the instances are stored in a csr_matrix of real-valued tfidf scores</p>
</dd></dl>
</div>
<div class="section" id="module-quapy.data.reader">
</section>
<section id="module-quapy.data.reader">
<span id="quapy-data-reader-module"></span><h2>quapy.data.reader module<a class="headerlink" href="#module-quapy.data.reader" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.data.reader.binarize">
@ -422,11 +423,11 @@ E.g., y=[B, B, A, C] -&gt; [1,1,0,2], [A,B,
:return: a ndarray (int) of class indexes, and a ndarray of classnames corresponding to the indexes.</p>
</dd></dl>
</div>
<div class="section" id="module-quapy.data">
</section>
<section id="module-quapy.data">
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.data" title="Permalink to this headline"></a></h2>
</div>
</div>
</section>
</section>
<div class="clearer"></div>

View File

@ -5,7 +5,8 @@
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>quapy package &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -52,9 +53,9 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="quapy-package">
<section id="quapy-package">
<h1>quapy package<a class="headerlink" href="#quapy-package" title="Permalink to this headline"></a></h1>
<div class="section" id="subpackages">
<section id="subpackages">
<h2>Subpackages<a class="headerlink" href="#subpackages" title="Permalink to this headline"></a></h2>
<div class="toctree-wrapper compound">
<ul>
@ -87,11 +88,11 @@
</li>
</ul>
</div>
</div>
<div class="section" id="submodules">
</section>
<section id="submodules">
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="module-quapy.error">
</section>
<section id="module-quapy.error">
<span id="quapy-error-module"></span><h2>quapy.error module<a class="headerlink" href="#module-quapy.error" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.error.absolute_error">
@ -508,8 +509,8 @@ will be taken from the environment variable <cite>SAMPLE_SIZE</cite> (which has
</dl>
</dd></dl>
</div>
<div class="section" id="module-quapy.evaluation">
</section>
<section id="module-quapy.evaluation">
<span id="quapy-evaluation-module"></span><h2>quapy.evaluation module<a class="headerlink" href="#module-quapy.evaluation" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.evaluation.artificial_prevalence_prediction">
@ -584,8 +585,8 @@ contains the the prevalence estimations</p>
<span class="sig-prename descclassname"><span class="pre">quapy.evaluation.</span></span><span class="sig-name descname"><span class="pre">natural_prevalence_report</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">model</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><a class="reference internal" href="quapy.method.html#quapy.method.base.BaseQuantifier" title="quapy.method.base.BaseQuantifier"><span class="pre">quapy.method.base.BaseQuantifier</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">test</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><a class="reference internal" href="quapy.data.html#quapy.data.base.LabelledCollection" title="quapy.data.base.LabelledCollection"><span class="pre">quapy.data.base.LabelledCollection</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">sample_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_repetitions</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_jobs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">random_seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">42</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">error_metrics</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Iterable</span><span class="p"><span class="pre">[</span></span><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span> </span><span class="pre">Callable</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">'mae'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">verbose</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.evaluation.natural_prevalence_report" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.functional">
</section>
<section id="module-quapy.functional">
<span id="quapy-functional-module"></span><h2>quapy.functional module<a class="headerlink" href="#module-quapy.functional" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.functional.HellingerDistance">
@ -668,8 +669,8 @@ and with the limits smoothed, i.e.:
<span class="sig-prename descclassname"><span class="pre">quapy.functional.</span></span><span class="sig-name descname"><span class="pre">uniform_simplex_sampling</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n_classes</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">size</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.functional.uniform_simplex_sampling" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.model_selection">
</section>
<section id="module-quapy.model_selection">
<span id="quapy-model-selection-module"></span><h2>quapy.model_selection module<a class="headerlink" href="#module-quapy.model_selection" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.model_selection.GridSearchQ">
@ -783,27 +784,92 @@ a float in [0,1] indicating the proportion of labelled data to extract from the
</dd></dl>
</div>
<div class="section" id="module-quapy.plot">
</section>
<section id="module-quapy.plot">
<span id="quapy-plot-module"></span><h2>quapy.plot module<a class="headerlink" href="#module-quapy.plot" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.plot.binary_bias_bins">
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">binary_bias_bins</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="pre">method_names</span></em>, <em class="sig-param"><span class="pre">true_prevs</span></em>, <em class="sig-param"><span class="pre">estim_prevs</span></em>, <em class="sig-param"><span class="pre">pos_class=1</span></em>, <em class="sig-param"><span class="pre">title=None</span></em>, <em class="sig-param"><span class="pre">nbins=5</span></em>, <em class="sig-param"><span class="pre">colormap=&lt;matplotlib.colors.ListedColormap</span> <span class="pre">object&gt;</span></em>, <em class="sig-param"><span class="pre">vertical_xticks=False</span></em>, <em class="sig-param"><span class="pre">legend=True</span></em>, <em class="sig-param"><span class="pre">savepath=None</span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.binary_bias_bins" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dd><dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>method_names</strong> array-like with the method names for each experiment</p></li>
<li><p><strong>true_prevs</strong> array-like with the true prevalence values (each being a ndarray with n_classes components) for
each experiment</p></li>
<li><p><strong>estim_prevs</strong> array-like with the estimated prevalence values (each being a ndarray with n_classes components)
for each experiment</p></li>
<li><p><strong>pos_class</strong> index of the positive class</p></li>
<li><p><strong>title</strong> the title to be displayed in the plot</p></li>
<li><p><strong>nbins</strong> number of bins</p></li>
<li><p><strong>colormap</strong> the matplotlib colormap to use (default cm.tab10)</p></li>
<li><p><strong>vertical_xticks</strong> </p></li>
<li><p><strong>legend</strong> whether or not to display the legend (default is True)</p></li>
<li><p><strong>savepath</strong> path where to save the plot. If not indicated (as default), the plot is shown.</p></li>
</ul>
</dd>
</dl>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.plot.binary_bias_global">
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">binary_bias_global</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method_names</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">true_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">estim_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">pos_class</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">title</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">savepath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.binary_bias_global" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dd><p>Box-plots displaying the global bias (i.e., signed error computed as the estimated value minus the true value)
for each quantification method with respect to a given positive class.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>method_names</strong> array-like with the method names for each experiment</p></li>
<li><p><strong>true_prevs</strong> array-like with the true prevalence values (each being a ndarray with n_classes components) for
each experiment</p></li>
<li><p><strong>estim_prevs</strong> array-like with the estimated prevalence values (each being a ndarray with n_classes components)
for each experiment</p></li>
<li><p><strong>pos_class</strong> index of the positive class</p></li>
<li><p><strong>title</strong> the title to be displayed in the plot</p></li>
<li><p><strong>savepath</strong> path where to save the plot. If not indicated (as default), the plot is shown.</p></li>
</ul>
</dd>
</dl>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.plot.binary_diagonal">
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">binary_diagonal</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method_names</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">true_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">estim_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">pos_class</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">title</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_std</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">legend</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">train_prev</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">savepath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.binary_diagonal" title="Permalink to this definition"></a></dt>
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">binary_diagonal</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method_names</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">true_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">estim_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">pos_class</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">title</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_std</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">legend</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">train_prev</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">savepath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">method_order</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.binary_diagonal" title="Permalink to this definition"></a></dt>
<dd><p>The diagonal plot displays the predicted prevalence values (along the y-axis) as a function of the true prevalence
values (along the x-axis). The optimal quantifier is described by the diagonal (0,0)-(1,1) of the plot (hence the
name). It is convenient for binary quantification problems, though it can be used for multiclass problems by
indicating which class is to be taken as the positive class. (For multiclass quantification problems, other plots
like the <a class="reference internal" href="#quapy.plot.error_by_drift" title="quapy.plot.error_by_drift"><code class="xref py py-meth docutils literal notranslate"><span class="pre">error_by_drift()</span></code></a> might be preferable though).</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>method_names</strong> array-like with the method names for each experiment</p></li>
<li><p><strong>true_prevs</strong> array-like with the true prevalence values (each being a ndarray with n_classes components) for
each experiment</p></li>
<li><p><strong>estim_prevs</strong> array-like with the estimated prevalence values (each being a ndarray with n_classes components)
for each experiment</p></li>
<li><p><strong>pos_class</strong> index of the positive class</p></li>
<li><p><strong>title</strong> the title to be displayed in the plot</p></li>
<li><p><strong>show_std</strong> whether or not to show standard deviations (represented by color bands). This might be inconvenient
for cases in which many methods are compared, or when the standard deviations are high default True)</p></li>
<li><p><strong>legend</strong> whether or not to display the leyend (default True)</p></li>
<li><p><strong>train_prev</strong> if indicated (default is None), the training prevalence (for the positive class) is hightlighted
in the plot. This is convenient when all the experiments have been conducted in the same dataset.</p></li>
<li><p><strong>savepath</strong> path where to save the plot. If not indicated (as default), the plot is shown.</p></li>
<li><p><strong>method_order</strong> if indicated (default is None), imposes the order in which the methods are processed (i.e.,
listed in the legend and associated with matplotlib colors).</p></li>
</ul>
</dd>
</dl>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.plot.brokenbar_supremacy_by_drift">
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">brokenbar_supremacy_by_drift</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method_names</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">true_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">estim_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">tr_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_bins</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">20</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">binning</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'isomerous'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">x_error</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'ae'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">y_error</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'ae'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ttest_alpha</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.005</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">tail_density_threshold</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.005</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">method_order</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">savepath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.brokenbar_supremacy_by_drift" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.plot.error_by_drift">
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">error_by_drift</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method_names</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">true_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">estim_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">tr_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_bins</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">20</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">error_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'ae'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_std</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">logscale</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">title</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'Quantification</span> <span class="pre">error</span> <span class="pre">as</span> <span class="pre">a</span> <span class="pre">function</span> <span class="pre">of</span> <span class="pre">distribution</span> <span class="pre">shift'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">savepath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.error_by_drift" title="Permalink to this definition"></a></dt>
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">error_by_drift</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method_names</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">true_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">estim_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">tr_prevs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_bins</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">20</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">error_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'ae'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_std</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_density</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">logscale</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">title</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'Quantification</span> <span class="pre">error</span> <span class="pre">as</span> <span class="pre">a</span> <span class="pre">function</span> <span class="pre">of</span> <span class="pre">distribution</span> <span class="pre">shift'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">savepath</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">vlines</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">method_order</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.error_by_drift" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
<dl class="py function">
@ -811,8 +877,8 @@ a float in [0,1] indicating the proportion of labelled data to extract from the
<span class="sig-prename descclassname"><span class="pre">quapy.plot.</span></span><span class="sig-name descname"><span class="pre">save_or_show</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">savepath</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.plot.save_or_show" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.util">
</section>
<section id="module-quapy.util">
<span id="quapy-util-module"></span><h2>quapy.util module<a class="headerlink" href="#module-quapy.util" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.util.EarlyStop">
@ -901,16 +967,16 @@ with temp_seed(random_seed):</p>
</dl>
</dd></dl>
</div>
<div class="section" id="module-quapy">
</section>
<section id="module-quapy">
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.isbinary">
<span class="sig-prename descclassname"><span class="pre">quapy.</span></span><span class="sig-name descname"><span class="pre">isbinary</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.isbinary" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
</div>
</section>
</section>
<div class="clearer"></div>

View File

@ -5,7 +5,8 @@
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>quapy.method package &#8212; QuaPy 0.1.6 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
@ -47,12 +48,12 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="quapy-method-package">
<section id="quapy-method-package">
<h1>quapy.method package<a class="headerlink" href="#quapy-method-package" title="Permalink to this headline"></a></h1>
<div class="section" id="submodules">
<section id="submodules">
<h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this headline"></a></h2>
</div>
<div class="section" id="module-quapy.method.aggregative">
</section>
<section id="module-quapy.method.aggregative">
<span id="quapy-method-aggregative-module"></span><h2>quapy.method.aggregative module<a class="headerlink" href="#module-quapy.method.aggregative" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.method.aggregative.ACC">
@ -599,8 +600,8 @@ LabelledCollection, represents the validation split itself
or None otherwise) to be used as a validation set for any subsequent parameter fitting</p>
</dd></dl>
</div>
<div class="section" id="module-quapy.method.base">
</section>
<section id="module-quapy.method.base">
<span id="quapy-method-base-module"></span><h2>quapy.method.base module<a class="headerlink" href="#module-quapy.method.base" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.method.base.BaseQuantifier">
@ -674,8 +675,8 @@ or None otherwise) to be used as a validation set for any subsequent parameter f
<span class="sig-prename descclassname"><span class="pre">quapy.method.base.</span></span><span class="sig-name descname"><span class="pre">isprobabilistic</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">model</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><a class="reference internal" href="#quapy.method.base.BaseQuantifier" title="quapy.method.base.BaseQuantifier"><span class="pre">quapy.method.base.BaseQuantifier</span></a></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.method.base.isprobabilistic" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.method.meta">
</section>
<section id="module-quapy.method.meta">
<span id="quapy-method-meta-module"></span><h2>quapy.method.meta module<a class="headerlink" href="#module-quapy.method.meta" title="Permalink to this headline"></a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="quapy.method.meta.EACC">
@ -814,8 +815,8 @@ to a first approximation of the test prevalence as made by all models in the ens
<span class="sig-prename descclassname"><span class="pre">quapy.method.meta.</span></span><span class="sig-name descname"><span class="pre">get_probability_distribution</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">posterior_probabilities</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bins</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">8</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.method.meta.get_probability_distribution" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.method.neural">
</section>
<section id="module-quapy.method.neural">
<span id="quapy-method-neural-module"></span><h2>quapy.method.neural module<a class="headerlink" href="#module-quapy.method.neural" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.method.neural.QuaNetModule">
@ -912,8 +913,8 @@ fit_learner=False, the data will be split in 66/34 for training QuaNet and valid
<span class="sig-prename descclassname"><span class="pre">quapy.method.neural.</span></span><span class="sig-name descname"><span class="pre">mae_loss</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">target</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.method.neural.mae_loss" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</div>
<div class="section" id="module-quapy.method.non_aggregative">
</section>
<section id="module-quapy.method.non_aggregative">
<span id="quapy-method-non-aggregative-module"></span><h2>quapy.method.non_aggregative module<a class="headerlink" href="#module-quapy.method.non_aggregative" title="Permalink to this headline"></a></h2>
<dl class="py class">
<dt class="sig sig-object py" id="quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation">
@ -946,11 +947,11 @@ fit_learner=False, the data will be split in 66/34 for training QuaNet and valid
</dd></dl>
</div>
<div class="section" id="module-quapy.method">
</section>
<section id="module-quapy.method">
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.method" title="Permalink to this headline"></a></h2>
</div>
</div>
</section>
</section>
<div class="clearer"></div>

File diff suppressed because one or more lines are too long

View File

@ -1,34 +1,58 @@
from collections import defaultdict
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
import numpy as np
from matplotlib import cm
from scipy.stats import ttest_ind_from_stats
import quapy as qp
from matplotlib.font_manager import FontProperties
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 200
plt.rcParams['font.size'] = 16
def _set_colors(ax, n_methods):
NUM_COLORS = n_methods
cm = plt.get_cmap('tab20')
ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
def binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=None, show_std=True, legend=True,
train_prev=None, savepath=None):
train_prev=None, savepath=None, method_order=None):
"""
The diagonal plot displays the predicted prevalence values (along the y-axis) as a function of the true prevalence
values (along the x-axis). The optimal quantifier is described by the diagonal (0,0)-(1,1) of the plot (hence the
name). It is convenient for binary quantification problems, though it can be used for multiclass problems by
indicating which class is to be taken as the positive class. (For multiclass quantification problems, other plots
like the :meth:`error_by_drift` might be preferable though).
:param method_names: array-like with the method names for each experiment
:param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for
each experiment
:param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)
for each experiment
:param pos_class: index of the positive class
:param title: the title to be displayed in the plot
:param show_std: whether or not to show standard deviations (represented by color bands). This might be inconvenient
for cases in which many methods are compared, or when the standard deviations are high -- default True)
:param legend: whether or not to display the leyend (default True)
:param train_prev: if indicated (default is None), the training prevalence (for the positive class) is hightlighted
in the plot. This is convenient when all the experiments have been conducted in the same dataset.
:param savepath: path where to save the plot. If not indicated (as default), the plot is shown.
:param method_order: if indicated (default is None), imposes the order in which the methods are processed (i.e.,
listed in the legend and associated with matplotlib colors).
"""
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.grid()
ax.plot([0, 1], [0, 1], '--k', label='ideal', zorder=1)
method_names, true_prevs, estim_prevs = _merge(method_names, true_prevs, estim_prevs)
_set_colors(ax, n_methods=len(method_names))
for method, true_prev, estim_prev in zip(method_names, true_prevs, estim_prevs):
order = list(zip(method_names, true_prevs, estim_prevs))
if method_order is not None:
table = {method_name:[true_prev, estim_prev] for method_name, true_prev, estim_prev in order}
order = [(method_name, *table[method_name]) for method_name in method_order]
cm = plt.get_cmap('tab20')
NUM_COLORS = len(method_names)
ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
for method, true_prev, estim_prev in order:
true_prev = true_prev[:,pos_class]
estim_prev = estim_prev[:,pos_class]
@ -50,14 +74,32 @@ def binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=No
ax.set_xlim(0, 1)
if legend:
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# box = ax.get_position()
# ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
ax.legend(loc='lower center',
bbox_to_anchor=(1, -0.5),
ncol=(len(method_names)+1)//2)
save_or_show(savepath)
def binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title=None, savepath=None):
"""
Box-plots displaying the global bias (i.e., signed error computed as the estimated value minus the true value)
for each quantification method with respect to a given positive class.
:param method_names: array-like with the method names for each experiment
:param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for
each experiment
:param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)
for each experiment
:param pos_class: index of the positive class
:param title: the title to be displayed in the plot
:param savepath: path where to save the plot. If not indicated (as default), the plot is shown.
"""
method_names, true_prevs, estim_prevs = _merge(method_names, true_prevs, estim_prevs)
fig, ax = plt.subplots()
@ -79,33 +121,47 @@ def binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title
def binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=None, nbins=5, colormap=cm.tab10,
vertical_xticks=False, legend=True, savepath=None):
"""
:param method_names: array-like with the method names for each experiment
:param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for
each experiment
:param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)
for each experiment
:param pos_class: index of the positive class
:param title: the title to be displayed in the plot
:param nbins: number of bins
:param colormap: the matplotlib colormap to use (default cm.tab10)
:param vertical_xticks:
:param legend: whether or not to display the legend (default is True)
:param savepath: path where to save the plot. If not indicated (as default), the plot is shown.
"""
from pylab import boxplot, plot, setp
fig, ax = plt.subplots()
ax.grid()
method_names, true_prevs, estim_prevs = _merge(method_names, true_prevs, estim_prevs)
_set_colors(ax, n_methods=len(method_names))
bins = np.linspace(0, 1, nbins+1)
binwidth = 1/nbins
data = {}
for method, true_prev, estim_prev in zip(method_names, true_prevs, estim_prevs):
true_prev = true_prev[:, pos_class]
estim_prev = estim_prev[:, pos_class]
true_prev = true_prev[:,pos_class]
estim_prev = estim_prev[:,pos_class]
data[method] = []
inds = np.digitize(true_prev, bins[1:], right=True)
inds = np.digitize(true_prev, bins, right=True)
for ind in range(len(bins)):
selected = inds==ind
data[method].append(estim_prev[selected] - true_prev[selected])
nmethods = len(method_names)
boxwidth = binwidth/(nmethods+4)
for i,bin in enumerate(bins):
for i,bin in enumerate(bins[:-1]):
boxdata = [data[method][i] for method in method_names]
positions = [bin+(i*boxwidth)+2*boxwidth for i,_ in enumerate(method_names)]
box = boxplot(boxdata, showmeans=False, positions=positions, widths=boxwidth, sym='+', patch_artist=True)
box = boxplot(boxdata, showmeans=False, positions=positions, widths = boxwidth, sym='+', patch_artist=True)
for boxid in range(len(method_names)):
c = colormap.colors[boxid%len(colormap.colors)]
setp(box['fliers'][boxid], color=c, marker='+', markersize=3., markeredgecolor=c)
@ -118,7 +174,7 @@ def binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=N
major_xticks_positions.append(b)
minor_xticks_positions.append(b + binwidth / 2)
major_xticks_labels.append('')
minor_xticks_labels.append(f'[{bins[i]:.2f}-{bins[i + 1]:.2f}' + (')' if i < len(bins)-2 else ']'))
minor_xticks_labels.append(f'[{bins[i]:.2f}-{bins[i + 1]:.2f})')
ax.set_xticks(major_xticks_positions)
ax.set_xticks(minor_xticks_positions, minor=True)
ax.set_xticklabels(major_xticks_labels)
@ -166,10 +222,19 @@ def _merge(method_names, true_prevs, estim_prevs):
return method_order, true_prevs_, estim_prevs_
def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=True,
def _set_colors(ax, n_methods):
NUM_COLORS = n_methods
cm = plt.get_cmap('tab20')
ax.set_prop_cycle(color=[cm(1. * i / NUM_COLORS) for i in range(NUM_COLORS)])
def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=False,
show_density=True,
logscale=False,
title=f'Quantification error as a function of distribution shift',
savepath=None):
savepath=None,
vlines=None,
method_order=None):
fig, ax = plt.subplots()
ax.grid()
@ -177,28 +242,17 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, e
x_error = qp.error.ae
y_error = getattr(qp.error, error_name)
# join all data, and keep the order in which the methods appeared for the first time
data = defaultdict(lambda:{'x':np.empty(shape=(0)), 'y':np.empty(shape=(0))})
method_order = []
for method, test_prevs_i, estim_prevs_i, tr_prev_i in zip(method_names, true_prevs, estim_prevs, tr_prevs):
tr_prev_i = np.repeat(tr_prev_i.reshape(1,-1), repeats=test_prevs_i.shape[0], axis=0)
tr_test_drifts = x_error(test_prevs_i, tr_prev_i)
data[method]['x'] = np.concatenate([data[method]['x'], tr_test_drifts])
method_drifts = y_error(test_prevs_i, estim_prevs_i)
data[method]['y'] = np.concatenate([data[method]['y'], method_drifts])
if method not in method_order:
method_order.append(method)
# get all data as a dictionary {'m':{'x':ndarray, 'y':ndarray}} where 'm' is a method name (in the same
# order as in method_order (if specified), and where 'x' are the train-test shifts (computed as according to
# x_error function) and 'y' is the estim-test shift (computed as according to y_error)
data = __join_data_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, x_error, y_error, method_order)
_set_colors(ax, n_methods=len(method_order))
bins = np.linspace(0, 1, n_bins+1)
inds_histogram_global = np.zeros(n_bins, dtype=np.float) # we use this to keep track of how many datapoits contribute to each bin
binwidth = 1 / n_bins
min_x, max_x = None, None
min_x, max_x, min_y, max_y = None, None, None, None
npoints = np.zeros(len(bins), dtype=float)
for method in method_order:
tr_test_drifts = data[method]['x']
method_drifts = data[method]['y']
@ -206,40 +260,194 @@ def error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, e
method_drifts=np.log(1+method_drifts)
inds = np.digitize(tr_test_drifts, bins, right=True)
inds_histogram_global += np.histogram(tr_test_drifts, density=True, bins=bins)[0]
xs, ys, ystds = [], [], []
for ind in range(len(bins)):
for p,ind in enumerate(range(len(bins))):
selected = inds==ind
if selected.sum() > 0:
xs.append(ind*binwidth)
xs.append(ind*binwidth-binwidth/2)
ys.append(np.mean(method_drifts[selected]))
ystds.append(np.std(method_drifts[selected]))
npoints[p] += len(method_drifts[selected])
xs = np.asarray(xs)
ys = np.asarray(ys)
ystds = np.asarray(ystds)
min_x_method, max_x_method = xs.min(), xs.max()
min_x_method, max_x_method, min_y_method, max_y_method = xs.min(), xs.max(), ys.min(), ys.max()
min_x = min_x_method if min_x is None or min_x_method < min_x else min_x
max_x = max_x_method if max_x is None or max_x_method > max_x else max_x
max_y = max_y_method if max_y is None or max_y_method > max_y else max_y
min_y = min_y_method if min_y is None or min_y_method < min_y else min_y
max_y = max_y_method if max_y is None or max_y_method > max_y else max_y
ax.errorbar(xs, ys, fmt='-', marker='o', color='w', markersize=8, linewidth=4, zorder=1)
ax.errorbar(xs, ys, fmt='-', marker='o', label=method, markersize=6, linewidth=2, zorder=2)
ax.errorbar(xs, ys, fmt='-', marker='o', label=method, markersize=3, zorder=2)
if show_std:
ax.fill_between(xs, ys-ystds, ys+ystds, alpha=0.25)
# xs = bins[:-1]
# ys = inds_histogram_global
# print(xs.shape, ys.shape)
# ax.errorbar(xs, ys, label='density')
if show_density:
ax.bar([ind * binwidth-binwidth/2 for ind in range(len(bins))],
max_y*npoints/np.max(npoints), alpha=0.15, color='g', width=binwidth, label='density')
ax.set(xlabel=f'Distribution shift between training set and test sample',
ylabel=f'{error_name.upper()} (true distribution, predicted distribution)',
title=title)
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
if vlines:
for vline in vlines:
ax.axvline(vline, 0, 1, linestyle='--', color='k')
ax.set_xlim(0, max_x)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.set_xlim(min_x, max_x)
save_or_show(savepath)
def brokenbar_supremacy_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, binning='isomerous',
x_error='ae', y_error='ae', ttest_alpha=0.005, tail_density_threshold=0.005,
method_order=None,
savepath=None):
assert binning in ['isomerous', 'isometric'], 'unknown binning type; valid types are "isomerous" and "isometric"'
x_error = getattr(qp.error, x_error)
y_error = getattr(qp.error, y_error)
# get all data as a dictionary {'m':{'x':ndarray, 'y':ndarray}} where 'm' is a method name (in the same
# order as in method_order (if specified), and where 'x' are the train-test shifts (computed as according to
# x_error function) and 'y' is the estim-test shift (computed as according to y_error)
data = __join_data_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, x_error, y_error, method_order)
if binning == 'isomerous':
# take bins containing the same amount of examples
tr_test_drifts = np.concatenate([data[m]['x'] for m in method_order])
bins = np.quantile(tr_test_drifts, q=np.linspace(0, 1, n_bins+1)).flatten()
else:
# take equidistant bins
bins = np.linspace(0, 1, n_bins+1)
bins[0] = -0.001
bins[-1] += 0.001
# we use this to keep track of how many datapoits contribute to each bin
inds_histogram_global = np.zeros(n_bins, dtype=np.float)
n_methods = len(method_order)
buckets = np.zeros(shape=(n_methods, n_bins, 3))
for i, method in enumerate(method_order):
tr_test_drifts = data[method]['x']
method_drifts = data[method]['y']
inds = np.digitize(tr_test_drifts, bins, right=False)
inds_histogram_global += np.histogram(tr_test_drifts, density=False, bins=bins)[0]
for j in range(len(bins)):
selected = inds == j
if selected.sum() > 0:
buckets[i, j-1, 0] = np.mean(method_drifts[selected])
buckets[i, j-1, 1] = np.std(method_drifts[selected])
buckets[i, j-1, 2] = selected.sum()
# cancel last buckets with low density
histogram = inds_histogram_global / inds_histogram_global.sum()
for tail in reversed(range(len(histogram))):
if histogram[tail] < tail_density_threshold:
buckets[:,tail,2] = 0
else:
break
salient_methods = set()
best_methods = []
for bucket in range(buckets.shape[1]):
nc = buckets[:, bucket, 2].sum()
if nc == 0:
best_methods.append([])
continue
order = np.argsort(buckets[:, bucket, 0])
rank1 = order[0]
best_bucket_methods = [method_order[rank1]]
best_mean, best_std, best_nc = buckets[rank1, bucket, :]
for method_index in order[1:]:
method_mean, method_std, method_nc = buckets[method_index, bucket, :]
_, pval = ttest_ind_from_stats(best_mean, best_std, best_nc, method_mean, method_std, method_nc)
if pval > ttest_alpha:
best_bucket_methods.append(method_order[method_index])
best_methods.append(best_bucket_methods)
salient_methods.update(best_bucket_methods)
print(best_bucket_methods)
if binning=='isomerous':
fig, axes = plt.subplots(2, 1, gridspec_kw={'height_ratios': [0.2, 1]}, figsize=(20, len(salient_methods)))
else:
fig, axes = plt.subplots(2, 1, gridspec_kw={'height_ratios': [1, 1]}, figsize=(20, len(salient_methods)))
ax = axes[1]
high_from = 0
yticks, yticks_method_names = [], []
color = get_cmap('Accent').colors
vlines = []
bar_high = 1
for method in [m for m in method_order if m in salient_methods]:
broken_paths = []
path_start, path_end = None, None
for i, best_bucket_methods in enumerate(best_methods):
if method in best_bucket_methods:
if path_start is None:
path_start = bins[i]
path_end = bins[i+1]-path_start
else:
path_end += bins[i+1]-bins[i]
else:
if path_start is not None:
broken_paths.append(tuple((path_start, path_end)))
path_start, path_end = None, None
if path_start is not None:
broken_paths.append(tuple((path_start, path_end)))
ax.broken_barh(broken_paths, (high_from, bar_high), facecolors=color[len(yticks_method_names)])
yticks.append(high_from+bar_high/2)
high_from += bar_high
yticks_method_names.append(method)
for path_start, path_end in broken_paths:
vlines.extend([path_start, path_start+path_end])
vlines = np.unique(vlines)
vlines = sorted(vlines)
for v in vlines[1:-1]:
ax.axvline(x=v, color='k', linestyle='--')
ax.set_ylim(0, high_from)
ax.set_xlim(vlines[0], vlines[-1])
ax.set_xlabel('Distribution shift between training set and sample')
ax.set_yticks(yticks)
ax.set_yticklabels(yticks_method_names)
# upper plot (explaining distribution)
ax = axes[0]
if binning == 'isometric':
# show the density for each region
bins[0]=0
y_pos = [b+(bins[i+1]-b)/2 for i,b in enumerate(bins[:-1]) if histogram[i]>0]
bar_width = [bins[i+1]-bins[i] for i in range(len(bins[:-1])) if histogram[i]>0]
ax.bar(y_pos, [n for n in histogram if n>0], bar_width, align='center', alpha=0.5, color='silver')
ax.set_ylabel('shift\ndistribution', rotation=0, ha='right', va='center')
ax.set_xlim(vlines[0], vlines[-1])
ax.get_xaxis().set_visible(False)
plt.subplots_adjust(wspace=0, hspace=0.1)
else:
# show the percentiles of the distribution
cumsum = np.cumsum(histogram)
for i in range(len(bins[:-1])):
start, width = bins[i], bins[i+1]-bins[i]
ax.broken_barh([tuple((start, width))], (0, 1), facecolors='whitesmoke' if i%2==0 else 'silver')
if i < len(bins)-2:
ax.text(bins[i+1], 0.5, '$P_{'+f'{int(np.round(cumsum[i]*100))}'+'}$', ha='center')
ax.set_ylim(0, 1)
ax.set_xlim(vlines[0], vlines[-1])
ax.get_yaxis().set_visible(False)
ax.get_xaxis().set_visible(False)
plt.subplots_adjust(wspace=0, hspace=0)
save_or_show(savepath)
@ -253,3 +461,23 @@ def save_or_show(savepath):
else:
plt.show()
def __join_data_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, x_error, y_error, method_order):
data = defaultdict(lambda: {'x': np.empty(shape=(0)), 'y': np.empty(shape=(0))})
if method_order is None:
method_order = []
for method, test_prevs_i, estim_prevs_i, tr_prev_i in zip(method_names, true_prevs, estim_prevs, tr_prevs):
tr_prev_i = np.repeat(tr_prev_i.reshape(1, -1), repeats=test_prevs_i.shape[0], axis=0)
tr_test_drifts = x_error(test_prevs_i, tr_prev_i)
data[method]['x'] = np.concatenate([data[method]['x'], tr_test_drifts])
method_drifts = y_error(test_prevs_i, estim_prevs_i)
data[method]['y'] = np.concatenate([data[method]['y'], method_drifts])
if method not in method_order:
method_order.append(method)
return data