Compare commits
50 Commits
|
|
@ -6,6 +6,8 @@ on:
|
|||
branches:
|
||||
- master
|
||||
- devel
|
||||
tags:
|
||||
- "[0-9]+.[0-9]+.[0-9]+"
|
||||
|
||||
jobs:
|
||||
|
||||
|
|
@ -28,7 +30,7 @@ jobs:
|
|||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip setuptools wheel
|
||||
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
|
||||
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@main"
|
||||
python -m pip install -e .[bayes,tests]
|
||||
- name: Test with unittest
|
||||
run: python -m unittest
|
||||
|
|
@ -47,7 +49,7 @@ jobs:
|
|||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip setuptools wheel "jax[cpu]"
|
||||
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
|
||||
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@main"
|
||||
python -m pip install -e .[neural,docs]
|
||||
- name: Build documentation
|
||||
run: sphinx-build -M html docs/source docs/build
|
||||
|
|
@ -66,3 +68,41 @@ jobs:
|
|||
branch: gh-pages
|
||||
directory: __gh-pages/
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
release:
|
||||
name: Build & Publish Release
|
||||
runs-on: ubuntu-latest
|
||||
if: startsWith(github.ref, 'refs/tags/')
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
- name: Install build dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip build twine
|
||||
- name: Build package
|
||||
run: python -m build
|
||||
- name: Publish to TestPyPI
|
||||
uses: pypa/gh-action-pypi-publish@release/v1
|
||||
with:
|
||||
user: __token__
|
||||
# use these for TESTs!
|
||||
# password: ${{ secrets.TEST_PYPI_API_TOKEN }}
|
||||
# repository_url: https://test.pypi.org/legacy/
|
||||
password: ${{ secrets.PYPI_API_TOKEN }}
|
||||
repository_url: https://upload.pypi.org/legacy/
|
||||
- name: Create GitHub Release
|
||||
id: create_release
|
||||
uses: actions/create-release@v1
|
||||
with:
|
||||
tag_name: ${{ github.ref_name }}
|
||||
release_name: Release ${{ github.ref_name }}
|
||||
body: |
|
||||
Changes in this release:
|
||||
- see commit history for details
|
||||
draft: false
|
||||
prerelease: false
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
|
@ -73,7 +73,8 @@ instance/
|
|||
.vscode/
|
||||
|
||||
# Sphinx documentation
|
||||
docs/_build/
|
||||
docs/_build/doctest
|
||||
docs/_build/doctrees
|
||||
|
||||
# PyBuilder
|
||||
target/
|
||||
|
|
|
|||
|
|
@ -1,12 +1,10 @@
|
|||
Change Log 0.1.10
|
||||
Change Log 0.2.0
|
||||
-----------------
|
||||
|
||||
CLEAN TODO-FILE
|
||||
|
||||
- Base code Refactor:
|
||||
- Removing coupling between LabelledCollection and quantification methods; the fit interface changes:
|
||||
def fit(data:LabelledCollection): -> def fit(X, y):
|
||||
- Adding function "predict" (function "quantify" is still present as an alias)
|
||||
- Adding function "predict" (function "quantify" is still present as an alias, for the nostalgic)
|
||||
- Aggregative methods's behavior in terms of fit_classifier and how to treat the val_split is now
|
||||
indicated exclusively at construction time, and it is no longer possible to indicate it at fit time.
|
||||
This is because, in v<=0.1.9, one could create a method (e.g., ACC) and then indicate:
|
||||
|
|
@ -21,15 +19,16 @@ CLEAN TODO-FILE
|
|||
- A new parameter "on_calib_error" is passed to the constructor, which informs of the policy to follow
|
||||
in case the abstention's calibration functions failed (which happens sometimes). Options include:
|
||||
- 'raise': raises a RuntimeException (default)
|
||||
- 'backup': reruns avoiding calibration
|
||||
- 'backup': reruns by silently avoiding calibration
|
||||
- Parameter "recalib" has been renamed "calib"
|
||||
- Added aggregative bootstrap for deriving confidence regions (confidence intervals, ellipses in the simplex, or
|
||||
ellipses in the CLR space). This method is efficient as it leverages the two-phases of the aggregative quantifiers.
|
||||
This method applies resampling only to the aggregation phase, thus avoiding to train many quantifiers, or
|
||||
classify multiple times the instances of a sample. See:
|
||||
- quapy/method/confidence.py (new)
|
||||
- the new example no. 15.
|
||||
- BayesianCC moved to confidence.py, where methods having to do with confidence intervals live
|
||||
- the new example no. 16.confidence_regions.py
|
||||
- BayesianCC moved to confidence.py, where methods having to do with confidence intervals belong.
|
||||
- Improved documentation of qp.plot module.
|
||||
|
||||
|
||||
Change Log 0.1.9
|
||||
|
|
|
|||
38
README.md
38
README.md
|
|
@ -13,8 +13,9 @@ for facilitating the analysis and interpretation of the experimental results.
|
|||
|
||||
### Last updates:
|
||||
|
||||
* Version 0.1.9 is released! major changes can be consulted [here](CHANGE_LOG.txt).
|
||||
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)
|
||||
* Version 0.2.0 is released! major changes can be consulted [here](CHANGE_LOG.txt).
|
||||
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/index.html)
|
||||
* Manuals are available [here](https://hlt-isti.github.io/QuaPy/manuals.html)
|
||||
|
||||
### Installation
|
||||
|
||||
|
|
@ -46,12 +47,12 @@ of the test set.
|
|||
```python
|
||||
import quapy as qp
|
||||
|
||||
dataset = qp.datasets.fetch_UCIBinaryDataset("yeast")
|
||||
training, test = dataset.train_test
|
||||
training, test = qp.datasets.fetch_UCIBinaryDataset("yeast").train_test
|
||||
|
||||
# create an "Adjusted Classify & Count" quantifier
|
||||
model = qp.method.aggregative.ACC()
|
||||
model.fit(training)
|
||||
Xtr, ytr = training.Xy
|
||||
model.fit(Xtr, ytr)
|
||||
|
||||
estim_prevalence = model.predict(test.X)
|
||||
true_prevalence = test.prevalence()
|
||||
|
|
@ -67,7 +68,7 @@ class prevalence of the training set. For this reason, any quantification model
|
|||
should be tested across many samples, even ones characterized by class prevalence
|
||||
values different or very different from those found in the training set.
|
||||
QuaPy implements sampling procedures and evaluation protocols that automate this workflow.
|
||||
See the [documentation](https://hlt-isti.github.io/QuaPy/build/html/) for detailed examples.
|
||||
See the [documentation](https://hlt-isti.github.io/QuaPy/manuals.html) for detailed examples.
|
||||
|
||||
## Features
|
||||
|
||||
|
|
@ -79,7 +80,8 @@ quantification methods based on structured output learning, HDy, QuaNet, quantif
|
|||
* 32 UCI Machine Learning datasets.
|
||||
* 11 Twitter quantification-by-sentiment datasets.
|
||||
* 3 product reviews quantification-by-sentiment datasets.
|
||||
* 4 tasks from LeQua competition (_new in v0.1.7!_)
|
||||
* 4 tasks from LeQua 2022 competition and 4 tasks from LeQua 2024 competition
|
||||
* IFCB for Plancton quantification
|
||||
* Native support for binary and single-label multiclass quantification scenarios.
|
||||
* Model selection functionality that minimizes quantification-oriented loss functions.
|
||||
* Visualization tools for analysing the experimental results.
|
||||
|
|
@ -100,19 +102,23 @@ In case you want to contribute improvements to quapy, please generate pull reque
|
|||
|
||||
## Documentation
|
||||
|
||||
The [developer API documentation](https://hlt-isti.github.io/QuaPy/build/html/modules.html) is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html).
|
||||
Check out the [developer API documentation here](https://hlt-isti.github.io/QuaPy/index.html).
|
||||
|
||||
Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples
|
||||
Check out the [Manuals](https://hlt-isti.github.io/QuaPy/manuals.html), in which many code examples
|
||||
are provided:
|
||||
|
||||
* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
|
||||
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
|
||||
* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
|
||||
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
|
||||
* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)
|
||||
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
|
||||
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)
|
||||
* [Datasets](https://hlt-isti.github.io/QuaPy/manuals/datasets.html)
|
||||
* [Evaluation](https://hlt-isti.github.io/QuaPy/manuals/evaluation.html)
|
||||
* [Protocols](https://hlt-isti.github.io/QuaPy/manuals/protocols.html)
|
||||
* [Methods](https://hlt-isti.github.io/QuaPy/manuals/methods.html)
|
||||
* [SVMperf](https://hlt-isti.github.io/QuaPy/manuals/explicit-loss-minimization.html)
|
||||
* [Model Selection](https://hlt-isti.github.io/QuaPy/manuals/model-selection.html)
|
||||
* [Plotting](https://hlt-isti.github.io/QuaPy/manuals/plotting.html)
|
||||
|
||||
## Acknowledgments:
|
||||
|
||||
<img src="docs/source/SoBigData.png" alt="SoBigData++" width="250"/>
|
||||
|
||||
This work has been supported by the QuaDaSh project
|
||||
_"Finanziato dall’Unione europea---Next Generation EU,
|
||||
Missione 4 Componente 2 CUP B53D23026250001"_.
|
||||
|
|
|
|||
41
TODO.txt
41
TODO.txt
|
|
@ -1,51 +1,19 @@
|
|||
Adapt examples; remaining: example 4-onwards
|
||||
not working: 4, 4b, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
|
||||
Solve the warnings issue; right now there is a warning ignore in method/__init__.py:
|
||||
|
||||
Add 'platt' to calib options in EMQ?
|
||||
|
||||
Allow n_prevpoints in APP to be specified by a user-defined grid?
|
||||
|
||||
Update READMEs, wiki, & examples for new fit-predict interface
|
||||
|
||||
Add the fix suggested by Alexander:
|
||||
|
||||
For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence
|
||||
Add the fix suggested by Alexander?
|
||||
"For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence
|
||||
based on the number of actual positives and the required sample size; e.g., for sample_size=100 and actual
|
||||
positives [10, 100, 500] -> [0.1, 1.0, 1.0], meaning that class 0 can be sampled at most at 0.1 prevalence, while
|
||||
the others can be sampled up to 1. prevalence. Then, when a prevalence value is requested, e.g., [0.33, 0.33, 0.33],
|
||||
we may either clip each value and normalize (as you suggest for the extreme case, e.g., [0.1, 0.33, 0.33]/sum) or
|
||||
scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum.
|
||||
scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum."
|
||||
- This affects LabelledCollection
|
||||
- This functionality should be accessible via sampling protocols and evaluation functions
|
||||
|
||||
Solve the pre-trained classifier issues. An example is the coptic-codes script I did, which needed a mock_lr to
|
||||
work for having access to classes_; think also the case in which the precomputed outputs are already generated
|
||||
as in the unifying problems code.
|
||||
|
||||
Para quitar el labelledcollection de los métodos:
|
||||
|
||||
- El follón viene por la semántica confusa de fit en agregativos, que recibe 3 parámetros:
|
||||
- data: LabelledCollection, que puede ser:
|
||||
- el training set si hay que entrenar el clasificador
|
||||
- None si no hay que entregar el clasificador
|
||||
- el validation, que entra en conflicto con val_split, si no hay que entrenar clasificador
|
||||
- fit_classifier: dice si hay que entrenar el clasificador o no, y estos cambia la semántica de los otros
|
||||
- val_split: que puede ser:
|
||||
- un número: el número de kfcv, lo cual implica fit_classifier=True y data=todo el training set
|
||||
- una fración en [0,1]: que indica la parte que usamos para validation; implica fit_classifier=True y data=train+val
|
||||
- un labelled collection: el conjunto de validación específico; no implica fit_classifier=True ni False
|
||||
- La forma de quitar la dependencia de los métodos con LabelledCollection debería ser así:
|
||||
- En el constructor se dice si el clasificador que se recibe por parámetro hay que entrenarlo o ya está entrenado;
|
||||
es decir, hay un fit_classifier=True o False.
|
||||
- fit_classifier=True:
|
||||
- data en fit es todo el training incluyendo el validation y todo
|
||||
- val_split:
|
||||
- int: número de folds en kfcv
|
||||
- proporción en [0,1]
|
||||
- fit_classifier=False:
|
||||
|
||||
|
||||
|
||||
- [TODO] document confidence in manuals
|
||||
- [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example
|
||||
- [TODO] Add EDy (an implementation is available at quantificationlib)
|
||||
|
|
@ -55,3 +23,4 @@ Para quitar el labelledcollection de los métodos:
|
|||
- [TODO] add Friedman's method and DeBias
|
||||
- [TODO] check ignore warning stuff
|
||||
check https://docs.python.org/3/library/warnings.html#temporarily-suppressing-warnings
|
||||
- [TODO] nmd and md are not selectable from qp.evaluation.evaluate as a string
|
||||
|
|
@ -0,0 +1,124 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Overview: module code — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
|
||||
<script src="../_static/jquery.js"></script>
|
||||
<script src="../_static/underscore.js"></script>
|
||||
<script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../_static/doctools.js"></script>
|
||||
<script src="../_static/sphinx_highlight.js"></script>
|
||||
<script src="../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../genindex.html" />
|
||||
<link rel="search" title="Search" href="../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item active">Overview: module code</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>All modules for which code is available</h1>
|
||||
<ul><li><a href="quapy/classification/calibration.html">quapy.classification.calibration</a></li>
|
||||
<li><a href="quapy/classification/methods.html">quapy.classification.methods</a></li>
|
||||
<li><a href="quapy/classification/neural.html">quapy.classification.neural</a></li>
|
||||
<li><a href="quapy/classification/svmperf.html">quapy.classification.svmperf</a></li>
|
||||
<li><a href="quapy/data/base.html">quapy.data.base</a></li>
|
||||
<li><a href="quapy/data/datasets.html">quapy.data.datasets</a></li>
|
||||
<li><a href="quapy/data/preprocessing.html">quapy.data.preprocessing</a></li>
|
||||
<li><a href="quapy/data/reader.html">quapy.data.reader</a></li>
|
||||
<li><a href="quapy/error.html">quapy.error</a></li>
|
||||
<li><a href="quapy/evaluation.html">quapy.evaluation</a></li>
|
||||
<li><a href="quapy/functional.html">quapy.functional</a></li>
|
||||
<li><a href="quapy/method/_kdey.html">quapy.method._kdey</a></li>
|
||||
<li><a href="quapy/method/_neural.html">quapy.method._neural</a></li>
|
||||
<li><a href="quapy/method/_threshold_optim.html">quapy.method._threshold_optim</a></li>
|
||||
<li><a href="quapy/method/aggregative.html">quapy.method.aggregative</a></li>
|
||||
<li><a href="quapy/method/base.html">quapy.method.base</a></li>
|
||||
<li><a href="quapy/method/meta.html">quapy.method.meta</a></li>
|
||||
<li><a href="quapy/method/non_aggregative.html">quapy.method.non_aggregative</a></li>
|
||||
<li><a href="quapy/model_selection.html">quapy.model_selection</a></li>
|
||||
<li><a href="quapy/plot.html">quapy.plot</a></li>
|
||||
<li><a href="quapy/protocol.html">quapy.protocol</a></li>
|
||||
<li><a href="quapy/util.html">quapy.util</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,319 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.classification.calibration — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.classification.calibration</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.classification.calibration</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">abstention.calibration</span> <span class="kn">import</span> <span class="n">NoBiasVectorScaling</span><span class="p">,</span> <span class="n">TempScaling</span><span class="p">,</span> <span class="n">VectorScaling</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">clone</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">cross_val_predict</span><span class="p">,</span> <span class="n">train_test_split</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
|
||||
<span class="c1"># Wrappers of calibration defined by Alexandari et al. in paper <http://proceedings.mlr.press/v119/alexandari20a.html></span>
|
||||
<span class="c1"># requires "pip install abstension"</span>
|
||||
<span class="c1"># see https://github.com/kundajelab/abstention</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifier"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifier">[docs]</a><span class="k">class</span> <span class="nc">RecalibratedProbabilisticClassifier</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstract class for (re)calibration method from `abstention.calibration`, as defined in</span>
|
||||
<span class="sd"> `Alexandari, A., Kundaje, A., & Shrikumar, A. (2020, November). Maximum likelihood with bias-corrected calibration</span>
|
||||
<span class="sd"> is hard-to-beat at label shift adaptation. In International Conference on Machine Learning (pp. 222-232). PMLR.</span>
|
||||
<span class="sd"> <http://proceedings.mlr.press/v119/alexandari20a.html>`_:</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">pass</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifierBase"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase">[docs]</a><span class="k">class</span> <span class="nc">RecalibratedProbabilisticClassifierBase</span><span class="p">(</span><span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">RecalibratedProbabilisticClassifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Applies a (re)calibration method from `abstention.calibration`, as defined in</span>
|
||||
<span class="sd"> `Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_.</span>
|
||||
|
||||
|
||||
<span class="sd"> :param classifier: a scikit-learn probabilistic classifier</span>
|
||||
<span class="sd"> :param calibrator: the calibration object (an instance of abstention.calibration.CalibratorFactory)</span>
|
||||
<span class="sd"> :param val_split: indicate an integer k for performing kFCV to obtain the posterior probabilities, or a float p</span>
|
||||
<span class="sd"> in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the</span>
|
||||
<span class="sd"> training instances (the rest is used for training). In any case, the classifier is retrained in the whole</span>
|
||||
<span class="sd"> training set afterwards. Default value is 5.</span>
|
||||
<span class="sd"> :param n_jobs: indicate the number of parallel workers (only when val_split is an integer); default=None</span>
|
||||
<span class="sd"> :param verbose: whether or not to display information in the standard output</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">,</span> <span class="n">calibrator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span> <span class="o">=</span> <span class="n">calibrator</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span>
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifierBase.fit"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fits the calibration for the probabilistic classifier.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` with the data instances</span>
|
||||
<span class="sd"> :param y: array-like of shape `(n_samples,)` with the class labels</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">k</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">val_split</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">k</span> <span class="o"><</span> <span class="mi">2</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'wrong value for val_split: the number of folds must be > 2'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">fit_cv</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="mi">0</span> <span class="o"><</span> <span class="n">k</span> <span class="o"><</span> <span class="mi">1</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'wrong value for val_split: the proportion of validation documents must be in (0,1)'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">fit_tr_val</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifierBase.fit_cv"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_cv">[docs]</a> <span class="k">def</span> <span class="nf">fit_cv</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fits the calibration in a cross-validation manner, i.e., it generates posterior probabilities for all</span>
|
||||
<span class="sd"> training instances via cross-validation, and then retrains the classifier on all training instances.</span>
|
||||
<span class="sd"> The posterior probabilities thus generated are used for calibrating the outputs of the classifier.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` with the data instances</span>
|
||||
<span class="sd"> :param y: array-like of shape `(n_samples,)` with the class labels</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="n">cross_val_predict</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">val_split</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s1">'predict_proba'</span>
|
||||
<span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="n">nclasses</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibration_function</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span><span class="p">(</span><span class="n">posteriors</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="n">nclasses</span><span class="p">)[</span><span class="n">y</span><span class="p">],</span> <span class="n">posterior_supplied</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifierBase.fit_tr_val"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.fit_tr_val">[docs]</a> <span class="k">def</span> <span class="nf">fit_tr_val</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fits the calibration in a train/val-split manner, i.e.t, it partitions the training instances into a</span>
|
||||
<span class="sd"> training and a validation set, and then uses the training samples to learn classifier which is then used</span>
|
||||
<span class="sd"> to generate posterior probabilities for the held-out validation data. These posteriors are used to calibrate</span>
|
||||
<span class="sd"> the classifier. The classifier is not retrained on the whole dataset.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` with the data instances</span>
|
||||
<span class="sd"> :param y: array-like of shape `(n_samples,)` with the class labels</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">Xtr</span><span class="p">,</span> <span class="n">Xva</span><span class="p">,</span> <span class="n">ytr</span><span class="p">,</span> <span class="n">yva</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">val_split</span><span class="p">,</span> <span class="n">stratify</span><span class="o">=</span><span class="n">y</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">Xtr</span><span class="p">,</span> <span class="n">ytr</span><span class="p">)</span>
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">Xva</span><span class="p">)</span>
|
||||
<span class="n">nclasses</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">yva</span><span class="p">))</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibration_function</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span><span class="p">(</span><span class="n">posteriors</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="n">nclasses</span><span class="p">)[</span><span class="n">yva</span><span class="p">],</span> <span class="n">posterior_supplied</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifierBase.predict"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict">[docs]</a> <span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts class labels for the data instances in `X`</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` with the data instances</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples,)` with the class label predictions</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="RecalibratedProbabilisticClassifierBase.predict_proba"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.RecalibratedProbabilisticClassifierBase.predict_proba">[docs]</a> <span class="k">def</span> <span class="nf">predict_proba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates posterior probabilities for the data instances in `X`</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` with the data instances</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples, n_classes)` with posterior probabilities</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">calibration_function</span><span class="p">(</span><span class="n">posteriors</span><span class="p">)</span></div>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">classes_</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the classes on which the classifier has been trained on</span>
|
||||
|
||||
<span class="sd"> :return: array-like of shape `(n_classes)`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">classes_</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NBVSCalibration"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.NBVSCalibration">[docs]</a><span class="k">class</span> <span class="nc">NBVSCalibration</span><span class="p">(</span><span class="n">RecalibratedProbabilisticClassifierBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Applies the No-Bias Vector Scaling (NBVS) calibration method from `abstention.calibration`, as defined in</span>
|
||||
<span class="sd"> `Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:</span>
|
||||
|
||||
<span class="sd"> :param classifier: a scikit-learn probabilistic classifier</span>
|
||||
<span class="sd"> :param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p</span>
|
||||
<span class="sd"> in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the</span>
|
||||
<span class="sd"> training instances (the rest is used for training). In any case, the classifier is retrained in the whole</span>
|
||||
<span class="sd"> training set afterwards. Default value is 5.</span>
|
||||
<span class="sd"> :param n_jobs: indicate the number of parallel workers (only when val_split is an integer)</span>
|
||||
<span class="sd"> :param verbose: whether or not to display information in the standard output</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span> <span class="o">=</span> <span class="n">NoBiasVectorScaling</span><span class="p">(</span><span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="BCTSCalibration"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.BCTSCalibration">[docs]</a><span class="k">class</span> <span class="nc">BCTSCalibration</span><span class="p">(</span><span class="n">RecalibratedProbabilisticClassifierBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Applies the Bias-Corrected Temperature Scaling (BCTS) calibration method from `abstention.calibration`, as defined in</span>
|
||||
<span class="sd"> `Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:</span>
|
||||
|
||||
<span class="sd"> :param classifier: a scikit-learn probabilistic classifier</span>
|
||||
<span class="sd"> :param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p</span>
|
||||
<span class="sd"> in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the</span>
|
||||
<span class="sd"> training instances (the rest is used for training). In any case, the classifier is retrained in the whole</span>
|
||||
<span class="sd"> training set afterwards. Default value is 5.</span>
|
||||
<span class="sd"> :param n_jobs: indicate the number of parallel workers (only when val_split is an integer)</span>
|
||||
<span class="sd"> :param verbose: whether or not to display information in the standard output</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span> <span class="o">=</span> <span class="n">TempScaling</span><span class="p">(</span><span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">,</span> <span class="n">bias_positions</span><span class="o">=</span><span class="s1">'all'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TSCalibration"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.TSCalibration">[docs]</a><span class="k">class</span> <span class="nc">TSCalibration</span><span class="p">(</span><span class="n">RecalibratedProbabilisticClassifierBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Applies the Temperature Scaling (TS) calibration method from `abstention.calibration`, as defined in</span>
|
||||
<span class="sd"> `Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:</span>
|
||||
|
||||
<span class="sd"> :param classifier: a scikit-learn probabilistic classifier</span>
|
||||
<span class="sd"> :param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p</span>
|
||||
<span class="sd"> in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the</span>
|
||||
<span class="sd"> training instances (the rest is used for training). In any case, the classifier is retrained in the whole</span>
|
||||
<span class="sd"> training set afterwards. Default value is 5.</span>
|
||||
<span class="sd"> :param n_jobs: indicate the number of parallel workers (only when val_split is an integer)</span>
|
||||
<span class="sd"> :param verbose: whether or not to display information in the standard output</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span> <span class="o">=</span> <span class="n">TempScaling</span><span class="p">(</span><span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="VSCalibration"><a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.calibration.VSCalibration">[docs]</a><span class="k">class</span> <span class="nc">VSCalibration</span><span class="p">(</span><span class="n">RecalibratedProbabilisticClassifierBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Applies the Vector Scaling (VS) calibration method from `abstention.calibration`, as defined in</span>
|
||||
<span class="sd"> `Alexandari et al. paper <http://proceedings.mlr.press/v119/alexandari20a.html>`_:</span>
|
||||
|
||||
<span class="sd"> :param classifier: a scikit-learn probabilistic classifier</span>
|
||||
<span class="sd"> :param val_split: indicate an integer k for performing kFCV to obtain the posterior prevalences, or a float p</span>
|
||||
<span class="sd"> in (0,1) to indicate that the posteriors are obtained in a stratified validation split containing p% of the</span>
|
||||
<span class="sd"> training instances (the rest is used for training). In any case, the classifier is retrained in the whole</span>
|
||||
<span class="sd"> training set afterwards. Default value is 5.</span>
|
||||
<span class="sd"> :param n_jobs: indicate the number of parallel workers (only when val_split is an integer)</span>
|
||||
<span class="sd"> :param verbose: whether or not to display information in the standard output</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">calibrator</span> <span class="o">=</span> <span class="n">VectorScaling</span><span class="p">(</span><span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,220 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.classification.methods — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.classification.methods</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.classification.methods</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.decomposition</span> <span class="kn">import</span> <span class="n">TruncatedSVD</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">LowRankLogisticRegression</span><span class="p">(</span><span class="n">BaseEstimator</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An example of a classification method (i.e., an object that implements `fit`, `predict`, and `predict_proba`)</span>
|
||||
<span class="sd"> that also generates embedded inputs (i.e., that implements `transform`), as those required for</span>
|
||||
<span class="sd"> :class:`quapy.method.neural.QuaNet`. This is a mock method to allow for easily instantiating</span>
|
||||
<span class="sd"> :class:`quapy.method.neural.QuaNet` on array-like real-valued instances.</span>
|
||||
<span class="sd"> The transformation consists of applying :class:`sklearn.decomposition.TruncatedSVD`</span>
|
||||
<span class="sd"> while classification is performed using :class:`sklearn.linear_model.LogisticRegression` on the low-rank space.</span>
|
||||
|
||||
<span class="sd"> :param n_components: the number of principal components to retain</span>
|
||||
<span class="sd"> :param kwargs: parameters for the</span>
|
||||
<span class="sd"> `Logistic Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`__ classifier</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_components</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_components</span> <span class="o">=</span> <span class="n">n_components</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression.get_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.get_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Get hyper-parameters for this estimator.</span>
|
||||
|
||||
<span class="sd"> :return: a dictionary with parameter names mapped to their values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">params</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'n_components'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_components</span><span class="p">}</span>
|
||||
<span class="n">params</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">get_params</span><span class="p">())</span>
|
||||
<span class="k">return</span> <span class="n">params</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression.set_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.set_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">params</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Set the parameters of this estimator.</span>
|
||||
|
||||
<span class="sd"> :param parameters: a `**kwargs` dictionary with the estimator parameters for</span>
|
||||
<span class="sd"> `Logistic Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`__</span>
|
||||
<span class="sd"> and eventually also `n_components` for `TruncatedSVD`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">params_</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">params</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="s1">'n_components'</span> <span class="ow">in</span> <span class="n">params_</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_components</span> <span class="o">=</span> <span class="n">params_</span><span class="p">[</span><span class="s1">'n_components'</span><span class="p">]</span>
|
||||
<span class="k">del</span> <span class="n">params_</span><span class="p">[</span><span class="s1">'n_components'</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">params_</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression.fit">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.fit">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fit the model according to the given training data. The fit consists of</span>
|
||||
<span class="sd"> fitting `TruncatedSVD` and then `LogisticRegression` on the low-rank representation.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` with the instances</span>
|
||||
<span class="sd"> :param y: array-like of shape `(n_samples, n_classes)` with the class labels</span>
|
||||
<span class="sd"> :return: `self`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">nF</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">pca</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="n">nF</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">n_components</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">pca</span> <span class="o">=</span> <span class="n">TruncatedSVD</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_components</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">classes_</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression.predict">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts labels for the instances `X` embedded into the low-rank space.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` instances to classify</span>
|
||||
<span class="sd"> :return: a `numpy` array of length `n` containing the label predictions, where `n` is the number of</span>
|
||||
<span class="sd"> instances in `X`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression.predict_proba">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.predict_proba">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">predict_proba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts posterior probabilities for the instances `X` embedded into the low-rank space.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` instances to classify</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples, n_classes)` with the posterior probabilities</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">X</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LowRankLogisticRegression.transform">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.methods.LowRankLogisticRegression.transform">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the low-rank approximation of `X` with `n_components` dimensions, or `X` unaltered if</span>
|
||||
<span class="sd"> `n_components` >= `X.shape[1]`.</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` instances to embed</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples, n_components)` with the embedded instances</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pca</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">X</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">pca</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,715 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.classification.neural — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.classification.neural</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.classification.neural</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">os</span>
|
||||
<span class="kn">from</span> <span class="nn">abc</span> <span class="kn">import</span> <span class="n">ABCMeta</span><span class="p">,</span> <span class="n">abstractmethod</span>
|
||||
<span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">import</span> <span class="nn">torch</span>
|
||||
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="nn">nn</span>
|
||||
<span class="kn">import</span> <span class="nn">torch.nn.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">accuracy_score</span><span class="p">,</span> <span class="n">f1_score</span>
|
||||
<span class="kn">from</span> <span class="nn">torch.nn.utils.rnn</span> <span class="kn">import</span> <span class="n">pad_sequence</span>
|
||||
<span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.util</span> <span class="kn">import</span> <span class="n">EarlyStop</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">NeuralClassifierTrainer</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Trains a neural network for text classification.</span>
|
||||
|
||||
<span class="sd"> :param net: an instance of `TextClassifierNet` implementing the forward pass</span>
|
||||
<span class="sd"> :param lr: learning rate (default 1e-3)</span>
|
||||
<span class="sd"> :param weight_decay: weight decay (default 0)</span>
|
||||
<span class="sd"> :param patience: number of epochs that do not show any improvement in validation</span>
|
||||
<span class="sd"> to wait before applying early stop (default 10)</span>
|
||||
<span class="sd"> :param epochs: maximum number of training epochs (default 200)</span>
|
||||
<span class="sd"> :param batch_size: batch size for training (default 64)</span>
|
||||
<span class="sd"> :param batch_size_test: batch size for test (default 512)</span>
|
||||
<span class="sd"> :param padding_length: maximum number of tokens to consider in a document (default 300)</span>
|
||||
<span class="sd"> :param device: specify 'cpu' (default) or 'cuda' for enabling gpu</span>
|
||||
<span class="sd"> :param checkpointpath: where to store the parameters of the best model found so far</span>
|
||||
<span class="sd"> according to the evaluation in the held-out validation split (default '../checkpoint/classifier_net.dat')</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">net</span><span class="p">:</span> <span class="s1">'TextClassifierNet'</span><span class="p">,</span>
|
||||
<span class="n">lr</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">,</span>
|
||||
<span class="n">weight_decay</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
|
||||
<span class="n">patience</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
|
||||
<span class="n">epochs</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span>
|
||||
<span class="n">batch_size</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span>
|
||||
<span class="n">batch_size_test</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span>
|
||||
<span class="n">padding_length</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span>
|
||||
<span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">,</span>
|
||||
<span class="n">checkpointpath</span><span class="o">=</span><span class="s1">'../checkpoint/classifier_net.dat'</span><span class="p">):</span>
|
||||
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">TextClassifierNet</span><span class="p">),</span> <span class="sa">f</span><span class="s1">'net is not an instance of </span><span class="si">{</span><span class="n">TextClassifierNet</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span> <span class="o">=</span> <span class="n">net</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocab_size</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">vocabulary_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span><span class="o">=</span><span class="p">{</span>
|
||||
<span class="s1">'lr'</span><span class="p">:</span> <span class="n">lr</span><span class="p">,</span>
|
||||
<span class="s1">'weight_decay'</span><span class="p">:</span> <span class="n">weight_decay</span><span class="p">,</span>
|
||||
<span class="s1">'patience'</span><span class="p">:</span> <span class="n">patience</span><span class="p">,</span>
|
||||
<span class="s1">'epochs'</span><span class="p">:</span> <span class="n">epochs</span><span class="p">,</span>
|
||||
<span class="s1">'batch_size'</span><span class="p">:</span> <span class="n">batch_size</span><span class="p">,</span>
|
||||
<span class="s1">'batch_size_test'</span><span class="p">:</span> <span class="n">batch_size_test</span><span class="p">,</span>
|
||||
<span class="s1">'padding_length'</span><span class="p">:</span> <span class="n">padding_length</span><span class="p">,</span>
|
||||
<span class="s1">'device'</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="p">}</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">learner_hyperparams</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">get_params</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">checkpointpath</span> <span class="o">=</span> <span class="n">checkpointpath</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'[NeuralNetwork running on </span><span class="si">{</span><span class="n">device</span><span class="si">}</span><span class="s1">]'</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">Path</span><span class="p">(</span><span class="n">checkpointpath</span><span class="p">)</span><span class="o">.</span><span class="n">parent</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.reset_net_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.reset_net_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">reset_net_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocab_size</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Reinitialize the network parameters</span>
|
||||
|
||||
<span class="sd"> :param vocab_size: the size of the vocabulary</span>
|
||||
<span class="sd"> :param n_classes: the number of target classes</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="vm">__class__</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">learner_hyperparams</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span><span class="p">[</span><span class="s1">'device'</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">xavier_uniform</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.get_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.get_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Get hyper-parameters for this estimator</span>
|
||||
|
||||
<span class="sd"> :return: a dictionary with parameter names mapped to their values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="p">{</span><span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">get_params</span><span class="p">(),</span> <span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span><span class="p">}</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.set_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.set_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">params</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Set the parameters of this trainer and the learner it is training.</span>
|
||||
<span class="sd"> In this current version, parameter names for the trainer and learner should</span>
|
||||
<span class="sd"> be disjoint.</span>
|
||||
|
||||
<span class="sd"> :param params: a `**kwargs` dictionary with the parameters</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">trainer_hyperparams</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span>
|
||||
<span class="n">learner_hyperparams</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">get_params</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">params</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
|
||||
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">trainer_hyperparams</span> <span class="ow">and</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">learner_hyperparams</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'the use of parameter </span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s1"> is ambiguous since it can refer to '</span>
|
||||
<span class="sa">f</span><span class="s1">'a parameters of the Trainer or the learner </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">trainer_hyperparams</span> <span class="ow">and</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">learner_hyperparams</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'parameter </span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s1"> is not valid'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">trainer_hyperparams</span><span class="p">:</span>
|
||||
<span class="n">trainer_hyperparams</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">learner_hyperparams</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span> <span class="o">=</span> <span class="n">trainer_hyperparams</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">learner_hyperparams</span> <span class="o">=</span> <span class="n">learner_hyperparams</span> </div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">device</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">""" Gets the device in which the network is allocated</span>
|
||||
|
||||
<span class="sd"> :return: device</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">parameters</span><span class="p">())</span><span class="o">.</span><span class="n">device</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_train_epoch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
|
||||
<span class="n">criterion</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span>
|
||||
<span class="n">losses</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">true_labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
|
||||
<span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">xi</span><span class="p">)</span>
|
||||
<span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">logits</span><span class="p">,</span> <span class="n">yi</span><span class="p">)</span>
|
||||
<span class="n">loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">step</span><span class="p">()</span>
|
||||
<span class="n">losses</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">())</span>
|
||||
<span class="n">preds</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">logits</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">status</span><span class="p">[</span><span class="s2">"loss"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">losses</span><span class="p">)</span>
|
||||
<span class="n">predictions</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">preds</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span>
|
||||
<span class="n">true_labels</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">yi</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span>
|
||||
<span class="n">status</span><span class="p">[</span><span class="s2">"acc"</span><span class="p">]</span> <span class="o">=</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">true_labels</span><span class="p">,</span> <span class="n">predictions</span><span class="p">)</span>
|
||||
<span class="n">status</span><span class="p">[</span><span class="s2">"f1"</span><span class="p">]</span> <span class="o">=</span> <span class="n">f1_score</span><span class="p">(</span><span class="n">true_labels</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">average</span><span class="o">=</span><span class="s1">'macro'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">__update_progress_bar</span><span class="p">(</span><span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_test_epoch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
|
||||
<span class="n">criterion</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span>
|
||||
<span class="n">losses</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">true_labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
|
||||
<span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
|
||||
<span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">xi</span><span class="p">)</span>
|
||||
<span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">logits</span><span class="p">,</span> <span class="n">yi</span><span class="p">)</span>
|
||||
<span class="n">losses</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">())</span>
|
||||
<span class="n">preds</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">logits</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">predictions</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">preds</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span>
|
||||
<span class="n">true_labels</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">yi</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span>
|
||||
|
||||
<span class="n">status</span><span class="p">[</span><span class="s2">"loss"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">losses</span><span class="p">)</span>
|
||||
<span class="n">status</span><span class="p">[</span><span class="s2">"acc"</span><span class="p">]</span> <span class="o">=</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">true_labels</span><span class="p">,</span> <span class="n">predictions</span><span class="p">)</span>
|
||||
<span class="n">status</span><span class="p">[</span><span class="s2">"f1"</span><span class="p">]</span> <span class="o">=</span> <span class="n">f1_score</span><span class="p">(</span><span class="n">true_labels</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">average</span><span class="o">=</span><span class="s1">'macro'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">__update_progress_bar</span><span class="p">(</span><span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__update_progress_bar</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">):</span>
|
||||
<span class="n">pbar</span><span class="o">.</span><span class="n">set_description</span><span class="p">(</span><span class="sa">f</span><span class="s1">'[</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">] training epoch=</span><span class="si">{</span><span class="n">epoch</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'tr-loss=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"tr"</span><span class="p">][</span><span class="s2">"loss"</span><span class="p">]</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'tr-acc=</span><span class="si">{</span><span class="mi">100</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"tr"</span><span class="p">][</span><span class="s2">"acc"</span><span class="p">]</span><span class="si">:</span><span class="s1">.2f</span><span class="si">}</span><span class="s1">% '</span>
|
||||
<span class="sa">f</span><span class="s1">'tr-macroF1=</span><span class="si">{</span><span class="mi">100</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"tr"</span><span class="p">][</span><span class="s2">"f1"</span><span class="p">]</span><span class="si">:</span><span class="s1">.2f</span><span class="si">}</span><span class="s1">% '</span>
|
||||
<span class="sa">f</span><span class="s1">'patience=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span><span class="o">.</span><span class="n">patience</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span><span class="o">.</span><span class="n">PATIENCE_LIMIT</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'val-loss=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"va"</span><span class="p">][</span><span class="s2">"loss"</span><span class="p">]</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'val-acc=</span><span class="si">{</span><span class="mi">100</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"va"</span><span class="p">][</span><span class="s2">"acc"</span><span class="p">]</span><span class="si">:</span><span class="s1">.2f</span><span class="si">}</span><span class="s1">% '</span>
|
||||
<span class="sa">f</span><span class="s1">'macroF1=</span><span class="si">{</span><span class="mi">100</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"va"</span><span class="p">][</span><span class="s2">"f1"</span><span class="p">]</span><span class="si">:</span><span class="s1">.2f</span><span class="si">}</span><span class="s1">%'</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.fit">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.fit">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mf">0.3</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fits the model according to the given training data.</span>
|
||||
|
||||
<span class="sd"> :param instances: list of lists of indexed tokens</span>
|
||||
<span class="sd"> :param labels: array-like of shape `(n_samples, n_classes)` with the class labels</span>
|
||||
<span class="sd"> :param val_split: proportion of training documents to be taken as the validation set (default 0.3)</span>
|
||||
<span class="sd"> :return:</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">val</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">instances</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">val_split</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span> <span class="o">=</span> <span class="n">train</span><span class="o">.</span><span class="n">classes_</span>
|
||||
<span class="n">opt</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span>
|
||||
<span class="n">checkpoint</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">checkpointpath</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">reset_net_params</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">vocab_size</span><span class="p">,</span> <span class="n">train</span><span class="o">.</span><span class="n">n_classes</span><span class="p">)</span>
|
||||
|
||||
<span class="n">train_generator</span> <span class="o">=</span> <span class="n">TorchDataset</span><span class="p">(</span><span class="n">train</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">train</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span><span class="o">.</span><span class="n">asDataloader</span><span class="p">(</span>
|
||||
<span class="n">opt</span><span class="p">[</span><span class="s1">'batch_size'</span><span class="p">],</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">pad_length</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'padding_length'</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'device'</span><span class="p">])</span>
|
||||
<span class="n">valid_generator</span> <span class="o">=</span> <span class="n">TorchDataset</span><span class="p">(</span><span class="n">val</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">val</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span><span class="o">.</span><span class="n">asDataloader</span><span class="p">(</span>
|
||||
<span class="n">opt</span><span class="p">[</span><span class="s1">'batch_size_test'</span><span class="p">],</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">pad_length</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'padding_length'</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'device'</span><span class="p">])</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'tr'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'loss'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'acc'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'f1'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">},</span>
|
||||
<span class="s1">'va'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'loss'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'acc'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'f1'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">}}</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">optim</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">Adam</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'lr'</span><span class="p">],</span> <span class="n">weight_decay</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'weight_decay'</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span> <span class="o">=</span> <span class="n">EarlyStop</span><span class="p">(</span><span class="n">opt</span><span class="p">[</span><span class="s1">'patience'</span><span class="p">],</span> <span class="n">lower_is_better</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">opt</span><span class="p">[</span><span class="s1">'epochs'</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="k">as</span> <span class="n">pbar</span><span class="p">:</span>
|
||||
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="n">pbar</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_train_epoch</span><span class="p">(</span><span class="n">train_generator</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'tr'</span><span class="p">],</span> <span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_test_epoch</span><span class="p">(</span><span class="n">valid_generator</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'va'</span><span class="p">],</span> <span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'va'</span><span class="p">][</span><span class="s1">'f1'</span><span class="p">],</span> <span class="n">epoch</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span><span class="o">.</span><span class="n">IMPROVED</span><span class="p">:</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="n">checkpoint</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span><span class="o">.</span><span class="n">STOP</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'training ended by patience exhasted; loading best model parameters in </span><span class="si">{</span><span class="n">checkpoint</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'for epoch </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">early_stop</span><span class="o">.</span><span class="n">best_epoch</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">load_state_dict</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">checkpoint</span><span class="p">))</span>
|
||||
<span class="k">break</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'performing one training pass over the validation set...'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_train_epoch</span><span class="p">(</span><span class="n">valid_generator</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'tr'</span><span class="p">],</span> <span class="n">pbar</span><span class="p">,</span> <span class="n">epoch</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'[done]'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.predict">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.predict">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts labels for the instances</span>
|
||||
|
||||
<span class="sd"> :param instances: list of lists of indexed tokens</span>
|
||||
<span class="sd"> :return: a `numpy` array of length `n` containing the label predictions, where `n` is the number of</span>
|
||||
<span class="sd"> instances in `X`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">instances</span><span class="p">),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.predict_proba">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.predict_proba">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">predict_proba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts posterior probabilities for the instances</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` instances to classify</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples, n_classes)` with the posterior probabilities</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
|
||||
<span class="n">opt</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span>
|
||||
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">xi</span> <span class="ow">in</span> <span class="n">TorchDataset</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span><span class="o">.</span><span class="n">asDataloader</span><span class="p">(</span>
|
||||
<span class="n">opt</span><span class="p">[</span><span class="s1">'batch_size_test'</span><span class="p">],</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">pad_length</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'padding_length'</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'device'</span><span class="p">]):</span>
|
||||
<span class="n">posteriors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">xi</span><span class="p">))</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">posteriors</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NeuralClassifierTrainer.transform">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.NeuralClassifierTrainer.transform">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the embeddings of the instances</span>
|
||||
|
||||
<span class="sd"> :param instances: list of lists of indexed tokens</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples, embed_size)` with the embedded instances,</span>
|
||||
<span class="sd"> where `embed_size` is defined by the classification network</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
|
||||
<span class="n">embeddings</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">opt</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">trainer_hyperparams</span>
|
||||
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
|
||||
<span class="k">for</span> <span class="n">xi</span> <span class="ow">in</span> <span class="n">TorchDataset</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span><span class="o">.</span><span class="n">asDataloader</span><span class="p">(</span>
|
||||
<span class="n">opt</span><span class="p">[</span><span class="s1">'batch_size_test'</span><span class="p">],</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">pad_length</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'padding_length'</span><span class="p">],</span> <span class="n">device</span><span class="o">=</span><span class="n">opt</span><span class="p">[</span><span class="s1">'device'</span><span class="p">]):</span>
|
||||
<span class="n">embeddings</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">document_embedding</span><span class="p">(</span><span class="n">xi</span><span class="p">)</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">())</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">embeddings</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TorchDataset">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TorchDataset">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">TorchDataset</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">Dataset</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Transforms labelled instances into a Torch's :class:`torch.utils.data.DataLoader` object</span>
|
||||
|
||||
<span class="sd"> :param instances: list of lists of indexed tokens</span>
|
||||
<span class="sd"> :param labels: array-like of shape `(n_samples, n_classes)` with the class labels</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">instances</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="p">{</span><span class="s1">'doc'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">[</span><span class="n">index</span><span class="p">],</span> <span class="s1">'label'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="kc">None</span><span class="p">}</span>
|
||||
|
||||
<div class="viewcode-block" id="TorchDataset.asDataloader">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TorchDataset.asDataloader">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">asDataloader</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">shuffle</span><span class="p">,</span> <span class="n">pad_length</span><span class="p">,</span> <span class="n">device</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Converts the labelled collection into a Torch DataLoader with dynamic padding for</span>
|
||||
<span class="sd"> the batch</span>
|
||||
|
||||
<span class="sd"> :param batch_size: batch size</span>
|
||||
<span class="sd"> :param shuffle: whether or not to shuffle instances</span>
|
||||
<span class="sd"> :param pad_length: the maximum length for the list of tokens (dynamic padding is</span>
|
||||
<span class="sd"> applied, meaning that if the longest document in the batch is shorter than</span>
|
||||
<span class="sd"> `pad_length`, then the batch is padded up to its length, and not to `pad_length`.</span>
|
||||
<span class="sd"> :param device: whether to allocate tensors in cpu or in cuda</span>
|
||||
<span class="sd"> :return: a :class:`torch.utils.data.DataLoader` object</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="nf">collate</span><span class="p">(</span><span class="n">batch</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">LongTensor</span><span class="p">(</span><span class="n">item</span><span class="p">[</span><span class="s1">'doc'</span><span class="p">][:</span><span class="n">pad_length</span><span class="p">])</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">batch</span><span class="p">]</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">padding_value</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'PAD_INDEX'</span><span class="p">])</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="n">targets</span> <span class="o">=</span> <span class="p">[</span><span class="n">item</span><span class="p">[</span><span class="s1">'label'</span><span class="p">]</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">batch</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">targets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">data</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">targets</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">as_tensor</span><span class="p">(</span><span class="n">targets</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">long</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">[</span><span class="n">data</span><span class="p">,</span> <span class="n">targets</span><span class="p">]</span>
|
||||
|
||||
<span class="n">torchDataset</span> <span class="o">=</span> <span class="n">TorchDataset</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">DataLoader</span><span class="p">(</span><span class="n">torchDataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="n">shuffle</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">collate</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">TextClassifierNet</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">,</span> <span class="n">metaclass</span><span class="o">=</span><span class="n">ABCMeta</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstract Text classifier (`torch.nn.Module`)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet.document_embedding">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet.document_embedding">[docs]</a>
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">document_embedding</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Embeds documents (i.e., performs the forward pass up to the</span>
|
||||
<span class="sd"> next-to-last layer).</span>
|
||||
|
||||
<span class="sd"> :param x: a batch of instances, typically generated by a torch's `DataLoader`</span>
|
||||
<span class="sd"> instance (see :class:`quapy.classification.neural.TorchDataset`)</span>
|
||||
<span class="sd"> :return: a torch tensor of shape `(n_samples, n_dimensions)`, where</span>
|
||||
<span class="sd"> `n_samples` is the number of documents, and `n_dimensions` is the</span>
|
||||
<span class="sd"> dimensionality of the embedding</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet.forward">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet.forward">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Performs the forward pass.</span>
|
||||
|
||||
<span class="sd"> :param x: a batch of instances, typically generated by a torch's `DataLoader`</span>
|
||||
<span class="sd"> instance (see :class:`quapy.classification.neural.TorchDataset`)</span>
|
||||
<span class="sd"> :return: a tensor of shape `(n_instances, n_classes)` with the decision scores</span>
|
||||
<span class="sd"> for each of the instances and classes</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">doc_embedded</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">document_embedding</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">doc_embedded</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet.dimensions">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet.dimensions">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">dimensions</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Gets the number of dimensions of the embedding space</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">dim</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet.predict_proba">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet.predict_proba">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">predict_proba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts posterior probabilities for the instances in `x`</span>
|
||||
|
||||
<span class="sd"> :param x: a torch tensor of indexed tokens with shape `(n_instances, pad_length)`</span>
|
||||
<span class="sd"> where `n_instances` is the number of instances in the batch, and `pad_length`</span>
|
||||
<span class="sd"> is length of the pad in the batch</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples, n_classes)` with the posterior probabilities</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">logits</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet.xavier_uniform">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet.xavier_uniform">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">xavier_uniform</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Performs Xavier initialization of the network parameters</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">parameters</span><span class="p">():</span>
|
||||
<span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">dim</span><span class="p">()</span> <span class="o">></span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">p</span><span class="o">.</span><span class="n">requires_grad</span><span class="p">:</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">init</span><span class="o">.</span><span class="n">xavier_uniform_</span><span class="p">(</span><span class="n">p</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TextClassifierNet.get_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.TextClassifierNet.get_params">[docs]</a>
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Get hyper-parameters for this estimator</span>
|
||||
|
||||
<span class="sd"> :return: a dictionary with parameter names mapped to their values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">vocabulary_size</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return the size of the vocabulary</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LSTMnet">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.LSTMnet">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">LSTMnet</span><span class="p">(</span><span class="n">TextClassifierNet</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An implementation of :class:`quapy.classification.neural.TextClassifierNet` based on</span>
|
||||
<span class="sd"> Long Short Term Memory networks.</span>
|
||||
|
||||
<span class="sd"> :param vocabulary_size: the size of the vocabulary</span>
|
||||
<span class="sd"> :param n_classes: number of target classes</span>
|
||||
<span class="sd"> :param embedding_size: the dimensionality of the word embeddings space (default 100)</span>
|
||||
<span class="sd"> :param hidden_size: the dimensionality of the hidden space (default 256)</span>
|
||||
<span class="sd"> :param repr_size: the dimensionality of the document embeddings space (default 100)</span>
|
||||
<span class="sd"> :param lstm_class_nlayers: number of LSTM layers (default 1)</span>
|
||||
<span class="sd"> :param drop_p: drop probability for dropout (default 0.5)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocabulary_size</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">embedding_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">hidden_size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">repr_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">lstm_class_nlayers</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="n">drop_p</span><span class="o">=</span><span class="mf">0.5</span><span class="p">):</span>
|
||||
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_size_</span> <span class="o">=</span> <span class="n">vocabulary_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_classes</span> <span class="o">=</span> <span class="n">n_classes</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span>
|
||||
<span class="s1">'embedding_size'</span><span class="p">:</span> <span class="n">embedding_size</span><span class="p">,</span>
|
||||
<span class="s1">'hidden_size'</span><span class="p">:</span> <span class="n">hidden_size</span><span class="p">,</span>
|
||||
<span class="s1">'repr_size'</span><span class="p">:</span> <span class="n">repr_size</span><span class="p">,</span>
|
||||
<span class="s1">'lstm_class_nlayers'</span><span class="p">:</span> <span class="n">lstm_class_nlayers</span><span class="p">,</span>
|
||||
<span class="s1">'drop_p'</span><span class="p">:</span> <span class="n">drop_p</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">word_embedding</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Embedding</span><span class="p">(</span><span class="n">vocabulary_size</span><span class="p">,</span> <span class="n">embedding_size</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">lstm</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">LSTM</span><span class="p">(</span><span class="n">embedding_size</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="n">lstm_class_nlayers</span><span class="p">,</span> <span class="n">dropout</span><span class="o">=</span><span class="n">drop_p</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">drop_p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dim</span> <span class="o">=</span> <span class="n">repr_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">doc_embedder</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">hidden_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dim</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">output</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">dim</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__init_hidden</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">set_size</span><span class="p">):</span>
|
||||
<span class="n">opt</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyperparams</span>
|
||||
<span class="n">var_hidden</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">opt</span><span class="p">[</span><span class="s1">'lstm_class_nlayers'</span><span class="p">],</span> <span class="n">set_size</span><span class="p">,</span> <span class="n">opt</span><span class="p">[</span><span class="s1">'hidden_size'</span><span class="p">])</span>
|
||||
<span class="n">var_cell</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">opt</span><span class="p">[</span><span class="s1">'lstm_class_nlayers'</span><span class="p">],</span> <span class="n">set_size</span><span class="p">,</span> <span class="n">opt</span><span class="p">[</span><span class="s1">'hidden_size'</span><span class="p">])</span>
|
||||
<span class="k">if</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">lstm</span><span class="o">.</span><span class="n">parameters</span><span class="p">())</span><span class="o">.</span><span class="n">is_cuda</span><span class="p">:</span>
|
||||
<span class="n">var_hidden</span><span class="p">,</span> <span class="n">var_cell</span> <span class="o">=</span> <span class="n">var_hidden</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span> <span class="n">var_cell</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">var_hidden</span><span class="p">,</span> <span class="n">var_cell</span>
|
||||
|
||||
<div class="viewcode-block" id="LSTMnet.document_embedding">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.LSTMnet.document_embedding">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">document_embedding</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Embeds documents (i.e., performs the forward pass up to the</span>
|
||||
<span class="sd"> next-to-last layer).</span>
|
||||
|
||||
<span class="sd"> :param x: a batch of instances, typically generated by a torch's `DataLoader`</span>
|
||||
<span class="sd"> instance (see :class:`quapy.classification.neural.TorchDataset`)</span>
|
||||
<span class="sd"> :return: a torch tensor of shape `(n_samples, n_dimensions)`, where</span>
|
||||
<span class="sd"> `n_samples` is the number of documents, and `n_dimensions` is the</span>
|
||||
<span class="sd"> dimensionality of the embedding</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">embedded</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">word_embedding</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
|
||||
<span class="n">rnn_output</span><span class="p">,</span> <span class="n">rnn_hidden</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lstm</span><span class="p">(</span><span class="n">embedded</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">__init_hidden</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">size</span><span class="p">()[</span><span class="mi">0</span><span class="p">]))</span>
|
||||
<span class="n">abstracted</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">rnn_hidden</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span>
|
||||
<span class="n">abstracted</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">doc_embedder</span><span class="p">(</span><span class="n">abstracted</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">abstracted</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LSTMnet.get_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.LSTMnet.get_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Get hyper-parameters for this estimator</span>
|
||||
|
||||
<span class="sd"> :return: a dictionary with parameter names mapped to their values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyperparams</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">vocabulary_size</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return the size of the vocabulary</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_size_</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="CNNnet">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.CNNnet">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">CNNnet</span><span class="p">(</span><span class="n">TextClassifierNet</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An implementation of :class:`quapy.classification.neural.TextClassifierNet` based on</span>
|
||||
<span class="sd"> Convolutional Neural Networks.</span>
|
||||
|
||||
<span class="sd"> :param vocabulary_size: the size of the vocabulary</span>
|
||||
<span class="sd"> :param n_classes: number of target classes</span>
|
||||
<span class="sd"> :param embedding_size: the dimensionality of the word embeddings space (default 100)</span>
|
||||
<span class="sd"> :param hidden_size: the dimensionality of the hidden space (default 256)</span>
|
||||
<span class="sd"> :param repr_size: the dimensionality of the document embeddings space (default 100)</span>
|
||||
<span class="sd"> :param kernel_heights: list of kernel lengths (default [3,5,7]), i.e., the number of</span>
|
||||
<span class="sd"> consecutive tokens that each kernel covers</span>
|
||||
<span class="sd"> :param stride: convolutional stride (default 1)</span>
|
||||
<span class="sd"> :param stride: convolutional pad (default 0)</span>
|
||||
<span class="sd"> :param drop_p: drop probability for dropout (default 0.5)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocabulary_size</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">embedding_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">hidden_size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">repr_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
|
||||
<span class="n">kernel_heights</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">drop_p</span><span class="o">=</span><span class="mf">0.5</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">(</span><span class="n">CNNnet</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_size_</span> <span class="o">=</span> <span class="n">vocabulary_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_classes</span> <span class="o">=</span> <span class="n">n_classes</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">hyperparams</span><span class="o">=</span><span class="p">{</span>
|
||||
<span class="s1">'embedding_size'</span><span class="p">:</span> <span class="n">embedding_size</span><span class="p">,</span>
|
||||
<span class="s1">'hidden_size'</span><span class="p">:</span> <span class="n">hidden_size</span><span class="p">,</span>
|
||||
<span class="s1">'repr_size'</span><span class="p">:</span> <span class="n">repr_size</span><span class="p">,</span>
|
||||
<span class="s1">'kernel_heights'</span><span class="p">:</span><span class="n">kernel_heights</span><span class="p">,</span>
|
||||
<span class="s1">'stride'</span><span class="p">:</span> <span class="n">stride</span><span class="p">,</span>
|
||||
<span class="s1">'drop_p'</span><span class="p">:</span> <span class="n">drop_p</span>
|
||||
<span class="p">}</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">word_embedding</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Embedding</span><span class="p">(</span><span class="n">vocabulary_size</span><span class="p">,</span> <span class="n">embedding_size</span><span class="p">)</span>
|
||||
<span class="n">in_channels</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="p">(</span><span class="n">kernel_heights</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">embedding_size</span><span class="p">),</span> <span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">conv2</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="p">(</span><span class="n">kernel_heights</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">embedding_size</span><span class="p">),</span> <span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">conv3</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="p">(</span><span class="n">kernel_heights</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">embedding_size</span><span class="p">),</span> <span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">drop_p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dim</span> <span class="o">=</span> <span class="n">repr_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">doc_embedder</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">kernel_heights</span><span class="p">)</span> <span class="o">*</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dim</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">output</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">dim</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__conv_block</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">,</span> <span class="n">conv_layer</span><span class="p">):</span>
|
||||
<span class="n">conv_out</span> <span class="o">=</span> <span class="n">conv_layer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="c1"># conv_out.size() = (batch_size, out_channels, dim, 1)</span>
|
||||
<span class="n">activation</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">conv_out</span><span class="o">.</span><span class="n">squeeze</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span> <span class="c1"># activation.size() = (batch_size, out_channels, dim1)</span>
|
||||
<span class="n">max_out</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">max_pool1d</span><span class="p">(</span><span class="n">activation</span><span class="p">,</span> <span class="n">activation</span><span class="o">.</span><span class="n">size</span><span class="p">()[</span><span class="mi">2</span><span class="p">])</span><span class="o">.</span><span class="n">squeeze</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># maxpool_out.size() = (batch_size, out_channels)</span>
|
||||
<span class="k">return</span> <span class="n">max_out</span>
|
||||
|
||||
<div class="viewcode-block" id="CNNnet.document_embedding">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.CNNnet.document_embedding">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">document_embedding</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Embeds documents (i.e., performs the forward pass up to the</span>
|
||||
<span class="sd"> next-to-last layer).</span>
|
||||
|
||||
<span class="sd"> :param input: a batch of instances, typically generated by a torch's `DataLoader`</span>
|
||||
<span class="sd"> instance (see :class:`quapy.classification.neural.TorchDataset`)</span>
|
||||
<span class="sd"> :return: a torch tensor of shape `(n_samples, n_dimensions)`, where</span>
|
||||
<span class="sd"> `n_samples` is the number of documents, and `n_dimensions` is the</span>
|
||||
<span class="sd"> dimensionality of the embedding</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="nb">input</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">word_embedding</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
|
||||
<span class="nb">input</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># input.size() = (batch_size, 1, num_seq, embedding_length)</span>
|
||||
|
||||
<span class="n">max_out1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">__conv_block</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">conv1</span><span class="p">)</span>
|
||||
<span class="n">max_out2</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">__conv_block</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">conv2</span><span class="p">)</span>
|
||||
<span class="n">max_out3</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">__conv_block</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">conv3</span><span class="p">)</span>
|
||||
|
||||
<span class="n">all_out</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">((</span><span class="n">max_out1</span><span class="p">,</span> <span class="n">max_out2</span><span class="p">,</span> <span class="n">max_out3</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># all_out.size() = (batch_size, num_kernels*out_channels)</span>
|
||||
<span class="n">abstracted</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">all_out</span><span class="p">))</span> <span class="c1"># (batch_size, num_kernels*out_channels)</span>
|
||||
<span class="n">abstracted</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">doc_embedder</span><span class="p">(</span><span class="n">abstracted</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">abstracted</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="CNNnet.get_params">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.neural.CNNnet.get_params">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Get hyper-parameters for this estimator</span>
|
||||
|
||||
<span class="sd"> :return: a dictionary with parameter names mapped to their values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyperparams</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">vocabulary_size</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return the size of the vocabulary</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_size_</span></div>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,268 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.classification.svmperf — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.classification.svmperf</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.classification.svmperf</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">random</span>
|
||||
<span class="kn">import</span> <span class="nn">shutil</span>
|
||||
<span class="kn">import</span> <span class="nn">subprocess</span>
|
||||
<span class="kn">import</span> <span class="nn">tempfile</span>
|
||||
<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">remove</span><span class="p">,</span> <span class="n">makedirs</span>
|
||||
<span class="kn">from</span> <span class="nn">os.path</span> <span class="kn">import</span> <span class="n">join</span><span class="p">,</span> <span class="n">exists</span>
|
||||
<span class="kn">from</span> <span class="nn">subprocess</span> <span class="kn">import</span> <span class="n">PIPE</span><span class="p">,</span> <span class="n">STDOUT</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClassifierMixin</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">dump_svmlight_file</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="SVMperf">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.svmperf.SVMperf">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">SVMperf</span><span class="p">(</span><span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClassifierMixin</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""A wrapper for the `SVM-perf package <https://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html>`__ by Thorsten Joachims.</span>
|
||||
<span class="sd"> When using losses for quantification, the source code has to be patched. See</span>
|
||||
<span class="sd"> the `installation documentation <https://hlt-isti.github.io/QuaPy/build/html/Installation.html#svm-perf-with-quantification-oriented-losses>`__</span>
|
||||
<span class="sd"> for further details.</span>
|
||||
|
||||
<span class="sd"> References:</span>
|
||||
|
||||
<span class="sd"> * `Esuli et al.2015 <https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0>`__</span>
|
||||
<span class="sd"> * `Barranquero et al.2015 <https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X>`__</span>
|
||||
|
||||
<span class="sd"> :param svmperf_base: path to directory containing the binary files `svm_perf_learn` and `svm_perf_classify`</span>
|
||||
<span class="sd"> :param C: trade-off between training error and margin (default 0.01)</span>
|
||||
<span class="sd"> :param verbose: set to True to print svm-perf std outputs</span>
|
||||
<span class="sd"> :param loss: the loss to optimize for. Available losses are "01", "f1", "kld", "nkld", "q", "qacc", "qf1", "qgm", "mae", "mrae".</span>
|
||||
<span class="sd"> :param host_folder: directory where to store the trained model; set to None (default) for using a tmp directory</span>
|
||||
<span class="sd"> (temporal directories are automatically deleted)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="c1"># losses with their respective codes in svm_perf implementation</span>
|
||||
<span class="n">valid_losses</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'01'</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'f1'</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'kld'</span><span class="p">:</span><span class="mi">12</span><span class="p">,</span> <span class="s1">'nkld'</span><span class="p">:</span><span class="mi">13</span><span class="p">,</span> <span class="s1">'q'</span><span class="p">:</span><span class="mi">22</span><span class="p">,</span> <span class="s1">'qacc'</span><span class="p">:</span><span class="mi">23</span><span class="p">,</span> <span class="s1">'qf1'</span><span class="p">:</span><span class="mi">24</span><span class="p">,</span> <span class="s1">'qgm'</span><span class="p">:</span><span class="mi">25</span><span class="p">,</span> <span class="s1">'mae'</span><span class="p">:</span><span class="mi">26</span><span class="p">,</span> <span class="s1">'mrae'</span><span class="p">:</span><span class="mi">27</span><span class="p">}</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">svmperf_base</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">loss</span><span class="o">=</span><span class="s1">'01'</span><span class="p">,</span> <span class="n">host_folder</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="n">exists</span><span class="p">(</span><span class="n">svmperf_base</span><span class="p">),</span> <span class="sa">f</span><span class="s1">'path </span><span class="si">{</span><span class="n">svmperf_base</span><span class="si">}</span><span class="s1"> does not seem to point to a valid path'</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">svmperf_base</span> <span class="o">=</span> <span class="n">svmperf_base</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">C</span> <span class="o">=</span> <span class="n">C</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">loss</span> <span class="o">=</span> <span class="n">loss</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">host_folder</span> <span class="o">=</span> <span class="n">host_folder</span>
|
||||
|
||||
<span class="c1"># def set_params(self, **parameters):</span>
|
||||
<span class="c1"># """</span>
|
||||
<span class="c1"># Set the hyper-parameters for svm-perf. Currently, only the `C` and `loss` parameters are supported</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># :param parameters: a `**kwargs` dictionary `{'C': <float>}`</span>
|
||||
<span class="c1"># """</span>
|
||||
<span class="c1"># assert sorted(list(parameters.keys())) == ['C', 'loss'], \</span>
|
||||
<span class="c1"># 'currently, only the C and loss parameters are supported'</span>
|
||||
<span class="c1"># self.C = parameters.get('C', self.C)</span>
|
||||
<span class="c1"># self.loss = parameters.get('loss', self.loss)</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># def get_params(self, deep=True):</span>
|
||||
<span class="c1"># return {'C': self.C, 'loss': self.loss}</span>
|
||||
|
||||
<div class="viewcode-block" id="SVMperf.fit">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.svmperf.SVMperf.fit">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Trains the SVM for the multivariate performance loss</span>
|
||||
|
||||
<span class="sd"> :param X: training instances</span>
|
||||
<span class="sd"> :param y: a binary vector of labels</span>
|
||||
<span class="sd"> :return: `self`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="bp">self</span><span class="o">.</span><span class="n">loss</span> <span class="ow">in</span> <span class="n">SVMperf</span><span class="o">.</span><span class="n">valid_losses</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'unsupported loss </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">loss</span><span class="si">}</span><span class="s1">, valid ones are </span><span class="si">{</span><span class="nb">list</span><span class="p">(</span><span class="n">SVMperf</span><span class="o">.</span><span class="n">valid_losses</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span><span class="si">}</span><span class="s1">'</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">svmperf_learn</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">svmperf_base</span><span class="p">,</span> <span class="s1">'svm_perf_learn'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">svmperf_classify</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">svmperf_base</span><span class="p">,</span> <span class="s1">'svm_perf_classify'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">loss_cmd</span> <span class="o">=</span> <span class="s1">'-w 3 -l '</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">valid_losses</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">loss</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">c_cmd</span> <span class="o">=</span> <span class="s1">'-c '</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">C</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_classes_</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
|
||||
<span class="n">local_random</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">Random</span><span class="p">()</span>
|
||||
<span class="c1"># this would allow to run parallel instances of predict</span>
|
||||
<span class="n">random_code</span> <span class="o">=</span> <span class="s1">'svmperfprocess'</span><span class="o">+</span><span class="s1">'-'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">local_random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">))</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">host_folder</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="c1"># tmp dir are removed after the fit terminates in multiprocessing...</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span> <span class="o">=</span> <span class="n">tempfile</span><span class="o">.</span><span class="n">TemporaryDirectory</span><span class="p">(</span><span class="n">suffix</span><span class="o">=</span><span class="n">random_code</span><span class="p">)</span><span class="o">.</span><span class="n">name</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">host_folder</span><span class="p">,</span> <span class="s1">'.'</span> <span class="o">+</span> <span class="n">random_code</span><span class="p">)</span>
|
||||
<span class="n">makedirs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span><span class="p">,</span> <span class="s1">'model-'</span><span class="o">+</span><span class="n">random_code</span><span class="p">)</span>
|
||||
<span class="n">traindat</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'train-</span><span class="si">{</span><span class="n">random_code</span><span class="si">}</span><span class="s1">.dat'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">dump_svmlight_file</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">traindat</span><span class="p">,</span> <span class="n">zero_based</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="bp">self</span><span class="o">.</span><span class="n">svmperf_learn</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">c_cmd</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">loss_cmd</span><span class="p">,</span> <span class="n">traindat</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">])</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'[Running]'</span><span class="p">,</span> <span class="n">cmd</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="o">.</span><span class="n">split</span><span class="p">(),</span> <span class="n">stdout</span><span class="o">=</span><span class="n">PIPE</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">STDOUT</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">exists</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">):</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">stderr</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
|
||||
<span class="n">remove</span><span class="p">(</span><span class="n">traindat</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="SVMperf.predict">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.svmperf.SVMperf.predict">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Predicts labels for the instances `X`</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` instances to classify</span>
|
||||
<span class="sd"> :return: a `numpy` array of length `n` containing the label predictions, where `n` is the number of</span>
|
||||
<span class="sd"> instances in `X`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">confidence_scores</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">decision_function</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="p">(</span><span class="n">confidence_scores</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1</span>
|
||||
<span class="k">return</span> <span class="n">predictions</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="SVMperf.decision_function">
|
||||
<a class="viewcode-back" href="../../../quapy.classification.html#quapy.classification.svmperf.SVMperf.decision_function">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">decision_function</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Evaluate the decision function for the samples in `X`.</span>
|
||||
|
||||
<span class="sd"> :param X: array-like of shape `(n_samples, n_features)` containing the instances to classify</span>
|
||||
<span class="sd"> :param y: unused</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_samples,)` containing the decision scores of the instances</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'tmpdir'</span><span class="p">),</span> <span class="s1">'predict called before fit'</span>
|
||||
<span class="k">assert</span> <span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">,</span> <span class="s1">'model directory corrupted'</span>
|
||||
<span class="k">assert</span> <span class="n">exists</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">),</span> <span class="s1">'model not found'</span>
|
||||
<span class="k">if</span> <span class="n">y</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
|
||||
|
||||
<span class="c1"># in order to allow for parallel runs of predict, a random code is assigned</span>
|
||||
<span class="n">local_random</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">Random</span><span class="p">()</span>
|
||||
<span class="n">random_code</span> <span class="o">=</span> <span class="s1">'-'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">local_random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">))</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>
|
||||
<span class="n">predictions_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span><span class="p">,</span> <span class="s1">'predictions'</span> <span class="o">+</span> <span class="n">random_code</span> <span class="o">+</span> <span class="s1">'.dat'</span><span class="p">)</span>
|
||||
<span class="n">testdat</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span><span class="p">,</span> <span class="s1">'test'</span> <span class="o">+</span> <span class="n">random_code</span> <span class="o">+</span> <span class="s1">'.dat'</span><span class="p">)</span>
|
||||
<span class="n">dump_svmlight_file</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">testdat</span><span class="p">,</span> <span class="n">zero_based</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="n">cmd</span> <span class="o">=</span> <span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="bp">self</span><span class="o">.</span><span class="n">svmperf_classify</span><span class="p">,</span> <span class="n">testdat</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">,</span> <span class="n">predictions_path</span><span class="p">])</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'[Running]'</span><span class="p">,</span> <span class="n">cmd</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="o">.</span><span class="n">split</span><span class="p">(),</span> <span class="n">stdout</span><span class="o">=</span><span class="n">PIPE</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">STDOUT</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
|
||||
|
||||
<span class="n">scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">loadtxt</span><span class="p">(</span><span class="n">predictions_path</span><span class="p">)</span>
|
||||
<span class="n">remove</span><span class="p">(</span><span class="n">testdat</span><span class="p">)</span>
|
||||
<span class="n">remove</span><span class="p">(</span><span class="n">predictions_path</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">scores</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="fm">__del__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'tmpdir'</span><span class="p">):</span>
|
||||
<span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tmpdir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,165 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.data._ifcb — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.data._ifcb</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.data._ifcb</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">os</span>
|
||||
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">AbstractProtocol</span>
|
||||
|
||||
<div class="viewcode-block" id="IFCBTrainSamplesFromDir">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._ifcb.IFCBTrainSamplesFromDir">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">IFCBTrainSamplesFromDir</span><span class="p">(</span><span class="n">AbstractProtocol</span><span class="p">):</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path_dir</span><span class="p">:</span><span class="nb">str</span><span class="p">,</span> <span class="n">classes</span><span class="p">:</span> <span class="nb">list</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">path_dir</span> <span class="o">=</span> <span class="n">path_dir</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes</span> <span class="o">=</span> <span class="n">classes</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">samples</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">filename</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">path_dir</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">filename</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'.csv'</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">samples</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">samples</span><span class="p">:</span>
|
||||
<span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">path_dir</span><span class="p">,</span><span class="n">sample</span><span class="p">))</span>
|
||||
<span class="c1"># all columns but the first where we get the class</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:]</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">()</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">()</span>
|
||||
<span class="k">yield</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span>
|
||||
|
||||
<div class="viewcode-block" id="IFCBTrainSamplesFromDir.total">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._ifcb.IFCBTrainSamplesFromDir.total">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the total number of samples that the protocol generates.</span>
|
||||
|
||||
<span class="sd"> :return: The number of training samples to generate.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">samples</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="IFCBTestSamples">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._ifcb.IFCBTestSamples">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">IFCBTestSamples</span><span class="p">(</span><span class="n">AbstractProtocol</span><span class="p">):</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path_dir</span><span class="p">:</span><span class="nb">str</span><span class="p">,</span> <span class="n">test_prevalences_path</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">path_dir</span> <span class="o">=</span> <span class="n">path_dir</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">test_prevalences</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path_dir</span><span class="p">,</span> <span class="n">test_prevalences_path</span><span class="p">))</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">test_sample</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">test_prevalences</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
|
||||
<span class="c1">#Load the sample from disk</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">path_dir</span><span class="p">,</span><span class="n">test_sample</span><span class="p">[</span><span class="s1">'sample'</span><span class="p">]</span><span class="o">+</span><span class="s1">'.csv'</span><span class="p">))</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">()</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">test_sample</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">()</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="k">yield</span> <span class="n">X</span><span class="p">,</span> <span class="n">prevalences</span>
|
||||
|
||||
<div class="viewcode-block" id="IFCBTestSamples.total">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._ifcb.IFCBTestSamples.total">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the total number of samples that the protocol generates.</span>
|
||||
|
||||
<span class="sd"> :return: The number of test samples to generate.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">test_prevalences</span><span class="o">.</span><span class="n">index</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,307 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.data._lequa2022 — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.data._lequa2022</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.data._lequa2022</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Tuple</span><span class="p">,</span> <span class="n">Union</span>
|
||||
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">import</span> <span class="nn">os</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">AbstractProtocol</span>
|
||||
|
||||
<span class="n">DEV_SAMPLES</span> <span class="o">=</span> <span class="mi">1000</span>
|
||||
<span class="n">TEST_SAMPLES</span> <span class="o">=</span> <span class="mi">5000</span>
|
||||
|
||||
<span class="n">ERROR_TOL</span> <span class="o">=</span> <span class="mf">1E-3</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="load_category_map">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.load_category_map">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">load_category_map</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="n">cat2code</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rt'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
|
||||
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">fin</span><span class="p">:</span>
|
||||
<span class="n">category</span><span class="p">,</span> <span class="n">code</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">()</span>
|
||||
<span class="n">cat2code</span><span class="p">[</span><span class="n">category</span><span class="p">]</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">code</span><span class="p">)</span>
|
||||
<span class="n">code2cat</span> <span class="o">=</span> <span class="p">[</span><span class="n">cat</span> <span class="k">for</span> <span class="n">cat</span><span class="p">,</span> <span class="n">code</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">cat2code</span><span class="o">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">])]</span>
|
||||
<span class="k">return</span> <span class="n">cat2code</span><span class="p">,</span> <span class="n">code2cat</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="load_raw_documents">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.load_raw_documents">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">load_raw_documents</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
|
||||
<span class="n">documents</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s2">"text"</span><span class="p">]</span><span class="o">.</span><span class="n">values</span><span class="p">)</span>
|
||||
<span class="n">labels</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="s2">"label"</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">:</span>
|
||||
<span class="n">labels</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s2">"label"</span><span class="p">]</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">documents</span><span class="p">,</span> <span class="n">labels</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="load_vector_documents">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.load_vector_documents">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">load_vector_documents</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="n">D</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">(</span><span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="n">labelled</span> <span class="o">=</span> <span class="n">D</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">301</span>
|
||||
<span class="k">if</span> <span class="n">labelled</span><span class="p">:</span>
|
||||
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">D</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:],</span> <span class="n">D</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">D</span><span class="p">,</span> <span class="kc">None</span>
|
||||
<span class="k">return</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="SamplesFromDir">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.SamplesFromDir">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">SamplesFromDir</span><span class="p">(</span><span class="n">AbstractProtocol</span><span class="p">):</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path_dir</span><span class="p">:</span><span class="nb">str</span><span class="p">,</span> <span class="n">ground_truth_path</span><span class="p">:</span><span class="nb">str</span><span class="p">,</span> <span class="n">load_fn</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">path_dir</span> <span class="o">=</span> <span class="n">path_dir</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">load_fn</span> <span class="o">=</span> <span class="n">load_fn</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">true_prevs</span> <span class="o">=</span> <span class="n">ResultSubmission</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">ground_truth_path</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="nb">id</span><span class="p">,</span> <span class="n">prevalence</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">true_prevs</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
|
||||
<span class="n">sample</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">load_fn</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">path_dir</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="nb">id</span><span class="si">}</span><span class="s1">.txt'</span><span class="p">))</span>
|
||||
<span class="k">yield</span> <span class="n">sample</span><span class="p">,</span> <span class="n">prevalence</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">ResultSubmission</span><span class="p">:</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">df</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__init_df</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">categories</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">categories</span><span class="p">,</span> <span class="nb">int</span><span class="p">)</span> <span class="ow">or</span> <span class="n">categories</span> <span class="o"><</span> <span class="mi">2</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">TypeError</span><span class="p">(</span><span class="s1">'wrong format for categories: an int (>=2) was expected'</span><span class="p">)</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">categories</span><span class="p">)))</span>
|
||||
<span class="n">df</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">set_names</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">n_categories</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="o">.</span><span class="n">values</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.add">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.add">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sample_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">prevalence_values</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">sample_id</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">TypeError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error: expected int for sample_sample, found </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">sample_id</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">prevalence_values</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">TypeError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error: expected np.ndarray for prevalence_values, found </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">prevalence_values</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">df</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">__init_df</span><span class="p">(</span><span class="n">categories</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">prevalence_values</span><span class="p">))</span>
|
||||
<span class="k">if</span> <span class="n">sample_id</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">values</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error: prevalence values for "</span><span class="si">{</span><span class="n">sample_id</span><span class="si">}</span><span class="s1">" already added'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">prevalence_values</span><span class="o">.</span><span class="n">ndim</span> <span class="o">!=</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">prevalence_values</span><span class="o">.</span><span class="n">size</span> <span class="o">!=</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_categories</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error: wrong shape found for prevalence vector </span><span class="si">{</span><span class="n">prevalence_values</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="n">prevalence_values</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">any</span><span class="p">()</span> <span class="ow">or</span> <span class="p">(</span><span class="n">prevalence_values</span> <span class="o">></span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">any</span><span class="p">():</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error: prevalence values out of range [0,1] for "</span><span class="si">{</span><span class="n">sample_id</span><span class="si">}</span><span class="s1">"'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">prevalence_values</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">></span> <span class="n">ERROR_TOL</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error: prevalence values do not sum up to one for "</span><span class="si">{</span><span class="n">sample_id</span><span class="si">}</span><span class="s1">"'</span>
|
||||
<span class="sa">f</span><span class="s1">'(error tolerance </span><span class="si">{</span><span class="n">ERROR_TOL</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">sample_id</span><span class="p">]</span> <span class="o">=</span> <span class="n">prevalence_values</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.load">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.load">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">path</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="s1">'ResultSubmission'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">ResultSubmission</span><span class="o">.</span><span class="n">check_file_format</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
|
||||
<span class="n">r</span> <span class="o">=</span> <span class="n">ResultSubmission</span><span class="p">()</span>
|
||||
<span class="n">r</span><span class="o">.</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span>
|
||||
<span class="k">return</span> <span class="n">r</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.dump">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.dump">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">dump</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="n">ResultSubmission</span><span class="o">.</span><span class="n">check_dataframe_format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">to_csv</span><span class="p">(</span><span class="n">path</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.prevalence">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.prevalence">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">prevalence</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sample_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="n">sel</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">sample_id</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">sel</span><span class="o">.</span><span class="n">empty</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="kc">None</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">sel</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.iterrows">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.iterrows">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">iterrows</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">df</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
|
||||
<span class="n">prevalence</span> <span class="o">=</span> <span class="n">row</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
|
||||
<span class="k">yield</span> <span class="n">index</span><span class="p">,</span> <span class="n">prevalence</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.check_file_format">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.check_file_format">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">check_file_format</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span> <span class="o">-></span> <span class="n">Union</span><span class="p">[</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="n">Tuple</span><span class="p">[</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="nb">str</span><span class="p">]]:</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'the file </span><span class="si">{</span><span class="n">path</span><span class="si">}</span><span class="s1"> does not seem to be a valid csv file. '</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">ResultSubmission</span><span class="o">.</span><span class="n">check_dataframe_format</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="n">path</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ResultSubmission.check_dataframe_format">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data._lequa2022.ResultSubmission.check_dataframe_format">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">check_dataframe_format</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="n">Union</span><span class="p">[</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="n">Tuple</span><span class="p">[</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span> <span class="nb">str</span><span class="p">]]:</span>
|
||||
<span class="n">hint_path</span> <span class="o">=</span> <span class="s1">''</span> <span class="c1"># if given, show the data path in the error message</span>
|
||||
<span class="k">if</span> <span class="n">path</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">hint_path</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">' in </span><span class="si">{</span><span class="n">path</span><span class="si">}</span><span class="s1">'</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">df</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">name</span> <span class="o">!=</span> <span class="s1">'id'</span> <span class="ow">or</span> <span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">)</span> <span class="o"><</span> <span class="mi">2</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'wrong header</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1">, '</span>
|
||||
<span class="sa">f</span><span class="s1">'the format of the header should be "id,0,...,n-1", '</span>
|
||||
<span class="sa">f</span><span class="s1">'where n is the number of categories'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">ci</span><span class="p">)</span> <span class="k">for</span> <span class="n">ci</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="o">.</span><span class="n">values</span><span class="p">]</span> <span class="o">!=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">))):</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'wrong header</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1">, category ids should be 0,1,2,...,n-1, '</span>
|
||||
<span class="sa">f</span><span class="s1">'where n is the number of categories'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">df</span><span class="o">.</span><span class="n">empty</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1">: results file is empty'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="o">!=</span> <span class="n">DEV_SAMPLES</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> <span class="o">!=</span> <span class="n">TEST_SAMPLES</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'wrong number of prevalence values found</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1">; '</span>
|
||||
<span class="sa">f</span><span class="s1">'expected </span><span class="si">{</span><span class="n">DEV_SAMPLES</span><span class="si">}</span><span class="s1"> for development sets and '</span>
|
||||
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">TEST_SAMPLES</span><span class="si">}</span><span class="s1"> for test sets; found </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ids</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">index</span><span class="o">.</span><span class="n">values</span><span class="p">)</span>
|
||||
<span class="n">expected_ids</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)))</span>
|
||||
<span class="k">if</span> <span class="n">ids</span> <span class="o">!=</span> <span class="n">expected_ids</span><span class="p">:</span>
|
||||
<span class="n">missing</span> <span class="o">=</span> <span class="n">expected_ids</span> <span class="o">-</span> <span class="n">ids</span>
|
||||
<span class="k">if</span> <span class="n">missing</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'there are </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">missing</span><span class="p">)</span><span class="si">}</span><span class="s1"> missing ids</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1">: </span><span class="si">{</span><span class="nb">sorted</span><span class="p">(</span><span class="n">missing</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">unexpected</span> <span class="o">=</span> <span class="n">ids</span> <span class="o">-</span> <span class="n">expected_ids</span>
|
||||
<span class="k">if</span> <span class="n">unexpected</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'there are </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">missing</span><span class="p">)</span><span class="si">}</span><span class="s1"> unexpected ids</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1">: </span><span class="si">{</span><span class="nb">sorted</span><span class="p">(</span><span class="n">unexpected</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">category_id</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="n">category_id</span><span class="p">]</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">any</span><span class="p">()</span> <span class="ow">or</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="n">category_id</span><span class="p">]</span> <span class="o">></span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">any</span><span class="p">():</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error</span><span class="si">{</span><span class="n">hint_path</span><span class="si">}</span><span class="s1"> column "</span><span class="si">{</span><span class="n">category_id</span><span class="si">}</span><span class="s1">" contains values out of range [0,1]'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">round_errors</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">prevs</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="mf">1.</span><span class="p">)</span> <span class="o">></span> <span class="n">ERROR_TOL</span>
|
||||
<span class="k">if</span> <span class="n">round_errors</span><span class="o">.</span><span class="n">any</span><span class="p">():</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'warning: prevalence values in rows with id </span><span class="si">{</span><span class="n">np</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">round_errors</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">tolist</span><span class="p">()</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'do not sum up to 1 (error tolerance </span><span class="si">{</span><span class="n">ERROR_TOL</span><span class="si">}</span><span class="s1">), '</span>
|
||||
<span class="sa">f</span><span class="s1">'probably due to some rounding errors.'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">df</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,728 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.data.base — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.data.base</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.data.base</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">itertools</span>
|
||||
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">cached_property</span>
|
||||
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Iterable</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy.sparse</span> <span class="kn">import</span> <span class="n">issparse</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy.sparse</span> <span class="kn">import</span> <span class="n">vstack</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span><span class="p">,</span> <span class="n">RepeatedStratifiedKFold</span>
|
||||
<span class="kn">from</span> <span class="nn">numpy.random</span> <span class="kn">import</span> <span class="n">RandomState</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.functional</span> <span class="kn">import</span> <span class="n">strprev</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.util</span> <span class="kn">import</span> <span class="n">temp_seed</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">LabelledCollection</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> A LabelledCollection is a set of objects each with a label attached to each of them. </span>
|
||||
<span class="sd"> This class implements several sampling routines and other utilities.</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> :param instances: array-like (np.ndarray, list, or csr_matrix are supported)</span>
|
||||
<span class="sd"> :param labels: array-like with the same length of instances</span>
|
||||
<span class="sd"> :param classes: optional, list of classes from which labels are taken. If not specified, the classes are inferred</span>
|
||||
<span class="sd"> from the labels. The classes must be indicated in cases in which some of the labels might have no examples</span>
|
||||
<span class="sd"> (i.e., a prevalence of 0)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">issparse</span><span class="p">(</span><span class="n">instances</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">instances</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">instances</span><span class="p">,</span> <span class="nb">list</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">instances</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="c1"># lists of strings occupy too much as ndarrays (although python-objects add a heavy overload)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">instances</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">labels</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span>
|
||||
<span class="n">n_docs</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">classes</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="o">.</span><span class="n">sort</span><span class="p">()</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">classes</span><span class="p">))</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="o">.</span><span class="n">sort</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">classes</span><span class="p">)))</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'labels (</span><span class="si">{</span><span class="nb">set</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span><span class="si">}</span><span class="s1">) contain values not included in classes_ (</span><span class="si">{</span><span class="nb">set</span><span class="p">(</span><span class="n">classes</span><span class="p">)</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="p">{</span><span class="n">class_</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n_docs</span><span class="p">)[</span><span class="bp">self</span><span class="o">.</span><span class="n">labels</span> <span class="o">==</span> <span class="n">class_</span><span class="p">]</span> <span class="k">for</span> <span class="n">class_</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">}</span>
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.load">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.load">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">path</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">loader_func</span><span class="p">:</span> <span class="n">callable</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">loader_kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a labelled set of data and convert it into a :class:`LabelledCollection` instance. The function in charge</span>
|
||||
<span class="sd"> of reading the instances must be specified. This function can be a custom one, or any of the reading functions</span>
|
||||
<span class="sd"> defined in :mod:`quapy.data.reader` module.</span>
|
||||
|
||||
<span class="sd"> :param path: string, the path to the file containing the labelled instances</span>
|
||||
<span class="sd"> :param loader_func: a custom function that implements the data loader and returns a tuple with instances and</span>
|
||||
<span class="sd"> labels</span>
|
||||
<span class="sd"> :param classes: array-like, the classes according to which the instances are labelled</span>
|
||||
<span class="sd"> :param loader_kwargs: any argument that the `loader_func` function needs in order to read the instances, i.e.,</span>
|
||||
<span class="sd"> these arguments are used to call `loader_func(path, **loader_kwargs)`</span>
|
||||
<span class="sd"> :return: a :class:`LabelledCollection` object</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="o">*</span><span class="n">loader_func</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="o">**</span><span class="n">loader_kwargs</span><span class="p">),</span> <span class="n">classes</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the length of this collection (number of labelled instances)</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.prevalence">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.prevalence">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">prevalence</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the prevalence, or relative frequency, of the classes in the codeframe.</span>
|
||||
|
||||
<span class="sd"> :return: a np.ndarray of shape `(n_classes)` with the relative frequencies of each class, in the same order</span>
|
||||
<span class="sd"> as listed by `self.classes_`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">counts</span><span class="p">()</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.counts">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.counts">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">counts</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the number of instances for each of the classes in the codeframe.</span>
|
||||
|
||||
<span class="sd"> :return: a np.ndarray of shape `(n_classes)` with the number of instances of each class, in the same order</span>
|
||||
<span class="sd"> as listed by `self.classes_`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">index</span><span class="p">[</span><span class="n">class_</span><span class="p">])</span> <span class="k">for</span> <span class="n">class_</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">])</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">n_classes</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> The number of classes</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">binary</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns True if the number of classes is 2</span>
|
||||
|
||||
<span class="sd"> :return: boolean</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_classes</span> <span class="o">==</span> <span class="mi">2</span>
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.sampling_index">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.sampling_index">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">sampling_index</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="o">*</span><span class="n">prevs</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns an index to be used to extract a random sample of desired size and desired prevalence values. If the</span>
|
||||
<span class="sd"> prevalence values are not specified, then returns the index of a uniform sampling.</span>
|
||||
<span class="sd"> For each class, the sampling is drawn with replacement if the requested prevalence is larger than</span>
|
||||
<span class="sd"> the actual prevalence of the class, or without replacement otherwise.</span>
|
||||
|
||||
<span class="sd"> :param size: integer, the requested size</span>
|
||||
<span class="sd"> :param prevs: the prevalence for each class; the prevalence value for the last class can be lead empty since</span>
|
||||
<span class="sd"> it is constrained. E.g., for binary collections, only the prevalence `p` for the first class (as listed in</span>
|
||||
<span class="sd"> `self.classes_` can be specified, while the other class takes prevalence value `1-p`</span>
|
||||
<span class="sd"> :param shuffle: if set to True (default), shuffles the index before returning it</span>
|
||||
<span class="sd"> :param random_state: seed for reproducing sampling</span>
|
||||
<span class="sd"> :return: a np.ndarray of shape `(size)` with the indexes</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># no prevalence was indicated; returns an index for uniform sampling</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">uniform_sampling_index</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_classes</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">prevs</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="nb">sum</span><span class="p">(</span><span class="n">prevs</span><span class="p">),)</span>
|
||||
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="s1">'unexpected number of prevalences'</span>
|
||||
<span class="k">assert</span> <span class="nb">sum</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'prevalences (</span><span class="si">{</span><span class="n">prevs</span><span class="si">}</span><span class="s1">) wrong range (sum=</span><span class="si">{</span><span class="nb">sum</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span><span class="si">}</span><span class="s1">)'</span>
|
||||
|
||||
<span class="c1"># Decide how many instances should be taken for each class in order to satisfy the requested prevalence</span>
|
||||
<span class="c1"># accurately, and the number of instances in the sample (exactly). If int(size * prevs[i]) (which is</span>
|
||||
<span class="c1"># <= size * prevs[i]) examples are drawn from class i, there could be a remainder number of instances to take</span>
|
||||
<span class="c1"># to satisfy the size constrain. The remainder is distributed along the classes with probability = prevs.</span>
|
||||
<span class="c1"># (This aims at avoiding the remainder to be placed in a class for which the prevalence requested is 0.)</span>
|
||||
<span class="n">n_requests</span> <span class="o">=</span> <span class="p">{</span><span class="n">class_</span><span class="p">:</span> <span class="nb">round</span><span class="p">(</span><span class="n">size</span> <span class="o">*</span> <span class="n">prevs</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">class_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)}</span>
|
||||
<span class="n">remainder</span> <span class="o">=</span> <span class="n">size</span> <span class="o">-</span> <span class="nb">sum</span><span class="p">(</span><span class="n">n_requests</span><span class="o">.</span><span class="n">values</span><span class="p">())</span>
|
||||
<span class="k">with</span> <span class="n">temp_seed</span><span class="p">(</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="c1"># due to rounding, the remainder can be 0, >0, or <0</span>
|
||||
<span class="k">if</span> <span class="n">remainder</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="c1"># when the remainder is >0 we randomly add 1 to the requests for each class;</span>
|
||||
<span class="c1"># more prevalent classes are more likely to be taken in order to minimize the impact in the final prevalence</span>
|
||||
<span class="k">for</span> <span class="n">rand_class</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">remainder</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">prevs</span><span class="p">):</span>
|
||||
<span class="n">n_requests</span><span class="p">[</span><span class="n">rand_class</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
|
||||
<span class="k">elif</span> <span class="n">remainder</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="c1"># when the remainder is <0 we randomly remove 1 from the requests, unless the request is 0 for a chosen</span>
|
||||
<span class="c1"># class; we repeat until remainder==0</span>
|
||||
<span class="k">while</span> <span class="n">remainder</span><span class="o">!=</span><span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">rand_class</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">prevs</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">n_requests</span><span class="p">[</span><span class="n">rand_class</span><span class="p">]</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">n_requests</span><span class="p">[</span><span class="n">rand_class</span><span class="p">]</span> <span class="o">-=</span> <span class="mi">1</span>
|
||||
<span class="n">remainder</span> <span class="o">+=</span> <span class="mi">1</span>
|
||||
|
||||
<span class="n">indexes_sample</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">class_</span><span class="p">,</span> <span class="n">n_requested</span> <span class="ow">in</span> <span class="n">n_requests</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
|
||||
<span class="n">n_candidates</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">index</span><span class="p">[</span><span class="n">class_</span><span class="p">])</span>
|
||||
<span class="n">index_sample</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">index</span><span class="p">[</span><span class="n">class_</span><span class="p">][</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">n_candidates</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n_requested</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="p">(</span><span class="n">n_requested</span> <span class="o">></span> <span class="n">n_candidates</span><span class="p">))</span>
|
||||
<span class="p">]</span> <span class="k">if</span> <span class="n">n_requested</span> <span class="o">></span> <span class="mi">0</span> <span class="k">else</span> <span class="p">[]</span>
|
||||
|
||||
<span class="n">indexes_sample</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">index_sample</span><span class="p">)</span>
|
||||
|
||||
<span class="n">indexes_sample</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">indexes_sample</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">shuffle</span><span class="p">:</span>
|
||||
<span class="n">indexes_sample</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">permutation</span><span class="p">(</span><span class="n">indexes_sample</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">indexes_sample</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.uniform_sampling_index">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.uniform_sampling_index">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">uniform_sampling_index</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns an index to be used to extract a uniform sample of desired size. The sampling is drawn</span>
|
||||
<span class="sd"> with replacement if the requested size is greater than the number of instances, or without replacement</span>
|
||||
<span class="sd"> otherwise.</span>
|
||||
|
||||
<span class="sd"> :param size: integer, the size of the uniform sample</span>
|
||||
<span class="sd"> :param random_state: if specified, guarantees reproducibility of the split.</span>
|
||||
<span class="sd"> :return: a np.ndarray of shape `(size)` with the indexes</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">random_state</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">ng</span> <span class="o">=</span> <span class="n">RandomState</span><span class="p">(</span><span class="n">seed</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">ng</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span>
|
||||
<span class="k">return</span> <span class="n">ng</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">),</span> <span class="n">size</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="n">size</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.sampling">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.sampling">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">sampling</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="o">*</span><span class="n">prevs</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return a random sample (an instance of :class:`LabelledCollection`) of desired size and desired prevalence</span>
|
||||
<span class="sd"> values. For each class, the sampling is drawn without replacement if the requested prevalence is larger than</span>
|
||||
<span class="sd"> the actual prevalence of the class, or with replacement otherwise.</span>
|
||||
|
||||
<span class="sd"> :param size: integer, the requested size</span>
|
||||
<span class="sd"> :param prevs: the prevalence for each class; the prevalence value for the last class can be lead empty since</span>
|
||||
<span class="sd"> it is constrained. E.g., for binary collections, only the prevalence `p` for the first class (as listed in</span>
|
||||
<span class="sd"> `self.classes_` can be specified, while the other class takes prevalence value `1-p`</span>
|
||||
<span class="sd"> :param shuffle: if set to True (default), shuffles the index before returning it</span>
|
||||
<span class="sd"> :param random_state: seed for reproducing sampling</span>
|
||||
<span class="sd"> :return: an instance of :class:`LabelledCollection` with length == `size` and prevalence close to `prevs` (or</span>
|
||||
<span class="sd"> prevalence == `prevs` if the exact prevalence values can be met as proportions of instances)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">prev_index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_index</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="o">*</span><span class="n">prevs</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="n">shuffle</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">prev_index</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.uniform_sampling">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.uniform_sampling">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">uniform_sampling</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a uniform sample (an instance of :class:`LabelledCollection`) of desired size. The sampling is drawn</span>
|
||||
<span class="sd"> with replacement if the requested size is greater than the number of instances, or without replacement</span>
|
||||
<span class="sd"> otherwise.</span>
|
||||
|
||||
<span class="sd"> :param size: integer, the requested size</span>
|
||||
<span class="sd"> :param random_state: if specified, guarantees reproducibility of the split.</span>
|
||||
<span class="sd"> :return: an instance of :class:`LabelledCollection` with length == `size`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">unif_index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">uniform_sampling_index</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">unif_index</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.sampling_from_index">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.sampling_from_index">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">sampling_from_index</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns an instance of :class:`LabelledCollection` whose elements are sampled from this collection using the</span>
|
||||
<span class="sd"> index.</span>
|
||||
|
||||
<span class="sd"> :param index: np.ndarray</span>
|
||||
<span class="sd"> :return: an instance of :class:`LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">documents</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
|
||||
<span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">documents</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.split_stratified">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.split_stratified">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">split_stratified</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">train_prop</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns two instances of :class:`LabelledCollection` split with stratification from this collection, at desired</span>
|
||||
<span class="sd"> proportion.</span>
|
||||
|
||||
<span class="sd"> :param train_prop: the proportion of elements to include in the left-most returned collection (typically used</span>
|
||||
<span class="sd"> as the training collection). The rest of elements are included in the right-most returned collection</span>
|
||||
<span class="sd"> (typically used as a test collection).</span>
|
||||
<span class="sd"> :param random_state: if specified, guarantees reproducibility of the split.</span>
|
||||
<span class="sd"> :return: two instances of :class:`LabelledCollection`, the first one with `train_prop` elements, and the</span>
|
||||
<span class="sd"> second one with `1-train_prop` elements</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">tr_docs</span><span class="p">,</span> <span class="n">te_docs</span><span class="p">,</span> <span class="n">tr_labels</span><span class="p">,</span> <span class="n">te_labels</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">train_size</span><span class="o">=</span><span class="n">train_prop</span><span class="p">,</span> <span class="n">stratify</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">tr_docs</span><span class="p">,</span> <span class="n">tr_labels</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">te_docs</span><span class="p">,</span> <span class="n">te_labels</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">training</span><span class="p">,</span> <span class="n">test</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.split_random">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.split_random">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">split_random</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">train_prop</span><span class="o">=</span><span class="mf">0.6</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns two instances of :class:`LabelledCollection` split randomly from this collection, at desired</span>
|
||||
<span class="sd"> proportion.</span>
|
||||
|
||||
<span class="sd"> :param train_prop: the proportion of elements to include in the left-most returned collection (typically used</span>
|
||||
<span class="sd"> as the training collection). The rest of elements are included in the right-most returned collection</span>
|
||||
<span class="sd"> (typically used as a test collection).</span>
|
||||
<span class="sd"> :param random_state: if specified, guarantees reproducibility of the split.</span>
|
||||
<span class="sd"> :return: two instances of :class:`LabelledCollection`, the first one with `train_prop` elements, and the</span>
|
||||
<span class="sd"> second one with `1-train_prop` elements</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">indexes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">RandomState</span><span class="p">(</span><span class="n">seed</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span><span class="o">.</span><span class="n">permutation</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">))</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">train_prop</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="n">train_prop</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">),</span> \
|
||||
<span class="s1">'argument train_prop cannot be greater than the number of elements in the collection'</span>
|
||||
<span class="n">splitpoint</span> <span class="o">=</span> <span class="n">train_prop</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">train_prop</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="mi">0</span> <span class="o"><</span> <span class="n">train_prop</span> <span class="o"><</span> <span class="mi">1</span><span class="p">,</span> \
|
||||
<span class="s1">'argument train_prop out of range (0,1)'</span>
|
||||
<span class="n">splitpoint</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span><span class="o">*</span><span class="n">train_prop</span><span class="p">))</span>
|
||||
<span class="n">left</span><span class="p">,</span> <span class="n">right</span> <span class="o">=</span> <span class="n">indexes</span><span class="p">[:</span><span class="n">splitpoint</span><span class="p">],</span> <span class="n">indexes</span><span class="p">[</span><span class="n">splitpoint</span><span class="p">:]</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">left</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">right</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">training</span><span class="p">,</span> <span class="n">test</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="fm">__add__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a new :class:`LabelledCollection` as the union of this collection with another collection.</span>
|
||||
<span class="sd"> Both labelled collections must have the same classes.</span>
|
||||
|
||||
<span class="sd"> :param other: another :class:`LabelledCollection`</span>
|
||||
<span class="sd"> :return: a :class:`LabelledCollection` representing the union of both collections</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span><span class="o">==</span><span class="n">np</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">other</span><span class="o">.</span><span class="n">classes_</span><span class="p">)):</span>
|
||||
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'unsupported operation for collections on different classes; '</span>
|
||||
<span class="sa">f</span><span class="s1">'expected </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="si">}</span><span class="s1">, found </span><span class="si">{</span><span class="n">other</span><span class="o">.</span><span class="n">classes_</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.join">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.join">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">join</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="s1">'LabelledCollection'</span><span class="p">]):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a new :class:`LabelledCollection` as the union of the collections given in input.</span>
|
||||
|
||||
<span class="sd"> :param args: instances of :class:`LabelledCollection`</span>
|
||||
<span class="sd"> :return: a :class:`LabelledCollection` representing the union of both collections</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="n">lc</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span> <span class="k">if</span> <span class="n">lc</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">]</span>
|
||||
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'empty list is not allowed for mix'</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">([</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">lc</span><span class="p">,</span> <span class="n">LabelledCollection</span><span class="p">)</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]),</span> \
|
||||
<span class="s1">'only instances of LabelledCollection allowed'</span>
|
||||
|
||||
<span class="n">first_instances</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">instances</span>
|
||||
<span class="n">first_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">first_instances</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">([</span><span class="nb">type</span><span class="p">(</span><span class="n">lc</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span><span class="o">==</span><span class="n">first_type</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]]),</span> \
|
||||
<span class="s1">'not all the collections are of instances of the same type'</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">issparse</span><span class="p">(</span><span class="n">first_instances</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">first_instances</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="n">first_ndim</span> <span class="o">=</span> <span class="n">first_instances</span><span class="o">.</span><span class="n">ndim</span>
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">([</span><span class="n">lc</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">ndim</span> <span class="o">==</span> <span class="n">first_ndim</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]]),</span> \
|
||||
<span class="s1">'not all the ndarrays are of the same dimension'</span>
|
||||
<span class="k">if</span> <span class="n">first_ndim</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">first_shape</span> <span class="o">=</span> <span class="n">first_instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">([</span><span class="n">lc</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="o">==</span> <span class="n">first_shape</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]]),</span> \
|
||||
<span class="s1">'not all the ndarrays are of the same shape'</span>
|
||||
<span class="k">if</span> <span class="n">issparse</span><span class="p">(</span><span class="n">first_instances</span><span class="p">):</span>
|
||||
<span class="n">instances</span> <span class="o">=</span> <span class="n">vstack</span><span class="p">([</span><span class="n">lc</span><span class="o">.</span><span class="n">instances</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">])</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">instances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">lc</span><span class="o">.</span><span class="n">instances</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">])</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">first_instances</span><span class="p">,</span> <span class="nb">list</span><span class="p">):</span>
|
||||
<span class="n">instances</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">itertools</span><span class="o">.</span><span class="n">chain</span><span class="p">(</span><span class="n">lc</span><span class="o">.</span><span class="n">instances</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">))</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="s1">'unsupported operation for collection types'</span><span class="p">)</span>
|
||||
<span class="n">labels</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">lc</span><span class="o">.</span><span class="n">labels</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">args</span><span class="p">])</span>
|
||||
<span class="n">classes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span><span class="o">.</span><span class="n">sort</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">instances</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="n">classes</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">Xy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Gets the instances and labels. This is useful when working with `sklearn` estimators, e.g.:</span>
|
||||
|
||||
<span class="sd"> >>> svm = LinearSVC().fit(*my_collection.Xy)</span>
|
||||
|
||||
<span class="sd"> :return: a tuple `(instances, labels)` from this collection</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">Xp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Gets the instances and the true prevalence. This is useful when implementing evaluation protocols from</span>
|
||||
<span class="sd"> a :class:`LabelledCollection` object.</span>
|
||||
|
||||
<span class="sd"> :return: a tuple `(instances, prevalence)` from this collection</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">X</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An alias to self.instances</span>
|
||||
|
||||
<span class="sd"> :return: self.instances</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">y</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An alias to self.labels</span>
|
||||
|
||||
<span class="sd"> :return: self.labels</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">p</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An alias to self.prevalence()</span>
|
||||
|
||||
<span class="sd"> :return: self.prevalence()</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.stats">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.stats">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">show</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns (and eventually prints) a dictionary with some stats of this collection. E.g.,:</span>
|
||||
|
||||
<span class="sd"> >>> data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)</span>
|
||||
<span class="sd"> >>> data.training.stats()</span>
|
||||
<span class="sd"> >>> #instances=3821, type=<class 'scipy.sparse.csr.csr_matrix'>, #features=4403, #classes=[0 1], prevs=[0.081, 0.919]</span>
|
||||
|
||||
<span class="sd"> :param show: if set to True (default), prints the stats in standard output</span>
|
||||
<span class="sd"> :return: a dictionary containing some stats of this collection. Keys include `#instances` (the number of</span>
|
||||
<span class="sd"> instances), `type` (the type representing the instances), `#features` (the number of features, if the</span>
|
||||
<span class="sd"> instances are in array-like format), `#classes` (the classes of the collection), `prevs` (the prevalence</span>
|
||||
<span class="sd"> values for each class)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">ninstances</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
|
||||
<span class="n">instance_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
|
||||
<span class="k">if</span> <span class="n">instance_type</span> <span class="o">==</span> <span class="nb">list</span><span class="p">:</span>
|
||||
<span class="n">nfeats</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
|
||||
<span class="k">elif</span> <span class="n">instance_type</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span> <span class="ow">or</span> <span class="n">issparse</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">nfeats</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">nfeats</span> <span class="o">=</span> <span class="s1">'?'</span>
|
||||
<span class="n">stats_</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'instances'</span><span class="p">:</span> <span class="n">ninstances</span><span class="p">,</span>
|
||||
<span class="s1">'type'</span><span class="p">:</span> <span class="n">instance_type</span><span class="p">,</span>
|
||||
<span class="s1">'features'</span><span class="p">:</span> <span class="n">nfeats</span><span class="p">,</span>
|
||||
<span class="s1">'classes'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">,</span>
|
||||
<span class="s1">'prevs'</span><span class="p">:</span> <span class="n">strprev</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())}</span>
|
||||
<span class="k">if</span> <span class="n">show</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'#instances=</span><span class="si">{</span><span class="n">stats_</span><span class="p">[</span><span class="s2">"instances"</span><span class="p">]</span><span class="si">}</span><span class="s1">, type=</span><span class="si">{</span><span class="n">stats_</span><span class="p">[</span><span class="s2">"type"</span><span class="p">]</span><span class="si">}</span><span class="s1">, #features=</span><span class="si">{</span><span class="n">stats_</span><span class="p">[</span><span class="s2">"features"</span><span class="p">]</span><span class="si">}</span><span class="s1">, '</span>
|
||||
<span class="sa">f</span><span class="s1">'#classes=</span><span class="si">{</span><span class="n">stats_</span><span class="p">[</span><span class="s2">"classes"</span><span class="p">]</span><span class="si">}</span><span class="s1">, prevs=</span><span class="si">{</span><span class="n">stats_</span><span class="p">[</span><span class="s2">"prevs"</span><span class="p">]</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">stats_</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelledCollection.kFCV">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.LabelledCollection.kFCV">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">kFCV</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nfolds</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">nrepeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generator of stratified folds to be used in k-fold cross validation.</span>
|
||||
|
||||
<span class="sd"> :param nfolds: integer (default 5), the number of folds to generate</span>
|
||||
<span class="sd"> :param nrepeats: integer (default 1), the number of rounds of k-fold cross validation to run</span>
|
||||
<span class="sd"> :param random_state: integer (default 0), guarantees that the folds generated are reproducible</span>
|
||||
<span class="sd"> :return: yields `nfolds * nrepeats` folds for k-fold cross validation</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">kf</span> <span class="o">=</span> <span class="n">RepeatedStratifiedKFold</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="n">nfolds</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="n">nrepeats</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">train_index</span><span class="p">,</span> <span class="n">test_index</span> <span class="ow">in</span> <span class="n">kf</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">Xy</span><span class="p">):</span>
|
||||
<span class="n">train</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">train_index</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">test_index</span><span class="p">)</span>
|
||||
<span class="k">yield</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="Dataset">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.Dataset">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">Dataset</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstraction of training and test :class:`LabelledCollection` objects.</span>
|
||||
|
||||
<span class="sd"> :param training: a :class:`LabelledCollection` instance</span>
|
||||
<span class="sd"> :param test: a :class:`LabelledCollection` instance</span>
|
||||
<span class="sd"> :param vocabulary: if indicated, is a dictionary of the terms used in this textual dataset</span>
|
||||
<span class="sd"> :param name: a string representing the name of the dataset</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">training</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">test</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">vocabulary</span><span class="p">:</span> <span class="nb">dict</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="nb">set</span><span class="p">(</span><span class="n">training</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span> <span class="o">==</span> <span class="nb">set</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">classes_</span><span class="p">),</span> <span class="s1">'incompatible labels in training and test collections'</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">training</span> <span class="o">=</span> <span class="n">training</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">test</span> <span class="o">=</span> <span class="n">test</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocabulary</span> <span class="o">=</span> <span class="n">vocabulary</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
|
||||
|
||||
<div class="viewcode-block" id="Dataset.SplitStratified">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.Dataset.SplitStratified">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">SplitStratified</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">collection</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">train_size</span><span class="o">=</span><span class="mf">0.6</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates a :class:`Dataset` from a stratified split of a :class:`LabelledCollection` instance.</span>
|
||||
<span class="sd"> See :meth:`LabelledCollection.split_stratified`</span>
|
||||
|
||||
<span class="sd"> :param collection: :class:`LabelledCollection`</span>
|
||||
<span class="sd"> :param train_size: the proportion of training documents (the rest conforms the test split)</span>
|
||||
<span class="sd"> :return: an instance of :class:`Dataset`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="o">*</span><span class="n">collection</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="n">train_size</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">classes_</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> The classes according to which the training collection is labelled</span>
|
||||
|
||||
<span class="sd"> :return: The classes according to which the training collection is labelled</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">classes_</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">n_classes</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> The number of classes according to which the training collection is labelled</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">n_classes</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">binary</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns True if the training collection is labelled according to two classes</span>
|
||||
|
||||
<span class="sd"> :return: boolean</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">binary</span>
|
||||
|
||||
<div class="viewcode-block" id="Dataset.load">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.Dataset.load">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">loader_func</span><span class="p">:</span> <span class="n">callable</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">loader_kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a training and a test labelled set of data and convert it into a :class:`Dataset` instance.</span>
|
||||
<span class="sd"> The function in charge of reading the instances must be specified. This function can be a custom one, or any of</span>
|
||||
<span class="sd"> the reading functions defined in :mod:`quapy.data.reader` module.</span>
|
||||
|
||||
<span class="sd"> :param train_path: string, the path to the file containing the training instances</span>
|
||||
<span class="sd"> :param test_path: string, the path to the file containing the test instances</span>
|
||||
<span class="sd"> :param loader_func: a custom function that implements the data loader and returns a tuple with instances and</span>
|
||||
<span class="sd"> labels</span>
|
||||
<span class="sd"> :param classes: array-like, the classes according to which the instances are labelled</span>
|
||||
<span class="sd"> :param loader_kwargs: any argument that the `loader_func` function needs in order to read the instances.</span>
|
||||
<span class="sd"> See :meth:`LabelledCollection.load` for further details.</span>
|
||||
<span class="sd"> :return: a :class:`Dataset` object</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">loader_func</span><span class="p">,</span> <span class="n">classes</span><span class="p">,</span> <span class="o">**</span><span class="n">loader_kwargs</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">test_path</span><span class="p">,</span> <span class="n">loader_func</span><span class="p">,</span> <span class="n">classes</span><span class="p">,</span> <span class="o">**</span><span class="n">loader_kwargs</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">vocabulary_size</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> If the dataset is textual, and the vocabulary was indicated, returns the size of the vocabulary</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">vocabulary</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">train_test</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Alias to `self.training` and `self.test`</span>
|
||||
|
||||
<span class="sd"> :return: the training and test collections</span>
|
||||
<span class="sd"> :return: the training and test collections</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">test</span>
|
||||
|
||||
<div class="viewcode-block" id="Dataset.stats">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.Dataset.stats">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">show</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns (and eventually prints) a dictionary with some stats of this dataset. E.g.,:</span>
|
||||
|
||||
<span class="sd"> >>> data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)</span>
|
||||
<span class="sd"> >>> data.stats()</span>
|
||||
<span class="sd"> >>> Dataset=kindle #tr-instances=3821, #te-instances=21591, type=<class 'scipy.sparse.csr.csr_matrix'>, #features=4403, #classes=[0 1], tr-prevs=[0.081, 0.919], te-prevs=[0.063, 0.937]</span>
|
||||
|
||||
<span class="sd"> :param show: if set to True (default), prints the stats in standard output</span>
|
||||
<span class="sd"> :return: a dictionary containing some stats of this collection for the training and test collections. The keys</span>
|
||||
<span class="sd"> are `train` and `test`, and point to dedicated dictionaries of stats, for each collection, with keys</span>
|
||||
<span class="sd"> `#instances` (the number of instances), `type` (the type representing the instances),</span>
|
||||
<span class="sd"> `#features` (the number of features, if the instances are in array-like format), `#classes` (the classes of</span>
|
||||
<span class="sd"> the collection), `prevs` (the prevalence values for each class)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">tr_stats</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">stats</span><span class="p">(</span><span class="n">show</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">te_stats</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">stats</span><span class="p">(</span><span class="n">show</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">show</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Dataset=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s1"> #tr-instances=</span><span class="si">{</span><span class="n">tr_stats</span><span class="p">[</span><span class="s2">"instances"</span><span class="p">]</span><span class="si">}</span><span class="s1">, #te-instances=</span><span class="si">{</span><span class="n">te_stats</span><span class="p">[</span><span class="s2">"instances"</span><span class="p">]</span><span class="si">}</span><span class="s1">, '</span>
|
||||
<span class="sa">f</span><span class="s1">'type=</span><span class="si">{</span><span class="n">tr_stats</span><span class="p">[</span><span class="s2">"type"</span><span class="p">]</span><span class="si">}</span><span class="s1">, #features=</span><span class="si">{</span><span class="n">tr_stats</span><span class="p">[</span><span class="s2">"features"</span><span class="p">]</span><span class="si">}</span><span class="s1">, #classes=</span><span class="si">{</span><span class="n">tr_stats</span><span class="p">[</span><span class="s2">"classes"</span><span class="p">]</span><span class="si">}</span><span class="s1">, '</span>
|
||||
<span class="sa">f</span><span class="s1">'tr-prevs=</span><span class="si">{</span><span class="n">tr_stats</span><span class="p">[</span><span class="s2">"prevs"</span><span class="p">]</span><span class="si">}</span><span class="s1">, te-prevs=</span><span class="si">{</span><span class="n">te_stats</span><span class="p">[</span><span class="s2">"prevs"</span><span class="p">]</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">{</span><span class="s1">'train'</span><span class="p">:</span> <span class="n">tr_stats</span><span class="p">,</span> <span class="s1">'test'</span><span class="p">:</span> <span class="n">te_stats</span><span class="p">}</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="Dataset.kFCV">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.Dataset.kFCV">[docs]</a>
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">kFCV</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">nfolds</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">nrepeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generator of stratified folds to be used in k-fold cross validation. This function is only a wrapper around</span>
|
||||
<span class="sd"> :meth:`LabelledCollection.kFCV` that returns :class:`Dataset` instances made of training and test folds.</span>
|
||||
|
||||
<span class="sd"> :param nfolds: integer (default 5), the number of folds to generate</span>
|
||||
<span class="sd"> :param nrepeats: integer (default 1), the number of rounds of k-fold cross validation to run</span>
|
||||
<span class="sd"> :param random_state: integer (default 0), guarantees that the folds generated are reproducible</span>
|
||||
<span class="sd"> :return: yields `nfolds * nrepeats` folds for k-fold cross validation as instances of :class:`Dataset`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">kFCV</span><span class="p">(</span><span class="n">nfolds</span><span class="o">=</span><span class="n">nfolds</span><span class="p">,</span> <span class="n">nrepeats</span><span class="o">=</span><span class="n">nrepeats</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)):</span>
|
||||
<span class="k">yield</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sa">f</span><span class="s1">'fold </span><span class="si">{</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="n">nfolds</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">nfolds</span><span class="si">}</span><span class="s1"> (round=</span><span class="si">{</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">nfolds</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="Dataset.reduce">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.base.Dataset.reduce">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">reduce</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_train</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_test</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Reduce the number of instances in place for quick experiments. Preserves the prevalence of each set.</span>
|
||||
|
||||
<span class="sd"> :param n_train: number of training documents to keep (default 100)</span>
|
||||
<span class="sd"> :param n_test: number of test documents to keep (default 100)</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">training</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">n_train</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">test</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">n_test</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">())</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
</div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,919 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.data.datasets — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.data.datasets</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.data.datasets</h1><div class="highlight"><pre>
|
||||
<div class="viewcode-block" id="warn"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.warn">[docs]</a><span></span><span class="k">def</span> <span class="nf">warn</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="k">pass</span></div>
|
||||
<span class="kn">import</span> <span class="nn">warnings</span>
|
||||
<span class="n">warnings</span><span class="o">.</span><span class="n">warn</span> <span class="o">=</span> <span class="n">warn</span>
|
||||
<span class="kn">import</span> <span class="nn">os</span>
|
||||
<span class="kn">import</span> <span class="nn">zipfile</span>
|
||||
<span class="kn">from</span> <span class="nn">os.path</span> <span class="kn">import</span> <span class="n">join</span>
|
||||
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
|
||||
<span class="kn">from</span> <span class="nn">ucimlrepo</span> <span class="kn">import</span> <span class="n">fetch_ucirepo</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data.base</span> <span class="kn">import</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data.preprocessing</span> <span class="kn">import</span> <span class="n">text2tfidf</span><span class="p">,</span> <span class="n">reduce_columns</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data.reader</span> <span class="kn">import</span> <span class="o">*</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.util</span> <span class="kn">import</span> <span class="n">download_file_if_not_exists</span><span class="p">,</span> <span class="n">download_file</span><span class="p">,</span> <span class="n">get_quapy_home</span><span class="p">,</span> <span class="n">pickled_resource</span>
|
||||
|
||||
|
||||
<span class="n">REVIEWS_SENTIMENT_DATASETS</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'hp'</span><span class="p">,</span> <span class="s1">'kindle'</span><span class="p">,</span> <span class="s1">'imdb'</span><span class="p">]</span>
|
||||
<span class="n">TWITTER_SENTIMENT_DATASETS_TEST</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'gasp'</span><span class="p">,</span> <span class="s1">'hcr'</span><span class="p">,</span> <span class="s1">'omd'</span><span class="p">,</span> <span class="s1">'sanders'</span><span class="p">,</span>
|
||||
<span class="s1">'semeval13'</span><span class="p">,</span> <span class="s1">'semeval14'</span><span class="p">,</span> <span class="s1">'semeval15'</span><span class="p">,</span> <span class="s1">'semeval16'</span><span class="p">,</span>
|
||||
<span class="s1">'sst'</span><span class="p">,</span> <span class="s1">'wa'</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">]</span>
|
||||
<span class="n">TWITTER_SENTIMENT_DATASETS_TRAIN</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'gasp'</span><span class="p">,</span> <span class="s1">'hcr'</span><span class="p">,</span> <span class="s1">'omd'</span><span class="p">,</span> <span class="s1">'sanders'</span><span class="p">,</span>
|
||||
<span class="s1">'semeval'</span><span class="p">,</span> <span class="s1">'semeval16'</span><span class="p">,</span>
|
||||
<span class="s1">'sst'</span><span class="p">,</span> <span class="s1">'wa'</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">]</span>
|
||||
<span class="n">UCI_BINARY_DATASETS</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'acute.a'</span><span class="p">,</span> <span class="s1">'acute.b'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.1'</span><span class="p">,</span> <span class="s1">'balance.2'</span><span class="p">,</span> <span class="s1">'balance.3'</span><span class="p">,</span>
|
||||
<span class="s1">'breast-cancer'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.1'</span><span class="p">,</span> <span class="s1">'cmc.2'</span><span class="p">,</span> <span class="s1">'cmc.3'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.1'</span><span class="p">,</span> <span class="s1">'ctg.2'</span><span class="p">,</span> <span class="s1">'ctg.3'</span><span class="p">,</span>
|
||||
<span class="c1">#'diabetes', # <-- I haven't found this one...</span>
|
||||
<span class="s1">'german'</span><span class="p">,</span>
|
||||
<span class="s1">'haberman'</span><span class="p">,</span>
|
||||
<span class="s1">'ionosphere'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.1'</span><span class="p">,</span> <span class="s1">'iris.2'</span><span class="p">,</span> <span class="s1">'iris.3'</span><span class="p">,</span>
|
||||
<span class="s1">'mammographic'</span><span class="p">,</span>
|
||||
<span class="s1">'pageblocks.5'</span><span class="p">,</span>
|
||||
<span class="c1">#'phoneme', # <-- I haven't found this one...</span>
|
||||
<span class="s1">'semeion'</span><span class="p">,</span>
|
||||
<span class="s1">'sonar'</span><span class="p">,</span>
|
||||
<span class="s1">'spambase'</span><span class="p">,</span>
|
||||
<span class="s1">'spectf'</span><span class="p">,</span>
|
||||
<span class="s1">'tictactoe'</span><span class="p">,</span>
|
||||
<span class="s1">'transfusion'</span><span class="p">,</span>
|
||||
<span class="s1">'wdbc'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.1'</span><span class="p">,</span> <span class="s1">'wine.2'</span><span class="p">,</span> <span class="s1">'wine.3'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-q-red'</span><span class="p">,</span> <span class="s1">'wine-q-white'</span><span class="p">,</span>
|
||||
<span class="s1">'yeast'</span><span class="p">]</span>
|
||||
|
||||
<span class="n">UCI_MULTICLASS_DATASETS</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'dry-bean'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-quality'</span><span class="p">,</span>
|
||||
<span class="s1">'academic-success'</span><span class="p">,</span>
|
||||
<span class="s1">'digits'</span><span class="p">,</span>
|
||||
<span class="s1">'letter'</span><span class="p">]</span>
|
||||
|
||||
<span class="n">LEQUA2022_TASKS</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'T1A'</span><span class="p">,</span> <span class="s1">'T1B'</span><span class="p">,</span> <span class="s1">'T2A'</span><span class="p">,</span> <span class="s1">'T2B'</span><span class="p">]</span>
|
||||
|
||||
<span class="n">_TXA_SAMPLE_SIZE</span> <span class="o">=</span> <span class="mi">250</span>
|
||||
<span class="n">_TXB_SAMPLE_SIZE</span> <span class="o">=</span> <span class="mi">1000</span>
|
||||
|
||||
<span class="n">LEQUA2022_SAMPLE_SIZE</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'TXA'</span><span class="p">:</span> <span class="n">_TXA_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'TXB'</span><span class="p">:</span> <span class="n">_TXB_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'T1A'</span><span class="p">:</span> <span class="n">_TXA_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'T1B'</span><span class="p">:</span> <span class="n">_TXB_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'T2A'</span><span class="p">:</span> <span class="n">_TXA_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'T2B'</span><span class="p">:</span> <span class="n">_TXB_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'binary'</span><span class="p">:</span> <span class="n">_TXA_SAMPLE_SIZE</span><span class="p">,</span>
|
||||
<span class="s1">'multiclass'</span><span class="p">:</span> <span class="n">_TXB_SAMPLE_SIZE</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_reviews"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_reviews">[docs]</a><span class="k">def</span> <span class="nf">fetch_reviews</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dataset</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a Reviews dataset as a Dataset instance, as used in</span>
|
||||
<span class="sd"> `Esuli, A., Moreo, A., and Sebastiani, F. "A recurrent neural network for sentiment quantification."</span>
|
||||
<span class="sd"> Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018. <https://dl.acm.org/doi/abs/10.1145/3269206.3269287>`_.</span>
|
||||
<span class="sd"> The list of valid dataset names can be accessed in `quapy.data.datasets.REVIEWS_SENTIMENT_DATASETS`</span>
|
||||
|
||||
<span class="sd"> :param dataset_name: the name of the dataset: valid ones are 'hp', 'kindle', 'imdb'</span>
|
||||
<span class="sd"> :param tfidf: set to True to transform the raw documents into tfidf weighted matrices</span>
|
||||
<span class="sd"> :param min_df: minimun number of documents that should contain a term in order for the term to be</span>
|
||||
<span class="sd"> kept (ignored if tfidf==False)</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :param pickle: set to True to pickle the Dataset object the first time it is generated, in order to allow for</span>
|
||||
<span class="sd"> faster subsequent invokations</span>
|
||||
<span class="sd"> :return: a :class:`quapy.data.base.Dataset` instance</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">dataset_name</span> <span class="ow">in</span> <span class="n">REVIEWS_SENTIMENT_DATASETS</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'Name </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1"> does not match any known dataset for sentiment reviews. '</span> \
|
||||
<span class="sa">f</span><span class="s1">'Valid ones are </span><span class="si">{</span><span class="n">REVIEWS_SENTIMENT_DATASETS</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">if</span> <span class="n">data_home</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">data_home</span> <span class="o">=</span> <span class="n">get_quapy_home</span><span class="p">()</span>
|
||||
|
||||
<span class="n">URL_TRAIN</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'https://zenodo.org/record/4117827/files/</span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">_train.txt'</span>
|
||||
<span class="n">URL_TEST</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'https://zenodo.org/record/4117827/files/</span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">_test.txt'</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'reviews'</span><span class="p">),</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">train_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'reviews'</span><span class="p">,</span> <span class="n">dataset_name</span><span class="p">,</span> <span class="s1">'train.txt'</span><span class="p">)</span>
|
||||
<span class="n">test_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'reviews'</span><span class="p">,</span> <span class="n">dataset_name</span><span class="p">,</span> <span class="s1">'test.txt'</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="n">URL_TRAIN</span><span class="p">,</span> <span class="n">train_path</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="n">URL_TEST</span><span class="p">,</span> <span class="n">test_path</span><span class="p">)</span>
|
||||
|
||||
<span class="n">pickle_path</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="n">pickle</span><span class="p">:</span>
|
||||
<span class="n">pickle_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'reviews'</span><span class="p">,</span> <span class="s1">'pickle'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">.pkl'</span><span class="p">)</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">pickled_resource</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">,</span> <span class="n">Dataset</span><span class="o">.</span><span class="n">load</span><span class="p">,</span> <span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">from_text</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">tfidf</span><span class="p">:</span>
|
||||
<span class="n">text2tfidf</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">min_df</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">reduce_columns</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="n">min_df</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">data</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">dataset_name</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">data</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_twitter"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_twitter">[docs]</a><span class="k">def</span> <span class="nf">fetch_twitter</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">for_model_selection</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dataset</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a Twitter dataset as a :class:`quapy.data.base.Dataset` instance, as used in:</span>
|
||||
<span class="sd"> `Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis.</span>
|
||||
<span class="sd"> Social Network Analysis and Mining6(19), 1–22 (2016) <https://link.springer.com/content/pdf/10.1007/s13278-016-0327-z.pdf>`_</span>
|
||||
<span class="sd"> Note that the datasets 'semeval13', 'semeval14', 'semeval15' share the same training set.</span>
|
||||
<span class="sd"> The list of valid dataset names corresponding to training sets can be accessed in</span>
|
||||
<span class="sd"> `quapy.data.datasets.TWITTER_SENTIMENT_DATASETS_TRAIN`, while the test sets can be accessed in</span>
|
||||
<span class="sd"> `quapy.data.datasets.TWITTER_SENTIMENT_DATASETS_TEST`</span>
|
||||
|
||||
<span class="sd"> :param dataset_name: the name of the dataset: valid ones are 'gasp', 'hcr', 'omd', 'sanders', 'semeval13',</span>
|
||||
<span class="sd"> 'semeval14', 'semeval15', 'semeval16', 'sst', 'wa', 'wb'</span>
|
||||
<span class="sd"> :param for_model_selection: if True, then returns the train split as the training set and the devel split</span>
|
||||
<span class="sd"> as the test set; if False, then returns the train+devel split as the training set and the test set as the</span>
|
||||
<span class="sd"> test set</span>
|
||||
<span class="sd"> :param min_df: minimun number of documents that should contain a term in order for the term to be kept</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :param pickle: set to True to pickle the Dataset object the first time it is generated, in order to allow for</span>
|
||||
<span class="sd"> faster subsequent invokations</span>
|
||||
<span class="sd"> :return: a :class:`quapy.data.base.Dataset` instance</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">dataset_name</span> <span class="ow">in</span> <span class="n">TWITTER_SENTIMENT_DATASETS_TRAIN</span> <span class="o">+</span> <span class="n">TWITTER_SENTIMENT_DATASETS_TEST</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'Name </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1"> does not match any known dataset for sentiment twitter. '</span> \
|
||||
<span class="sa">f</span><span class="s1">'Valid ones are </span><span class="si">{</span><span class="n">TWITTER_SENTIMENT_DATASETS_TRAIN</span><span class="si">}</span><span class="s1"> for model selection and '</span> \
|
||||
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">TWITTER_SENTIMENT_DATASETS_TEST</span><span class="si">}</span><span class="s1"> for test (datasets "semeval14", "semeval15", "semeval16" share '</span> \
|
||||
<span class="sa">f</span><span class="s1">'a common training set "semeval")'</span>
|
||||
<span class="k">if</span> <span class="n">data_home</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">data_home</span> <span class="o">=</span> <span class="n">get_quapy_home</span><span class="p">()</span>
|
||||
|
||||
<span class="n">URL</span> <span class="o">=</span> <span class="s1">'https://zenodo.org/record/4255764/files/tweet_sentiment_quantification_snam.zip'</span>
|
||||
<span class="n">unzipped_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'tweet_sentiment_quantification_snam'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">):</span>
|
||||
<span class="n">downloaded_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'tweet_sentiment_quantification_snam.zip'</span><span class="p">)</span>
|
||||
<span class="n">download_file</span><span class="p">(</span><span class="n">URL</span><span class="p">,</span> <span class="n">downloaded_path</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="n">zipfile</span><span class="o">.</span><span class="n">ZipFile</span><span class="p">(</span><span class="n">downloaded_path</span><span class="p">)</span> <span class="k">as</span> <span class="n">file</span><span class="p">:</span>
|
||||
<span class="n">file</span><span class="o">.</span><span class="n">extractall</span><span class="p">(</span><span class="n">data_home</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">downloaded_path</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="ow">in</span> <span class="p">{</span><span class="s1">'semeval13'</span><span class="p">,</span> <span class="s1">'semeval14'</span><span class="p">,</span> <span class="s1">'semeval15'</span><span class="p">}:</span>
|
||||
<span class="n">trainset_name</span> <span class="o">=</span> <span class="s1">'semeval'</span>
|
||||
<span class="n">testset_name</span> <span class="o">=</span> <span class="s1">'semeval'</span> <span class="k">if</span> <span class="n">for_model_selection</span> <span class="k">else</span> <span class="n">dataset_name</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"the training and development sets for datasets 'semeval13', 'semeval14', 'semeval15' are common "</span>
|
||||
<span class="sa">f</span><span class="s2">"(called 'semeval'); returning trainin-set='</span><span class="si">{</span><span class="n">trainset_name</span><span class="si">}</span><span class="s2">' and test-set=</span><span class="si">{</span><span class="n">testset_name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'semeval'</span> <span class="ow">and</span> <span class="n">for_model_selection</span><span class="o">==</span><span class="kc">False</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'dataset "semeval" can only be used for model selection. '</span>
|
||||
<span class="s1">'Use "semeval13", "semeval14", or "semeval15" for model evaluation.'</span><span class="p">)</span>
|
||||
<span class="n">trainset_name</span> <span class="o">=</span> <span class="n">testset_name</span> <span class="o">=</span> <span class="n">dataset_name</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">for_model_selection</span><span class="p">:</span>
|
||||
<span class="n">train</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="s1">'train'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">trainset_name</span><span class="si">}</span><span class="s1">.train.feature.txt'</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="s1">'test'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">testset_name</span><span class="si">}</span><span class="s1">.dev.feature.txt'</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">train</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="s1">'train'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">trainset_name</span><span class="si">}</span><span class="s1">.train+dev.feature.txt'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'semeval16'</span><span class="p">:</span> <span class="c1"># there is a different test name in the case of semeval16 only</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="s1">'test'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">testset_name</span><span class="si">}</span><span class="s1">.dev-test.feature.txt'</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="s1">'test'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">testset_name</span><span class="si">}</span><span class="s1">.test.feature.txt'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">pickle_path</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="n">pickle</span><span class="p">:</span>
|
||||
<span class="n">mode</span> <span class="o">=</span> <span class="s2">"train-dev"</span> <span class="k">if</span> <span class="n">for_model_selection</span> <span class="k">else</span> <span class="s2">"train+dev-test"</span>
|
||||
<span class="n">pickle_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="s1">'pickle'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">testset_name</span><span class="si">}</span><span class="s1">.</span><span class="si">{</span><span class="n">mode</span><span class="si">}</span><span class="s1">.pkl'</span><span class="p">)</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">pickled_resource</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">,</span> <span class="n">Dataset</span><span class="o">.</span><span class="n">load</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">,</span> <span class="n">from_sparse</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">min_df</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">reduce_columns</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="n">min_df</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">data</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">dataset_name</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">data</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_UCIBinaryDataset"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_UCIBinaryDataset">[docs]</a><span class="k">def</span> <span class="nf">fetch_UCIBinaryDataset</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">test_split</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dataset</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a UCI dataset as an instance of :class:`quapy.data.base.Dataset`, as used in</span>
|
||||
<span class="sd"> `Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).</span>
|
||||
<span class="sd"> Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.</span>
|
||||
<span class="sd"> Information Fusion, 34, 87-100. <https://www.sciencedirect.com/science/article/pii/S1566253516300628>`_</span>
|
||||
<span class="sd"> and</span>
|
||||
<span class="sd"> `Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).</span>
|
||||
<span class="sd"> Dynamic ensemble selection for quantification tasks.</span>
|
||||
<span class="sd"> Information Fusion, 45, 1-15. <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.</span>
|
||||
<span class="sd"> The datasets do not come with a predefined train-test split (see :meth:`fetch_UCILabelledCollection` for further</span>
|
||||
<span class="sd"> information on how to use these collections), and so a train-test split is generated at desired proportion.</span>
|
||||
<span class="sd"> The list of valid dataset names can be accessed in `quapy.data.datasets.UCI_DATASETS`</span>
|
||||
|
||||
<span class="sd"> :param dataset_name: a dataset name</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :param test_split: proportion of documents to be included in the test set. The rest conforms the training set</span>
|
||||
<span class="sd"> :param verbose: set to True (default is False) to get information (from the UCI ML repository) about the datasets</span>
|
||||
<span class="sd"> :return: a :class:`quapy.data.base.Dataset` instance</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">fetch_UCIBinaryLabelledCollection</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">data_home</span><span class="p">,</span> <span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="o">*</span><span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">test_split</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_UCIBinaryLabelledCollection"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_UCIBinaryLabelledCollection">[docs]</a><span class="k">def</span> <span class="nf">fetch_UCIBinaryLabelledCollection</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">LabelledCollection</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a UCI collection as an instance of :class:`quapy.data.base.LabelledCollection`, as used in</span>
|
||||
<span class="sd"> `Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).</span>
|
||||
<span class="sd"> Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.</span>
|
||||
<span class="sd"> Information Fusion, 34, 87-100. <https://www.sciencedirect.com/science/article/pii/S1566253516300628>`_</span>
|
||||
<span class="sd"> and</span>
|
||||
<span class="sd"> `Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).</span>
|
||||
<span class="sd"> Dynamic ensemble selection for quantification tasks.</span>
|
||||
<span class="sd"> Information Fusion, 45, 1-15. <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.</span>
|
||||
<span class="sd"> The datasets do not come with a predefined train-test split, and so Pérez-Gállego et al. adopted a 5FCVx2 evaluation</span>
|
||||
<span class="sd"> protocol, meaning that each collection was used to generate two rounds (hence the x2) of 5 fold cross validation.</span>
|
||||
<span class="sd"> This can be reproduced by using :meth:`quapy.data.base.Dataset.kFCV`, e.g.:</span>
|
||||
|
||||
<span class="sd"> >>> import quapy as qp</span>
|
||||
<span class="sd"> >>> collection = qp.datasets.fetch_UCIBinaryLabelledCollection("yeast")</span>
|
||||
<span class="sd"> >>> for data in qp.train.Dataset.kFCV(collection, nfolds=5, nrepeats=2):</span>
|
||||
<span class="sd"> >>> ...</span>
|
||||
|
||||
<span class="sd"> The list of valid dataset names can be accessed in `quapy.data.datasets.UCI_DATASETS`</span>
|
||||
|
||||
<span class="sd"> :param dataset_name: a dataset name</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :param test_split: proportion of documents to be included in the test set. The rest conforms the training set</span>
|
||||
<span class="sd"> :param verbose: set to True (default is False) to get information (from the UCI ML repository) about the datasets</span>
|
||||
<span class="sd"> :return: a :class:`quapy.data.base.LabelledCollection` instance</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">assert</span> <span class="n">dataset_name</span> <span class="ow">in</span> <span class="n">UCI_BINARY_DATASETS</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'Name </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1"> does not match any known dataset from the UCI Machine Learning datasets repository. '</span> \
|
||||
<span class="sa">f</span><span class="s1">'Valid ones are </span><span class="si">{</span><span class="n">UCI_BINARY_DATASETS</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">if</span> <span class="n">data_home</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">data_home</span> <span class="o">=</span> <span class="n">get_quapy_home</span><span class="p">()</span>
|
||||
|
||||
<span class="n">dataset_fullname</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'acute.a'</span><span class="p">:</span> <span class="s1">'Acute Inflammations (urinary bladder)'</span><span class="p">,</span>
|
||||
<span class="s1">'acute.b'</span><span class="p">:</span> <span class="s1">'Acute Inflammations (renal pelvis)'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.1'</span><span class="p">:</span> <span class="s1">'Balance Scale Weight & Distance Database (left)'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.2'</span><span class="p">:</span> <span class="s1">'Balance Scale Weight & Distance Database (balanced)'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.3'</span><span class="p">:</span> <span class="s1">'Balance Scale Weight & Distance Database (right)'</span><span class="p">,</span>
|
||||
<span class="s1">'breast-cancer'</span><span class="p">:</span> <span class="s1">'Breast Cancer Wisconsin (Original)'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.1'</span><span class="p">:</span> <span class="s1">'Contraceptive Method Choice (no use)'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.2'</span><span class="p">:</span> <span class="s1">'Contraceptive Method Choice (long term)'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.3'</span><span class="p">:</span> <span class="s1">'Contraceptive Method Choice (short term)'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.1'</span><span class="p">:</span> <span class="s1">'Cardiotocography Data Set (normal)'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.2'</span><span class="p">:</span> <span class="s1">'Cardiotocography Data Set (suspect)'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.3'</span><span class="p">:</span> <span class="s1">'Cardiotocography Data Set (pathologic)'</span><span class="p">,</span>
|
||||
<span class="s1">'german'</span><span class="p">:</span> <span class="s1">'Statlog German Credit Data'</span><span class="p">,</span>
|
||||
<span class="s1">'haberman'</span><span class="p">:</span> <span class="s2">"Haberman's Survival Data"</span><span class="p">,</span>
|
||||
<span class="s1">'ionosphere'</span><span class="p">:</span> <span class="s1">'Johns Hopkins University Ionosphere DB'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.1'</span><span class="p">:</span> <span class="s1">'Iris Plants Database(x)'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.2'</span><span class="p">:</span> <span class="s1">'Iris Plants Database(versicolour)'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.3'</span><span class="p">:</span> <span class="s1">'Iris Plants Database(virginica)'</span><span class="p">,</span>
|
||||
<span class="s1">'mammographic'</span><span class="p">:</span> <span class="s1">'Mammographic Mass'</span><span class="p">,</span>
|
||||
<span class="s1">'pageblocks.5'</span><span class="p">:</span> <span class="s1">'Page Blocks Classification (5)'</span><span class="p">,</span>
|
||||
<span class="s1">'semeion'</span><span class="p">:</span> <span class="s1">'Semeion Handwritten Digit (8)'</span><span class="p">,</span>
|
||||
<span class="s1">'sonar'</span><span class="p">:</span> <span class="s1">'Sonar, Mines vs. Rocks'</span><span class="p">,</span>
|
||||
<span class="s1">'spambase'</span><span class="p">:</span> <span class="s1">'Spambase Data Set'</span><span class="p">,</span>
|
||||
<span class="s1">'spectf'</span><span class="p">:</span> <span class="s1">'SPECTF Heart Data'</span><span class="p">,</span>
|
||||
<span class="s1">'tictactoe'</span><span class="p">:</span> <span class="s1">'Tic-Tac-Toe Endgame Database'</span><span class="p">,</span>
|
||||
<span class="s1">'transfusion'</span><span class="p">:</span> <span class="s1">'Blood Transfusion Service Center Data Set'</span><span class="p">,</span>
|
||||
<span class="s1">'wdbc'</span><span class="p">:</span> <span class="s1">'Wisconsin Diagnostic Breast Cancer'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.1'</span><span class="p">:</span> <span class="s1">'Wine Recognition Data (1)'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.2'</span><span class="p">:</span> <span class="s1">'Wine Recognition Data (2)'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.3'</span><span class="p">:</span> <span class="s1">'Wine Recognition Data (3)'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-q-red'</span><span class="p">:</span> <span class="s1">'Wine Quality Red (6-10)'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-q-white'</span><span class="p">:</span> <span class="s1">'Wine Quality White (6-10)'</span><span class="p">,</span>
|
||||
<span class="s1">'yeast'</span><span class="p">:</span> <span class="s1">'Yeast'</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="c1"># the identifier is an alias for the dataset group, it's part of the url data-folder, and is the name we use</span>
|
||||
<span class="c1"># to download the raw dataset</span>
|
||||
<span class="n">identifier_map</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'acute.a'</span><span class="p">:</span> <span class="s1">'acute'</span><span class="p">,</span>
|
||||
<span class="s1">'acute.b'</span><span class="p">:</span> <span class="s1">'acute'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.1'</span><span class="p">:</span> <span class="s1">'balance-scale'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.2'</span><span class="p">:</span> <span class="s1">'balance-scale'</span><span class="p">,</span>
|
||||
<span class="s1">'balance.3'</span><span class="p">:</span> <span class="s1">'balance-scale'</span><span class="p">,</span>
|
||||
<span class="s1">'breast-cancer'</span><span class="p">:</span> <span class="s1">'breast-cancer-wisconsin'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.1'</span><span class="p">:</span> <span class="s1">'cmc'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.2'</span><span class="p">:</span> <span class="s1">'cmc'</span><span class="p">,</span>
|
||||
<span class="s1">'cmc.3'</span><span class="p">:</span> <span class="s1">'cmc'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.1'</span><span class="p">:</span> <span class="s1">'00193'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.2'</span><span class="p">:</span> <span class="s1">'00193'</span><span class="p">,</span>
|
||||
<span class="s1">'ctg.3'</span><span class="p">:</span> <span class="s1">'00193'</span><span class="p">,</span>
|
||||
<span class="s1">'german'</span><span class="p">:</span> <span class="s1">'statlog/german'</span><span class="p">,</span>
|
||||
<span class="s1">'haberman'</span><span class="p">:</span> <span class="s1">'haberman'</span><span class="p">,</span>
|
||||
<span class="s1">'ionosphere'</span><span class="p">:</span> <span class="s1">'ionosphere'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.1'</span><span class="p">:</span> <span class="s1">'iris'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.2'</span><span class="p">:</span> <span class="s1">'iris'</span><span class="p">,</span>
|
||||
<span class="s1">'iris.3'</span><span class="p">:</span> <span class="s1">'iris'</span><span class="p">,</span>
|
||||
<span class="s1">'mammographic'</span><span class="p">:</span> <span class="s1">'mammographic-masses'</span><span class="p">,</span>
|
||||
<span class="s1">'pageblocks.5'</span><span class="p">:</span> <span class="s1">'page-blocks'</span><span class="p">,</span>
|
||||
<span class="s1">'semeion'</span><span class="p">:</span> <span class="s1">'semeion'</span><span class="p">,</span>
|
||||
<span class="s1">'sonar'</span><span class="p">:</span> <span class="s1">'undocumented/connectionist-bench/sonar'</span><span class="p">,</span>
|
||||
<span class="s1">'spambase'</span><span class="p">:</span> <span class="s1">'spambase'</span><span class="p">,</span>
|
||||
<span class="s1">'spectf'</span><span class="p">:</span> <span class="s1">'spect'</span><span class="p">,</span>
|
||||
<span class="s1">'tictactoe'</span><span class="p">:</span> <span class="s1">'tic-tac-toe'</span><span class="p">,</span>
|
||||
<span class="s1">'transfusion'</span><span class="p">:</span> <span class="s1">'blood-transfusion'</span><span class="p">,</span>
|
||||
<span class="s1">'wdbc'</span><span class="p">:</span> <span class="s1">'breast-cancer-wisconsin'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-q-red'</span><span class="p">:</span> <span class="s1">'wine-quality'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-q-white'</span><span class="p">:</span> <span class="s1">'wine-quality'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.1'</span><span class="p">:</span> <span class="s1">'wine'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.2'</span><span class="p">:</span> <span class="s1">'wine'</span><span class="p">,</span>
|
||||
<span class="s1">'wine.3'</span><span class="p">:</span> <span class="s1">'wine'</span><span class="p">,</span>
|
||||
<span class="s1">'yeast'</span><span class="p">:</span> <span class="s1">'yeast'</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="c1"># the filename is the name of the file within the data_folder indexed by the identifier</span>
|
||||
<span class="n">file_name</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'acute'</span><span class="p">:</span> <span class="s1">'diagnosis.data'</span><span class="p">,</span>
|
||||
<span class="s1">'00193'</span><span class="p">:</span> <span class="s1">'CTG.xls'</span><span class="p">,</span>
|
||||
<span class="s1">'statlog/german'</span><span class="p">:</span> <span class="s1">'german.data-numeric'</span><span class="p">,</span>
|
||||
<span class="s1">'mammographic-masses'</span><span class="p">:</span> <span class="s1">'mammographic_masses.data'</span><span class="p">,</span>
|
||||
<span class="s1">'page-blocks'</span><span class="p">:</span> <span class="s1">'page-blocks.data.Z'</span><span class="p">,</span>
|
||||
<span class="s1">'undocumented/connectionist-bench/sonar'</span><span class="p">:</span> <span class="s1">'sonar.all-data'</span><span class="p">,</span>
|
||||
<span class="s1">'spect'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'SPECTF.train'</span><span class="p">,</span> <span class="s1">'SPECTF.test'</span><span class="p">],</span>
|
||||
<span class="s1">'blood-transfusion'</span><span class="p">:</span> <span class="s1">'transfusion.data'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-quality'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'winequality-red.csv'</span><span class="p">,</span> <span class="s1">'winequality-white.csv'</span><span class="p">],</span>
|
||||
<span class="s1">'breast-cancer-wisconsin'</span><span class="p">:</span> <span class="s1">'breast-cancer-wisconsin.data'</span> <span class="k">if</span> <span class="n">dataset_name</span><span class="o">==</span><span class="s1">'breast-cancer'</span> <span class="k">else</span> <span class="s1">'wdbc.data'</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="c1"># the filename containing the dataset description (if any)</span>
|
||||
<span class="n">desc_name</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'acute'</span><span class="p">:</span> <span class="s1">'diagnosis.names'</span><span class="p">,</span>
|
||||
<span class="s1">'00193'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="s1">'statlog/german'</span><span class="p">:</span> <span class="s1">'german.doc'</span><span class="p">,</span>
|
||||
<span class="s1">'mammographic-masses'</span><span class="p">:</span> <span class="s1">'mammographic_masses.names'</span><span class="p">,</span>
|
||||
<span class="s1">'undocumented/connectionist-bench/sonar'</span><span class="p">:</span> <span class="s1">'sonar.names'</span><span class="p">,</span>
|
||||
<span class="s1">'spect'</span><span class="p">:</span> <span class="s1">'SPECTF.names'</span><span class="p">,</span>
|
||||
<span class="s1">'blood-transfusion'</span><span class="p">:</span> <span class="s1">'transfusion.names'</span><span class="p">,</span>
|
||||
<span class="s1">'wine-quality'</span><span class="p">:</span> <span class="s1">'winequality.names'</span><span class="p">,</span>
|
||||
<span class="s1">'breast-cancer-wisconsin'</span><span class="p">:</span> <span class="s1">'breast-cancer-wisconsin.names'</span> <span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'breast-cancer'</span> <span class="k">else</span> <span class="s1">'wdbc.names'</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">identifier</span> <span class="o">=</span> <span class="n">identifier_map</span><span class="p">[</span><span class="n">dataset_name</span><span class="p">]</span>
|
||||
<span class="n">filename</span> <span class="o">=</span> <span class="n">file_name</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">identifier</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">identifier</span><span class="si">}</span><span class="s1">.data'</span><span class="p">)</span>
|
||||
<span class="n">descfile</span> <span class="o">=</span> <span class="n">desc_name</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">identifier</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">identifier</span><span class="si">}</span><span class="s1">.names'</span><span class="p">)</span>
|
||||
<span class="n">fullname</span> <span class="o">=</span> <span class="n">dataset_fullname</span><span class="p">[</span><span class="n">dataset_name</span><span class="p">]</span>
|
||||
|
||||
<span class="n">URL</span> <span class="o">=</span> <span class="sa">f</span><span class="s1">'http://archive.ics.uci.edu/ml/machine-learning-databases/</span><span class="si">{</span><span class="n">identifier</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="n">data_dir</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'uci_datasets'</span><span class="p">,</span> <span class="n">identifier</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span> <span class="c1"># filename could be a list of files, in which case it will be processed later</span>
|
||||
<span class="n">data_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_dir</span><span class="p">,</span> <span class="n">filename</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">URL</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="s1">'</span><span class="p">,</span> <span class="n">data_path</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">descfile</span><span class="p">:</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">URL</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">descfile</span><span class="si">}</span><span class="s1">'</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">descfile</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">data_dir</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">descfile</span><span class="si">}</span><span class="s1">'</span><span class="p">,</span> <span class="s1">'rt'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
|
||||
<span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'could not read the description file'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'no file description available'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Loading </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1"> (</span><span class="si">{</span><span class="n">fullname</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'acute'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">'utf-16'</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">','</span><span class="p">,</span> <span class="s1">'.'</span><span class="p">)))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">,</span> <span class="n">copy</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="p">[</span><span class="n">_df_replace</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">col</span><span class="p">)</span> <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">6</span><span class="p">)]</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'acute.a'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="mi">6</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'yes'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'acute.b'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="mi">7</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'yes'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'balance-scale'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'balance.1'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'L'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'balance.2'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'B'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'balance.3'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'R'</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'breast-cancer-wisconsin'</span> <span class="ow">and</span> <span class="n">dataset_name</span><span class="o">==</span><span class="s1">'breast-cancer'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">Xy</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">10</span><span class="p">]</span>
|
||||
<span class="n">Xy</span><span class="p">[</span><span class="n">Xy</span><span class="o">==</span><span class="s1">'?'</span><span class="p">]</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">nan</span>
|
||||
<span class="n">Xy</span> <span class="o">=</span> <span class="n">Xy</span><span class="o">.</span><span class="n">dropna</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">Xy</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">9</span><span class="p">]</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">Xy</span><span class="p">[</span><span class="mi">10</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'breast-cancer-wisconsin'</span> <span class="ow">and</span> <span class="n">dataset_name</span><span class="o">==</span><span class="s1">'wdbc'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">:</span><span class="mi">32</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'M'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'cmc'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">8</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'cmc.1'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'cmc.2'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'cmc.3'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'00193'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_excel</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">sheet_name</span><span class="o">=</span><span class="s1">'Data'</span><span class="p">,</span> <span class="n">skipfooter</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">24</span><span class="p">))]</span> <span class="c1"># select columns numbered (number 23 is the target label)</span>
|
||||
<span class="c1"># replaces the header with the first row</span>
|
||||
<span class="n">new_header</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># grab the first row for the header</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="c1"># take the data less the header row</span>
|
||||
<span class="n">df</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">new_header</span> <span class="c1"># set the header row as the df header</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">22</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s1">'NSP'</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'ctg.1'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 1==Normal</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'ctg.2'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># 2==Suspect</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'ctg.3'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="c1"># 3==Pathologic</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'statlog/german'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">delim_whitespace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">24</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">24</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'haberman'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">3</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'ionosphere'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">34</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">34</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'b'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'iris'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">4</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'iris.1'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'Iris-setosa'</span><span class="p">)</span> <span class="c1"># 1==Setosa</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'iris.2'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'Iris-versicolor'</span><span class="p">)</span> <span class="c1"># 2==Versicolor</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'iris.3'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'Iris-virginica'</span><span class="p">)</span> <span class="c1"># 3==Virginica</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'mammographic-masses'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">df</span><span class="p">[</span><span class="n">df</span> <span class="o">==</span> <span class="s1">'?'</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span>
|
||||
<span class="n">Xy</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">dropna</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">Xy</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">Xy</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span><span class="mi">5</span><span class="p">],</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'page-blocks'</span><span class="p">:</span>
|
||||
<span class="n">data_path_</span> <span class="o">=</span> <span class="n">data_path</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'.Z'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">data_path_</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">FileNotFoundError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Warning: file </span><span class="si">{</span><span class="n">data_path_</span><span class="si">}</span><span class="s1"> does not exist. If this is the first time you '</span>
|
||||
<span class="sa">f</span><span class="s1">'attempt to load this dataset, then you have to manually unzip the </span><span class="si">{</span><span class="n">data_path</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'and name the extracted file </span><span class="si">{</span><span class="n">data_path_</span><span class="si">}</span><span class="s1"> (unfortunately, neither zipfile, nor '</span>
|
||||
<span class="sa">f</span><span class="s1">'gzip can handle unix compressed files automatically -- there is a repo in GitHub '</span>
|
||||
<span class="sa">f</span><span class="s1">'https://github.com/umeat/unlzw where the problem seems to be solved anyway).'</span><span class="p">)</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path_</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">delim_whitespace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">10</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span> <span class="c1"># 5==block "graphic"</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'semeion'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">delim_whitespace</span><span class="o">=</span><span class="kc">True</span> <span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">256</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">263</span><span class="p">]</span><span class="o">.</span><span class="n">values</span> <span class="c1"># 263 stands for digit 8 (labels are one-hot vectors from col 256-266)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'undocumented/connectionist-bench/sonar'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">60</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">60</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'R'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'spambase'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">57</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">57</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'spect'</span><span class="p">:</span>
|
||||
<span class="n">dfs</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">filename</span><span class="p">:</span>
|
||||
<span class="n">data_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_dir</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">URL</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">file</span><span class="si">}</span><span class="s1">'</span><span class="p">,</span> <span class="n">data_path</span><span class="p">)</span>
|
||||
<span class="n">dfs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">))</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">(</span><span class="n">dfs</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">45</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'tic-tac-toe'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">9</span><span class="p">]</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'o'</span><span class="p">,</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'b'</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'x'</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'negative'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'blood-transfusion'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">4</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">4</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'wine'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">14</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'wine.1'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'wine.2'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'wine.3'</span><span class="p">:</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'wine-quality'</span><span class="p">:</span>
|
||||
<span class="n">filename</span> <span class="o">=</span> <span class="n">filename</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">dataset_name</span><span class="o">==</span><span class="s1">'wine-q-red'</span> <span class="k">else</span> <span class="n">filename</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="n">data_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_dir</span><span class="p">,</span> <span class="n">filename</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">URL</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="s1">'</span><span class="p">,</span> <span class="n">data_path</span><span class="p">)</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s1">';'</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="mi">11</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">11</span><span class="p">]</span><span class="o">.</span><span class="n">values</span> <span class="o">></span> <span class="mi">5</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">identifier</span> <span class="o">==</span> <span class="s1">'yeast'</span><span class="p">:</span>
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">delim_whitespace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">9</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">9</span><span class="p">]</span><span class="o">.</span><span class="n">values</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="s1">'NUC'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="n">data</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">data</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_UCIMulticlassDataset"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_UCIMulticlassDataset">[docs]</a><span class="k">def</span> <span class="nf">fetch_UCIMulticlassDataset</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">test_split</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">Dataset</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a UCI multiclass dataset as an instance of :class:`quapy.data.base.Dataset`. </span>
|
||||
|
||||
<span class="sd"> The list of available datasets is taken from https://archive.ics.uci.edu/, following these criteria:</span>
|
||||
<span class="sd"> - It has more than 1000 instances</span>
|
||||
<span class="sd"> - It is suited for classification</span>
|
||||
<span class="sd"> - It has more than two classes</span>
|
||||
<span class="sd"> - It is available for Python import (requires ucimlrepo package)</span>
|
||||
|
||||
<span class="sd"> >>> import quapy as qp</span>
|
||||
<span class="sd"> >>> dataset = qp.datasets.fetch_UCIMulticlassDataset("dry-bean")</span>
|
||||
<span class="sd"> >>> train, test = dataset.train_test</span>
|
||||
<span class="sd"> >>> ...</span>
|
||||
|
||||
<span class="sd"> The list of valid dataset names can be accessed in `quapy.data.datasets.UCI_MULTICLASS_DATASETS`</span>
|
||||
|
||||
<span class="sd"> The datasets are downloaded only once and pickled into disk, saving time for consecutive calls.</span>
|
||||
|
||||
<span class="sd"> :param dataset_name: a dataset name</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :param test_split: proportion of documents to be included in the test set. The rest conforms the training set</span>
|
||||
<span class="sd"> :param verbose: set to True (default is False) to get information (stats) about the dataset</span>
|
||||
<span class="sd"> :return: a :class:`quapy.data.base.Dataset` instance</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">fetch_UCIMulticlassLabelledCollection</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">data_home</span><span class="p">,</span> <span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="o">*</span><span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">test_split</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_UCIMulticlassLabelledCollection"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_UCIMulticlassLabelledCollection">[docs]</a><span class="k">def</span> <span class="nf">fetch_UCIMulticlassLabelledCollection</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">LabelledCollection</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads a UCI multiclass collection as an instance of :class:`quapy.data.base.LabelledCollection`.</span>
|
||||
|
||||
<span class="sd"> The list of available datasets is taken from https://archive.ics.uci.edu/, following these criteria:</span>
|
||||
<span class="sd"> - It has more than 1000 instances</span>
|
||||
<span class="sd"> - It is suited for classification</span>
|
||||
<span class="sd"> - It has more than two classes</span>
|
||||
<span class="sd"> - It is available for Python import (requires ucimlrepo package)</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> >>> import quapy as qp</span>
|
||||
<span class="sd"> >>> collection = qp.datasets.fetch_UCIMulticlassLabelledCollection("dry-bean")</span>
|
||||
<span class="sd"> >>> X, y = collection.Xy</span>
|
||||
<span class="sd"> >>> ...</span>
|
||||
|
||||
<span class="sd"> The list of valid dataset names can be accessed in `quapy.data.datasets.UCI_MULTICLASS_DATASETS`</span>
|
||||
|
||||
<span class="sd"> The datasets are downloaded only once and pickled into disk, saving time for consecutive calls.</span>
|
||||
|
||||
<span class="sd"> :param dataset_name: a dataset name</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where the dataset will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :param test_split: proportion of documents to be included in the test set. The rest conforms the training set</span>
|
||||
<span class="sd"> :param verbose: set to True (default is False) to get information (stats) about the dataset</span>
|
||||
<span class="sd"> :return: a :class:`quapy.data.base.LabelledCollection` instance</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">dataset_name</span> <span class="ow">in</span> <span class="n">UCI_MULTICLASS_DATASETS</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'Name </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1"> does not match any known dataset from the '</span> \
|
||||
<span class="sa">f</span><span class="s1">'UCI Machine Learning datasets repository (multiclass). '</span> \
|
||||
<span class="sa">f</span><span class="s1">'Valid ones are </span><span class="si">{</span><span class="n">UCI_MULTICLASS_DATASETS</span><span class="si">}</span><span class="s1">'</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">data_home</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">data_home</span> <span class="o">=</span> <span class="n">get_quapy_home</span><span class="p">()</span>
|
||||
|
||||
<span class="n">identifiers</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s2">"dry-bean"</span><span class="p">:</span> <span class="mi">602</span><span class="p">,</span>
|
||||
<span class="s2">"wine-quality"</span><span class="p">:</span> <span class="mi">186</span><span class="p">,</span>
|
||||
<span class="s2">"academic-success"</span><span class="p">:</span> <span class="mi">697</span><span class="p">,</span>
|
||||
<span class="s2">"digits"</span><span class="p">:</span> <span class="mi">80</span><span class="p">,</span>
|
||||
<span class="s2">"letter"</span><span class="p">:</span> <span class="mi">59</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">full_names</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s2">"dry-bean"</span><span class="p">:</span> <span class="s2">"Dry Bean Dataset"</span><span class="p">,</span>
|
||||
<span class="s2">"wine-quality"</span><span class="p">:</span> <span class="s2">"Wine Quality"</span><span class="p">,</span>
|
||||
<span class="s2">"academic-success"</span><span class="p">:</span> <span class="s2">"Predict students' dropout and academic success"</span><span class="p">,</span>
|
||||
<span class="s2">"digits"</span><span class="p">:</span> <span class="s2">"Optical Recognition of Handwritten Digits"</span><span class="p">,</span>
|
||||
<span class="s2">"letter"</span><span class="p">:</span> <span class="s2">"Letter Recognition"</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">identifier</span> <span class="o">=</span> <span class="n">identifiers</span><span class="p">[</span><span class="n">dataset_name</span><span class="p">]</span>
|
||||
<span class="n">fullname</span> <span class="o">=</span> <span class="n">full_names</span><span class="p">[</span><span class="n">dataset_name</span><span class="p">]</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Loading UCI Muticlass </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1"> (</span><span class="si">{</span><span class="n">fullname</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">file</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'uci_multiclass'</span><span class="p">,</span> <span class="n">dataset_name</span><span class="o">+</span><span class="s1">'.pkl'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">download</span><span class="p">(</span><span class="nb">id</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">fetch_ucirepo</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="nb">id</span><span class="p">)</span>
|
||||
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'data'</span><span class="p">][</span><span class="s1">'features'</span><span class="p">]</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">(),</span> <span class="n">data</span><span class="p">[</span><span class="s1">'data'</span><span class="p">][</span><span class="s1">'targets'</span><span class="p">]</span><span class="o">.</span><span class="n">to_numpy</span><span class="p">()</span><span class="o">.</span><span class="n">squeeze</span><span class="p">()</span>
|
||||
<span class="n">classes</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">))</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">searchsorted</span><span class="p">(</span><span class="n">classes</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">pickled_resource</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">download</span><span class="p">,</span> <span class="n">identifier</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="n">data</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">data</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_df_replace</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">col</span><span class="p">,</span> <span class="n">repl</span><span class="o">=</span><span class="p">{</span><span class="s1">'yes'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">:</span><span class="mi">0</span><span class="p">},</span> <span class="n">astype</span><span class="o">=</span><span class="nb">float</span><span class="p">):</span>
|
||||
<span class="n">df</span><span class="p">[</span><span class="n">col</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">col</span><span class="p">]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span><span class="n">repl</span><span class="p">[</span><span class="n">x</span><span class="p">])</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">astype</span><span class="p">,</span> <span class="n">copy</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_lequa2022"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_lequa2022">[docs]</a><span class="k">def</span> <span class="nf">fetch_lequa2022</span><span class="p">(</span><span class="n">task</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads the official datasets provided for the `LeQua <https://lequa2022.github.io/index>`_ competition.</span>
|
||||
<span class="sd"> In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification</span>
|
||||
<span class="sd"> problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide raw documents instead.</span>
|
||||
<span class="sd"> Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B are multiclass quantification</span>
|
||||
<span class="sd"> problems consisting of estimating the class prevalence values of 28 different merchandise products.</span>
|
||||
<span class="sd"> We refer to the `Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).</span>
|
||||
<span class="sd"> A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.</span>
|
||||
<span class="sd"> <https://ceur-ws.org/Vol-3180/paper-146.pdf>`_ for a detailed description</span>
|
||||
<span class="sd"> on the tasks and datasets.</span>
|
||||
|
||||
<span class="sd"> The datasets are downloaded only once, and stored for fast reuse.</span>
|
||||
|
||||
<span class="sd"> See `lequa2022_experiments.py` provided in the example folder, that can serve as a guide on how to use these</span>
|
||||
<span class="sd"> datasets.</span>
|
||||
|
||||
|
||||
<span class="sd"> :param task: a string representing the task name; valid ones are T1A, T1B, T2A, and T2B</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :return: a tuple `(train, val_gen, test_gen)` where `train` is an instance of</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection`, `val_gen` and `test_gen` are instances of</span>
|
||||
<span class="sd"> :class:`quapy.data._lequa2022.SamplesFromDir`, a subclass of :class:`quapy.protocol.AbstractProtocol`,</span>
|
||||
<span class="sd"> that return a series of samples stored in a directory which are labelled by prevalence.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.data._lequa2022</span> <span class="kn">import</span> <span class="n">load_raw_documents</span><span class="p">,</span> <span class="n">load_vector_documents</span><span class="p">,</span> <span class="n">SamplesFromDir</span>
|
||||
|
||||
<span class="k">assert</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">LEQUA2022_TASKS</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'Unknown task </span><span class="si">{</span><span class="n">task</span><span class="si">}</span><span class="s1">. Valid ones are </span><span class="si">{</span><span class="n">LEQUA2022_TASKS</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">if</span> <span class="n">data_home</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">data_home</span> <span class="o">=</span> <span class="n">get_quapy_home</span><span class="p">()</span>
|
||||
|
||||
<span class="n">URL_TRAINDEV</span><span class="o">=</span><span class="sa">f</span><span class="s1">'https://zenodo.org/record/6546188/files/</span><span class="si">{</span><span class="n">task</span><span class="si">}</span><span class="s1">.train_dev.zip'</span>
|
||||
<span class="n">URL_TEST</span><span class="o">=</span><span class="sa">f</span><span class="s1">'https://zenodo.org/record/6546188/files/</span><span class="si">{</span><span class="n">task</span><span class="si">}</span><span class="s1">.test.zip'</span>
|
||||
<span class="n">URL_TEST_PREV</span><span class="o">=</span><span class="sa">f</span><span class="s1">'https://zenodo.org/record/6546188/files/</span><span class="si">{</span><span class="n">task</span><span class="si">}</span><span class="s1">.test_prevalences.zip'</span>
|
||||
|
||||
<span class="n">lequa_dir</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'lequa2022'</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">download_unzip_and_remove</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
|
||||
<span class="n">tmp_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span> <span class="o">+</span> <span class="s1">'_tmp.zip'</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">tmp_path</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="n">zipfile</span><span class="o">.</span><span class="n">ZipFile</span><span class="p">(</span><span class="n">tmp_path</span><span class="p">)</span> <span class="k">as</span> <span class="n">file</span><span class="p">:</span>
|
||||
<span class="n">file</span><span class="o">.</span><span class="n">extractall</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">tmp_path</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span><span class="p">)):</span>
|
||||
<span class="n">download_unzip_and_remove</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">URL_TRAINDEV</span><span class="p">)</span>
|
||||
<span class="n">download_unzip_and_remove</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">URL_TEST</span><span class="p">)</span>
|
||||
<span class="n">download_unzip_and_remove</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">URL_TEST_PREV</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">task</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'T1A'</span><span class="p">,</span> <span class="s1">'T1B'</span><span class="p">]:</span>
|
||||
<span class="n">load_fn</span> <span class="o">=</span> <span class="n">load_vector_documents</span>
|
||||
<span class="k">elif</span> <span class="n">task</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'T2A'</span><span class="p">,</span> <span class="s1">'T2B'</span><span class="p">]:</span>
|
||||
<span class="n">load_fn</span> <span class="o">=</span> <span class="n">load_raw_documents</span>
|
||||
|
||||
<span class="n">tr_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span><span class="p">,</span> <span class="s1">'public'</span><span class="p">,</span> <span class="s1">'training_data.txt'</span><span class="p">)</span>
|
||||
<span class="n">train</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">tr_path</span><span class="p">,</span> <span class="n">loader_func</span><span class="o">=</span><span class="n">load_fn</span><span class="p">)</span>
|
||||
|
||||
<span class="n">val_samples_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span><span class="p">,</span> <span class="s1">'public'</span><span class="p">,</span> <span class="s1">'dev_samples'</span><span class="p">)</span>
|
||||
<span class="n">val_true_prev_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span><span class="p">,</span> <span class="s1">'public'</span><span class="p">,</span> <span class="s1">'dev_prevalences.txt'</span><span class="p">)</span>
|
||||
<span class="n">val_gen</span> <span class="o">=</span> <span class="n">SamplesFromDir</span><span class="p">(</span><span class="n">val_samples_path</span><span class="p">,</span> <span class="n">val_true_prev_path</span><span class="p">,</span> <span class="n">load_fn</span><span class="o">=</span><span class="n">load_fn</span><span class="p">)</span>
|
||||
|
||||
<span class="n">test_samples_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span><span class="p">,</span> <span class="s1">'public'</span><span class="p">,</span> <span class="s1">'test_samples'</span><span class="p">)</span>
|
||||
<span class="n">test_true_prev_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">lequa_dir</span><span class="p">,</span> <span class="n">task</span><span class="p">,</span> <span class="s1">'public'</span><span class="p">,</span> <span class="s1">'test_prevalences.txt'</span><span class="p">)</span>
|
||||
<span class="n">test_gen</span> <span class="o">=</span> <span class="n">SamplesFromDir</span><span class="p">(</span><span class="n">test_samples_path</span><span class="p">,</span> <span class="n">test_true_prev_path</span><span class="p">,</span> <span class="n">load_fn</span><span class="o">=</span><span class="n">load_fn</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">train</span><span class="p">,</span> <span class="n">val_gen</span><span class="p">,</span> <span class="n">test_gen</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="fetch_IFCB"><a class="viewcode-back" href="../../../quapy.data.html#quapy.data.datasets.fetch_IFCB">[docs]</a><span class="k">def</span> <span class="nf">fetch_IFCB</span><span class="p">(</span><span class="n">single_sample_train</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">for_model_selection</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">data_home</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Loads the IFCB dataset for quantification from `Zenodo <https://zenodo.org/records/10036244>`_ (for more</span>
|
||||
<span class="sd"> information on this dataset, please follow the zenodo link).</span>
|
||||
<span class="sd"> This dataset is based on the data available publicly at</span>
|
||||
<span class="sd"> `WHOI-Plankton repo <https://github.com/hsosik/WHOI-Plankton>`_.</span>
|
||||
<span class="sd"> The scripts for the processing are available at `P. González's repo <https://github.com/pglez82/IFCB_Zenodo>`_.</span>
|
||||
<span class="sd"> Basically, this is the IFCB dataset with precomputed features for testing quantification algorithms.</span>
|
||||
|
||||
<span class="sd"> The datasets are downloaded only once, and stored for fast reuse.</span>
|
||||
|
||||
<span class="sd"> :param single_sample_train: a boolean. If true, it will return the train dataset as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (all examples together).</span>
|
||||
<span class="sd"> If false, a generator of training samples will be returned. Each example in the training set has an individual label.</span>
|
||||
<span class="sd"> :param for_model_selection: if True, then returns a split 30% of the training set (86 out of 286 samples) to be used for model selection; </span>
|
||||
<span class="sd"> if False, then returns the full training set as training set and the test set as the test set</span>
|
||||
<span class="sd"> :param data_home: specify the quapy home directory where collections will be dumped (leave empty to use the default</span>
|
||||
<span class="sd"> ~/quay_data/ directory)</span>
|
||||
<span class="sd"> :return: a tuple `(train, test_gen)` where `train` is an instance of</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection`, if `single_sample_train` is true or</span>
|
||||
<span class="sd"> :class:`quapy.data._ifcb.IFCBTrainSamplesFromDir`, i.e. a sampling protocol that returns a series of samples</span>
|
||||
<span class="sd"> labelled example by example. test_gen will be a :class:`quapy.data._ifcb.IFCBTestSamples`, </span>
|
||||
<span class="sd"> i.e., a sampling protocol that returns a series of samples labelled by prevalence.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.data._ifcb</span> <span class="kn">import</span> <span class="n">IFCBTrainSamplesFromDir</span><span class="p">,</span> <span class="n">IFCBTestSamples</span><span class="p">,</span> <span class="n">get_sample_list</span><span class="p">,</span> <span class="n">generate_modelselection_split</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">data_home</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">data_home</span> <span class="o">=</span> <span class="n">get_quapy_home</span><span class="p">()</span>
|
||||
|
||||
<span class="n">URL_TRAIN</span><span class="o">=</span><span class="sa">f</span><span class="s1">'https://zenodo.org/records/10036244/files/IFCB.train.zip'</span>
|
||||
<span class="n">URL_TEST</span><span class="o">=</span><span class="sa">f</span><span class="s1">'https://zenodo.org/records/10036244/files/IFCB.test.zip'</span>
|
||||
<span class="n">URL_TEST_PREV</span><span class="o">=</span><span class="sa">f</span><span class="s1">'https://zenodo.org/records/10036244/files/IFCB.test_prevalences.zip'</span>
|
||||
|
||||
<span class="n">ifcb_dir</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">data_home</span><span class="p">,</span> <span class="s1">'ifcb'</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">download_unzip_and_remove</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
|
||||
<span class="n">tmp_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span> <span class="s1">'ifcb_tmp.zip'</span><span class="p">)</span>
|
||||
<span class="n">download_file_if_not_exists</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">tmp_path</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="n">zipfile</span><span class="o">.</span><span class="n">ZipFile</span><span class="p">(</span><span class="n">tmp_path</span><span class="p">)</span> <span class="k">as</span> <span class="n">file</span><span class="p">:</span>
|
||||
<span class="n">file</span><span class="o">.</span><span class="n">extractall</span><span class="p">(</span><span class="n">unzipped_path</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">tmp_path</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span><span class="s1">'train'</span><span class="p">)):</span>
|
||||
<span class="n">download_unzip_and_remove</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span> <span class="n">URL_TRAIN</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span><span class="s1">'test'</span><span class="p">)):</span>
|
||||
<span class="n">download_unzip_and_remove</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span> <span class="n">URL_TEST</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span><span class="s1">'test_prevalences.csv'</span><span class="p">)):</span>
|
||||
<span class="n">download_unzip_and_remove</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span> <span class="n">URL_TEST_PREV</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Load test prevalences and classes</span>
|
||||
<span class="n">test_true_prev_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span> <span class="s1">'test_prevalences.csv'</span><span class="p">)</span>
|
||||
<span class="n">test_true_prev</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">test_true_prev_path</span><span class="p">)</span>
|
||||
<span class="n">classes</span> <span class="o">=</span> <span class="n">test_true_prev</span><span class="o">.</span><span class="n">columns</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
|
||||
|
||||
<span class="c1">#Load train and test samples</span>
|
||||
<span class="n">train_samples_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span><span class="s1">'train'</span><span class="p">)</span>
|
||||
<span class="n">test_samples_path</span> <span class="o">=</span> <span class="n">join</span><span class="p">(</span><span class="n">ifcb_dir</span><span class="p">,</span><span class="s1">'test'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">for_model_selection</span><span class="p">:</span>
|
||||
<span class="c1"># In this case, return 70% of training data as the training set and 30% as the test set</span>
|
||||
<span class="n">samples</span> <span class="o">=</span> <span class="n">get_sample_list</span><span class="p">(</span><span class="n">train_samples_path</span><span class="p">)</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">generate_modelselection_split</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">split</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
|
||||
<span class="n">train_gen</span> <span class="o">=</span> <span class="n">IFCBTrainSamplesFromDir</span><span class="p">(</span><span class="n">path_dir</span><span class="o">=</span><span class="n">train_samples_path</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="n">classes</span><span class="p">,</span> <span class="n">samples</span><span class="o">=</span><span class="n">train</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Test prevalence is computed from class labels</span>
|
||||
<span class="n">test_gen</span> <span class="o">=</span> <span class="n">IFCBTestSamples</span><span class="p">(</span><span class="n">path_dir</span><span class="o">=</span><span class="n">train_samples_path</span><span class="p">,</span> <span class="n">test_prevalences</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">samples</span><span class="o">=</span><span class="n">test</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="n">classes</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="c1"># In this case, we use all training samples as the training set and the test samples as the test set</span>
|
||||
<span class="n">train_gen</span> <span class="o">=</span> <span class="n">IFCBTrainSamplesFromDir</span><span class="p">(</span><span class="n">path_dir</span><span class="o">=</span><span class="n">train_samples_path</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="n">classes</span><span class="p">)</span>
|
||||
<span class="n">test_gen</span> <span class="o">=</span> <span class="n">IFCBTestSamples</span><span class="p">(</span><span class="n">path_dir</span><span class="o">=</span><span class="n">test_samples_path</span><span class="p">,</span> <span class="n">test_prevalences</span><span class="o">=</span><span class="n">test_true_prev</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># In the case the user wants it, join all the train samples in one LabelledCollection</span>
|
||||
<span class="k">if</span> <span class="n">single_sample_train</span><span class="p">:</span>
|
||||
<span class="n">train</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">lc</span> <span class="k">for</span> <span class="n">lc</span> <span class="ow">in</span> <span class="n">train_gen</span><span class="p">()])</span>
|
||||
<span class="k">return</span> <span class="n">train</span><span class="p">,</span> <span class="n">test_gen</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">train_gen</span><span class="p">,</span> <span class="n">test_gen</span></div>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,373 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.data.preprocessing — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.data.preprocessing</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.data.preprocessing</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy.sparse</span> <span class="kn">import</span> <span class="n">spmatrix</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.feature_extraction.text</span> <span class="kn">import</span> <span class="n">TfidfVectorizer</span><span class="p">,</span> <span class="n">CountVectorizer</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">StandardScaler</span>
|
||||
<span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data.base</span> <span class="kn">import</span> <span class="n">Dataset</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.util</span> <span class="kn">import</span> <span class="n">map_parallel</span>
|
||||
<span class="kn">from</span> <span class="nn">.base</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="text2tfidf">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.text2tfidf">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">text2tfidf</span><span class="p">(</span><span class="n">dataset</span><span class="p">:</span><span class="n">Dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">sublinear_tf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Transforms a :class:`quapy.data.base.Dataset` of textual instances into a :class:`quapy.data.base.Dataset` of</span>
|
||||
<span class="sd"> tfidf weighted sparse vectors</span>
|
||||
|
||||
<span class="sd"> :param dataset: a :class:`quapy.data.base.Dataset` where the instances of training and test collections are</span>
|
||||
<span class="sd"> lists of str</span>
|
||||
<span class="sd"> :param min_df: minimum number of occurrences for a word to be considered as part of the vocabulary (default 3)</span>
|
||||
<span class="sd"> :param sublinear_tf: whether or not to apply the log scalling to the tf counters (default True)</span>
|
||||
<span class="sd"> :param inplace: whether or not to apply the transformation inplace (True), or to a new copy (False, default)</span>
|
||||
<span class="sd"> :param kwargs: the rest of parameters of the transformation (as for sklearn's</span>
|
||||
<span class="sd"> `TfidfVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html>`_)</span>
|
||||
<span class="sd"> :return: a new :class:`quapy.data.base.Dataset` in `csr_matrix` format (if inplace=False) or a reference to the</span>
|
||||
<span class="sd"> current Dataset (if inplace=True) where the instances are stored in a `csr_matrix` of real-valued tfidf scores</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">__check_type</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span>
|
||||
<span class="n">__check_type</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span>
|
||||
|
||||
<span class="n">vectorizer</span> <span class="o">=</span> <span class="n">TfidfVectorizer</span><span class="p">(</span><span class="n">min_df</span><span class="o">=</span><span class="n">min_df</span><span class="p">,</span> <span class="n">sublinear_tf</span><span class="o">=</span><span class="n">sublinear_tf</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
|
||||
<span class="n">training_documents</span> <span class="o">=</span> <span class="n">vectorizer</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">test_documents</span> <span class="o">=</span> <span class="n">vectorizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">inplace</span><span class="p">:</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">training_documents</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">test_documents</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">vocabulary</span> <span class="o">=</span> <span class="n">vectorizer</span><span class="o">.</span><span class="n">vocabulary_</span>
|
||||
<span class="k">return</span> <span class="n">dataset</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">training_documents</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">(),</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">test_documents</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">(),</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">test</span><span class="p">,</span> <span class="n">vectorizer</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="reduce_columns">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.reduce_columns">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">reduce_columns</span><span class="p">(</span><span class="n">dataset</span><span class="p">:</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Reduces the dimensionality of the instances, represented as a `csr_matrix` (or any subtype of</span>
|
||||
<span class="sd"> `scipy.sparse.spmatrix`), of training and test documents by removing the columns of words which are not present</span>
|
||||
<span class="sd"> in at least `min_df` instances in the training set</span>
|
||||
|
||||
<span class="sd"> :param dataset: a :class:`quapy.data.base.Dataset` in which instances are represented in sparse format (any</span>
|
||||
<span class="sd"> subtype of scipy.sparse.spmatrix)</span>
|
||||
<span class="sd"> :param min_df: integer, minimum number of instances below which the columns are removed</span>
|
||||
<span class="sd"> :param inplace: whether or not to apply the transformation inplace (True), or to a new copy (False, default)</span>
|
||||
<span class="sd"> :return: a new :class:`quapy.data.base.Dataset` (if inplace=False) or a reference to the current</span>
|
||||
<span class="sd"> :class:`quapy.data.base.Dataset` (inplace=True) where the dimensions corresponding to infrequent terms</span>
|
||||
<span class="sd"> in the training set have been removed</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">__check_type</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">spmatrix</span><span class="p">)</span>
|
||||
<span class="n">__check_type</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">spmatrix</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s1">'unaligned vector spaces'</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">filter_by_occurrences</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">W</span><span class="p">):</span>
|
||||
<span class="n">column_prevalence</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">((</span><span class="n">X</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
|
||||
<span class="n">take_columns</span> <span class="o">=</span> <span class="n">column_prevalence</span> <span class="o">>=</span> <span class="n">min_df</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:,</span> <span class="n">take_columns</span><span class="p">]</span>
|
||||
<span class="n">W</span> <span class="o">=</span> <span class="n">W</span><span class="p">[:,</span> <span class="n">take_columns</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="n">X</span><span class="p">,</span> <span class="n">W</span>
|
||||
|
||||
<span class="n">Xtr</span><span class="p">,</span> <span class="n">Xte</span> <span class="o">=</span> <span class="n">filter_by_occurrences</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">inplace</span><span class="p">:</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">Xtr</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">Xte</span>
|
||||
<span class="k">return</span> <span class="n">dataset</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">Xtr</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">(),</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">Xte</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">(),</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="standardize">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.standardize">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">standardize</span><span class="p">(</span><span class="n">dataset</span><span class="p">:</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Standardizes the real-valued columns of a :class:`quapy.data.base.Dataset`.</span>
|
||||
<span class="sd"> Standardization, aka z-scoring, of a variable `X` comes down to subtracting the average and normalizing by the</span>
|
||||
<span class="sd"> standard deviation.</span>
|
||||
|
||||
<span class="sd"> :param dataset: a :class:`quapy.data.base.Dataset` object</span>
|
||||
<span class="sd"> :param inplace: set to True if the transformation is to be applied inplace, or to False (default) if a new</span>
|
||||
<span class="sd"> :class:`quapy.data.base.Dataset` is to be returned</span>
|
||||
<span class="sd"> :return: an instance of :class:`quapy.data.base.Dataset`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">s</span> <span class="o">=</span> <span class="n">StandardScaler</span><span class="p">(</span><span class="n">copy</span><span class="o">=</span><span class="ow">not</span> <span class="n">inplace</span><span class="p">)</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">inplace</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">dataset</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">test</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">vocabulary</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">name</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="index">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.index">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">index</span><span class="p">(</span><span class="n">dataset</span><span class="p">:</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Indexes the tokens of a textual :class:`quapy.data.base.Dataset` of string documents.</span>
|
||||
<span class="sd"> To index a document means to replace each different token by a unique numerical index.</span>
|
||||
<span class="sd"> Rare words (i.e., words occurring less than `min_df` times) are replaced by a special token `UNK`</span>
|
||||
|
||||
<span class="sd"> :param dataset: a :class:`quapy.data.base.Dataset` object where the instances of training and test documents</span>
|
||||
<span class="sd"> are lists of str</span>
|
||||
<span class="sd"> :param min_df: minimum number of occurrences below which the term is replaced by a `UNK` index</span>
|
||||
<span class="sd"> :param inplace: whether or not to apply the transformation inplace (True), or to a new copy (False, default)</span>
|
||||
<span class="sd"> :param kwargs: the rest of parameters of the transformation (as for sklearn's</span>
|
||||
<span class="sd"> `CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>_`)</span>
|
||||
<span class="sd"> :return: a new :class:`quapy.data.base.Dataset` (if inplace=False) or a reference to the current</span>
|
||||
<span class="sd"> :class:`quapy.data.base.Dataset` (inplace=True) consisting of lists of integer values representing indices.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">__check_type</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span>
|
||||
<span class="n">__check_type</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span>
|
||||
|
||||
<span class="n">indexer</span> <span class="o">=</span> <span class="n">IndexTransformer</span><span class="p">(</span><span class="n">min_df</span><span class="o">=</span><span class="n">min_df</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
|
||||
<span class="n">training_index</span> <span class="o">=</span> <span class="n">indexer</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">test_index</span> <span class="o">=</span> <span class="n">indexer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="n">training_index</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">training_index</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
|
||||
<span class="n">test_index</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">test_index</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">inplace</span><span class="p">:</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">training_index</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">test_index</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">vocabulary</span> <span class="o">=</span> <span class="n">indexer</span><span class="o">.</span><span class="n">vocabulary_</span>
|
||||
<span class="k">return</span> <span class="n">dataset</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">training</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">training_index</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">(),</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">test_index</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">(),</span> <span class="n">dataset</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">test</span><span class="p">,</span> <span class="n">indexer</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">__check_type</span><span class="p">(</span><span class="n">container</span><span class="p">,</span> <span class="n">container_type</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_type</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">container_type</span><span class="p">:</span>
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">container</span><span class="p">,</span> <span class="n">container_type</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'unexpected type of container (expected </span><span class="si">{</span><span class="n">container_type</span><span class="si">}</span><span class="s1">, found </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">container</span><span class="p">)</span><span class="si">}</span><span class="s1">)'</span>
|
||||
<span class="k">if</span> <span class="n">element_type</span><span class="p">:</span>
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">container</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">element_type</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'unexpected type of element (expected </span><span class="si">{</span><span class="n">container_type</span><span class="si">}</span><span class="s1">, found </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">container</span><span class="p">)</span><span class="si">}</span><span class="s1">)'</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="IndexTransformer">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.IndexTransformer">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">IndexTransformer</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This class implements a sklearn's-style transformer that indexes text as numerical ids for the tokens it</span>
|
||||
<span class="sd"> contains, and that would be generated by sklearn's</span>
|
||||
<span class="sd"> `CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_</span>
|
||||
|
||||
<span class="sd"> :param kwargs: keyworded arguments from</span>
|
||||
<span class="sd"> `CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vect</span> <span class="o">=</span> <span class="n">CountVectorizer</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">unk</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># a valid index is assigned after fit</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">pad</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="c1"># a valid index is assigned after fit</span>
|
||||
|
||||
<div class="viewcode-block" id="IndexTransformer.fit">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.IndexTransformer.fit">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fits the transformer, i.e., decides on the vocabulary, given a list of strings.</span>
|
||||
|
||||
<span class="sd"> :param X: a list of strings</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vect</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">analyzer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">vect</span><span class="o">.</span><span class="n">build_analyzer</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">vect</span><span class="o">.</span><span class="n">vocabulary_</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">unk</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_word</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'UNK_TOKEN'</span><span class="p">],</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'UNK_INDEX'</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">pad</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_word</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'PAD_TOKEN'</span><span class="p">],</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'PAD_INDEX'</span><span class="p">])</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="IndexTransformer.transform">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.IndexTransformer.transform">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Transforms the strings in `X` as lists of numerical ids</span>
|
||||
|
||||
<span class="sd"> :param X: a list of strings</span>
|
||||
<span class="sd"> :param n_jobs: the number of parallel workers to carry out this task</span>
|
||||
<span class="sd"> :return: a `np.ndarray` of numerical ids</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="c1"># given the number of tasks and the number of jobs, generates the slices for the parallel processes</span>
|
||||
<span class="k">assert</span> <span class="bp">self</span><span class="o">.</span><span class="n">unk</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'transform called before fit'</span>
|
||||
<span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">map_parallel</span><span class="p">(</span><span class="n">func</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">_index</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_index</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">documents</span><span class="p">):</span>
|
||||
<span class="n">vocab</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="p">[[</span><span class="n">vocab</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">unk</span><span class="p">)</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">analyzer</span><span class="p">(</span><span class="n">doc</span><span class="p">)]</span> <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">documents</span><span class="p">,</span> <span class="s1">'indexing'</span><span class="p">)]</span>
|
||||
|
||||
<div class="viewcode-block" id="IndexTransformer.fit_transform">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.IndexTransformer.fit_transform">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">fit_transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Fits the transform on `X` and transforms it.</span>
|
||||
|
||||
<span class="sd"> :param X: a list of strings</span>
|
||||
<span class="sd"> :param n_jobs: the number of parallel workers to carry out this task</span>
|
||||
<span class="sd"> :return: a `np.ndarray` of numerical ids</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="IndexTransformer.vocabulary_size">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.IndexTransformer.vocabulary_size">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">vocabulary_size</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Gets the length of the vocabulary according to which the document tokens have been indexed</span>
|
||||
|
||||
<span class="sd"> :return: integer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="IndexTransformer.add_word">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.preprocessing.IndexTransformer.add_word">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">add_word</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">word</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">nogaps</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Adds a new token (regardless of whether it has been found in the text or not), with dedicated id.</span>
|
||||
<span class="sd"> Useful to define special tokens for codifying unknown words, or padding tokens.</span>
|
||||
|
||||
<span class="sd"> :param word: string, surface form of the token</span>
|
||||
<span class="sd"> :param id: integer, numerical value to assign to the token (leave as None for indicating the next valid id,</span>
|
||||
<span class="sd"> default)</span>
|
||||
<span class="sd"> :param nogaps: if set to True (default) asserts that the id indicated leads to no numerical gaps with</span>
|
||||
<span class="sd"> precedent ids stored so far</span>
|
||||
<span class="sd"> :return: integer, the numerical id for the new token</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">word</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'word </span><span class="si">{</span><span class="n">word</span><span class="si">}</span><span class="s1"> already in dictionary'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">id</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="c1"># add the word with the next id</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">id2word</span> <span class="o">=</span> <span class="p">{</span><span class="n">id_</span><span class="p">:</span><span class="n">word_</span> <span class="k">for</span> <span class="n">word_</span><span class="p">,</span> <span class="n">id_</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span>
|
||||
<span class="k">if</span> <span class="nb">id</span> <span class="ow">in</span> <span class="n">id2word</span><span class="p">:</span>
|
||||
<span class="n">old_word</span> <span class="o">=</span> <span class="n">id2word</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="o">=</span> <span class="nb">id</span>
|
||||
<span class="k">del</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">[</span><span class="n">old_word</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">add_word</span><span class="p">(</span><span class="n">old_word</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">nogaps</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="nb">id</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_size</span><span class="p">()</span><span class="o">+</span><span class="mi">1</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'word </span><span class="si">{</span><span class="n">word</span><span class="si">}</span><span class="s1"> added with id </span><span class="si">{</span><span class="nb">id</span><span class="si">}</span><span class="s1">, while the current vocabulary size '</span>
|
||||
<span class="sa">f</span><span class="s1">'is of </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_size</span><span class="p">()</span><span class="si">}</span><span class="s1">, and id gaps are not allowed'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">vocabulary_</span><span class="p">[</span><span class="n">word</span><span class="p">]</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,244 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.data.reader — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.data.reader</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.data.reader</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy.sparse</span> <span class="kn">import</span> <span class="n">dok_matrix</span>
|
||||
<span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="from_text">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.reader.from_text">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">from_text</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">'utf-8'</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">class2int</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Reads a labelled colletion of documents.</span>
|
||||
<span class="sd"> File fomart <0 or 1>\t<document>\n</span>
|
||||
|
||||
<span class="sd"> :param path: path to the labelled collection</span>
|
||||
<span class="sd"> :param encoding: the text encoding used to open the file</span>
|
||||
<span class="sd"> :param verbose: if >0 (default) shows some progress information in standard output</span>
|
||||
<span class="sd"> :return: a list of sentences, and a list of labels</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">all_sentences</span><span class="p">,</span> <span class="n">all_labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">file</span> <span class="o">=</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rt'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="n">encoding</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">(),</span> <span class="sa">f</span><span class="s1">'loading </span><span class="si">{</span><span class="n">path</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rt'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="n">encoding</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">file</span><span class="p">:</span>
|
||||
<span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="n">line</span><span class="p">:</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="n">label</span><span class="p">,</span> <span class="n">sentence</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">sentence</span> <span class="o">=</span> <span class="n">sentence</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="n">class2int</span><span class="p">:</span>
|
||||
<span class="n">label</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">sentence</span><span class="p">:</span>
|
||||
<span class="n">all_sentences</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
|
||||
<span class="n">all_labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
|
||||
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'format error in </span><span class="si">{</span><span class="n">line</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">all_sentences</span><span class="p">,</span> <span class="n">all_labels</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="from_sparse">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.reader.from_sparse">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">from_sparse</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Reads a labelled collection of real-valued instances expressed in sparse format</span>
|
||||
<span class="sd"> File format <-1 or 0 or 1>[\s col(int):val(float)]\n</span>
|
||||
|
||||
<span class="sd"> :param path: path to the labelled collection</span>
|
||||
<span class="sd"> :return: a `csr_matrix` containing the instances (rows), and a ndarray containing the labels</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">split_col_val</span><span class="p">(</span><span class="n">col_val</span><span class="p">):</span>
|
||||
<span class="n">col</span><span class="p">,</span> <span class="n">val</span> <span class="o">=</span> <span class="n">col_val</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">)</span>
|
||||
<span class="n">col</span><span class="p">,</span> <span class="n">val</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">col</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="nb">float</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">col</span><span class="p">,</span> <span class="n">val</span>
|
||||
|
||||
<span class="n">all_documents</span><span class="p">,</span> <span class="n">all_labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="n">max_col</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rt'</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">(),</span> <span class="sa">f</span><span class="s1">'loading </span><span class="si">{</span><span class="n">path</span><span class="si">}</span><span class="s1">'</span><span class="p">):</span>
|
||||
<span class="n">parts</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="n">parts</span><span class="p">:</span>
|
||||
<span class="n">all_labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
|
||||
<span class="n">cols</span><span class="p">,</span> <span class="n">vals</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="n">split_col_val</span><span class="p">(</span><span class="n">col_val</span><span class="p">)</span> <span class="k">for</span> <span class="n">col_val</span> <span class="ow">in</span> <span class="n">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">:]])</span>
|
||||
<span class="n">cols</span><span class="p">,</span> <span class="n">vals</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">cols</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">vals</span><span class="p">)</span>
|
||||
<span class="n">max_col</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">max_col</span><span class="p">,</span> <span class="n">cols</span><span class="o">.</span><span class="n">max</span><span class="p">())</span>
|
||||
<span class="n">all_documents</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">cols</span><span class="p">,</span> <span class="n">vals</span><span class="p">))</span>
|
||||
<span class="n">n_docs</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">all_labels</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">dok_matrix</span><span class="p">((</span><span class="n">n_docs</span><span class="p">,</span> <span class="n">max_col</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">cols</span><span class="p">,</span> <span class="n">vals</span><span class="p">)</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">all_documents</span><span class="p">),</span> <span class="n">total</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">all_documents</span><span class="p">),</span>
|
||||
<span class="n">desc</span><span class="o">=</span><span class="sa">f</span><span class="s1">'\-- filling matrix of shape </span><span class="si">{</span><span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s1">'</span><span class="p">):</span>
|
||||
<span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">cols</span><span class="p">]</span> <span class="o">=</span> <span class="n">vals</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">tocsr</span><span class="p">()</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">all_labels</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
|
||||
<span class="k">return</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="from_csv">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.reader.from_csv">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">from_csv</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s1">'utf-8'</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Reads a csv file in which columns are separated by ','.</span>
|
||||
<span class="sd"> File format <label>,<feat1>,<feat2>,...,<featn>\n</span>
|
||||
|
||||
<span class="sd"> :param path: path to the csv file</span>
|
||||
<span class="sd"> :param encoding: the text encoding used to open the file</span>
|
||||
<span class="sd"> :return: a np.ndarray for the labels and a ndarray (float) for the covariates</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">instance</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s1">'rt'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="n">encoding</span><span class="p">)</span><span class="o">.</span><span class="n">readlines</span><span class="p">(),</span> <span class="n">desc</span><span class="o">=</span><span class="sa">f</span><span class="s1">'reading </span><span class="si">{</span><span class="n">path</span><span class="si">}</span><span class="s1">'</span><span class="p">):</span>
|
||||
<span class="n">yi</span><span class="p">,</span> <span class="o">*</span><span class="n">xi</span> <span class="o">=</span> <span class="n">instance</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span>
|
||||
<span class="n">X</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">float</span><span class="p">,</span><span class="n">xi</span><span class="p">)))</span>
|
||||
<span class="n">y</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">yi</span><span class="p">)</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="reindex_labels">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.reader.reindex_labels">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">reindex_labels</span><span class="p">(</span><span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Re-indexes a list of labels as a list of indexes, and returns the classnames corresponding to the indexes.</span>
|
||||
<span class="sd"> E.g.:</span>
|
||||
|
||||
<span class="sd"> >>> reindex_labels(['B', 'B', 'A', 'C'])</span>
|
||||
<span class="sd"> >>> (array([1, 1, 0, 2]), array(['A', 'B', 'C'], dtype='<U1'))</span>
|
||||
|
||||
<span class="sd"> :param y: the list or array of original labels</span>
|
||||
<span class="sd"> :return: a ndarray (int) of class indexes, and a ndarray of classnames corresponding to the indexes.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
|
||||
<span class="n">classnames</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">)))</span>
|
||||
<span class="n">label2index</span> <span class="o">=</span> <span class="p">{</span><span class="n">label</span><span class="p">:</span> <span class="n">index</span> <span class="k">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">label</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">classnames</span><span class="p">)}</span>
|
||||
<span class="n">indexed</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">int</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">classnames</span><span class="p">:</span>
|
||||
<span class="n">indexed</span><span class="p">[</span><span class="n">y</span><span class="o">==</span><span class="n">label</span><span class="p">]</span> <span class="o">=</span> <span class="n">label2index</span><span class="p">[</span><span class="n">label</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="n">indexed</span><span class="p">,</span> <span class="n">classnames</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binarize">
|
||||
<a class="viewcode-back" href="../../../quapy.data.html#quapy.data.reader.binarize">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">binarize</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">pos_class</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Binarizes a categorical array-like collection of labels towards the positive class `pos_class`. E.g.,:</span>
|
||||
|
||||
<span class="sd"> >>> binarize([1, 2, 3, 1, 1, 0], pos_class=2)</span>
|
||||
<span class="sd"> >>> array([0, 1, 0, 0, 0, 0])</span>
|
||||
|
||||
<span class="sd"> :param y: array-like of labels</span>
|
||||
<span class="sd"> :param pos_class: integer, the positive class</span>
|
||||
<span class="sd"> :return: a binary np.ndarray, in which values 1 corresponds to positions in whcih `y` had `pos_class` labels, and</span>
|
||||
<span class="sd"> 0 otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
|
||||
<span class="n">ybin</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">int</span><span class="p">)</span>
|
||||
<span class="n">ybin</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">pos_class</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="k">return</span> <span class="n">ybin</span></div>
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,433 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.error — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/jquery.js"></script>
|
||||
<script src="../../_static/underscore.js"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.error</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.error</h1><div class="highlight"><pre>
|
||||
<span></span><span class="sd">"""Implementation of error measures used for quantification"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">f1_score</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="from_name"><a class="viewcode-back" href="../../quapy.html#quapy.error.from_name">[docs]</a><span class="k">def</span> <span class="nf">from_name</span><span class="p">(</span><span class="n">err_name</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Gets an error function from its name. E.g., `from_name("mae")`</span>
|
||||
<span class="sd"> will return function :meth:`quapy.error.mae`</span>
|
||||
|
||||
<span class="sd"> :param err_name: string, the error name</span>
|
||||
<span class="sd"> :return: a callable implementing the requested error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">err_name</span> <span class="ow">in</span> <span class="n">ERROR_NAMES</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'unknown error </span><span class="si">{</span><span class="n">err_name</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="n">callable_error</span> <span class="o">=</span> <span class="nb">globals</span><span class="p">()[</span><span class="n">err_name</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="n">callable_error</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="f1e"><a class="viewcode-back" href="../../quapy.html#quapy.error.f1e">[docs]</a><span class="k">def</span> <span class="nf">f1e</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""F1 error: simply computes the error in terms of macro :math:`F_1`, i.e.,</span>
|
||||
<span class="sd"> :math:`1-F_1^M`, where :math:`F_1` is the harmonic mean of precision and recall,</span>
|
||||
<span class="sd"> defined as :math:`\\frac{2tp}{2tp+fp+fn}`, with `tp`, `fp`, and `fn` standing</span>
|
||||
<span class="sd"> for true positives, false positives, and false negatives, respectively.</span>
|
||||
<span class="sd"> `Macro` averaging means the :math:`F_1` is computed for each category independently,</span>
|
||||
<span class="sd"> and then averaged.</span>
|
||||
|
||||
<span class="sd"> :param y_true: array-like of true labels</span>
|
||||
<span class="sd"> :param y_pred: array-like of predicted labels</span>
|
||||
<span class="sd"> :return: :math:`1-F_1^M`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="mf">1.</span> <span class="o">-</span> <span class="n">f1_score</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">average</span><span class="o">=</span><span class="s1">'macro'</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="acce"><a class="viewcode-back" href="../../quapy.html#quapy.error.acce">[docs]</a><span class="k">def</span> <span class="nf">acce</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the error in terms of 1-accuracy. The accuracy is computed as</span>
|
||||
<span class="sd"> :math:`\\frac{tp+tn}{tp+fp+fn+tn}`, with `tp`, `fp`, `fn`, and `tn` standing</span>
|
||||
<span class="sd"> for true positives, false positives, false negatives, and true negatives,</span>
|
||||
<span class="sd"> respectively</span>
|
||||
|
||||
<span class="sd"> :param y_true: array-like of true labels</span>
|
||||
<span class="sd"> :param y_pred: array-like of predicted labels</span>
|
||||
<span class="sd"> :return: 1-accuracy</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="mf">1.</span> <span class="o">-</span> <span class="p">(</span><span class="n">y_true</span> <span class="o">==</span> <span class="n">y_pred</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mae"><a class="viewcode-back" href="../../quapy.html#quapy.error.mae">[docs]</a><span class="k">def</span> <span class="nf">mae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean absolute error (see :meth:`quapy.error.ae`) across the sample pairs.</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the predicted</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :return: mean absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">ae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ae"><a class="viewcode-back" href="../../quapy.html#quapy.error.ae">[docs]</a><span class="k">def</span> <span class="nf">ae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the absolute error between the two prevalence vectors.</span>
|
||||
<span class="sd"> Absolute error between two prevalence vectors :math:`p` and :math:`\\hat{p}` is computed as</span>
|
||||
<span class="sd"> :math:`AE(p,\\hat{p})=\\frac{1}{|\\mathcal{Y}|}\\sum_{y\\in \\mathcal{Y}}|\\hat{p}(y)-p(y)|`,</span>
|
||||
<span class="sd"> where :math:`\\mathcal{Y}` are the classes of interest.</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :return: absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">prevs</span><span class="o">.</span><span class="n">shape</span> <span class="o">==</span> <span class="n">prevs_hat</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'wrong shape </span><span class="si">{</span><span class="n">prevs</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s1"> vs. </span><span class="si">{</span><span class="n">prevs_hat</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">return</span> <span class="nb">abs</span><span class="p">(</span><span class="n">prevs_hat</span> <span class="o">-</span> <span class="n">prevs</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="nae"><a class="viewcode-back" href="../../quapy.html#quapy.error.nae">[docs]</a><span class="k">def</span> <span class="nf">nae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the normalized absolute error between the two prevalence vectors.</span>
|
||||
<span class="sd"> Normalized absolute error between two prevalence vectors :math:`p` and :math:`\\hat{p}` is computed as</span>
|
||||
<span class="sd"> :math:`NAE(p,\\hat{p})=\\frac{AE(p,\\hat{p})}{z_{AE}}`,</span>
|
||||
<span class="sd"> where :math:`z_{AE}=\\frac{2(1-\\min_{y\\in \\mathcal{Y}} p(y))}{|\\mathcal{Y}|}`, and :math:`\\mathcal{Y}`</span>
|
||||
<span class="sd"> are the classes of interest.</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :return: normalized absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">prevs</span><span class="o">.</span><span class="n">shape</span> <span class="o">==</span> <span class="n">prevs_hat</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'wrong shape </span><span class="si">{</span><span class="n">prevs</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s1"> vs. </span><span class="si">{</span><span class="n">prevs_hat</span><span class="o">.</span><span class="n">shape</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">return</span> <span class="nb">abs</span><span class="p">(</span><span class="n">prevs_hat</span> <span class="o">-</span> <span class="n">prevs</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">prevs</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mnae"><a class="viewcode-back" href="../../quapy.html#quapy.error.mnae">[docs]</a><span class="k">def</span> <span class="nf">mnae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean normalized absolute error (see :meth:`quapy.error.nae`) across the sample pairs.</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the predicted</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :return: mean normalized absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">nae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mse"><a class="viewcode-back" href="../../quapy.html#quapy.error.mse">[docs]</a><span class="k">def</span> <span class="nf">mse</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean squared error (see :meth:`quapy.error.se`) across the sample pairs.</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the</span>
|
||||
<span class="sd"> true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the</span>
|
||||
<span class="sd"> predicted prevalence values</span>
|
||||
<span class="sd"> :return: mean squared error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">se</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="se"><a class="viewcode-back" href="../../quapy.html#quapy.error.se">[docs]</a><span class="k">def</span> <span class="nf">se</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the squared error between the two prevalence vectors.</span>
|
||||
<span class="sd"> Squared error between two prevalence vectors :math:`p` and :math:`\\hat{p}` is computed as</span>
|
||||
<span class="sd"> :math:`SE(p,\\hat{p})=\\frac{1}{|\\mathcal{Y}|}\\sum_{y\\in \\mathcal{Y}}(\\hat{p}(y)-p(y))^2`,</span>
|
||||
<span class="sd"> where</span>
|
||||
<span class="sd"> :math:`\\mathcal{Y}` are the classes of interest.</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :return: absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="p">((</span><span class="n">prevs_hat</span> <span class="o">-</span> <span class="n">prevs</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mkld"><a class="viewcode-back" href="../../quapy.html#quapy.error.mkld">[docs]</a><span class="k">def</span> <span class="nf">mkld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean Kullback-Leibler divergence (see :meth:`quapy.error.kld`) across the</span>
|
||||
<span class="sd"> sample pairs. The distributions are smoothed using the `eps` factor</span>
|
||||
<span class="sd"> (see :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the true</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the predicted</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. KLD is not defined in cases in which the distributions contain</span>
|
||||
<span class="sd"> zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`, with :math:`T` the sample size.</span>
|
||||
<span class="sd"> If `eps=None`, the sample size will be taken from the environment variable `SAMPLE_SIZE`</span>
|
||||
<span class="sd"> (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: mean Kullback-Leibler distribution</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">kld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="kld"><a class="viewcode-back" href="../../quapy.html#quapy.error.kld">[docs]</a><span class="k">def</span> <span class="nf">kld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the Kullback-Leibler divergence between the two prevalence distributions.</span>
|
||||
<span class="sd"> Kullback-Leibler divergence between two prevalence distributions :math:`p` and :math:`\\hat{p}`</span>
|
||||
<span class="sd"> is computed as</span>
|
||||
<span class="sd"> :math:`KLD(p,\\hat{p})=D_{KL}(p||\\hat{p})=</span>
|
||||
<span class="sd"> \\sum_{y\\in \\mathcal{Y}} p(y)\\log\\frac{p(y)}{\\hat{p}(y)}`,</span>
|
||||
<span class="sd"> where :math:`\\mathcal{Y}` are the classes of interest.</span>
|
||||
<span class="sd"> The distributions are smoothed using the `eps` factor (see :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. KLD is not defined in cases in which the distributions contain</span>
|
||||
<span class="sd"> zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`, with :math:`T` the sample size.</span>
|
||||
<span class="sd"> If `eps=None`, the sample size will be taken from the environment variable `SAMPLE_SIZE`</span>
|
||||
<span class="sd"> (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: Kullback-Leibler divergence between the two distributions</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">eps</span> <span class="o">=</span> <span class="n">__check_eps</span><span class="p">(</span><span class="n">eps</span><span class="p">)</span>
|
||||
<span class="n">smooth_prevs</span> <span class="o">=</span> <span class="n">prevs</span> <span class="o">+</span> <span class="n">eps</span>
|
||||
<span class="n">smooth_prevs_hat</span> <span class="o">=</span> <span class="n">prevs_hat</span> <span class="o">+</span> <span class="n">eps</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">smooth_prevs</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">smooth_prevs</span><span class="o">/</span><span class="n">smooth_prevs_hat</span><span class="p">))</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mnkld"><a class="viewcode-back" href="../../quapy.html#quapy.error.mnkld">[docs]</a><span class="k">def</span> <span class="nf">mnkld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean Normalized Kullback-Leibler divergence (see :meth:`quapy.error.nkld`)</span>
|
||||
<span class="sd"> across the sample pairs. The distributions are smoothed using the `eps` factor</span>
|
||||
<span class="sd"> (see :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the predicted</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. NKLD is not defined in cases in which the distributions contain</span>
|
||||
<span class="sd"> zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`, with :math:`T` the sample size.</span>
|
||||
<span class="sd"> If `eps=None`, the sample size will be taken from the environment variable `SAMPLE_SIZE`</span>
|
||||
<span class="sd"> (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: mean Normalized Kullback-Leibler distribution</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">nkld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="nkld"><a class="viewcode-back" href="../../quapy.html#quapy.error.nkld">[docs]</a><span class="k">def</span> <span class="nf">nkld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the Normalized Kullback-Leibler divergence between the two prevalence distributions.</span>
|
||||
<span class="sd"> Normalized Kullback-Leibler divergence between two prevalence distributions :math:`p` and</span>
|
||||
<span class="sd"> :math:`\\hat{p}` is computed as</span>
|
||||
<span class="sd"> math:`NKLD(p,\\hat{p}) = 2\\frac{e^{KLD(p,\\hat{p})}}{e^{KLD(p,\\hat{p})}+1}-1`,</span>
|
||||
<span class="sd"> where</span>
|
||||
<span class="sd"> :math:`\\mathcal{Y}` are the classes of interest.</span>
|
||||
<span class="sd"> The distributions are smoothed using the `eps` factor (see :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. NKLD is not defined in cases in which the distributions</span>
|
||||
<span class="sd"> contain zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`, with :math:`T` the sample</span>
|
||||
<span class="sd"> size. If `eps=None`, the sample size will be taken from the environment variable</span>
|
||||
<span class="sd"> `SAMPLE_SIZE` (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: Normalized Kullback-Leibler divergence between the two distributions</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">ekld</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="n">kld</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">))</span>
|
||||
<span class="k">return</span> <span class="mf">2.</span> <span class="o">*</span> <span class="n">ekld</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">ekld</span><span class="p">)</span> <span class="o">-</span> <span class="mf">1.</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mrae"><a class="viewcode-back" href="../../quapy.html#quapy.error.mrae">[docs]</a><span class="k">def</span> <span class="nf">mrae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean relative absolute error (see :meth:`quapy.error.rae`) across</span>
|
||||
<span class="sd"> the sample pairs. The distributions are smoothed using the `eps` factor (see</span>
|
||||
<span class="sd"> :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the true</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the predicted</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. `mrae` is not defined in cases in which the true</span>
|
||||
<span class="sd"> distribution contains zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`,</span>
|
||||
<span class="sd"> with :math:`T` the sample size. If `eps=None`, the sample size will be taken from</span>
|
||||
<span class="sd"> the environment variable `SAMPLE_SIZE` (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: mean relative absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">rae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="rae"><a class="viewcode-back" href="../../quapy.html#quapy.error.rae">[docs]</a><span class="k">def</span> <span class="nf">rae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the absolute relative error between the two prevalence vectors.</span>
|
||||
<span class="sd"> Relative absolute error between two prevalence vectors :math:`p` and :math:`\\hat{p}`</span>
|
||||
<span class="sd"> is computed as</span>
|
||||
<span class="sd"> :math:`RAE(p,\\hat{p})=</span>
|
||||
<span class="sd"> \\frac{1}{|\\mathcal{Y}|}\\sum_{y\\in \\mathcal{Y}}\\frac{|\\hat{p}(y)-p(y)|}{p(y)}`,</span>
|
||||
<span class="sd"> where :math:`\\mathcal{Y}` are the classes of interest.</span>
|
||||
<span class="sd"> The distributions are smoothed using the `eps` factor (see :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. `rae` is not defined in cases in which the true distribution</span>
|
||||
<span class="sd"> contains zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`, with :math:`T` the</span>
|
||||
<span class="sd"> sample size. If `eps=None`, the sample size will be taken from the environment variable</span>
|
||||
<span class="sd"> `SAMPLE_SIZE` (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: relative absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">eps</span> <span class="o">=</span> <span class="n">__check_eps</span><span class="p">(</span><span class="n">eps</span><span class="p">)</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">smooth</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span>
|
||||
<span class="n">prevs_hat</span> <span class="o">=</span> <span class="n">smooth</span><span class="p">(</span><span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">prevs</span> <span class="o">-</span> <span class="n">prevs_hat</span><span class="p">)</span> <span class="o">/</span> <span class="n">prevs</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="nrae"><a class="viewcode-back" href="../../quapy.html#quapy.error.nrae">[docs]</a><span class="k">def</span> <span class="nf">nrae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the normalized absolute relative error between the two prevalence vectors.</span>
|
||||
<span class="sd"> Relative absolute error between two prevalence vectors :math:`p` and :math:`\\hat{p}`</span>
|
||||
<span class="sd"> is computed as</span>
|
||||
<span class="sd"> :math:`NRAE(p,\\hat{p})= \\frac{RAE(p,\\hat{p})}{z_{RAE}}`,</span>
|
||||
<span class="sd"> where</span>
|
||||
<span class="sd"> :math:`z_{RAE} = \\frac{|\\mathcal{Y}|-1+\\frac{1-\\min_{y\\in \\mathcal{Y}} p(y)}{\\min_{y\\in \\mathcal{Y}} p(y)}}{|\\mathcal{Y}|}`</span>
|
||||
<span class="sd"> and :math:`\\mathcal{Y}` are the classes of interest.</span>
|
||||
<span class="sd"> The distributions are smoothed using the `eps` factor (see :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_classes,)` with the predicted prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. `nrae` is not defined in cases in which the true distribution</span>
|
||||
<span class="sd"> contains zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`, with :math:`T` the</span>
|
||||
<span class="sd"> sample size. If `eps=None`, the sample size will be taken from the environment variable</span>
|
||||
<span class="sd"> `SAMPLE_SIZE` (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: normalized relative absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">eps</span> <span class="o">=</span> <span class="n">__check_eps</span><span class="p">(</span><span class="n">eps</span><span class="p">)</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">smooth</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span>
|
||||
<span class="n">prevs_hat</span> <span class="o">=</span> <span class="n">smooth</span><span class="p">(</span><span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span>
|
||||
<span class="n">min_p</span> <span class="o">=</span> <span class="n">prevs</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">prevs</span> <span class="o">-</span> <span class="n">prevs_hat</span><span class="p">)</span> <span class="o">/</span> <span class="n">prevs</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">prevs</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">1</span><span class="o">+</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">min_p</span><span class="p">)</span><span class="o">/</span><span class="n">min_p</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mnrae"><a class="viewcode-back" href="../../quapy.html#quapy.error.mnrae">[docs]</a><span class="k">def</span> <span class="nf">mnrae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Computes the mean normalized relative absolute error (see :meth:`quapy.error.nrae`) across</span>
|
||||
<span class="sd"> the sample pairs. The distributions are smoothed using the `eps` factor (see</span>
|
||||
<span class="sd"> :meth:`quapy.error.smooth`).</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_samples, n_classes,)` with the true</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param prevs_hat: array-like of shape `(n_samples, n_classes,)` with the predicted</span>
|
||||
<span class="sd"> prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor. `mnrae` is not defined in cases in which the true</span>
|
||||
<span class="sd"> distribution contains zeros; `eps` is typically set to be :math:`\\frac{1}{2T}`,</span>
|
||||
<span class="sd"> with :math:`T` the sample size. If `eps=None`, the sample size will be taken from</span>
|
||||
<span class="sd"> the environment variable `SAMPLE_SIZE` (which has thus to be set beforehand).</span>
|
||||
<span class="sd"> :return: mean normalized relative absolute error</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">nrae</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="smooth"><a class="viewcode-back" href="../../quapy.html#quapy.error.smooth">[docs]</a><span class="k">def</span> <span class="nf">smooth</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">eps</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">""" Smooths a prevalence distribution with :math:`\\epsilon` (`eps`) as:</span>
|
||||
<span class="sd"> :math:`\\underline{p}(y)=\\frac{\\epsilon+p(y)}{\\epsilon|\\mathcal{Y}|+</span>
|
||||
<span class="sd"> \\displaystyle\\sum_{y\\in \\mathcal{Y}}p(y)}`</span>
|
||||
|
||||
<span class="sd"> :param prevs: array-like of shape `(n_classes,)` with the true prevalence values</span>
|
||||
<span class="sd"> :param eps: smoothing factor</span>
|
||||
<span class="sd"> :return: array-like of shape `(n_classes,)` with the smoothed distribution</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">n_classes</span> <span class="o">=</span> <span class="n">prevs</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">prevs</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">eps</span> <span class="o">*</span> <span class="n">n_classes</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">__check_eps</span><span class="p">(</span><span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">eps</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">sample_size</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'eps was not defined, and qp.environ["SAMPLE_SIZE"] was not set'</span><span class="p">)</span>
|
||||
<span class="n">eps</span> <span class="o">=</span> <span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="mf">2.</span> <span class="o">*</span> <span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">eps</span>
|
||||
|
||||
|
||||
<span class="n">CLASSIFICATION_ERROR</span> <span class="o">=</span> <span class="p">{</span><span class="n">f1e</span><span class="p">,</span> <span class="n">acce</span><span class="p">}</span>
|
||||
<span class="n">QUANTIFICATION_ERROR</span> <span class="o">=</span> <span class="p">{</span><span class="n">mae</span><span class="p">,</span> <span class="n">mnae</span><span class="p">,</span> <span class="n">mrae</span><span class="p">,</span> <span class="n">mnrae</span><span class="p">,</span> <span class="n">mse</span><span class="p">,</span> <span class="n">mkld</span><span class="p">,</span> <span class="n">mnkld</span><span class="p">}</span>
|
||||
<span class="n">QUANTIFICATION_ERROR_SINGLE</span> <span class="o">=</span> <span class="p">{</span><span class="n">ae</span><span class="p">,</span> <span class="n">nae</span><span class="p">,</span> <span class="n">rae</span><span class="p">,</span> <span class="n">nrae</span><span class="p">,</span> <span class="n">se</span><span class="p">,</span> <span class="n">kld</span><span class="p">,</span> <span class="n">nkld</span><span class="p">}</span>
|
||||
<span class="n">QUANTIFICATION_ERROR_SMOOTH</span> <span class="o">=</span> <span class="p">{</span><span class="n">kld</span><span class="p">,</span> <span class="n">nkld</span><span class="p">,</span> <span class="n">rae</span><span class="p">,</span> <span class="n">nrae</span><span class="p">,</span> <span class="n">mkld</span><span class="p">,</span> <span class="n">mnkld</span><span class="p">,</span> <span class="n">mrae</span><span class="p">}</span>
|
||||
<span class="n">CLASSIFICATION_ERROR_NAMES</span> <span class="o">=</span> <span class="p">{</span><span class="n">func</span><span class="o">.</span><span class="vm">__name__</span> <span class="k">for</span> <span class="n">func</span> <span class="ow">in</span> <span class="n">CLASSIFICATION_ERROR</span><span class="p">}</span>
|
||||
<span class="n">QUANTIFICATION_ERROR_NAMES</span> <span class="o">=</span> <span class="p">{</span><span class="n">func</span><span class="o">.</span><span class="vm">__name__</span> <span class="k">for</span> <span class="n">func</span> <span class="ow">in</span> <span class="n">QUANTIFICATION_ERROR</span><span class="p">}</span>
|
||||
<span class="n">QUANTIFICATION_ERROR_SINGLE_NAMES</span> <span class="o">=</span> <span class="p">{</span><span class="n">func</span><span class="o">.</span><span class="vm">__name__</span> <span class="k">for</span> <span class="n">func</span> <span class="ow">in</span> <span class="n">QUANTIFICATION_ERROR_SINGLE</span><span class="p">}</span>
|
||||
<span class="n">QUANTIFICATION_ERROR_SMOOTH_NAMES</span> <span class="o">=</span> <span class="p">{</span><span class="n">func</span><span class="o">.</span><span class="vm">__name__</span> <span class="k">for</span> <span class="n">func</span> <span class="ow">in</span> <span class="n">QUANTIFICATION_ERROR_SMOOTH</span><span class="p">}</span>
|
||||
<span class="n">ERROR_NAMES</span> <span class="o">=</span> \
|
||||
<span class="n">CLASSIFICATION_ERROR_NAMES</span> <span class="o">|</span> <span class="n">QUANTIFICATION_ERROR_NAMES</span> <span class="o">|</span> <span class="n">QUANTIFICATION_ERROR_SINGLE_NAMES</span>
|
||||
|
||||
<span class="n">f1_error</span> <span class="o">=</span> <span class="n">f1e</span>
|
||||
<span class="n">acc_error</span> <span class="o">=</span> <span class="n">acce</span>
|
||||
<span class="n">mean_absolute_error</span> <span class="o">=</span> <span class="n">mae</span>
|
||||
<span class="n">absolute_error</span> <span class="o">=</span> <span class="n">ae</span>
|
||||
<span class="n">mean_relative_absolute_error</span> <span class="o">=</span> <span class="n">mrae</span>
|
||||
<span class="n">relative_absolute_error</span> <span class="o">=</span> <span class="n">rae</span>
|
||||
<span class="n">normalized_absolute_error</span> <span class="o">=</span> <span class="n">nae</span>
|
||||
<span class="n">normalized_relative_absolute_error</span> <span class="o">=</span> <span class="n">nrae</span>
|
||||
<span class="n">mean_normalized_absolute_error</span> <span class="o">=</span> <span class="n">mnae</span>
|
||||
<span class="n">mean_normalized_relative_absolute_error</span> <span class="o">=</span> <span class="n">mnrae</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,291 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.evaluation — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/jquery.js"></script>
|
||||
<script src="../../_static/underscore.js"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.evaluation</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.evaluation</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span><span class="p">,</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Iterable</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">AbstractProtocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">,</span> <span class="n">IterateProtocol</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.base</span> <span class="kn">import</span> <span class="n">BaseQuantifier</span>
|
||||
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="prediction"><a class="viewcode-back" href="../../quapy.html#quapy.evaluation.prediction">[docs]</a><span class="k">def</span> <span class="nf">prediction</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span>
|
||||
<span class="n">protocol</span><span class="p">:</span> <span class="n">AbstractProtocol</span><span class="p">,</span>
|
||||
<span class="n">aggr_speedup</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">bool</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'auto'</span><span class="p">,</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Uses a quantification model to generate predictions for the samples generated via a specific protocol.</span>
|
||||
<span class="sd"> This function is central to all evaluation processes, and is endowed with an optimization to speed-up the</span>
|
||||
<span class="sd"> prediction of protocols that generate samples from a large collection. The optimization applies to aggregative</span>
|
||||
<span class="sd"> quantifiers only, and to OnLabelledCollectionProtocol protocols, and comes down to generating the classification</span>
|
||||
<span class="sd"> predictions once and for all, and then generating samples over the classification predictions (instead of over</span>
|
||||
<span class="sd"> the raw instances), so that the classifier prediction is never called again. This behaviour is obtained by</span>
|
||||
<span class="sd"> setting `aggr_speedup` to 'auto' or True, and is only carried out if the overall process is convenient in terms</span>
|
||||
<span class="sd"> of computations (e.g., if the number of classification predictions needed for the original collection exceed the</span>
|
||||
<span class="sd"> number of classification predictions needed for all samples, then the optimization is not undertaken).</span>
|
||||
|
||||
<span class="sd"> :param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`</span>
|
||||
<span class="sd"> :param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of</span>
|
||||
<span class="sd"> :class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the protocol</span>
|
||||
<span class="sd"> in charge of generating the samples for which the model has to issue class prevalence predictions.</span>
|
||||
<span class="sd"> :param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of</span>
|
||||
<span class="sd"> instances in the original collection on which the protocol acts is larger than the number of instances</span>
|
||||
<span class="sd"> in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is</span>
|
||||
<span class="sd"> convenient or not. Set to False to deactivate.</span>
|
||||
<span class="sd"> :param verbose: boolean, show or not information in stdout</span>
|
||||
<span class="sd"> :return: a tuple `(true_prevs, estim_prevs)` in which each element in the tuple is an array of shape</span>
|
||||
<span class="sd"> `(n_samples, n_classes)` containing the true, or predicted, prevalence values for each sample</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">aggr_speedup</span> <span class="ow">in</span> <span class="p">[</span><span class="kc">False</span><span class="p">,</span> <span class="kc">True</span><span class="p">,</span> <span class="s1">'auto'</span><span class="p">,</span> <span class="s1">'force'</span><span class="p">],</span> <span class="s1">'invalid value for aggr_speedup'</span>
|
||||
|
||||
<span class="n">sout</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">if</span> <span class="n">verbose</span> <span class="k">else</span> <span class="kc">None</span>
|
||||
|
||||
<span class="n">apply_optimization</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">aggr_speedup</span> <span class="ow">in</span> <span class="p">[</span><span class="kc">True</span><span class="p">,</span> <span class="s1">'auto'</span><span class="p">,</span> <span class="s1">'force'</span><span class="p">]:</span>
|
||||
<span class="c1"># checks whether the prediction can be made more efficiently; this check consists in verifying if the model is</span>
|
||||
<span class="c1"># of type aggregative, if the protocol is based on LabelledCollection, and if the total number of documents to</span>
|
||||
<span class="c1"># classify using the protocol would exceed the number of test documents in the original collection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">AggregativeQuantifier</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">AggregativeQuantifier</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">protocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">aggr_speedup</span> <span class="o">==</span> <span class="s1">'force'</span><span class="p">:</span>
|
||||
<span class="n">apply_optimization</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
<span class="n">sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'forcing aggregative speedup'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">protocol</span><span class="p">,</span> <span class="s1">'sample_size'</span><span class="p">):</span>
|
||||
<span class="n">nD</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">protocol</span><span class="o">.</span><span class="n">get_labelled_collection</span><span class="p">())</span>
|
||||
<span class="n">samplesD</span> <span class="o">=</span> <span class="n">protocol</span><span class="o">.</span><span class="n">total</span><span class="p">()</span> <span class="o">*</span> <span class="n">protocol</span><span class="o">.</span><span class="n">sample_size</span>
|
||||
<span class="k">if</span> <span class="n">nD</span> <span class="o"><</span> <span class="n">samplesD</span><span class="p">:</span>
|
||||
<span class="n">apply_optimization</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
<span class="n">sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'speeding up the prediction for the aggregative quantifier, '</span>
|
||||
<span class="sa">f</span><span class="s1">'total classifications </span><span class="si">{</span><span class="n">nD</span><span class="si">}</span><span class="s1"> instead of </span><span class="si">{</span><span class="n">samplesD</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">apply_optimization</span><span class="p">:</span>
|
||||
<span class="n">pre_classified</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">classify</span><span class="p">(</span><span class="n">protocol</span><span class="o">.</span><span class="n">get_labelled_collection</span><span class="p">()</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">protocol_with_predictions</span> <span class="o">=</span> <span class="n">protocol</span><span class="o">.</span><span class="n">on_preclassified_instances</span><span class="p">(</span><span class="n">pre_classified</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">__prediction_helper</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">aggregate</span><span class="p">,</span> <span class="n">protocol_with_predictions</span><span class="p">,</span> <span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">__prediction_helper</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">verbose</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">__prediction_helper</span><span class="p">(</span><span class="n">quantification_fn</span><span class="p">,</span> <span class="n">protocol</span><span class="p">:</span> <span class="n">AbstractProtocol</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">sample_instances</span><span class="p">,</span> <span class="n">sample_prev</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">protocol</span><span class="p">(),</span> <span class="n">total</span><span class="o">=</span><span class="n">protocol</span><span class="o">.</span><span class="n">total</span><span class="p">(),</span> <span class="n">desc</span><span class="o">=</span><span class="s1">'predicting'</span><span class="p">)</span> <span class="k">if</span> <span class="n">verbose</span> <span class="k">else</span> <span class="n">protocol</span><span class="p">():</span>
|
||||
<span class="n">estim_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">quantification_fn</span><span class="p">(</span><span class="n">sample_instances</span><span class="p">))</span>
|
||||
<span class="n">true_prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sample_prev</span><span class="p">)</span>
|
||||
|
||||
<span class="n">true_prevs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">)</span>
|
||||
<span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">estim_prevs</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="evaluation_report"><a class="viewcode-back" href="../../quapy.html#quapy.evaluation.evaluation_report">[docs]</a><span class="k">def</span> <span class="nf">evaluation_report</span><span class="p">(</span><span class="n">model</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span>
|
||||
<span class="n">protocol</span><span class="p">:</span> <span class="n">AbstractProtocol</span><span class="p">,</span>
|
||||
<span class="n">error_metrics</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span><span class="n">Callable</span><span class="p">]]</span> <span class="o">=</span> <span class="s1">'mae'</span><span class="p">,</span>
|
||||
<span class="n">aggr_speedup</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">bool</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'auto'</span><span class="p">,</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates a report (a pandas' DataFrame) containing information of the evaluation of the model as according</span>
|
||||
<span class="sd"> to a specific protocol and in terms of one or more evaluation metrics (errors).</span>
|
||||
|
||||
|
||||
<span class="sd"> :param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`</span>
|
||||
<span class="sd"> :param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of</span>
|
||||
<span class="sd"> :class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the protocol</span>
|
||||
<span class="sd"> in charge of generating the samples in which the model is evaluated.</span>
|
||||
<span class="sd"> :param error_metrics: a string, or list of strings, representing the name(s) of an error function in `qp.error`</span>
|
||||
<span class="sd"> (e.g., 'mae', the default value), or a callable function, or a list of callable functions, implementing</span>
|
||||
<span class="sd"> the error function itself.</span>
|
||||
<span class="sd"> :param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of</span>
|
||||
<span class="sd"> instances in the original collection on which the protocol acts is larger than the number of instances</span>
|
||||
<span class="sd"> in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is</span>
|
||||
<span class="sd"> convenient or not. Set to False to deactivate.</span>
|
||||
<span class="sd"> :param verbose: boolean, show or not information in stdout</span>
|
||||
<span class="sd"> :return: a pandas' DataFrame containing the columns 'true-prev' (the true prevalence of each sample),</span>
|
||||
<span class="sd"> 'estim-prev' (the prevalence estimated by the model for each sample), and as many columns as error metrics</span>
|
||||
<span class="sd"> have been indicated, each displaying the score in terms of that metric for every sample.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">aggr_speedup</span><span class="o">=</span><span class="n">aggr_speedup</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_prevalence_report</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">error_metrics</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_prevalence_report</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">error_metrics</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Callable</span><span class="p">]]</span> <span class="o">=</span> <span class="s1">'mae'</span><span class="p">):</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">error_metrics</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="n">error_metrics</span> <span class="o">=</span> <span class="p">[</span><span class="n">error_metrics</span><span class="p">]</span>
|
||||
|
||||
<span class="n">error_funcs</span> <span class="o">=</span> <span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span> <span class="k">else</span> <span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">error_metrics</span><span class="p">]</span>
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">(</span><span class="nb">hasattr</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="s1">'__call__'</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">error_funcs</span><span class="p">),</span> <span class="s1">'invalid error functions'</span>
|
||||
<span class="n">error_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span><span class="o">.</span><span class="vm">__name__</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">error_funcs</span><span class="p">]</span>
|
||||
|
||||
<span class="n">row_entries</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">):</span>
|
||||
<span class="n">series</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'true-prev'</span><span class="p">:</span> <span class="n">true_prev</span><span class="p">,</span> <span class="s1">'estim-prev'</span><span class="p">:</span> <span class="n">estim_prev</span><span class="p">}</span>
|
||||
<span class="k">for</span> <span class="n">error_name</span><span class="p">,</span> <span class="n">error_metric</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">error_names</span><span class="p">,</span> <span class="n">error_funcs</span><span class="p">):</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">error_metric</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||||
<span class="n">series</span><span class="p">[</span><span class="n">error_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">score</span>
|
||||
<span class="n">row_entries</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">series</span><span class="p">)</span>
|
||||
|
||||
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_records</span><span class="p">(</span><span class="n">row_entries</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">df</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="evaluate"><a class="viewcode-back" href="../../quapy.html#quapy.evaluation.evaluate">[docs]</a><span class="k">def</span> <span class="nf">evaluate</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span>
|
||||
<span class="n">protocol</span><span class="p">:</span> <span class="n">AbstractProtocol</span><span class="p">,</span>
|
||||
<span class="n">error_metric</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Callable</span><span class="p">],</span>
|
||||
<span class="n">aggr_speedup</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">bool</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'auto'</span><span class="p">,</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Evaluates a quantification model according to a specific sample generation protocol and in terms of one</span>
|
||||
<span class="sd"> evaluation metric (error).</span>
|
||||
|
||||
<span class="sd"> :param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`</span>
|
||||
<span class="sd"> :param protocol: :class:`quapy.protocol.AbstractProtocol`; if this object is also instance of</span>
|
||||
<span class="sd"> :class:`quapy.protocol.OnLabelledCollectionProtocol`, then the aggregation speed-up can be run. This is the</span>
|
||||
<span class="sd"> protocol in charge of generating the samples in which the model is evaluated.</span>
|
||||
<span class="sd"> :param error_metric: a string representing the name(s) of an error function in `qp.error`</span>
|
||||
<span class="sd"> (e.g., 'mae'), or a callable function implementing the error function itself.</span>
|
||||
<span class="sd"> :param aggr_speedup: whether or not to apply the speed-up. Set to "force" for applying it even if the number of</span>
|
||||
<span class="sd"> instances in the original collection on which the protocol acts is larger than the number of instances</span>
|
||||
<span class="sd"> in the samples to be generated. Set to True or "auto" (default) for letting QuaPy decide whether it is</span>
|
||||
<span class="sd"> convenient or not. Set to False to deactivate.</span>
|
||||
<span class="sd"> :param verbose: boolean, show or not information in stdout</span>
|
||||
<span class="sd"> :return: if the error metric is not averaged (e.g., 'ae', 'rae'), returns an array of shape `(n_samples,)` with</span>
|
||||
<span class="sd"> the error scores for each sample; if the error metric is averaged (e.g., 'mae', 'mrae') then returns</span>
|
||||
<span class="sd"> a single float</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">error_metric</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="n">error_metric</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">error_metric</span><span class="p">)</span>
|
||||
<span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">prediction</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">aggr_speedup</span><span class="o">=</span><span class="n">aggr_speedup</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">error_metric</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="evaluate_on_samples"><a class="viewcode-back" href="../../quapy.html#quapy.evaluation.evaluate_on_samples">[docs]</a><span class="k">def</span> <span class="nf">evaluate_on_samples</span><span class="p">(</span>
|
||||
<span class="n">model</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span>
|
||||
<span class="n">samples</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">],</span>
|
||||
<span class="n">error_metric</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Callable</span><span class="p">],</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Evaluates a quantification model on a given set of samples and in terms of one evaluation metric (error).</span>
|
||||
|
||||
<span class="sd"> :param model: a quantifier, instance of :class:`quapy.method.base.BaseQuantifier`</span>
|
||||
<span class="sd"> :param samples: a list of samples on which the quantifier is to be evaluated</span>
|
||||
<span class="sd"> :param error_metric: a string representing the name(s) of an error function in `qp.error`</span>
|
||||
<span class="sd"> (e.g., 'mae'), or a callable function implementing the error function itself.</span>
|
||||
<span class="sd"> :param verbose: boolean, show or not information in stdout</span>
|
||||
<span class="sd"> :return: if the error metric is not averaged (e.g., 'ae', 'rae'), returns an array of shape `(n_samples,)` with</span>
|
||||
<span class="sd"> the error scores for each sample; if the error metric is averaged (e.g., 'mae', 'mrae') then returns</span>
|
||||
<span class="sd"> a single float</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">IterateProtocol</span><span class="p">(</span><span class="n">samples</span><span class="p">),</span> <span class="n">error_metric</span><span class="p">,</span> <span class="n">aggr_speedup</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,468 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.functional — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/jquery.js"></script>
|
||||
<script src="../../_static/underscore.js"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.functional</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.functional</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">itertools</span>
|
||||
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
|
||||
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span><span class="p">,</span> <span class="n">Callable</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">scipy</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="prevalence_linspace"><a class="viewcode-back" href="../../quapy.html#quapy.functional.prevalence_linspace">[docs]</a><span class="k">def</span> <span class="nf">prevalence_linspace</span><span class="p">(</span><span class="n">n_prevalences</span><span class="o">=</span><span class="mi">21</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">smooth_limits_epsilon</span><span class="o">=</span><span class="mf">0.01</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Produces an array of uniformly separated values of prevalence.</span>
|
||||
<span class="sd"> By default, produces an array of 21 prevalence values, with</span>
|
||||
<span class="sd"> step 0.05 and with the limits smoothed, i.e.:</span>
|
||||
<span class="sd"> [0.01, 0.05, 0.10, 0.15, ..., 0.90, 0.95, 0.99]</span>
|
||||
|
||||
<span class="sd"> :param n_prevalences: the number of prevalence values to sample from the [0,1] interval (default 21)</span>
|
||||
<span class="sd"> :param repeats: number of times each prevalence is to be repeated (defaults to 1)</span>
|
||||
<span class="sd"> :param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1</span>
|
||||
<span class="sd"> :return: an array of uniformly separated prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="n">n_prevalences</span><span class="p">,</span> <span class="n">endpoint</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+=</span> <span class="n">smooth_limits_epsilon</span>
|
||||
<span class="n">p</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">-=</span> <span class="n">smooth_limits_epsilon</span>
|
||||
<span class="k">if</span> <span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">></span> <span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'the smoothing in the limits is greater than the prevalence step'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">repeats</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">repeat</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">repeats</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">p</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="prevalence_from_labels"><a class="viewcode-back" href="../../quapy.html#quapy.functional.prevalence_from_labels">[docs]</a><span class="k">def</span> <span class="nf">prevalence_from_labels</span><span class="p">(</span><span class="n">labels</span><span class="p">,</span> <span class="n">classes</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Computed the prevalence values from a vector of labels.</span>
|
||||
|
||||
<span class="sd"> :param labels: array-like of shape `(n_instances)` with the label for each instance</span>
|
||||
<span class="sd"> :param classes: the class labels. This is needed in order to correctly compute the prevalence vector even when</span>
|
||||
<span class="sd"> some classes have no examples.</span>
|
||||
<span class="sd"> :return: an ndarray of shape `(len(classes))` with the class prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">labels</span><span class="o">.</span><span class="n">ndim</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'param labels does not seem to be a ndarray of label predictions'</span><span class="p">)</span>
|
||||
<span class="n">unique</span><span class="p">,</span> <span class="n">counts</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">labels</span><span class="p">,</span> <span class="n">return_counts</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">by_class</span> <span class="o">=</span> <span class="n">defaultdict</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">unique</span><span class="p">,</span> <span class="n">counts</span><span class="p">)))</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="n">by_class</span><span class="p">[</span><span class="n">class_</span><span class="p">]</span> <span class="k">for</span> <span class="n">class_</span> <span class="ow">in</span> <span class="n">classes</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="n">prevalences</span> <span class="o">/=</span> <span class="n">prevalences</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">prevalences</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="prevalence_from_probabilities"><a class="viewcode-back" href="../../quapy.html#quapy.functional.prevalence_from_probabilities">[docs]</a><span class="k">def</span> <span class="nf">prevalence_from_probabilities</span><span class="p">(</span><span class="n">posteriors</span><span class="p">,</span> <span class="n">binarize</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a vector of prevalence values from a matrix of posterior probabilities.</span>
|
||||
|
||||
<span class="sd"> :param posteriors: array-like of shape `(n_instances, n_classes,)` with posterior probabilities for each class</span>
|
||||
<span class="sd"> :param binarize: set to True (default is False) for computing the prevalence values on crisp decisions (i.e.,</span>
|
||||
<span class="sd"> converting the vectors of posterior probabilities into class indices, by taking the argmax).</span>
|
||||
<span class="sd"> :return: array of shape `(n_classes,)` containing the prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">posteriors</span><span class="o">.</span><span class="n">ndim</span> <span class="o">!=</span> <span class="mi">2</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'param posteriors does not seem to be a ndarray of posteior probabilities'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">binarize</span><span class="p">:</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">posteriors</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">prevalence_from_labels</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">posteriors</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">posteriors</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">prevalences</span> <span class="o">/=</span> <span class="n">prevalences</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">prevalences</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="as_binary_prevalence"><a class="viewcode-back" href="../../quapy.html#quapy.functional.as_binary_prevalence">[docs]</a><span class="k">def</span> <span class="nf">as_binary_prevalence</span><span class="p">(</span><span class="n">positive_prevalence</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">float</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">],</span> <span class="n">clip_if_necessary</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Helper that, given a float representing the prevalence for the positive class, returns a np.ndarray of two</span>
|
||||
<span class="sd"> values representing a binary distribution.</span>
|
||||
|
||||
<span class="sd"> :param positive_prevalence: prevalence for the positive class</span>
|
||||
<span class="sd"> :param clip_if_necessary: if True, clips the value in [0,1] in order to guarantee the resulting distribution</span>
|
||||
<span class="sd"> is valid. If False, it then checks that the value is in the valid range, and raises an error if not.</span>
|
||||
<span class="sd"> :return: np.ndarray of shape `(2,)`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">clip_if_necessary</span><span class="p">:</span>
|
||||
<span class="n">positive_prevalence</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">clip</span><span class="p">(</span><span class="n">positive_prevalence</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">assert</span> <span class="mi">0</span> <span class="o"><=</span> <span class="n">positive_prevalence</span> <span class="o"><=</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'the value provided is not a valid prevalence for the positive class'</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mi">1</span><span class="o">-</span><span class="n">positive_prevalence</span><span class="p">,</span> <span class="n">positive_prevalence</span><span class="p">])</span><span class="o">.</span><span class="n">T</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="HellingerDistance"><a class="viewcode-back" href="../../quapy.html#quapy.functional.HellingerDistance">[docs]</a><span class="k">def</span> <span class="nf">HellingerDistance</span><span class="p">(</span><span class="n">P</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Computes the Hellingher Distance (HD) between (discretized) distributions `P` and `Q`.</span>
|
||||
<span class="sd"> The HD for two discrete distributions of `k` bins is defined as:</span>
|
||||
|
||||
<span class="sd"> .. math::</span>
|
||||
<span class="sd"> HD(P,Q) = \\frac{ 1 }{ \\sqrt{ 2 } } \\sqrt{ \\sum_{i=1}^k ( \\sqrt{p_i} - \\sqrt{q_i} )^2 }</span>
|
||||
|
||||
<span class="sd"> :param P: real-valued array-like of shape `(k,)` representing a discrete distribution</span>
|
||||
<span class="sd"> :param Q: real-valued array-like of shape `(k,)` representing a discrete distribution</span>
|
||||
<span class="sd"> :return: float</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">P</span><span class="p">)</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">Q</span><span class="p">))</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TopsoeDistance"><a class="viewcode-back" href="../../quapy.html#quapy.functional.TopsoeDistance">[docs]</a><span class="k">def</span> <span class="nf">TopsoeDistance</span><span class="p">(</span><span class="n">P</span><span class="p">,</span> <span class="n">Q</span><span class="p">,</span> <span class="n">epsilon</span><span class="o">=</span><span class="mf">1e-20</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Topsoe distance between two (discretized) distributions `P` and `Q`.</span>
|
||||
<span class="sd"> The Topsoe distance for two discrete distributions of `k` bins is defined as:</span>
|
||||
|
||||
<span class="sd"> .. math::</span>
|
||||
<span class="sd"> Topsoe(P,Q) = \\sum_{i=1}^k \\left( p_i \\log\\left(\\frac{ 2 p_i + \\epsilon }{ p_i+q_i+\\epsilon }\\right) +</span>
|
||||
<span class="sd"> q_i \\log\\left(\\frac{ 2 q_i + \\epsilon }{ p_i+q_i+\\epsilon }\\right) \\right)</span>
|
||||
|
||||
<span class="sd"> :param P: real-valued array-like of shape `(k,)` representing a discrete distribution</span>
|
||||
<span class="sd"> :param Q: real-valued array-like of shape `(k,)` representing a discrete distribution</span>
|
||||
<span class="sd"> :return: float</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">P</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">((</span><span class="mi">2</span><span class="o">*</span><span class="n">P</span><span class="o">+</span><span class="n">epsilon</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">P</span><span class="o">+</span><span class="n">Q</span><span class="o">+</span><span class="n">epsilon</span><span class="p">))</span> <span class="o">+</span> <span class="n">Q</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">((</span><span class="mi">2</span><span class="o">*</span><span class="n">Q</span><span class="o">+</span><span class="n">epsilon</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">P</span><span class="o">+</span><span class="n">Q</span><span class="o">+</span><span class="n">epsilon</span><span class="p">)))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="uniform_prevalence_sampling"><a class="viewcode-back" href="../../quapy.html#quapy.functional.uniform_prevalence_sampling">[docs]</a><span class="k">def</span> <span class="nf">uniform_prevalence_sampling</span><span class="p">(</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements the `Kraemer algorithm <http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf>`_</span>
|
||||
<span class="sd"> for sampling uniformly at random from the unit simplex. This implementation is adapted from this</span>
|
||||
<span class="sd"> `post <https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex>_`.</span>
|
||||
|
||||
<span class="sd"> :param n_classes: integer, number of classes (dimensionality of the simplex)</span>
|
||||
<span class="sd"> :param size: number of samples to return</span>
|
||||
<span class="sd"> :return: `np.ndarray` of shape `(size, n_classes,)` if `size>1`, or of shape `(n_classes,)` otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">n_classes</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
|
||||
<span class="n">u</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">size</span><span class="p">)</span>
|
||||
<span class="n">u</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vstack</span><span class="p">([</span><span class="mi">1</span><span class="o">-</span><span class="n">u</span><span class="p">,</span> <span class="n">u</span><span class="p">])</span><span class="o">.</span><span class="n">T</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">u</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">n_classes</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">u</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">_0s</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
|
||||
<span class="n">_1s</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
|
||||
<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">hstack</span><span class="p">([</span><span class="n">_0s</span><span class="p">,</span> <span class="n">u</span><span class="p">])</span>
|
||||
<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">hstack</span><span class="p">([</span><span class="n">u</span><span class="p">,</span> <span class="n">_1s</span><span class="p">])</span>
|
||||
<span class="n">u</span> <span class="o">=</span> <span class="n">b</span><span class="o">-</span><span class="n">a</span>
|
||||
<span class="k">if</span> <span class="n">size</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">u</span> <span class="o">=</span> <span class="n">u</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">u</span></div>
|
||||
|
||||
|
||||
<span class="n">uniform_simplex_sampling</span> <span class="o">=</span> <span class="n">uniform_prevalence_sampling</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="strprev"><a class="viewcode-back" href="../../quapy.html#quapy.functional.strprev">[docs]</a><span class="k">def</span> <span class="nf">strprev</span><span class="p">(</span><span class="n">prevalences</span><span class="p">,</span> <span class="n">prec</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a string representation for a prevalence vector. E.g.,</span>
|
||||
|
||||
<span class="sd"> >>> strprev([1/3, 2/3], prec=2)</span>
|
||||
<span class="sd"> >>> '[0.33, 0.67]'</span>
|
||||
|
||||
<span class="sd"> :param prevalences: a vector of prevalence values</span>
|
||||
<span class="sd"> :param prec: float precision</span>
|
||||
<span class="sd"> :return: string</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="s1">'['</span><span class="o">+</span> <span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">p</span><span class="si">:</span><span class="s1">.</span><span class="si">{</span><span class="n">prec</span><span class="si">}</span><span class="s1">f</span><span class="si">}</span><span class="s1">'</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">prevalences</span><span class="p">])</span> <span class="o">+</span> <span class="s1">']'</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="adjusted_quantification"><a class="viewcode-back" href="../../quapy.html#quapy.functional.adjusted_quantification">[docs]</a><span class="k">def</span> <span class="nf">adjusted_quantification</span><span class="p">(</span><span class="n">prevalence_estim</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">,</span> <span class="n">clip</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements the adjustment of ACC and PACC for the binary case. The adjustment for a prevalence estimate of the</span>
|
||||
<span class="sd"> positive class `p` comes down to computing:</span>
|
||||
|
||||
<span class="sd"> .. math::</span>
|
||||
<span class="sd"> ACC(p) = \\frac{ p - fpr }{ tpr - fpr }</span>
|
||||
|
||||
<span class="sd"> :param prevalence_estim: float, the estimated value for the positive class</span>
|
||||
<span class="sd"> :param tpr: float, the true positive rate of the classifier</span>
|
||||
<span class="sd"> :param fpr: float, the false positive rate of the classifier</span>
|
||||
<span class="sd"> :param clip: set to True (default) to clip values that might exceed the range [0,1]</span>
|
||||
<span class="sd"> :return: float, the adjusted count</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">den</span> <span class="o">=</span> <span class="n">tpr</span> <span class="o">-</span> <span class="n">fpr</span>
|
||||
<span class="k">if</span> <span class="n">den</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">den</span> <span class="o">+=</span> <span class="mf">1e-8</span>
|
||||
<span class="n">adjusted</span> <span class="o">=</span> <span class="p">(</span><span class="n">prevalence_estim</span> <span class="o">-</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">/</span> <span class="n">den</span>
|
||||
<span class="k">if</span> <span class="n">clip</span><span class="p">:</span>
|
||||
<span class="n">adjusted</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">clip</span><span class="p">(</span><span class="n">adjusted</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">adjusted</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="normalize_prevalence"><a class="viewcode-back" href="../../quapy.html#quapy.functional.normalize_prevalence">[docs]</a><span class="k">def</span> <span class="nf">normalize_prevalence</span><span class="p">(</span><span class="n">prevalences</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Normalize a vector or matrix of prevalence values. The normalization consists of applying a L1 normalization in</span>
|
||||
<span class="sd"> cases in which the prevalence values are not all-zeros, and to convert the prevalence values into `1/n_classes` in</span>
|
||||
<span class="sd"> cases in which all values are zero.</span>
|
||||
|
||||
<span class="sd"> :param prevalences: array-like of shape `(n_classes,)` or of shape `(n_samples, n_classes,)` with prevalence values</span>
|
||||
<span class="sd"> :return: a normalized vector or matrix of prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">prevalences</span><span class="p">)</span>
|
||||
<span class="n">n_classes</span> <span class="o">=</span> <span class="n">prevalences</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="n">accum</span> <span class="o">=</span> <span class="n">prevalences</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">true_divide</span><span class="p">(</span><span class="n">prevalences</span><span class="p">,</span> <span class="n">accum</span><span class="p">,</span> <span class="n">where</span><span class="o">=</span><span class="n">accum</span><span class="o">></span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">allzeros</span> <span class="o">=</span> <span class="n">accum</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span><span class="o">==</span><span class="mi">0</span>
|
||||
<span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">allzeros</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">prevalences</span><span class="o">.</span><span class="n">ndim</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">fill_value</span><span class="o">=</span><span class="mf">1.</span><span class="o">/</span><span class="n">n_classes</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">prevalences</span><span class="p">[</span><span class="n">accum</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span><span class="o">==</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">fill_value</span><span class="o">=</span><span class="mf">1.</span><span class="o">/</span><span class="n">n_classes</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">prevalences</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">__num_prevalence_combinations_depr</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">:</span><span class="nb">int</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">:</span><span class="nb">int</span><span class="p">,</span> <span class="n">n_repeats</span><span class="p">:</span><span class="nb">int</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Computes the number of prevalence combinations in the n_classes-dimensional simplex if `nprevpoints` equally distant</span>
|
||||
<span class="sd"> prevalence values are generated and `n_repeats` repetitions are requested.</span>
|
||||
|
||||
<span class="sd"> :param n_classes: integer, number of classes</span>
|
||||
<span class="sd"> :param n_prevpoints: integer, number of prevalence points.</span>
|
||||
<span class="sd"> :param n_repeats: integer, number of repetitions for each prevalence combination</span>
|
||||
<span class="sd"> :return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the</span>
|
||||
<span class="sd"> number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">__cache</span><span class="o">=</span><span class="p">{}</span>
|
||||
<span class="k">def</span> <span class="nf">__f</span><span class="p">(</span><span class="n">nc</span><span class="p">,</span><span class="n">np</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="n">nc</span><span class="p">,</span><span class="n">np</span><span class="p">)</span> <span class="ow">in</span> <span class="n">__cache</span><span class="p">:</span> <span class="c1"># cached result</span>
|
||||
<span class="k">return</span> <span class="n">__cache</span><span class="p">[(</span><span class="n">nc</span><span class="p">,</span><span class="n">np</span><span class="p">)]</span>
|
||||
<span class="k">if</span> <span class="n">nc</span><span class="o">==</span><span class="mi">1</span><span class="p">:</span> <span class="c1"># stop condition</span>
|
||||
<span class="k">return</span> <span class="mi">1</span>
|
||||
<span class="k">else</span><span class="p">:</span> <span class="c1"># recursive call</span>
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">([</span><span class="n">__f</span><span class="p">(</span><span class="n">nc</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">np</span><span class="o">-</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">np</span><span class="p">)])</span>
|
||||
<span class="n">__cache</span><span class="p">[(</span><span class="n">nc</span><span class="p">,</span><span class="n">np</span><span class="p">)]</span> <span class="o">=</span> <span class="n">x</span>
|
||||
<span class="k">return</span> <span class="n">x</span>
|
||||
<span class="k">return</span> <span class="n">__f</span><span class="p">(</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">n_prevpoints</span><span class="p">)</span> <span class="o">*</span> <span class="n">n_repeats</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="num_prevalence_combinations"><a class="viewcode-back" href="../../quapy.html#quapy.functional.num_prevalence_combinations">[docs]</a><span class="k">def</span> <span class="nf">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">:</span><span class="nb">int</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">:</span><span class="nb">int</span><span class="p">,</span> <span class="n">n_repeats</span><span class="p">:</span><span class="nb">int</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Computes the number of valid prevalence combinations in the n_classes-dimensional simplex if `n_prevpoints` equally</span>
|
||||
<span class="sd"> distant prevalence values are generated and `n_repeats` repetitions are requested.</span>
|
||||
<span class="sd"> The computation comes down to calculating:</span>
|
||||
|
||||
<span class="sd"> .. math::</span>
|
||||
<span class="sd"> \\binom{N+C-1}{C-1} \\times r</span>
|
||||
|
||||
<span class="sd"> where `N` is `n_prevpoints-1`, i.e., the number of probability mass blocks to allocate, `C` is the number of</span>
|
||||
<span class="sd"> classes, and `r` is `n_repeats`. This solution comes from the</span>
|
||||
<span class="sd"> `Stars and Bars <https://brilliant.org/wiki/integer-equations-star-and-bars/>`_ problem.</span>
|
||||
|
||||
<span class="sd"> :param n_classes: integer, number of classes</span>
|
||||
<span class="sd"> :param n_prevpoints: integer, number of prevalence points.</span>
|
||||
<span class="sd"> :param n_repeats: integer, number of repetitions for each prevalence combination</span>
|
||||
<span class="sd"> :return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the</span>
|
||||
<span class="sd"> number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">N</span> <span class="o">=</span> <span class="n">n_prevpoints</span><span class="o">-</span><span class="mi">1</span>
|
||||
<span class="n">C</span> <span class="o">=</span> <span class="n">n_classes</span>
|
||||
<span class="n">r</span> <span class="o">=</span> <span class="n">n_repeats</span>
|
||||
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">scipy</span><span class="o">.</span><span class="n">special</span><span class="o">.</span><span class="n">binom</span><span class="p">(</span><span class="n">N</span> <span class="o">+</span> <span class="n">C</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">C</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">r</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_nprevpoints_approximation"><a class="viewcode-back" href="../../quapy.html#quapy.functional.get_nprevpoints_approximation">[docs]</a><span class="k">def</span> <span class="nf">get_nprevpoints_approximation</span><span class="p">(</span><span class="n">combinations_budget</span><span class="p">:</span><span class="nb">int</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">:</span><span class="nb">int</span><span class="p">,</span> <span class="n">n_repeats</span><span class="p">:</span><span class="nb">int</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Searches for the largest number of (equidistant) prevalence points to define for each of the `n_classes` classes so</span>
|
||||
<span class="sd"> that the number of valid prevalence values generated as combinations of prevalence points (points in a</span>
|
||||
<span class="sd"> `n_classes`-dimensional simplex) do not exceed combinations_budget.</span>
|
||||
|
||||
<span class="sd"> :param combinations_budget: integer, maximum number of combinations allowed</span>
|
||||
<span class="sd"> :param n_classes: integer, number of classes</span>
|
||||
<span class="sd"> :param n_repeats: integer, number of repetitions for each prevalence combination</span>
|
||||
<span class="sd"> :return: the largest number of prevalence points that generate less than combinations_budget valid prevalences</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">n_classes</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">n_repeats</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">combinations_budget</span> <span class="o">></span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'parameters must be positive integers'</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
|
||||
<span class="n">combinations</span> <span class="o">=</span> <span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">combinations</span> <span class="o">></span> <span class="n">combinations_budget</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">n_prevpoints</span><span class="o">-</span><span class="mi">1</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">+=</span> <span class="mi">1</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="check_prevalence_vector"><a class="viewcode-back" href="../../quapy.html#quapy.functional.check_prevalence_vector">[docs]</a><span class="k">def</span> <span class="nf">check_prevalence_vector</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">raise_exception</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">toleranze</span><span class="o">=</span><span class="mf">1e-08</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Checks that p is a valid prevalence vector, i.e., that it contains values in [0,1] and that the values sum up to 1.</span>
|
||||
|
||||
<span class="sd"> :param p: the prevalence vector to check</span>
|
||||
<span class="sd"> :return: True if `p` is valid, False otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="n">p</span><span class="o">>=</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">raise_exception</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'the prevalence vector contains negative numbers'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="kc">False</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="n">p</span><span class="o"><=</span><span class="mi">1</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">raise_exception</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'the prevalence vector contains values >1'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="kc">False</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">np</span><span class="o">.</span><span class="n">isclose</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">sum</span><span class="p">(),</span> <span class="mi">1</span><span class="p">,</span> <span class="n">atol</span><span class="o">=</span><span class="n">toleranze</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">raise_exception</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'the prevalence vector does not sum up to 1'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="kc">False</span>
|
||||
<span class="k">return</span> <span class="kc">True</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_divergence"><a class="viewcode-back" href="../../quapy.html#quapy.functional.get_divergence">[docs]</a><span class="k">def</span> <span class="nf">get_divergence</span><span class="p">(</span><span class="n">divergence</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Callable</span><span class="p">]):</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">divergence</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">divergence</span><span class="o">==</span><span class="s1">'HD'</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">HellingerDistance</span>
|
||||
<span class="k">elif</span> <span class="n">divergence</span><span class="o">==</span><span class="s1">'topsoe'</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">TopsoeDistance</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'unknown divergence </span><span class="si">{</span><span class="n">divergence</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">callable</span><span class="p">(</span><span class="n">divergence</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">divergence</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'argument "divergence" not understood; use a str or a callable function'</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="argmin_prevalence"><a class="viewcode-back" href="../../quapy.html#quapy.functional.argmin_prevalence">[docs]</a><span class="k">def</span> <span class="nf">argmin_prevalence</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s1">'optim_minimize'</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">method</span> <span class="o">==</span> <span class="s1">'optim_minimize'</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">optim_minimize</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">method</span> <span class="o">==</span> <span class="s1">'linear_search'</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">linear_search</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">method</span> <span class="o">==</span> <span class="s1">'ternary_search'</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="optim_minimize"><a class="viewcode-back" href="../../quapy.html#quapy.functional.optim_minimize">[docs]</a><span class="k">def</span> <span class="nf">optim_minimize</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Searches for the optimal prevalence values, i.e., an `n_classes`-dimensional vector of the (`n_classes`-1)-simplex</span>
|
||||
<span class="sd"> that yields the smallest lost. This optimization is carried out by means of a constrained search using scipy's</span>
|
||||
<span class="sd"> SLSQP routine.</span>
|
||||
|
||||
<span class="sd"> :param loss: (callable) the function to minimize</span>
|
||||
<span class="sd"> :param n_classes: (int) the number of classes, i.e., the dimensionality of the prevalence vector</span>
|
||||
<span class="sd"> :return: (ndarray) the best prevalence vector found</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">optimize</span>
|
||||
|
||||
<span class="c1"># the initial point is set as the uniform distribution</span>
|
||||
<span class="n">uniform_distribution</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full</span><span class="p">(</span><span class="n">fill_value</span><span class="o">=</span><span class="mi">1</span> <span class="o">/</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n_classes</span><span class="p">,))</span>
|
||||
|
||||
<span class="c1"># solutions are bounded to those contained in the unit-simplex</span>
|
||||
<span class="n">bounds</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_classes</span><span class="p">))</span> <span class="c1"># values in [0,1]</span>
|
||||
<span class="n">constraints</span> <span class="o">=</span> <span class="p">({</span><span class="s1">'type'</span><span class="p">:</span> <span class="s1">'eq'</span><span class="p">,</span> <span class="s1">'fun'</span><span class="p">:</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="mi">1</span> <span class="o">-</span> <span class="nb">sum</span><span class="p">(</span><span class="n">x</span><span class="p">)})</span> <span class="c1"># values summing up to 1</span>
|
||||
<span class="n">r</span> <span class="o">=</span> <span class="n">optimize</span><span class="o">.</span><span class="n">minimize</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">x0</span><span class="o">=</span><span class="n">uniform_distribution</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s1">'SLSQP'</span><span class="p">,</span> <span class="n">bounds</span><span class="o">=</span><span class="n">bounds</span><span class="p">,</span> <span class="n">constraints</span><span class="o">=</span><span class="n">constraints</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">r</span><span class="o">.</span><span class="n">x</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="linear_search"><a class="viewcode-back" href="../../quapy.html#quapy.functional.linear_search">[docs]</a><span class="k">def</span> <span class="nf">linear_search</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Performs a linear search for the best prevalence value in binary problems. The search is carried out by exploring</span>
|
||||
<span class="sd"> the range [0,1] stepping by 0.01. This search is inefficient, and is added only for completeness (some of the</span>
|
||||
<span class="sd"> early methods in quantification literature used it, e.g., HDy). A most powerful alternative is `optim_minimize`.</span>
|
||||
|
||||
<span class="sd"> :param loss: (callable) the function to minimize</span>
|
||||
<span class="sd"> :param n_classes: (int) the number of classes, i.e., the dimensionality of the prevalence vector</span>
|
||||
<span class="sd"> :return: (ndarray) the best prevalence vector found</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">n_classes</span><span class="o">==</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'linear search is only available for binary problems'</span>
|
||||
|
||||
<span class="n">prev_selected</span><span class="p">,</span> <span class="n">min_score</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
|
||||
<span class="k">for</span> <span class="n">prev</span> <span class="ow">in</span> <span class="n">prevalence_linspace</span><span class="p">(</span><span class="n">n_prevalences</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">smooth_limits_epsilon</span><span class="o">=</span><span class="mf">0.0</span><span class="p">):</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">loss</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mi">1</span> <span class="o">-</span> <span class="n">prev</span><span class="p">,</span> <span class="n">prev</span><span class="p">]))</span>
|
||||
<span class="k">if</span> <span class="n">min_score</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">score</span> <span class="o"><</span> <span class="n">min_score</span><span class="p">:</span>
|
||||
<span class="n">prev_selected</span><span class="p">,</span> <span class="n">min_score</span> <span class="o">=</span> <span class="n">prev</span><span class="p">,</span> <span class="n">score</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mi">1</span> <span class="o">-</span> <span class="n">prev_selected</span><span class="p">,</span> <span class="n">prev_selected</span><span class="p">])</span></div>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,462 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.method._kdey — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.method._kdey</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.method._kdey</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.neighbors</span> <span class="kn">import</span> <span class="n">KernelDensity</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">AggregativeSoftQuantifier</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">sklearn.metrics.pairwise</span> <span class="kn">import</span> <span class="n">rbf_kernel</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="KDEBase"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEBase">[docs]</a><span class="k">class</span> <span class="nc">KDEBase</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Common ancestor for KDE-based methods. Implements some common routines.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">BANDWIDTH_METHOD</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'scott'</span><span class="p">,</span> <span class="s1">'silverman'</span><span class="p">]</span>
|
||||
|
||||
<span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">_check_bandwidth</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">bandwidth</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Checks that the bandwidth parameter is correct</span>
|
||||
|
||||
<span class="sd"> :param bandwidth: either a string (see BANDWIDTH_METHOD) or a float</span>
|
||||
<span class="sd"> :return: nothing, but raises an exception for invalid values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">bandwidth</span> <span class="ow">in</span> <span class="n">KDEBase</span><span class="o">.</span><span class="n">BANDWIDTH_METHOD</span> <span class="ow">or</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">bandwidth</span><span class="p">,</span> <span class="nb">float</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'invalid bandwidth, valid ones are </span><span class="si">{</span><span class="n">KDEBase</span><span class="o">.</span><span class="n">BANDWIDTH_METHOD</span><span class="si">}</span><span class="s1"> or float values'</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">bandwidth</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="mi">0</span> <span class="o"><</span> <span class="n">bandwidth</span> <span class="o"><</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">"the bandwith for KDEy should be in (0,1), since this method models the unit simplex"</span>
|
||||
|
||||
<div class="viewcode-block" id="KDEBase.get_kde_function"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEBase.get_kde_function">[docs]</a> <span class="k">def</span> <span class="nf">get_kde_function</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">bandwidth</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Wraps the KDE function from scikit-learn.</span>
|
||||
|
||||
<span class="sd"> :param X: data for which the density function is to be estimated</span>
|
||||
<span class="sd"> :param bandwidth: the bandwidth of the kernel</span>
|
||||
<span class="sd"> :return: a scikit-learn's KernelDensity object</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">KernelDensity</span><span class="p">(</span><span class="n">bandwidth</span><span class="o">=</span><span class="n">bandwidth</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KDEBase.pdf"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEBase.pdf">[docs]</a> <span class="k">def</span> <span class="nf">pdf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">kde</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Wraps the density evalution of scikit-learn's KDE. Scikit-learn returns log-scores (s), so this</span>
|
||||
<span class="sd"> function returns :math:`e^{s}`</span>
|
||||
|
||||
<span class="sd"> :param kde: a previously fit KDE function</span>
|
||||
<span class="sd"> :param X: the data for which the density is to be estimated</span>
|
||||
<span class="sd"> :return: np.ndarray with the densities</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="n">kde</span><span class="o">.</span><span class="n">score_samples</span><span class="p">(</span><span class="n">X</span><span class="p">))</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KDEBase.get_mixture_components"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEBase.get_mixture_components">[docs]</a> <span class="k">def</span> <span class="nf">get_mixture_components</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">bandwidth</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns an array containing the mixture components, i.e., the KDE functions for each class.</span>
|
||||
|
||||
<span class="sd"> :param X: the data containing the covariates</span>
|
||||
<span class="sd"> :param y: the class labels</span>
|
||||
<span class="sd"> :param n_classes: integer, the number of classes</span>
|
||||
<span class="sd"> :param bandwidth: float, the bandwidth of the kernel</span>
|
||||
<span class="sd"> :return: a list of KernelDensity objects, each fitted with the corresponding class-specific covariates</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">get_kde_function</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">cat</span><span class="p">],</span> <span class="n">bandwidth</span><span class="p">)</span> <span class="k">for</span> <span class="n">cat</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_classes</span><span class="p">)]</span></div></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="KDEyML"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyML">[docs]</a><span class="k">class</span> <span class="nc">KDEyML</span><span class="p">(</span><span class="n">AggregativeSoftQuantifier</span><span class="p">,</span> <span class="n">KDEBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Kernel Density Estimation model for quantification (KDEy) relying on the Kullback-Leibler divergence (KLD) as</span>
|
||||
<span class="sd"> the divergence measure to be minimized. This method was first proposed in the paper</span>
|
||||
<span class="sd"> `Kernel Density Estimation for Multiclass Quantification <https://arxiv.org/abs/2401.00490>`_, in which</span>
|
||||
<span class="sd"> the authors show that minimizing the distribution mathing criterion for KLD is akin to performing</span>
|
||||
<span class="sd"> maximum likelihood (ML).</span>
|
||||
|
||||
<span class="sd"> The distribution matching optimization problem comes down to solving:</span>
|
||||
|
||||
<span class="sd"> :math:`\\hat{\\alpha} = \\arg\\min_{\\alpha\\in\\Delta^{n-1}} \\mathcal{D}(\\boldsymbol{p}_{\\alpha}||q_{\\widetilde{U}})`</span>
|
||||
|
||||
<span class="sd"> where :math:`p_{\\alpha}` is the mixture of class-specific KDEs with mixture parameter (hence class prevalence)</span>
|
||||
<span class="sd"> :math:`\\alpha` defined by</span>
|
||||
|
||||
<span class="sd"> :math:`\\boldsymbol{p}_{\\alpha}(\\widetilde{x}) = \\sum_{i=1}^n \\alpha_i p_{\\widetilde{L}_i}(\\widetilde{x})`</span>
|
||||
|
||||
<span class="sd"> where :math:`p_X(\\boldsymbol{x}) = \\frac{1}{|X|} \\sum_{x_i\\in X} K\\left(\\frac{x-x_i}{h}\\right)` is the</span>
|
||||
<span class="sd"> KDE function that uses the datapoints in X as the kernel centers.</span>
|
||||
|
||||
<span class="sd"> In KDEy-ML, the divergence is taken to be the Kullback-Leibler Divergence. This is equivalent to solving:</span>
|
||||
<span class="sd"> :math:`\\hat{\\alpha} = \\arg\\min_{\\alpha\\in\\Delta^{n-1}} -</span>
|
||||
<span class="sd"> \\mathbb{E}_{q_{\\widetilde{U}}} \\left[ \\log \\boldsymbol{p}_{\\alpha}(\\widetilde{x}) \\right]`</span>
|
||||
|
||||
<span class="sd"> which corresponds to the maximum likelihood estimate.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a binary classifier.</span>
|
||||
<span class="sd"> :param val_split: specifies the data used for generating classifier predictions. This specification</span>
|
||||
<span class="sd"> can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to</span>
|
||||
<span class="sd"> be extracted from the training set; or as an integer (default 5), indicating that the predictions</span>
|
||||
<span class="sd"> are to be generated in a `k`-fold cross-validation manner (with this integer indicating the value</span>
|
||||
<span class="sd"> for `k`); or as a collection defining the specific set of data to use for validation.</span>
|
||||
<span class="sd"> Alternatively, this set can be specified at fit time by indicating the exact set of data</span>
|
||||
<span class="sd"> on which the predictions are to be generated.</span>
|
||||
<span class="sd"> :param bandwidth: float, the bandwidth of the Kernel</span>
|
||||
<span class="sd"> :param n_jobs: number of parallel workers</span>
|
||||
<span class="sd"> :param random_state: a seed to be set before fitting any base quantifier (default None)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">bandwidth</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_check_bandwidth</span><span class="p">(</span><span class="n">bandwidth</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span> <span class="o">=</span> <span class="n">bandwidth</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span>
|
||||
|
||||
<div class="viewcode-block" id="KDEyML.aggregation_fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyML.aggregation_fit">[docs]</a> <span class="k">def</span> <span class="nf">aggregation_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_mixture_components</span><span class="p">(</span><span class="o">*</span><span class="n">classif_predictions</span><span class="o">.</span><span class="n">Xy</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KDEyML.aggregate"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyML.aggregate">[docs]</a> <span class="k">def</span> <span class="nf">aggregate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Searches for the mixture model parameter (the sought prevalence values) that maximizes the likelihood</span>
|
||||
<span class="sd"> of the data (i.e., that minimizes the negative log-likelihood)</span>
|
||||
|
||||
<span class="sd"> :param posteriors: instances in the sample converted into posterior probabilities</span>
|
||||
<span class="sd"> :return: a vector of class prevalence estimates</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">RandomState</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="n">epsilon</span> <span class="o">=</span> <span class="mf">1e-10</span>
|
||||
<span class="n">n_classes</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span><span class="p">)</span>
|
||||
<span class="n">test_densities</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">kde_i</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">)</span> <span class="k">for</span> <span class="n">kde_i</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span><span class="p">]</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">neg_loglikelihood</span><span class="p">(</span><span class="n">prev</span><span class="p">):</span>
|
||||
<span class="n">test_mixture_likelihood</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">prev_i</span> <span class="o">*</span> <span class="n">dens_i</span> <span class="k">for</span> <span class="n">prev_i</span><span class="p">,</span> <span class="n">dens_i</span> <span class="ow">in</span> <span class="nb">zip</span> <span class="p">(</span><span class="n">prev</span><span class="p">,</span> <span class="n">test_densities</span><span class="p">))</span>
|
||||
<span class="n">test_loglikelihood</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">test_mixture_likelihood</span> <span class="o">+</span> <span class="n">epsilon</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">test_loglikelihood</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">F</span><span class="o">.</span><span class="n">optim_minimize</span><span class="p">(</span><span class="n">neg_loglikelihood</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="KDEyHD"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyHD">[docs]</a><span class="k">class</span> <span class="nc">KDEyHD</span><span class="p">(</span><span class="n">AggregativeSoftQuantifier</span><span class="p">,</span> <span class="n">KDEBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Kernel Density Estimation model for quantification (KDEy) relying on the squared Hellinger Disntace (HD) as</span>
|
||||
<span class="sd"> the divergence measure to be minimized. This method was first proposed in the paper</span>
|
||||
<span class="sd"> `Kernel Density Estimation for Multiclass Quantification <https://arxiv.org/abs/2401.00490>`_, in which</span>
|
||||
<span class="sd"> the authors proposed a Monte Carlo approach for minimizing the divergence.</span>
|
||||
|
||||
<span class="sd"> The distribution matching optimization problem comes down to solving:</span>
|
||||
|
||||
<span class="sd"> :math:`\\hat{\\alpha} = \\arg\\min_{\\alpha\\in\\Delta^{n-1}} \\mathcal{D}(\\boldsymbol{p}_{\\alpha}||q_{\\widetilde{U}})`</span>
|
||||
|
||||
<span class="sd"> where :math:`p_{\\alpha}` is the mixture of class-specific KDEs with mixture parameter (hence class prevalence)</span>
|
||||
<span class="sd"> :math:`\\alpha` defined by</span>
|
||||
|
||||
<span class="sd"> :math:`\\boldsymbol{p}_{\\alpha}(\\widetilde{x}) = \\sum_{i=1}^n \\alpha_i p_{\\widetilde{L}_i}(\\widetilde{x})`</span>
|
||||
|
||||
<span class="sd"> where :math:`p_X(\\boldsymbol{x}) = \\frac{1}{|X|} \\sum_{x_i\\in X} K\\left(\\frac{x-x_i}{h}\\right)` is the</span>
|
||||
<span class="sd"> KDE function that uses the datapoints in X as the kernel centers.</span>
|
||||
|
||||
<span class="sd"> In KDEy-HD, the divergence is taken to be the squared Hellinger Distance, an f-divergence with corresponding</span>
|
||||
<span class="sd"> f-generator function given by:</span>
|
||||
|
||||
<span class="sd"> :math:`f(u)=(\\sqrt{u}-1)^2`</span>
|
||||
|
||||
<span class="sd"> The authors proposed a Monte Carlo solution that relies on importance sampling:</span>
|
||||
|
||||
<span class="sd"> :math:`\\hat{D}_f(p||q)= \\frac{1}{t} \\sum_{i=1}^t f\\left(\\frac{p(x_i)}{q(x_i)}\\right) \\frac{q(x_i)}{r(x_i)}`</span>
|
||||
|
||||
<span class="sd"> where the datapoints (trials) :math:`x_1,\\ldots,x_t\\sim_{\\mathrm{iid}} r` with :math:`r` the</span>
|
||||
<span class="sd"> uniform distribution.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a binary classifier.</span>
|
||||
<span class="sd"> :param val_split: specifies the data used for generating classifier predictions. This specification</span>
|
||||
<span class="sd"> can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to</span>
|
||||
<span class="sd"> be extracted from the training set; or as an integer (default 5), indicating that the predictions</span>
|
||||
<span class="sd"> are to be generated in a `k`-fold cross-validation manner (with this integer indicating the value</span>
|
||||
<span class="sd"> for `k`); or as a collection defining the specific set of data to use for validation.</span>
|
||||
<span class="sd"> Alternatively, this set can be specified at fit time by indicating the exact set of data</span>
|
||||
<span class="sd"> on which the predictions are to be generated.</span>
|
||||
<span class="sd"> :param bandwidth: float, the bandwidth of the Kernel</span>
|
||||
<span class="sd"> :param n_jobs: number of parallel workers</span>
|
||||
<span class="sd"> :param random_state: a seed to be set before fitting any base quantifier (default None)</span>
|
||||
<span class="sd"> :param montecarlo_trials: number of Monte Carlo trials (default 10000)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">divergence</span><span class="p">:</span> <span class="nb">str</span><span class="o">=</span><span class="s1">'HD'</span><span class="p">,</span>
|
||||
<span class="n">bandwidth</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">montecarlo_trials</span><span class="o">=</span><span class="mi">10000</span><span class="p">):</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_check_bandwidth</span><span class="p">(</span><span class="n">bandwidth</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">divergence</span> <span class="o">=</span> <span class="n">divergence</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span> <span class="o">=</span> <span class="n">bandwidth</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">montecarlo_trials</span> <span class="o">=</span> <span class="n">montecarlo_trials</span>
|
||||
|
||||
<div class="viewcode-block" id="KDEyHD.aggregation_fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyHD.aggregation_fit">[docs]</a> <span class="k">def</span> <span class="nf">aggregation_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_mixture_components</span><span class="p">(</span><span class="o">*</span><span class="n">classif_predictions</span><span class="o">.</span><span class="n">Xy</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span><span class="p">)</span>
|
||||
|
||||
<span class="n">N</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">montecarlo_trials</span>
|
||||
<span class="n">rs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">random_state</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">n_classes</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">reference_samples</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vstack</span><span class="p">([</span><span class="n">kde_i</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">N</span><span class="o">//</span><span class="n">n</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">rs</span><span class="p">)</span> <span class="k">for</span> <span class="n">kde_i</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">reference_classwise_densities</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="bp">self</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">kde_j</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">reference_samples</span><span class="p">)</span> <span class="k">for</span> <span class="n">kde_j</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">reference_density</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">reference_classwise_densities</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># equiv. to (uniform @ self.reference_classwise_densities)</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KDEyHD.aggregate"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyHD.aggregate">[docs]</a> <span class="k">def</span> <span class="nf">aggregate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="c1"># we retain all n*N examples (sampled from a mixture with uniform parameter), and then</span>
|
||||
<span class="c1"># apply importance sampling (IS). In this version we compute D(p_alpha||q) with IS</span>
|
||||
<span class="n">n_classes</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mix_densities</span><span class="p">)</span>
|
||||
|
||||
<span class="n">test_kde</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_kde_function</span><span class="p">(</span><span class="n">posteriors</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span><span class="p">)</span>
|
||||
<span class="n">test_densities</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">test_kde</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">reference_samples</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">f_squared_hellinger</span><span class="p">(</span><span class="n">u</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">u</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span>
|
||||
|
||||
<span class="c1"># todo: this will fail when self.divergence is a callable, and is not the right place to do it anyway</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">divergence</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'hd'</span><span class="p">:</span>
|
||||
<span class="n">f</span> <span class="o">=</span> <span class="n">f_squared_hellinger</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'only squared HD is currently implemented'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">epsilon</span> <span class="o">=</span> <span class="mf">1e-10</span>
|
||||
<span class="n">qs</span> <span class="o">=</span> <span class="n">test_densities</span> <span class="o">+</span> <span class="n">epsilon</span>
|
||||
<span class="n">rs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">reference_density</span> <span class="o">+</span> <span class="n">epsilon</span>
|
||||
<span class="n">iw</span> <span class="o">=</span> <span class="n">qs</span><span class="o">/</span><span class="n">rs</span> <span class="c1">#importance weights</span>
|
||||
<span class="n">p_class</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">reference_classwise_densities</span> <span class="o">+</span> <span class="n">epsilon</span>
|
||||
<span class="n">fracs</span> <span class="o">=</span> <span class="n">p_class</span><span class="o">/</span><span class="n">qs</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">divergence</span><span class="p">(</span><span class="n">prev</span><span class="p">):</span>
|
||||
<span class="c1"># ps / qs = (prev @ p_class) / qs = prev @ (p_class / qs) = prev @ fracs</span>
|
||||
<span class="n">ps_div_qs</span> <span class="o">=</span> <span class="n">prev</span> <span class="o">@</span> <span class="n">fracs</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span> <span class="n">f</span><span class="p">(</span><span class="n">ps_div_qs</span><span class="p">)</span> <span class="o">*</span> <span class="n">iw</span> <span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">F</span><span class="o">.</span><span class="n">optim_minimize</span><span class="p">(</span><span class="n">divergence</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="KDEyCS"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyCS">[docs]</a><span class="k">class</span> <span class="nc">KDEyCS</span><span class="p">(</span><span class="n">AggregativeSoftQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Kernel Density Estimation model for quantification (KDEy) relying on the Cauchy-Schwarz divergence (CS) as</span>
|
||||
<span class="sd"> the divergence measure to be minimized. This method was first proposed in the paper</span>
|
||||
<span class="sd"> `Kernel Density Estimation for Multiclass Quantification <https://arxiv.org/abs/2401.00490>`_, in which</span>
|
||||
<span class="sd"> the authors proposed a Monte Carlo approach for minimizing the divergence.</span>
|
||||
|
||||
<span class="sd"> The distribution matching optimization problem comes down to solving:</span>
|
||||
|
||||
<span class="sd"> :math:`\\hat{\\alpha} = \\arg\\min_{\\alpha\\in\\Delta^{n-1}} \\mathcal{D}(\\boldsymbol{p}_{\\alpha}||q_{\\widetilde{U}})`</span>
|
||||
|
||||
<span class="sd"> where :math:`p_{\\alpha}` is the mixture of class-specific KDEs with mixture parameter (hence class prevalence)</span>
|
||||
<span class="sd"> :math:`\\alpha` defined by</span>
|
||||
|
||||
<span class="sd"> :math:`\\boldsymbol{p}_{\\alpha}(\\widetilde{x}) = \\sum_{i=1}^n \\alpha_i p_{\\widetilde{L}_i}(\\widetilde{x})`</span>
|
||||
|
||||
<span class="sd"> where :math:`p_X(\\boldsymbol{x}) = \\frac{1}{|X|} \\sum_{x_i\\in X} K\\left(\\frac{x-x_i}{h}\\right)` is the</span>
|
||||
<span class="sd"> KDE function that uses the datapoints in X as the kernel centers.</span>
|
||||
|
||||
<span class="sd"> In KDEy-CS, the divergence is taken to be the Cauchy-Schwarz divergence given by:</span>
|
||||
|
||||
<span class="sd"> :math:`\\mathcal{D}_{\\mathrm{CS}}(p||q)=-\\log\\left(\\frac{\\int p(x)q(x)dx}{\\sqrt{\\int p(x)^2dx \\int q(x)^2dx}}\\right)`</span>
|
||||
|
||||
<span class="sd"> The authors showed that this distribution matching admits a closed-form solution</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a binary classifier.</span>
|
||||
<span class="sd"> :param val_split: specifies the data used for generating classifier predictions. This specification</span>
|
||||
<span class="sd"> can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to</span>
|
||||
<span class="sd"> be extracted from the training set; or as an integer (default 5), indicating that the predictions</span>
|
||||
<span class="sd"> are to be generated in a `k`-fold cross-validation manner (with this integer indicating the value</span>
|
||||
<span class="sd"> for `k`); or as a collection defining the specific set of data to use for validation.</span>
|
||||
<span class="sd"> Alternatively, this set can be specified at fit time by indicating the exact set of data</span>
|
||||
<span class="sd"> on which the predictions are to be generated.</span>
|
||||
<span class="sd"> :param bandwidth: float, the bandwidth of the Kernel</span>
|
||||
<span class="sd"> :param n_jobs: number of parallel workers</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">bandwidth</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="n">KDEBase</span><span class="o">.</span><span class="n">_check_bandwidth</span><span class="p">(</span><span class="n">bandwidth</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span> <span class="o">=</span> <span class="n">bandwidth</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
|
||||
<div class="viewcode-block" id="KDEyCS.gram_matrix_mix_sum"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyCS.gram_matrix_mix_sum">[docs]</a> <span class="k">def</span> <span class="nf">gram_matrix_mix_sum</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="c1"># this adapts the output of the rbf_kernel function (pairwise evaluations of Gaussian kernels k(x,y))</span>
|
||||
<span class="c1"># to contain pairwise evaluations of N(x|mu,Sigma1+Sigma2) with mu=y and Sigma1 and Sigma2 are </span>
|
||||
<span class="c1"># two "scalar matrices" (h^2)*I each, so Sigma1+Sigma2 has scalar 2(h^2) (h is the bandwidth)</span>
|
||||
<span class="n">h</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">bandwidth</span>
|
||||
<span class="n">variance</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="p">(</span><span class="n">h</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">nD</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="n">gamma</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">variance</span><span class="p">)</span>
|
||||
<span class="n">norm_factor</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(((</span><span class="mi">2</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="p">)</span><span class="o">**</span><span class="n">nD</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">variance</span><span class="o">**</span><span class="p">(</span><span class="n">nD</span><span class="p">)))</span>
|
||||
<span class="n">gram</span> <span class="o">=</span> <span class="n">norm_factor</span> <span class="o">*</span> <span class="n">rbf_kernel</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">gamma</span><span class="o">=</span><span class="n">gamma</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">gram</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KDEyCS.aggregation_fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyCS.aggregation_fit">[docs]</a> <span class="k">def</span> <span class="nf">aggregation_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
|
||||
<span class="n">P</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">classif_predictions</span><span class="o">.</span><span class="n">Xy</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">n_classes</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">))</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n</span><span class="p">)),</span> \
|
||||
<span class="s1">'label name gaps not allowed in current implementation'</span>
|
||||
|
||||
<span class="c1"># counts_inv keeps track of the relative weight of each datapoint within its class</span>
|
||||
<span class="c1"># (i.e., the weight in its KDE model)</span>
|
||||
<span class="n">counts_inv</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">counts</span><span class="p">())</span>
|
||||
|
||||
<span class="c1"># tr_tr_sums corresponds to symbol \overline{B} in the paper</span>
|
||||
<span class="n">tr_tr_sums</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">i</span> <span class="o">></span> <span class="n">j</span><span class="p">:</span>
|
||||
<span class="n">tr_tr_sums</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">tr_tr_sums</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="n">i</span><span class="p">]</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">block</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">gram_matrix_mix_sum</span><span class="p">(</span><span class="n">P</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">i</span><span class="p">],</span> <span class="n">P</span><span class="p">[</span><span class="n">y</span> <span class="o">==</span> <span class="n">j</span><span class="p">]</span> <span class="k">if</span> <span class="n">i</span><span class="o">!=</span><span class="n">j</span> <span class="k">else</span> <span class="kc">None</span><span class="p">)</span>
|
||||
<span class="n">tr_tr_sums</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">block</span>
|
||||
|
||||
<span class="c1"># keep track of these data structures for the test phase</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">Ptr</span> <span class="o">=</span> <span class="n">P</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">ytr</span> <span class="o">=</span> <span class="n">y</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tr_tr_sums</span> <span class="o">=</span> <span class="n">tr_tr_sums</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">counts_inv</span> <span class="o">=</span> <span class="n">counts_inv</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="KDEyCS.aggregate"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._kdey.KDEyCS.aggregate">[docs]</a> <span class="k">def</span> <span class="nf">aggregate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="n">Ptr</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">Ptr</span>
|
||||
<span class="n">Pte</span> <span class="o">=</span> <span class="n">posteriors</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">ytr</span>
|
||||
<span class="n">tr_tr_sums</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">tr_tr_sums</span>
|
||||
|
||||
<span class="n">M</span><span class="p">,</span> <span class="n">nD</span> <span class="o">=</span> <span class="n">Pte</span><span class="o">.</span><span class="n">shape</span>
|
||||
<span class="n">Minv</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="n">M</span><span class="p">)</span> <span class="c1"># t in the paper</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">Ptr</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
|
||||
<span class="c1"># becomes a constant that does not affect the optimization, no need to compute it</span>
|
||||
<span class="c1"># partC = 0.5*np.log(self.gram_matrix_mix_sum(Pte) * Kinv * Kinv)</span>
|
||||
|
||||
<span class="c1"># tr_te_sums corresponds to \overline{a}*(1/Li)*(1/M) in the paper (note the constants</span>
|
||||
<span class="c1"># are already aggregated to tr_te_sums, so these multiplications are not carried out</span>
|
||||
<span class="c1"># at each iteration of the optimization phase)</span>
|
||||
<span class="n">tr_te_sums</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
|
||||
<span class="n">tr_te_sums</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">gram_matrix_mix_sum</span><span class="p">(</span><span class="n">Ptr</span><span class="p">[</span><span class="n">y</span><span class="o">==</span><span class="n">i</span><span class="p">],</span> <span class="n">Pte</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">divergence</span><span class="p">(</span><span class="n">alpha</span><span class="p">):</span>
|
||||
<span class="c1"># called \overline{r} in the paper</span>
|
||||
<span class="n">alpha_ratio</span> <span class="o">=</span> <span class="n">alpha</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">counts_inv</span>
|
||||
|
||||
<span class="c1"># recal that tr_te_sums already accounts for the constant terms (1/Li)*(1/M)</span>
|
||||
<span class="n">partA</span> <span class="o">=</span> <span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">((</span><span class="n">alpha_ratio</span> <span class="o">@</span> <span class="n">tr_te_sums</span><span class="p">)</span> <span class="o">*</span> <span class="n">Minv</span><span class="p">)</span>
|
||||
<span class="n">partB</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">alpha_ratio</span> <span class="o">@</span> <span class="n">tr_tr_sums</span> <span class="o">@</span> <span class="n">alpha_ratio</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">partA</span> <span class="o">+</span> <span class="n">partB</span> <span class="c1">#+ partC</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">F</span><span class="o">.</span><span class="n">optim_minimize</span><span class="p">(</span><span class="n">divergence</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span></div></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,520 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.method._neural — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.method._neural</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.method._neural</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">os</span>
|
||||
<span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
|
||||
<span class="kn">import</span> <span class="nn">random</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">torch</span>
|
||||
<span class="kn">from</span> <span class="nn">torch.nn</span> <span class="kn">import</span> <span class="n">MSELoss</span>
|
||||
<span class="kn">from</span> <span class="nn">torch.nn.functional</span> <span class="kn">import</span> <span class="n">relu</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">UPP</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="o">*</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.util</span> <span class="kn">import</span> <span class="n">EarlyStop</span>
|
||||
<span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer">[docs]</a><span class="k">class</span> <span class="nc">QuaNetTrainer</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implementation of `QuaNet <https://dl.acm.org/doi/abs/10.1145/3269206.3269287>`_, a neural network for</span>
|
||||
<span class="sd"> quantification. This implementation uses `PyTorch <https://pytorch.org/>`_ and can take advantage of GPU</span>
|
||||
<span class="sd"> for speeding-up the training phase.</span>
|
||||
|
||||
<span class="sd"> Example:</span>
|
||||
|
||||
<span class="sd"> >>> import quapy as qp</span>
|
||||
<span class="sd"> >>> from quapy.method.meta import QuaNet</span>
|
||||
<span class="sd"> >>> from quapy.classification.neural import NeuralClassifierTrainer, CNNnet</span>
|
||||
<span class="sd"> >>></span>
|
||||
<span class="sd"> >>> # use samples of 100 elements</span>
|
||||
<span class="sd"> >>> qp.environ['SAMPLE_SIZE'] = 100</span>
|
||||
<span class="sd"> >>></span>
|
||||
<span class="sd"> >>> # load the kindle dataset as text, and convert words to numerical indexes</span>
|
||||
<span class="sd"> >>> dataset = qp.datasets.fetch_reviews('kindle', pickle=True)</span>
|
||||
<span class="sd"> >>> qp.train.preprocessing.index(dataset, min_df=5, inplace=True)</span>
|
||||
<span class="sd"> >>></span>
|
||||
<span class="sd"> >>> # the text classifier is a CNN trained by NeuralClassifierTrainer</span>
|
||||
<span class="sd"> >>> cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)</span>
|
||||
<span class="sd"> >>> classifier = NeuralClassifierTrainer(cnn, device='cuda')</span>
|
||||
<span class="sd"> >>></span>
|
||||
<span class="sd"> >>> # train QuaNet (QuaNet is an alias to QuaNetTrainer)</span>
|
||||
<span class="sd"> >>> model = QuaNet(classifier, qp.environ['SAMPLE_SIZE'], device='cuda')</span>
|
||||
<span class="sd"> >>> model.fit(dataset.training)</span>
|
||||
<span class="sd"> >>> estim_prevalence = model.quantify(dataset.test.instances)</span>
|
||||
|
||||
<span class="sd"> :param classifier: an object implementing `fit` (i.e., that can be trained on labelled data),</span>
|
||||
<span class="sd"> `predict_proba` (i.e., that can generate posterior probabilities of unlabelled examples) and</span>
|
||||
<span class="sd"> `transform` (i.e., that can generate embedded representations of the unlabelled instances).</span>
|
||||
<span class="sd"> :param sample_size: integer, the sample size; default is None, meaning that the sample size should be</span>
|
||||
<span class="sd"> taken from qp.environ["SAMPLE_SIZE"]</span>
|
||||
<span class="sd"> :param n_epochs: integer, maximum number of training epochs</span>
|
||||
<span class="sd"> :param tr_iter_per_poch: integer, number of training iterations before considering an epoch complete</span>
|
||||
<span class="sd"> :param va_iter_per_poch: integer, number of validation iterations to perform after each epoch</span>
|
||||
<span class="sd"> :param lr: float, the learning rate</span>
|
||||
<span class="sd"> :param lstm_hidden_size: integer, hidden dimensionality of the LSTM cells</span>
|
||||
<span class="sd"> :param lstm_nlayers: integer, number of LSTM layers</span>
|
||||
<span class="sd"> :param ff_layers: list of integers, dimensions of the densely-connected FF layers on top of the</span>
|
||||
<span class="sd"> quantification embedding</span>
|
||||
<span class="sd"> :param bidirectional: boolean, indicates whether the LSTM is bidirectional or not</span>
|
||||
<span class="sd"> :param qdrop_p: float, dropout probability</span>
|
||||
<span class="sd"> :param patience: integer, number of epochs showing no improvement in the validation set before stopping the</span>
|
||||
<span class="sd"> training phase (early stopping)</span>
|
||||
<span class="sd"> :param checkpointdir: string, a path where to store models' checkpoints</span>
|
||||
<span class="sd"> :param checkpointname: string (optional), the name of the model's checkpoint</span>
|
||||
<span class="sd"> :param device: string, indicate "cpu" or "cuda"</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">classifier</span><span class="p">,</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">n_epochs</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
|
||||
<span class="n">tr_iter_per_poch</span><span class="o">=</span><span class="mi">500</span><span class="p">,</span>
|
||||
<span class="n">va_iter_per_poch</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
|
||||
<span class="n">lr</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">,</span>
|
||||
<span class="n">lstm_hidden_size</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span>
|
||||
<span class="n">lstm_nlayers</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="n">ff_layers</span><span class="o">=</span><span class="p">[</span><span class="mi">1024</span><span class="p">,</span> <span class="mi">512</span><span class="p">],</span>
|
||||
<span class="n">bidirectional</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">qdrop_p</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
|
||||
<span class="n">patience</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
|
||||
<span class="n">checkpointdir</span><span class="o">=</span><span class="s1">'../checkpoint'</span><span class="p">,</span>
|
||||
<span class="n">checkpointname</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">):</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="s1">'transform'</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'the classifier </span><span class="si">{</span><span class="n">classifier</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1"> does not seem to be able to produce document embeddings '</span> \
|
||||
<span class="sa">f</span><span class="s1">'since it does not implement the method "transform"'</span>
|
||||
<span class="k">assert</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="s1">'predict_proba'</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'the classifier </span><span class="si">{</span><span class="n">classifier</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1"> does not seem to be able to produce posterior probabilities '</span> \
|
||||
<span class="sa">f</span><span class="s1">'since it does not implement the method "predict_proba"'</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_sample_size</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_epochs</span> <span class="o">=</span> <span class="n">n_epochs</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tr_iter</span> <span class="o">=</span> <span class="n">tr_iter_per_poch</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">va_iter</span> <span class="o">=</span> <span class="n">va_iter_per_poch</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">lr</span> <span class="o">=</span> <span class="n">lr</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quanet_params</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'lstm_hidden_size'</span><span class="p">:</span> <span class="n">lstm_hidden_size</span><span class="p">,</span>
|
||||
<span class="s1">'lstm_nlayers'</span><span class="p">:</span> <span class="n">lstm_nlayers</span><span class="p">,</span>
|
||||
<span class="s1">'ff_layers'</span><span class="p">:</span> <span class="n">ff_layers</span><span class="p">,</span>
|
||||
<span class="s1">'bidirectional'</span><span class="p">:</span> <span class="n">bidirectional</span><span class="p">,</span>
|
||||
<span class="s1">'qdrop_p'</span><span class="p">:</span> <span class="n">qdrop_p</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">patience</span> <span class="o">=</span> <span class="n">patience</span>
|
||||
<span class="k">if</span> <span class="n">checkpointname</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">local_random</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">Random</span><span class="p">()</span>
|
||||
<span class="n">random_code</span> <span class="o">=</span> <span class="s1">'-'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">local_random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">))</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>
|
||||
<span class="n">checkpointname</span> <span class="o">=</span> <span class="s1">'QuaNet-'</span><span class="o">+</span><span class="n">random_code</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">checkpointdir</span> <span class="o">=</span> <span class="n">checkpointdir</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">checkpoint</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">checkpointdir</span><span class="p">,</span> <span class="n">checkpointname</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">__check_params_colision</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">quanet_params</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">get_params</span><span class="p">())</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_classes_</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Trains QuaNet.</span>
|
||||
|
||||
<span class="sd"> :param data: the training data on which to train QuaNet. If `fit_classifier=True`, the data will be split in</span>
|
||||
<span class="sd"> 40/40/20 for training the classifier, training QuaNet, and validating QuaNet, respectively. If</span>
|
||||
<span class="sd"> `fit_classifier=False`, the data will be split in 66/34 for training QuaNet and validating it, respectively.</span>
|
||||
<span class="sd"> :param fit_classifier: if True, trains the classifier on a split containing 40% of the data</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_classes_</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">classes_</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">checkpointdir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">fit_classifier</span><span class="p">:</span>
|
||||
<span class="n">classifier_data</span><span class="p">,</span> <span class="n">unused_data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.4</span><span class="p">)</span>
|
||||
<span class="n">train_data</span><span class="p">,</span> <span class="n">valid_data</span> <span class="o">=</span> <span class="n">unused_data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.66</span><span class="p">)</span> <span class="c1"># 0.66 split of 60% makes 40% and 20%</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="o">*</span><span class="n">classifier_data</span><span class="o">.</span><span class="n">Xy</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">classifier_data</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="n">train_data</span><span class="p">,</span> <span class="n">valid_data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.66</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># estimate the hard and soft stats tpr and fpr of the classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tr_prev</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
|
||||
<span class="c1"># compute the posterior probabilities of the instances</span>
|
||||
<span class="n">valid_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">valid_data</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">train_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">train_data</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># turn instances' original representations into embeddings</span>
|
||||
<span class="n">valid_data_embed</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">valid_data</span><span class="o">.</span><span class="n">instances</span><span class="p">),</span> <span class="n">valid_data</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_classes_</span><span class="p">)</span>
|
||||
<span class="n">train_data_embed</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">train_data</span><span class="o">.</span><span class="n">instances</span><span class="p">),</span> <span class="n">train_data</span><span class="o">.</span><span class="n">labels</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_classes_</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quantifiers</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'cc'</span><span class="p">:</span> <span class="n">CC</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span>
|
||||
<span class="s1">'acc'</span><span class="p">:</span> <span class="n">ACC</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="n">valid_data</span><span class="p">),</span>
|
||||
<span class="s1">'pcc'</span><span class="p">:</span> <span class="n">PCC</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span>
|
||||
<span class="s1">'pacc'</span><span class="p">:</span> <span class="n">PACC</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="n">valid_data</span><span class="p">),</span>
|
||||
<span class="p">}</span>
|
||||
<span class="k">if</span> <span class="n">classifier_data</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quantifiers</span><span class="p">[</span><span class="s1">'emq'</span><span class="p">]</span> <span class="o">=</span> <span class="n">EMQ</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">classifier_data</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="s1">'tr-loss'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="s1">'va-loss'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="s1">'tr-mae'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="s1">'va-mae'</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">nQ</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">quantifiers</span><span class="p">)</span>
|
||||
<span class="n">nC</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">n_classes</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quanet</span> <span class="o">=</span> <span class="n">QuaNetModule</span><span class="p">(</span>
|
||||
<span class="n">doc_embedding_size</span><span class="o">=</span><span class="n">train_data_embed</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
|
||||
<span class="n">n_classes</span><span class="o">=</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span>
|
||||
<span class="n">stats_size</span><span class="o">=</span><span class="n">nQ</span><span class="o">*</span><span class="n">nC</span><span class="p">,</span>
|
||||
<span class="n">order_by</span><span class="o">=</span><span class="mi">0</span> <span class="k">if</span> <span class="n">data</span><span class="o">.</span><span class="n">binary</span> <span class="k">else</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">quanet_params</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">optim</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">Adam</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">lr</span><span class="p">)</span>
|
||||
<span class="n">early_stop</span> <span class="o">=</span> <span class="n">EarlyStop</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">patience</span><span class="p">,</span> <span class="n">lower_is_better</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">checkpoint</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">checkpoint</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">epoch_i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_epochs</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_epoch</span><span class="p">(</span><span class="n">train_data_embed</span><span class="p">,</span> <span class="n">train_posteriors</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">tr_iter</span><span class="p">,</span> <span class="n">epoch_i</span><span class="p">,</span> <span class="n">early_stop</span><span class="p">,</span> <span class="n">train</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_epoch</span><span class="p">(</span><span class="n">valid_data_embed</span><span class="p">,</span> <span class="n">valid_posteriors</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">va_iter</span><span class="p">,</span> <span class="n">epoch_i</span><span class="p">,</span> <span class="n">early_stop</span><span class="p">,</span> <span class="n">train</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="n">early_stop</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'va-loss'</span><span class="p">],</span> <span class="n">epoch_i</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">early_stop</span><span class="o">.</span><span class="n">IMPROVED</span><span class="p">:</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="n">checkpoint</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">early_stop</span><span class="o">.</span><span class="n">STOP</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'training ended by patience exhausted; loading best model parameters in </span><span class="si">{</span><span class="n">checkpoint</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'for epoch </span><span class="si">{</span><span class="n">early_stop</span><span class="o">.</span><span class="n">best_epoch</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">load_state_dict</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">checkpoint</span><span class="p">))</span>
|
||||
<span class="k">break</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_get_aggregative_estims</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">):</span>
|
||||
<span class="n">label_predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">posteriors</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">prevs_estim</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">quantifier</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">quantifiers</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">posteriors</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">AggregativeSoftQuantifier</span><span class="p">)</span> <span class="k">else</span> <span class="n">label_predictions</span>
|
||||
<span class="n">prevs_estim</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">quantifier</span><span class="o">.</span><span class="n">aggregate</span><span class="p">(</span><span class="n">predictions</span><span class="p">))</span>
|
||||
|
||||
<span class="c1"># there is no real need for adding static estims like the TPR or FPR from training since those are constant</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">prevs_estim</span>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">embeddings</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">quant_estims</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_get_aggregative_estims</span><span class="p">(</span><span class="n">posteriors</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">eval</span><span class="p">()</span>
|
||||
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
|
||||
<span class="n">prevalence</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">embeddings</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">,</span> <span class="n">quant_estims</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">device</span> <span class="o">==</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">):</span>
|
||||
<span class="n">prevalence</span> <span class="o">=</span> <span class="n">prevalence</span><span class="o">.</span><span class="n">cpu</span><span class="p">()</span>
|
||||
<span class="n">prevalence</span> <span class="o">=</span> <span class="n">prevalence</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">prevalence</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_epoch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">,</span> <span class="n">iterations</span><span class="p">,</span> <span class="n">epoch</span><span class="p">,</span> <span class="n">early_stop</span><span class="p">,</span> <span class="n">train</span><span class="p">):</span>
|
||||
<span class="n">mse_loss</span> <span class="o">=</span> <span class="n">MSELoss</span><span class="p">()</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">mode</span><span class="o">=</span><span class="n">train</span><span class="p">)</span>
|
||||
<span class="n">losses</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">mae_errors</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">sampler</span> <span class="o">=</span> <span class="n">UPP</span><span class="p">(</span>
|
||||
<span class="n">data</span><span class="p">,</span>
|
||||
<span class="n">sample_size</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span><span class="p">,</span>
|
||||
<span class="n">repeats</span><span class="o">=</span><span class="n">iterations</span><span class="p">,</span>
|
||||
<span class="n">random_state</span><span class="o">=</span><span class="kc">None</span> <span class="k">if</span> <span class="n">train</span> <span class="k">else</span> <span class="mi">0</span> <span class="c1"># different samples during train, same samples during validation</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">pbar</span> <span class="o">=</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">sampler</span><span class="o">.</span><span class="n">samples_parameters</span><span class="p">(),</span> <span class="n">total</span><span class="o">=</span><span class="n">sampler</span><span class="o">.</span><span class="n">total</span><span class="p">())</span>
|
||||
<span class="k">for</span> <span class="n">it</span><span class="p">,</span> <span class="n">index</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">pbar</span><span class="p">):</span>
|
||||
<span class="n">sample_data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
|
||||
<span class="n">sample_posteriors</span> <span class="o">=</span> <span class="n">posteriors</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
|
||||
<span class="n">quant_estims</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_get_aggregative_estims</span><span class="p">(</span><span class="n">sample_posteriors</span><span class="p">)</span>
|
||||
<span class="n">ptrue</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">as_tensor</span><span class="p">([</span><span class="n">sample_data</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">train</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">zero_grad</span><span class="p">()</span>
|
||||
<span class="n">phat</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">sample_data</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">sample_posteriors</span><span class="p">,</span> <span class="n">quant_estims</span><span class="p">)</span>
|
||||
<span class="n">loss</span> <span class="o">=</span> <span class="n">mse_loss</span><span class="p">(</span><span class="n">phat</span><span class="p">,</span> <span class="n">ptrue</span><span class="p">)</span>
|
||||
<span class="n">mae</span> <span class="o">=</span> <span class="n">mae_loss</span><span class="p">(</span><span class="n">phat</span><span class="p">,</span> <span class="n">ptrue</span><span class="p">)</span>
|
||||
<span class="n">loss</span><span class="o">.</span><span class="n">backward</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">step</span><span class="p">()</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
|
||||
<span class="n">phat</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">quanet</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">sample_data</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">sample_posteriors</span><span class="p">,</span> <span class="n">quant_estims</span><span class="p">)</span>
|
||||
<span class="n">loss</span> <span class="o">=</span> <span class="n">mse_loss</span><span class="p">(</span><span class="n">phat</span><span class="p">,</span> <span class="n">ptrue</span><span class="p">)</span>
|
||||
<span class="n">mae</span> <span class="o">=</span> <span class="n">mae_loss</span><span class="p">(</span><span class="n">phat</span><span class="p">,</span> <span class="n">ptrue</span><span class="p">)</span>
|
||||
|
||||
<span class="n">losses</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">loss</span><span class="o">.</span><span class="n">item</span><span class="p">())</span>
|
||||
<span class="n">mae_errors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">mae</span><span class="o">.</span><span class="n">item</span><span class="p">())</span>
|
||||
|
||||
<span class="n">mse</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">losses</span><span class="p">)</span>
|
||||
<span class="n">mae</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">mae_errors</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">train</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'tr-loss'</span><span class="p">]</span> <span class="o">=</span> <span class="n">mse</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'tr-mae'</span><span class="p">]</span> <span class="o">=</span> <span class="n">mae</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'va-loss'</span><span class="p">]</span> <span class="o">=</span> <span class="n">mse</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s1">'va-mae'</span><span class="p">]</span> <span class="o">=</span> <span class="n">mae</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">train</span><span class="p">:</span>
|
||||
<span class="n">pbar</span><span class="o">.</span><span class="n">set_description</span><span class="p">(</span><span class="sa">f</span><span class="s1">'[QuaNet] '</span>
|
||||
<span class="sa">f</span><span class="s1">'epoch=</span><span class="si">{</span><span class="n">epoch</span><span class="si">}</span><span class="s1"> [it=</span><span class="si">{</span><span class="n">it</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">iterations</span><span class="si">}</span><span class="s1">]</span><span class="se">\t</span><span class="s1">'</span>
|
||||
<span class="sa">f</span><span class="s1">'tr-mseloss=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"tr-loss"</span><span class="p">]</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1"> tr-maeloss=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"tr-mae"</span><span class="p">]</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="se">\t</span><span class="s1">'</span>
|
||||
<span class="sa">f</span><span class="s1">'val-mseloss=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"va-loss"</span><span class="p">]</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1"> val-maeloss=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="p">[</span><span class="s2">"va-mae"</span><span class="p">]</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1"> '</span>
|
||||
<span class="sa">f</span><span class="s1">'patience=</span><span class="si">{</span><span class="n">early_stop</span><span class="o">.</span><span class="n">patience</span><span class="si">}</span><span class="s1">/</span><span class="si">{</span><span class="n">early_stop</span><span class="o">.</span><span class="n">PATIENCE_LIMIT</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer.get_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer.get_params">[docs]</a> <span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="n">classifier_params</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">get_params</span><span class="p">()</span>
|
||||
<span class="n">classifier_params</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'classifier__'</span><span class="o">+</span><span class="n">k</span><span class="p">:</span><span class="n">v</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="n">classifier_params</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span>
|
||||
<span class="k">return</span> <span class="p">{</span><span class="o">**</span><span class="n">classifier_params</span><span class="p">,</span> <span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">quanet_params</span><span class="p">}</span></div>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer.set_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer.set_params">[docs]</a> <span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span>
|
||||
<span class="n">learner_params</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">parameters</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
|
||||
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">quanet_params</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">quanet_params</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span>
|
||||
<span class="k">elif</span> <span class="n">key</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'classifier__'</span><span class="p">):</span>
|
||||
<span class="n">learner_params</span><span class="p">[</span><span class="n">key</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'classifier__'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)]</span> <span class="o">=</span> <span class="n">val</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'unknown parameter '</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">learner_params</span><span class="p">)</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__check_params_colision</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">quanet_params</span><span class="p">,</span> <span class="n">learner_params</span><span class="p">):</span>
|
||||
<span class="n">quanet_keys</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">quanet_params</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
|
||||
<span class="n">learner_keys</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">learner_params</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
|
||||
<span class="n">intersection</span> <span class="o">=</span> <span class="n">quanet_keys</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="n">learner_keys</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">intersection</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'the use of parameters </span><span class="si">{</span><span class="n">intersection</span><span class="si">}</span><span class="s1"> is ambiguous sine those can refer to '</span>
|
||||
<span class="sa">f</span><span class="s1">'the parameters of QuaNet or the learner </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer.clean_checkpoint"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer.clean_checkpoint">[docs]</a> <span class="k">def</span> <span class="nf">clean_checkpoint</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Removes the checkpoint</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">checkpoint</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetTrainer.clean_checkpoint_dir"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetTrainer.clean_checkpoint_dir">[docs]</a> <span class="k">def</span> <span class="nf">clean_checkpoint_dir</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Removes anything contained in the checkpoint directory</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="kn">import</span> <span class="nn">shutil</span>
|
||||
<span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">checkpointdir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">classes_</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_classes_</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mae_loss"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.mae_loss">[docs]</a><span class="k">def</span> <span class="nf">mae_loss</span><span class="p">(</span><span class="n">output</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Torch-like wrapper for the Mean Absolute Error</span>
|
||||
|
||||
<span class="sd"> :param output: predictions</span>
|
||||
<span class="sd"> :param target: ground truth values</span>
|
||||
<span class="sd"> :return: mean absolute error loss</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">output</span> <span class="o">-</span> <span class="n">target</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="QuaNetModule"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetModule">[docs]</a><span class="k">class</span> <span class="nc">QuaNetModule</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements the `QuaNet <https://dl.acm.org/doi/abs/10.1145/3269206.3269287>`_ forward pass.</span>
|
||||
<span class="sd"> See :class:`QuaNetTrainer` for training QuaNet.</span>
|
||||
|
||||
<span class="sd"> :param doc_embedding_size: integer, the dimensionality of the document embeddings</span>
|
||||
<span class="sd"> :param n_classes: integer, number of classes</span>
|
||||
<span class="sd"> :param stats_size: integer, number of statistics estimated by simple quantification methods</span>
|
||||
<span class="sd"> :param lstm_hidden_size: integer, hidden dimensionality of the LSTM cell</span>
|
||||
<span class="sd"> :param lstm_nlayers: integer, number of LSTM layers</span>
|
||||
<span class="sd"> :param ff_layers: list of integers, dimensions of the densely-connected FF layers on top of the</span>
|
||||
<span class="sd"> quantification embedding</span>
|
||||
<span class="sd"> :param bidirectional: boolean, whether or not to use bidirectional LSTM</span>
|
||||
<span class="sd"> :param qdrop_p: float, dropout probability</span>
|
||||
<span class="sd"> :param order_by: integer, class for which the document embeddings are to be sorted</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">doc_embedding_size</span><span class="p">,</span>
|
||||
<span class="n">n_classes</span><span class="p">,</span>
|
||||
<span class="n">stats_size</span><span class="p">,</span>
|
||||
<span class="n">lstm_hidden_size</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span>
|
||||
<span class="n">lstm_nlayers</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="n">ff_layers</span><span class="o">=</span><span class="p">[</span><span class="mi">1024</span><span class="p">,</span> <span class="mi">512</span><span class="p">],</span>
|
||||
<span class="n">bidirectional</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">qdrop_p</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
|
||||
<span class="n">order_by</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
|
||||
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_classes</span> <span class="o">=</span> <span class="n">n_classes</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">order_by</span> <span class="o">=</span> <span class="n">order_by</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span> <span class="o">=</span> <span class="n">lstm_hidden_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span> <span class="o">=</span> <span class="n">lstm_nlayers</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">bidirectional</span> <span class="o">=</span> <span class="n">bidirectional</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">ndirections</span> <span class="o">=</span> <span class="mi">2</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">bidirectional</span> <span class="k">else</span> <span class="mi">1</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">qdrop_p</span> <span class="o">=</span> <span class="n">qdrop_p</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">lstm</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">LSTM</span><span class="p">(</span><span class="n">doc_embedding_size</span> <span class="o">+</span> <span class="n">n_classes</span><span class="p">,</span> <span class="c1"># +n_classes stands for the posterior probs. (concatenated)</span>
|
||||
<span class="n">lstm_hidden_size</span><span class="p">,</span> <span class="n">lstm_nlayers</span><span class="p">,</span> <span class="n">bidirectional</span><span class="o">=</span><span class="n">bidirectional</span><span class="p">,</span>
|
||||
<span class="n">dropout</span><span class="o">=</span><span class="n">qdrop_p</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">qdrop_p</span><span class="p">)</span>
|
||||
|
||||
<span class="n">lstm_output_size</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">ndirections</span>
|
||||
<span class="n">ff_input_size</span> <span class="o">=</span> <span class="n">lstm_output_size</span> <span class="o">+</span> <span class="n">stats_size</span>
|
||||
<span class="n">prev_size</span> <span class="o">=</span> <span class="n">ff_input_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">ff_layers</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">lin_size</span> <span class="ow">in</span> <span class="n">ff_layers</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">ff_layers</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">prev_size</span><span class="p">,</span> <span class="n">lin_size</span><span class="p">))</span>
|
||||
<span class="n">prev_size</span> <span class="o">=</span> <span class="n">lin_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">output</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">prev_size</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">device</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span> <span class="k">if</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">parameters</span><span class="p">())</span><span class="o">.</span><span class="n">is_cuda</span> <span class="k">else</span> <span class="n">torch</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="s1">'cpu'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_init_hidden</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">directions</span> <span class="o">=</span> <span class="mi">2</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">bidirectional</span> <span class="k">else</span> <span class="mi">1</span>
|
||||
<span class="n">var_hidden</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span> <span class="o">*</span> <span class="n">directions</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span><span class="p">)</span>
|
||||
<span class="n">var_cell</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span> <span class="o">*</span> <span class="n">directions</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">lstm</span><span class="o">.</span><span class="n">parameters</span><span class="p">())</span><span class="o">.</span><span class="n">is_cuda</span><span class="p">:</span>
|
||||
<span class="n">var_hidden</span><span class="p">,</span> <span class="n">var_cell</span> <span class="o">=</span> <span class="n">var_hidden</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span> <span class="n">var_cell</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">var_hidden</span><span class="p">,</span> <span class="n">var_cell</span>
|
||||
|
||||
<div class="viewcode-block" id="QuaNetModule.forward"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._neural.QuaNetModule.forward">[docs]</a> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">doc_embeddings</span><span class="p">,</span> <span class="n">doc_posteriors</span><span class="p">,</span> <span class="n">statistics</span><span class="p">):</span>
|
||||
<span class="n">device</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">device</span>
|
||||
<span class="n">doc_embeddings</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">as_tensor</span><span class="p">(</span><span class="n">doc_embeddings</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="n">doc_posteriors</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">as_tensor</span><span class="p">(</span><span class="n">doc_posteriors</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">)</span>
|
||||
<span class="n">statistics</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">as_tensor</span><span class="p">(</span><span class="n">statistics</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">order_by</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">doc_posteriors</span><span class="p">[:,</span> <span class="bp">self</span><span class="o">.</span><span class="n">order_by</span><span class="p">])</span>
|
||||
<span class="n">doc_embeddings</span> <span class="o">=</span> <span class="n">doc_embeddings</span><span class="p">[</span><span class="n">order</span><span class="p">]</span>
|
||||
<span class="n">doc_posteriors</span> <span class="o">=</span> <span class="n">doc_posteriors</span><span class="p">[</span><span class="n">order</span><span class="p">]</span>
|
||||
|
||||
<span class="n">embeded_posteriors</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">((</span><span class="n">doc_embeddings</span><span class="p">,</span> <span class="n">doc_posteriors</span><span class="p">),</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># the entire set represents only one instance in quapy contexts, and so the batch_size=1</span>
|
||||
<span class="c1"># the shape should be (1, number-of-instances, embedding-size + n_classes)</span>
|
||||
<span class="n">embeded_posteriors</span> <span class="o">=</span> <span class="n">embeded_posteriors</span><span class="o">.</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">lstm</span><span class="o">.</span><span class="n">flatten_parameters</span><span class="p">()</span>
|
||||
<span class="n">_</span><span class="p">,</span> <span class="p">(</span><span class="n">rnn_hidden</span><span class="p">,</span><span class="n">_</span><span class="p">)</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lstm</span><span class="p">(</span><span class="n">embeded_posteriors</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_init_hidden</span><span class="p">())</span>
|
||||
<span class="n">rnn_hidden</span> <span class="o">=</span> <span class="n">rnn_hidden</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">ndirections</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span><span class="p">)</span>
|
||||
<span class="n">quant_embedding</span> <span class="o">=</span> <span class="n">rnn_hidden</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">quant_embedding</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">((</span><span class="n">quant_embedding</span><span class="p">,</span> <span class="n">statistics</span><span class="p">))</span>
|
||||
|
||||
<span class="n">abstracted</span> <span class="o">=</span> <span class="n">quant_embedding</span><span class="o">.</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">linear</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">ff_layers</span><span class="p">:</span>
|
||||
<span class="n">abstracted</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">relu</span><span class="p">(</span><span class="n">linear</span><span class="p">(</span><span class="n">abstracted</span><span class="p">)))</span>
|
||||
|
||||
<span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">abstracted</span><span class="p">)</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">prevalence</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">logits</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">prevalence</span></div></div>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,364 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.method._threshold_optim — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.method._threshold_optim</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.method._threshold_optim</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">abc</span> <span class="kn">import</span> <span class="n">abstractmethod</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">BinaryAggregativeQuantifier</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ThresholdOptimization"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.ThresholdOptimization">[docs]</a><span class="k">class</span> <span class="nc">ThresholdOptimization</span><span class="p">(</span><span class="n">BinaryAggregativeQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstract class of Threshold Optimization variants for :class:`ACC` as proposed by</span>
|
||||
<span class="sd"> `Forman 2006 <https://dl.acm.org/doi/abs/10.1145/1150402.1150423>`_ and</span>
|
||||
<span class="sd"> `Forman 2008 <https://link.springer.com/article/10.1007/s10618-008-0097-y>`_.</span>
|
||||
<span class="sd"> The goal is to bring improved stability to the denominator of the adjustment.</span>
|
||||
<span class="sd"> The different variants are based on different heuristics for choosing a decision threshold</span>
|
||||
<span class="sd"> that would allow for more true positives and many more false positives, on the grounds this</span>
|
||||
<span class="sd"> would deliver larger denominators.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param val_split: indicates the proportion of data to be used as a stratified held-out validation set in which the</span>
|
||||
<span class="sd"> misclassification rates are to be estimated.</span>
|
||||
<span class="sd"> This parameter can be indicated as a real value (between 0 and 1), representing a proportion of</span>
|
||||
<span class="sd"> validation data, or as an integer, indicating that the misclassification rates should be estimated via</span>
|
||||
<span class="sd"> `k`-fold cross validation (this integer stands for the number of folds `k`, defaults 5), or as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">classifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="ThresholdOptimization.condition"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.ThresholdOptimization.condition">[docs]</a> <span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">condition</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements the criterion according to which the threshold should be selected.</span>
|
||||
<span class="sd"> This function should return the (float) score to be minimized.</span>
|
||||
|
||||
<span class="sd"> :param tpr: float, true positive rate</span>
|
||||
<span class="sd"> :param fpr: float, false positive rate</span>
|
||||
<span class="sd"> :return: float, a score for the given `tpr` and `fpr`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
<div class="viewcode-block" id="ThresholdOptimization.discard"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.ThresholdOptimization.discard">[docs]</a> <span class="k">def</span> <span class="nf">discard</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Indicates whether a combination of tpr and fpr should be discarded</span>
|
||||
|
||||
<span class="sd"> :param tpr: float, true positive rate</span>
|
||||
<span class="sd"> :param fpr: float, false positive rate</span>
|
||||
<span class="sd"> :return: true if the combination is to be discarded, false otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">tpr</span> <span class="o">-</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_eval_candidate_thresholds</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">decision_scores</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Seeks for the best `tpr` and `fpr` according to the score obtained at different</span>
|
||||
<span class="sd"> decision thresholds. The scoring function is implemented in function `_condition`.</span>
|
||||
|
||||
<span class="sd"> :param decision_scores: array-like with the classification scores</span>
|
||||
<span class="sd"> :param y: predicted labels for the validation set (or for the training set via `k`-fold cross validation)</span>
|
||||
<span class="sd"> :return: best `tpr` and `fpr` and `threshold` according to `_condition`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">candidate_thresholds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">decision_scores</span><span class="p">)</span>
|
||||
|
||||
<span class="n">candidates</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">scores</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">candidate_threshold</span> <span class="ow">in</span> <span class="n">candidate_thresholds</span><span class="p">:</span>
|
||||
<span class="n">y_</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">classes_</span><span class="p">[</span><span class="mi">1</span> <span class="o">*</span> <span class="p">(</span><span class="n">decision_scores</span> <span class="o">>=</span> <span class="n">candidate_threshold</span><span class="p">)]</span>
|
||||
<span class="n">TP</span><span class="p">,</span> <span class="n">FP</span><span class="p">,</span> <span class="n">FN</span><span class="p">,</span> <span class="n">TN</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_compute_table</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y_</span><span class="p">)</span>
|
||||
<span class="n">tpr</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_compute_tpr</span><span class="p">(</span><span class="n">TP</span><span class="p">,</span> <span class="n">FN</span><span class="p">)</span>
|
||||
<span class="n">fpr</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_compute_fpr</span><span class="p">(</span><span class="n">FP</span><span class="p">,</span> <span class="n">TN</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">):</span>
|
||||
<span class="n">candidate_score</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">condition</span><span class="p">(</span><span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span>
|
||||
<span class="n">candidates</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">,</span> <span class="n">candidate_threshold</span><span class="p">])</span>
|
||||
<span class="n">scores</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">candidate_score</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="c1"># if no candidate gives rise to a valid combination of tpr and fpr, this method defaults to the standard</span>
|
||||
<span class="c1"># classify & count; this is akin to assign tpr=1, fpr=0, threshold=0</span>
|
||||
<span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">,</span> <span class="n">threshold</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span>
|
||||
<span class="n">candidates</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">,</span> <span class="n">threshold</span><span class="p">])</span>
|
||||
<span class="n">scores</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="n">candidates</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span>
|
||||
<span class="n">candidates</span> <span class="o">=</span> <span class="n">candidates</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">scores</span><span class="p">)]</span> <span class="c1"># sort candidates by candidate_score</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">candidates</span>
|
||||
|
||||
<div class="viewcode-block" id="ThresholdOptimization.aggregate_with_threshold"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.ThresholdOptimization.aggregate_with_threshold">[docs]</a> <span class="k">def</span> <span class="nf">aggregate_with_threshold</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">,</span> <span class="n">tprs</span><span class="p">,</span> <span class="n">fprs</span><span class="p">,</span> <span class="n">thresholds</span><span class="p">):</span>
|
||||
<span class="c1"># This function performs the adjusted count for given tpr, fpr, and threshold.</span>
|
||||
<span class="c1"># Note that, due to broadcasting, tprs, fprs, and thresholds could be arrays of length > 1</span>
|
||||
<span class="n">prevs_estims</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">[:,</span> <span class="kc">None</span><span class="p">]</span> <span class="o">>=</span> <span class="n">thresholds</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">prevs_estims</span> <span class="o">=</span> <span class="p">(</span><span class="n">prevs_estims</span> <span class="o">-</span> <span class="n">fprs</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">tprs</span> <span class="o">-</span> <span class="n">fprs</span><span class="p">)</span>
|
||||
<span class="n">prevs_estims</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">as_binary_prevalence</span><span class="p">(</span><span class="n">prevs_estims</span><span class="p">,</span> <span class="n">clip_if_necessary</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">prevs_estims</span><span class="o">.</span><span class="n">squeeze</span><span class="p">()</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_compute_table</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">y_</span><span class="p">):</span>
|
||||
<span class="n">TP</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">logical_and</span><span class="p">(</span><span class="n">y</span> <span class="o">==</span> <span class="n">y_</span><span class="p">,</span> <span class="n">y</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos_label</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="n">FP</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">logical_and</span><span class="p">(</span><span class="n">y</span> <span class="o">!=</span> <span class="n">y_</span><span class="p">,</span> <span class="n">y</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">neg_label</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="n">FN</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">logical_and</span><span class="p">(</span><span class="n">y</span> <span class="o">!=</span> <span class="n">y_</span><span class="p">,</span> <span class="n">y</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos_label</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="n">TN</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">logical_and</span><span class="p">(</span><span class="n">y</span> <span class="o">==</span> <span class="n">y_</span><span class="p">,</span> <span class="n">y</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">neg_label</span><span class="p">)</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="n">TP</span><span class="p">,</span> <span class="n">FP</span><span class="p">,</span> <span class="n">FN</span><span class="p">,</span> <span class="n">TN</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_compute_tpr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">TP</span><span class="p">,</span> <span class="n">FP</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">TP</span> <span class="o">+</span> <span class="n">FP</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="mi">1</span>
|
||||
<span class="k">return</span> <span class="n">TP</span> <span class="o">/</span> <span class="p">(</span><span class="n">TP</span> <span class="o">+</span> <span class="n">FP</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_compute_fpr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">FP</span><span class="p">,</span> <span class="n">TN</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">FP</span> <span class="o">+</span> <span class="n">TN</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="mi">0</span>
|
||||
<span class="k">return</span> <span class="n">FP</span> <span class="o">/</span> <span class="p">(</span><span class="n">FP</span> <span class="o">+</span> <span class="n">TN</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="ThresholdOptimization.aggregation_fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.ThresholdOptimization.aggregation_fit">[docs]</a> <span class="k">def</span> <span class="nf">aggregation_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="n">decision_scores</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">classif_predictions</span><span class="o">.</span><span class="n">Xy</span>
|
||||
<span class="c1"># the standard behavior is to keep the best threshold only</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tpr</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">fpr</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">threshold</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_eval_candidate_thresholds</span><span class="p">(</span><span class="n">decision_scores</span><span class="p">,</span> <span class="n">y</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="ThresholdOptimization.aggregate"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.ThresholdOptimization.aggregate">[docs]</a> <span class="k">def</span> <span class="nf">aggregate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="c1"># the standard behavior is to compute the adjusted count using the best threshold found</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate_with_threshold</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">tpr</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">fpr</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">threshold</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="T50"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.T50">[docs]</a><span class="k">class</span> <span class="nc">T50</span><span class="p">(</span><span class="n">ThresholdOptimization</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Threshold Optimization variant for :class:`ACC` as proposed by</span>
|
||||
<span class="sd"> `Forman 2006 <https://dl.acm.org/doi/abs/10.1145/1150402.1150423>`_ and</span>
|
||||
<span class="sd"> `Forman 2008 <https://link.springer.com/article/10.1007/s10618-008-0097-y>`_ that looks</span>
|
||||
<span class="sd"> for the threshold that makes `tpr` closest to 0.5.</span>
|
||||
<span class="sd"> The goal is to bring improved stability to the denominator of the adjustment.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param val_split: indicates the proportion of data to be used as a stratified held-out validation set in which the</span>
|
||||
<span class="sd"> misclassification rates are to be estimated.</span>
|
||||
<span class="sd"> This parameter can be indicated as a real value (between 0 and 1), representing a proportion of</span>
|
||||
<span class="sd"> validation data, or as an integer, indicating that the misclassification rates should be estimated via</span>
|
||||
<span class="sd"> `k`-fold cross validation (this integer stands for the number of folds `k`, defaults 5), or as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="T50.condition"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.T50.condition">[docs]</a> <span class="k">def</span> <span class="nf">condition</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="nb">abs</span><span class="p">(</span><span class="n">tpr</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MAX"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MAX">[docs]</a><span class="k">class</span> <span class="nc">MAX</span><span class="p">(</span><span class="n">ThresholdOptimization</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Threshold Optimization variant for :class:`ACC` as proposed by</span>
|
||||
<span class="sd"> `Forman 2006 <https://dl.acm.org/doi/abs/10.1145/1150402.1150423>`_ and</span>
|
||||
<span class="sd"> `Forman 2008 <https://link.springer.com/article/10.1007/s10618-008-0097-y>`_ that looks</span>
|
||||
<span class="sd"> for the threshold that maximizes `tpr-fpr`.</span>
|
||||
<span class="sd"> The goal is to bring improved stability to the denominator of the adjustment.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param val_split: indicates the proportion of data to be used as a stratified held-out validation set in which the</span>
|
||||
<span class="sd"> misclassification rates are to be estimated.</span>
|
||||
<span class="sd"> This parameter can be indicated as a real value (between 0 and 1), representing a proportion of</span>
|
||||
<span class="sd"> validation data, or as an integer, indicating that the misclassification rates should be estimated via</span>
|
||||
<span class="sd"> `k`-fold cross validation (this integer stands for the number of folds `k`, defaults 5), or as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MAX.condition"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MAX.condition">[docs]</a> <span class="k">def</span> <span class="nf">condition</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
|
||||
<span class="c1"># MAX strives to maximize (tpr - fpr), which is equivalent to minimize (fpr - tpr)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">fpr</span> <span class="o">-</span> <span class="n">tpr</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="X"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.X">[docs]</a><span class="k">class</span> <span class="nc">X</span><span class="p">(</span><span class="n">ThresholdOptimization</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Threshold Optimization variant for :class:`ACC` as proposed by</span>
|
||||
<span class="sd"> `Forman 2006 <https://dl.acm.org/doi/abs/10.1145/1150402.1150423>`_ and</span>
|
||||
<span class="sd"> `Forman 2008 <https://link.springer.com/article/10.1007/s10618-008-0097-y>`_ that looks</span>
|
||||
<span class="sd"> for the threshold that yields `tpr=1-fpr`.</span>
|
||||
<span class="sd"> The goal is to bring improved stability to the denominator of the adjustment.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param val_split: indicates the proportion of data to be used as a stratified held-out validation set in which the</span>
|
||||
<span class="sd"> misclassification rates are to be estimated.</span>
|
||||
<span class="sd"> This parameter can be indicated as a real value (between 0 and 1), representing a proportion of</span>
|
||||
<span class="sd"> validation data, or as an integer, indicating that the misclassification rates should be estimated via</span>
|
||||
<span class="sd"> `k`-fold cross validation (this integer stands for the number of folds `k`, defaults 5), or as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="X.condition"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.X.condition">[docs]</a> <span class="k">def</span> <span class="nf">condition</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="nb">abs</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="p">(</span><span class="n">tpr</span> <span class="o">+</span> <span class="n">fpr</span><span class="p">))</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MS"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MS">[docs]</a><span class="k">class</span> <span class="nc">MS</span><span class="p">(</span><span class="n">ThresholdOptimization</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Median Sweep. Threshold Optimization variant for :class:`ACC` as proposed by</span>
|
||||
<span class="sd"> `Forman 2006 <https://dl.acm.org/doi/abs/10.1145/1150402.1150423>`_ and</span>
|
||||
<span class="sd"> `Forman 2008 <https://link.springer.com/article/10.1007/s10618-008-0097-y>`_ that generates</span>
|
||||
<span class="sd"> class prevalence estimates for all decision thresholds and returns the median of them all.</span>
|
||||
<span class="sd"> The goal is to bring improved stability to the denominator of the adjustment.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param val_split: indicates the proportion of data to be used as a stratified held-out validation set in which the</span>
|
||||
<span class="sd"> misclassification rates are to be estimated.</span>
|
||||
<span class="sd"> This parameter can be indicated as a real value (between 0 and 1), representing a proportion of</span>
|
||||
<span class="sd"> validation data, or as an integer, indicating that the misclassification rates should be estimated via</span>
|
||||
<span class="sd"> `k`-fold cross validation (this integer stands for the number of folds `k`, defaults 5), or as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MS.condition"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MS.condition">[docs]</a> <span class="k">def</span> <span class="nf">condition</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="mi">1</span></div>
|
||||
|
||||
<div class="viewcode-block" id="MS.aggregation_fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MS.aggregation_fit">[docs]</a> <span class="k">def</span> <span class="nf">aggregation_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="n">decision_scores</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">classif_predictions</span><span class="o">.</span><span class="n">Xy</span>
|
||||
<span class="c1"># keeps all candidates</span>
|
||||
<span class="n">tprs_fprs_thresholds</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_eval_candidate_thresholds</span><span class="p">(</span><span class="n">decision_scores</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tprs</span> <span class="o">=</span> <span class="n">tprs_fprs_thresholds</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">fprs</span> <span class="o">=</span> <span class="n">tprs_fprs_thresholds</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">thresholds</span> <span class="o">=</span> <span class="n">tprs_fprs_thresholds</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="MS.aggregate"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MS.aggregate">[docs]</a> <span class="k">def</span> <span class="nf">aggregate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classif_predictions</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">):</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">aggregate_with_threshold</span><span class="p">(</span><span class="n">classif_predictions</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">tprs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">fprs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">thresholds</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">prevalences</span><span class="o">.</span><span class="n">ndim</span><span class="o">==</span><span class="mi">2</span><span class="p">:</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">median</span><span class="p">(</span><span class="n">prevalences</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">prevalences</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MS2"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MS2">[docs]</a><span class="k">class</span> <span class="nc">MS2</span><span class="p">(</span><span class="n">MS</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Median Sweep 2. Threshold Optimization variant for :class:`ACC` as proposed by</span>
|
||||
<span class="sd"> `Forman 2006 <https://dl.acm.org/doi/abs/10.1145/1150402.1150423>`_ and</span>
|
||||
<span class="sd"> `Forman 2008 <https://link.springer.com/article/10.1007/s10618-008-0097-y>`_ that generates</span>
|
||||
<span class="sd"> class prevalence estimates for all decision thresholds and returns the median of for cases in</span>
|
||||
<span class="sd"> which `tpr-fpr>0.25`</span>
|
||||
<span class="sd"> The goal is to bring improved stability to the denominator of the adjustment.</span>
|
||||
|
||||
<span class="sd"> :param classifier: a sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param val_split: indicates the proportion of data to be used as a stratified held-out validation set in which the</span>
|
||||
<span class="sd"> misclassification rates are to be estimated.</span>
|
||||
<span class="sd"> This parameter can be indicated as a real value (between 0 and 1), representing a proportion of</span>
|
||||
<span class="sd"> validation data, or as an integer, indicating that the misclassification rates should be estimated via</span>
|
||||
<span class="sd"> `k`-fold cross validation (this integer stands for the number of folds `k`, defaults 5), or as a</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">classifier</span><span class="p">:</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">val_split</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MS2.discard"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method._threshold_optim.MS2.discard">[docs]</a> <span class="k">def</span> <span class="nf">discard</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tpr</span><span class="p">,</span> <span class="n">fpr</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">tpr</span><span class="o">-</span><span class="n">fpr</span><span class="p">)</span> <span class="o"><=</span> <span class="mf">0.25</span></div></div>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,212 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.method.base — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.method.base</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.method.base</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">abc</span> <span class="kn">import</span> <span class="n">ABCMeta</span><span class="p">,</span> <span class="n">abstractmethod</span>
|
||||
<span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">joblib</span> <span class="kn">import</span> <span class="n">Parallel</span><span class="p">,</span> <span class="n">delayed</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
|
||||
<span class="c1"># Base Quantifier abstract class</span>
|
||||
<span class="c1"># ------------------------------------</span>
|
||||
<div class="viewcode-block" id="BaseQuantifier"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.BaseQuantifier">[docs]</a><span class="k">class</span> <span class="nc">BaseQuantifier</span><span class="p">(</span><span class="n">BaseEstimator</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstract Quantifier. A quantifier is defined as an object of a class that implements the method :meth:`fit` on</span>
|
||||
<span class="sd"> :class:`quapy.data.base.LabelledCollection`, the method :meth:`quantify`, and the :meth:`set_params` and</span>
|
||||
<span class="sd"> :meth:`get_params` for model selection (see :meth:`quapy.model_selection.GridSearchQ`)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<div class="viewcode-block" id="BaseQuantifier.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.BaseQuantifier.fit">[docs]</a> <span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Trains a quantifier.</span>
|
||||
|
||||
<span class="sd"> :param data: a :class:`quapy.data.base.LabelledCollection` consisting of the training data</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
<div class="viewcode-block" id="BaseQuantifier.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.BaseQuantifier.quantify">[docs]</a> <span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generate class prevalence estimates for the sample's instances</span>
|
||||
|
||||
<span class="sd"> :param instances: array-like</span>
|
||||
<span class="sd"> :return: `np.ndarray` of shape `(n_classes,)` with class prevalence estimates.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="BinaryQuantifier"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.BinaryQuantifier">[docs]</a><span class="k">class</span> <span class="nc">BinaryQuantifier</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstract class of binary quantifiers, i.e., quantifiers estimating class prevalence values for only two classes</span>
|
||||
<span class="sd"> (typically, to be interpreted as one class and its complement).</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_check_binary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">quantifier_name</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="n">data</span><span class="o">.</span><span class="n">binary</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">quantifier_name</span><span class="si">}</span><span class="s1"> works only on problems of binary classification. '</span> \
|
||||
<span class="sa">f</span><span class="s1">'Use the class OneVsAll to enable </span><span class="si">{</span><span class="n">quantifier_name</span><span class="si">}</span><span class="s1"> work on single-label data.'</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="OneVsAll"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.OneVsAll">[docs]</a><span class="k">class</span> <span class="nc">OneVsAll</span><span class="p">:</span>
|
||||
<span class="k">pass</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="newOneVsAll"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.newOneVsAll">[docs]</a><span class="k">def</span> <span class="nf">newOneVsAll</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">BaseQuantifier</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">binary_quantifier</span><span class="si">}</span><span class="s1"> does not seem to be a Quantifier'</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">AggregativeQuantifier</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">OneVsAllAggregative</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">OneVsAllGeneric</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">n_jobs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="OneVsAllGeneric"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.OneVsAllGeneric">[docs]</a><span class="k">class</span> <span class="nc">OneVsAllGeneric</span><span class="p">(</span><span class="n">OneVsAll</span><span class="p">,</span> <span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary</span>
|
||||
<span class="sd"> quantifier for each class, and then l1-normalizes the outputs so that the class prevelence values sum up to 1.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">BaseQuantifier</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">binary_quantifier</span><span class="si">}</span><span class="s1"> does not seem to be a Quantifier'</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binary_quantifier</span><span class="p">,</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">AggregativeQuantifier</span><span class="p">):</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'[warning] the quantifier seems to be an instance of qp.method.aggregative.AggregativeQuantifier; '</span>
|
||||
<span class="sa">f</span><span class="s1">'you might prefer instantiating </span><span class="si">{</span><span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">OneVsAllAggregative</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">binary_quantifier</span> <span class="o">=</span> <span class="n">binary_quantifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="OneVsAllGeneric.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.OneVsAllGeneric.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">fit_classifier</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="ow">not</span> <span class="n">data</span><span class="o">.</span><span class="n">binary</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1"> expect non-binary data'</span>
|
||||
<span class="k">assert</span> <span class="n">fit_classifier</span> <span class="o">==</span> <span class="kc">True</span><span class="p">,</span> <span class="s1">'fit_classifier must be True'</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dict_binary_quantifiers</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span><span class="p">:</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">binary_quantifier</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">classes_</span><span class="p">}</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_parallel</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_delayed_binary_fit</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_parallel</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span>
|
||||
<span class="n">Parallel</span><span class="p">(</span><span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span> <span class="n">backend</span><span class="o">=</span><span class="s1">'threading'</span><span class="p">)(</span>
|
||||
<span class="n">delayed</span><span class="p">(</span><span class="n">func</span><span class="p">)(</span><span class="n">c</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">classes_</span>
|
||||
<span class="p">)</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="OneVsAllGeneric.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.base.OneVsAllGeneric.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">prevalences</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parallel</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_delayed_binary_predict</span><span class="p">,</span> <span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">qp</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">normalize_prevalence</span><span class="p">(</span><span class="n">prevalences</span><span class="p">)</span></div>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">classes_</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="nb">sorted</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">dict_binary_quantifiers</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_binary_predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">dict_binary_quantifiers</span><span class="p">[</span><span class="n">c</span><span class="p">]</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_binary_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
|
||||
<span class="n">bindata</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">labels</span> <span class="o">==</span> <span class="n">c</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="p">[</span><span class="kc">False</span><span class="p">,</span> <span class="kc">True</span><span class="p">])</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">dict_binary_quantifiers</span><span class="p">[</span><span class="n">c</span><span class="p">]</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">bindata</span><span class="p">)</span></div>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,796 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.method.meta — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.method.meta</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.method.meta</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">itertools</span>
|
||||
<span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span>
|
||||
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">f1_score</span><span class="p">,</span> <span class="n">make_scorer</span><span class="p">,</span> <span class="n">accuracy_score</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">GridSearchCV</span><span class="p">,</span> <span class="n">cross_val_predict</span>
|
||||
<span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy</span> <span class="kn">import</span> <span class="n">functional</span> <span class="k">as</span> <span class="n">F</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.model_selection</span> <span class="kn">import</span> <span class="n">GridSearchQ</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.base</span> <span class="kn">import</span> <span class="n">BaseQuantifier</span><span class="p">,</span> <span class="n">BinaryQuantifier</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">CC</span><span class="p">,</span> <span class="n">ACC</span><span class="p">,</span> <span class="n">PACC</span><span class="p">,</span> <span class="n">HDy</span><span class="p">,</span> <span class="n">EMQ</span><span class="p">,</span> <span class="n">AggregativeQuantifier</span>
|
||||
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="n">_neural</span>
|
||||
<span class="k">except</span> <span class="ne">ModuleNotFoundError</span><span class="p">:</span>
|
||||
<span class="n">_neural</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="n">_neural</span><span class="p">:</span>
|
||||
<span class="n">QuaNet</span> <span class="o">=</span> <span class="n">_neural</span><span class="o">.</span><span class="n">QuaNetTrainer</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">QuaNet</span> <span class="o">=</span> <span class="s2">"QuaNet is not available due to missing torch package"</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator2"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator2">[docs]</a><span class="k">class</span> <span class="nc">MedianEstimator2</span><span class="p">(</span><span class="n">BinaryQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This method is a meta-quantifier that returns, as the estimated class prevalence values, the median of the</span>
|
||||
<span class="sd"> estimation returned by differently (hyper)parameterized base quantifiers.</span>
|
||||
<span class="sd"> The median of unit-vectors is only guaranteed to be a unit-vector for n=2 dimensions,</span>
|
||||
<span class="sd"> i.e., in cases of binary quantification.</span>
|
||||
|
||||
<span class="sd"> :param base_quantifier: the base, binary quantifier</span>
|
||||
<span class="sd"> :param random_state: a seed to be set before fitting any base quantifier (default None)</span>
|
||||
<span class="sd"> :param param_grid: the grid or parameters towards which the median will be computed</span>
|
||||
<span class="sd"> :param n_jobs: number of parllel workes</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">base_quantifier</span><span class="p">:</span> <span class="n">BinaryQuantifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span> <span class="o">=</span> <span class="n">base_quantifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span> <span class="o">=</span> <span class="n">param_grid</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator2.get_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator2.get_params">[docs]</a> <span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="o">.</span><span class="n">get_params</span><span class="p">(</span><span class="n">deep</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator2.set_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator2.set_params">[docs]</a> <span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">params</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">params</span><span class="p">)</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="n">params</span><span class="p">,</span> <span class="n">training</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">params</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">model</span>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator2.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator2.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">training</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_check_binary</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">)</span>
|
||||
|
||||
<span class="n">configs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">expand_grid</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">models</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_delayed_fit</span><span class="p">,</span>
|
||||
<span class="p">((</span><span class="n">params</span><span class="p">,</span> <span class="n">training</span><span class="p">)</span> <span class="k">for</span> <span class="n">params</span> <span class="ow">in</span> <span class="n">configs</span><span class="p">),</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">instances</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="k">return</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator2.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator2.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">prev_preds</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_delayed_predict</span><span class="p">,</span>
|
||||
<span class="p">((</span><span class="n">model</span><span class="p">,</span> <span class="n">instances</span><span class="p">)</span> <span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">models</span><span class="p">),</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">prev_preds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">prev_preds</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">median</span><span class="p">(</span><span class="n">prev_preds</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator">[docs]</a><span class="k">class</span> <span class="nc">MedianEstimator</span><span class="p">(</span><span class="n">BinaryQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This method is a meta-quantifier that returns, as the estimated class prevalence values, the median of the</span>
|
||||
<span class="sd"> estimation returned by differently (hyper)parameterized base quantifiers.</span>
|
||||
<span class="sd"> The median of unit-vectors is only guaranteed to be a unit-vector for n=2 dimensions,</span>
|
||||
<span class="sd"> i.e., in cases of binary quantification.</span>
|
||||
|
||||
<span class="sd"> :param base_quantifier: the base, binary quantifier</span>
|
||||
<span class="sd"> :param random_state: a seed to be set before fitting any base quantifier (default None)</span>
|
||||
<span class="sd"> :param param_grid: the grid or parameters towards which the median will be computed</span>
|
||||
<span class="sd"> :param n_jobs: number of parllel workes</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">base_quantifier</span><span class="p">:</span> <span class="n">BinaryQuantifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span> <span class="o">=</span> <span class="n">base_quantifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span> <span class="o">=</span> <span class="n">param_grid</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator.get_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator.get_params">[docs]</a> <span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="o">.</span><span class="n">get_params</span><span class="p">(</span><span class="n">deep</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator.set_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator.set_params">[docs]</a> <span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">params</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">params</span><span class="p">)</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="n">params</span><span class="p">,</span> <span class="n">training</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">params</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">model</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_fit_classifier</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="n">cls_params</span><span class="p">,</span> <span class="n">training</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">cls_params</span><span class="p">)</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier_fit_predict</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">predict_on</span><span class="o">=</span><span class="n">model</span><span class="o">.</span><span class="n">val_split</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_fit_aggregation</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="p">((</span><span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">),</span> <span class="n">q_params</span><span class="p">),</span> <span class="n">training</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">q_params</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">aggregation_fit</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">training</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">model</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">training</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_check_binary</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="p">,</span> <span class="n">AggregativeQuantifier</span><span class="p">):</span>
|
||||
<span class="n">cls_configs</span><span class="p">,</span> <span class="n">q_configs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">group_params</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">cls_configs</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">models_preds</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_delayed_fit_classifier</span><span class="p">,</span>
|
||||
<span class="p">((</span><span class="n">params</span><span class="p">,</span> <span class="n">training</span><span class="p">)</span> <span class="k">for</span> <span class="n">params</span> <span class="ow">in</span> <span class="n">cls_configs</span><span class="p">),</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span>
|
||||
<span class="n">asarray</span><span class="o">=</span><span class="kc">False</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">cls_configs</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier_fit_predict</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">predict_on</span><span class="o">=</span><span class="n">model</span><span class="o">.</span><span class="n">val_split</span><span class="p">)</span>
|
||||
<span class="n">models_preds</span> <span class="o">=</span> <span class="p">[(</span><span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">)]</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">models</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_delayed_fit_aggregation</span><span class="p">,</span>
|
||||
<span class="p">((</span><span class="n">setup</span><span class="p">,</span> <span class="n">training</span><span class="p">)</span> <span class="k">for</span> <span class="n">setup</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">product</span><span class="p">(</span><span class="n">models_preds</span><span class="p">,</span> <span class="n">q_configs</span><span class="p">)),</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span>
|
||||
<span class="n">asarray</span><span class="o">=</span><span class="kc">False</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">configs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">model_selection</span><span class="o">.</span><span class="n">expand_grid</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">models</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_delayed_fit</span><span class="p">,</span>
|
||||
<span class="p">((</span><span class="n">params</span><span class="p">,</span> <span class="n">training</span><span class="p">)</span> <span class="k">for</span> <span class="n">params</span> <span class="ow">in</span> <span class="n">configs</span><span class="p">),</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span>
|
||||
<span class="n">asarray</span><span class="o">=</span><span class="kc">False</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">instances</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="k">return</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="MedianEstimator.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.MedianEstimator.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">prev_preds</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_delayed_predict</span><span class="p">,</span>
|
||||
<span class="p">((</span><span class="n">model</span><span class="p">,</span> <span class="n">instances</span><span class="p">)</span> <span class="k">for</span> <span class="n">model</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">models</span><span class="p">),</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span>
|
||||
<span class="n">asarray</span><span class="o">=</span><span class="kc">False</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">prev_preds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">prev_preds</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">median</span><span class="p">(</span><span class="n">prev_preds</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="Ensemble"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.Ensemble">[docs]</a><span class="k">class</span> <span class="nc">Ensemble</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="n">VALID_POLICIES</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'ave'</span><span class="p">,</span> <span class="s1">'ptr'</span><span class="p">,</span> <span class="s1">'ds'</span><span class="p">}</span> <span class="o">|</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">QUANTIFICATION_ERROR_NAMES</span>
|
||||
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implementation of the Ensemble methods for quantification described by </span>
|
||||
<span class="sd"> `Pérez-Gállego et al., 2017 <https://www.sciencedirect.com/science/article/pii/S1566253516300628>`_</span>
|
||||
<span class="sd"> and</span>
|
||||
<span class="sd"> `Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.</span>
|
||||
<span class="sd"> The policies implemented include:</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> - Average (`policy='ave'`): computes class prevalence estimates as the average of the estimates </span>
|
||||
<span class="sd"> returned by the base quantifiers.</span>
|
||||
<span class="sd"> - Training Prevalence (`policy='ptr'`): applies a dynamic selection to the ensemble’s members by retaining only </span>
|
||||
<span class="sd"> those members such that the class prevalence values in the samples they use as training set are closest to </span>
|
||||
<span class="sd"> preliminary class prevalence estimates computed as the average of the estimates of all the members. The final </span>
|
||||
<span class="sd"> estimate is recomputed by considering only the selected members.</span>
|
||||
<span class="sd"> - Distribution Similarity (`policy='ds'`): performs a dynamic selection of base members by retaining</span>
|
||||
<span class="sd"> the members trained on samples whose distribution of posterior probabilities is closest, in terms of the</span>
|
||||
<span class="sd"> Hellinger Distance, to the distribution of posterior probabilities in the test sample</span>
|
||||
<span class="sd"> - Accuracy (`policy='<valid error name>'`): performs a static selection of the ensemble members by</span>
|
||||
<span class="sd"> retaining those that minimize a quantification error measure, which is passed as an argument.</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> Example:</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> >>> model = Ensemble(quantifier=ACC(LogisticRegression()), size=30, policy='ave', n_jobs=-1)</span>
|
||||
<span class="sd"> </span>
|
||||
<span class="sd"> :param quantifier: base quantification member of the ensemble </span>
|
||||
<span class="sd"> :param size: number of members</span>
|
||||
<span class="sd"> :param red_size: number of members to retain after selection (depending on the policy)</span>
|
||||
<span class="sd"> :param min_pos: minimum number of positive instances to consider a sample as valid </span>
|
||||
<span class="sd"> :param policy: the selection policy; available policies include: `ave` (default), `ptr`, `ds`, and accuracy </span>
|
||||
<span class="sd"> (which is instantiated via a valid error name, e.g., `mae`)</span>
|
||||
<span class="sd"> :param max_sample_size: maximum number of instances to consider in the samples (set to None </span>
|
||||
<span class="sd"> to indicate no limit, default)</span>
|
||||
<span class="sd"> :param val_split: a float in range (0,1) indicating the proportion of data to be used as a stratified held-out</span>
|
||||
<span class="sd"> validation split, or a :class:`quapy.data.base.LabelledCollection` (the split itself).</span>
|
||||
<span class="sd"> :param n_jobs: number of parallel workers (default 1)</span>
|
||||
<span class="sd"> :param verbose: set to True (default is False) to get some information in standard output</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">quantifier</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span>
|
||||
<span class="n">size</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
|
||||
<span class="n">red_size</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span>
|
||||
<span class="n">min_pos</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
|
||||
<span class="n">policy</span><span class="o">=</span><span class="s1">'ave'</span><span class="p">,</span>
|
||||
<span class="n">max_sample_size</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">val_split</span><span class="p">:</span><span class="n">Union</span><span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="n">policy</span> <span class="ow">in</span> <span class="n">Ensemble</span><span class="o">.</span><span class="n">VALID_POLICIES</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'unknown policy=</span><span class="si">{</span><span class="n">policy</span><span class="si">}</span><span class="s1">; valid are </span><span class="si">{</span><span class="n">Ensemble</span><span class="o">.</span><span class="n">VALID_POLICIES</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">assert</span> <span class="n">max_sample_size</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">max_sample_size</span> <span class="o">></span> <span class="mi">0</span><span class="p">,</span> \
|
||||
<span class="s1">'wrong value for max_sample_size; set it to a positive number or None'</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span> <span class="o">=</span> <span class="n">quantifier</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">min_pos</span> <span class="o">=</span> <span class="n">min_pos</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">red_size</span> <span class="o">=</span> <span class="n">red_size</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="o">=</span> <span class="n">policy</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">val_split</span> <span class="o">=</span> <span class="n">val_split</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">post_proba_fn</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">max_sample_size</span> <span class="o">=</span> <span class="n">max_sample_size</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_sout</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'[Ensemble]'</span> <span class="o">+</span> <span class="n">msg</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="Ensemble.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.Ensemble.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">val_split</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="o">==</span> <span class="s1">'ds'</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">data</span><span class="o">.</span><span class="n">binary</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'ds policy is only defined for binary quantification, but this dataset is not binary'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">val_split</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">val_split</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">val_split</span>
|
||||
|
||||
<span class="c1"># randomly chooses the prevalences for each member of the ensemble (preventing classes with less than</span>
|
||||
<span class="c1"># min_pos positive examples)</span>
|
||||
<span class="n">sample_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_sample_size</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="nb">min</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">max_sample_size</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="p">[</span><span class="n">_draw_simplex</span><span class="p">(</span><span class="n">ndim</span><span class="o">=</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">min_val</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">min_pos</span> <span class="o">/</span> <span class="n">sample_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)]</span>
|
||||
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="o">==</span> <span class="s1">'ds'</span><span class="p">:</span>
|
||||
<span class="c1"># precompute the training posterior probabilities</span>
|
||||
<span class="n">posteriors</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">post_proba_fn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_ds_policy_get_posteriors</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
|
||||
|
||||
<span class="n">is_static_policy</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="ow">in</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">QUANTIFICATION_ERROR_NAMES</span><span class="p">)</span>
|
||||
|
||||
<span class="n">args</span> <span class="o">=</span> <span class="p">(</span>
|
||||
<span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">base_quantifier</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">val_split</span><span class="p">,</span> <span class="n">prev</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">,</span> <span class="n">is_static_policy</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">prev</span> <span class="ow">in</span> <span class="n">prevs</span>
|
||||
<span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="n">_delayed_new_instance</span><span class="p">,</span>
|
||||
<span class="n">tqdm</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">desc</span><span class="o">=</span><span class="s1">'fitting ensamble'</span><span class="p">,</span> <span class="n">total</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="k">else</span> <span class="n">args</span><span class="p">,</span>
|
||||
<span class="n">asarray</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># static selection policy (the name of a quantification-oriented error function to minimize)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="ow">in</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">QUANTIFICATION_ERROR_NAMES</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_accuracy_policy</span><span class="p">(</span><span class="n">error_name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">policy</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="s1">'Fit [Done]'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="Ensemble.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.Ensemble.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span><span class="n">_delayed_quantify</span><span class="p">,</span> <span class="p">((</span><span class="n">Qi</span><span class="p">,</span> <span class="n">instances</span><span class="p">)</span> <span class="k">for</span> <span class="n">Qi</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span><span class="p">),</span> <span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="o">==</span> <span class="s1">'ptr'</span><span class="p">:</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_ptr_policy</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">policy</span> <span class="o">==</span> <span class="s1">'ds'</span><span class="p">:</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_ds_policy</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">F</span><span class="o">.</span><span class="n">normalize_prevalence</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="Ensemble.set_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.Ensemble.set_params">[docs]</a> <span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This function should not be used within :class:`quapy.model_selection.GridSearchQ` (is here for compatibility</span>
|
||||
<span class="sd"> with the abstract class).</span>
|
||||
<span class="sd"> Instead, use `Ensemble(GridSearchQ(q),...)`, with `q` a Quantifier (recommended), or</span>
|
||||
<span class="sd"> `Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a classifier `l` optimized for</span>
|
||||
<span class="sd"> classification (not recommended).</span>
|
||||
|
||||
<span class="sd"> :param parameters: dictionary</span>
|
||||
<span class="sd"> :return: raises an Exception</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1"> should not be used within GridSearchQ; '</span>
|
||||
<span class="sa">f</span><span class="s1">'instead, use Ensemble(GridSearchQ(q),...), with q a Quantifier (recommended), '</span>
|
||||
<span class="sa">f</span><span class="s1">'or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a classifier '</span>
|
||||
<span class="sa">f</span><span class="s1">'l optimized for classification (not recommended).'</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="Ensemble.get_params"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.Ensemble.get_params">[docs]</a> <span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This function should not be used within :class:`quapy.model_selection.GridSearchQ` (is here for compatibility</span>
|
||||
<span class="sd"> with the abstract class).</span>
|
||||
<span class="sd"> Instead, use `Ensemble(GridSearchQ(q),...)`, with `q` a Quantifier (recommended), or</span>
|
||||
<span class="sd"> `Ensemble(Q(GridSearchCV(l)))` with `Q` a quantifier class that has a classifier `l` optimized for</span>
|
||||
<span class="sd"> classification (not recommended).</span>
|
||||
|
||||
<span class="sd"> :param deep: for compatibility with scikit-learn</span>
|
||||
<span class="sd"> :return: raises an Exception</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_accuracy_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">error_name</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Selects the red_size best performant quantifiers in a static way (i.e., dropping all non-selected instances).</span>
|
||||
<span class="sd"> For each model in the ensemble, the performance is measured in terms of _error_name_ on the quantification of</span>
|
||||
<span class="sd"> the samples used for training the rest of the models in the ensemble.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.evaluation</span> <span class="kn">import</span> <span class="n">evaluate_on_samples</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">error_name</span><span class="p">)</span>
|
||||
<span class="n">tests</span> <span class="o">=</span> <span class="p">[</span><span class="n">m</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span><span class="p">]</span>
|
||||
<span class="n">scores</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">model</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span><span class="p">):</span>
|
||||
<span class="n">scores</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">evaluate_on_samples</span><span class="p">(</span><span class="n">model</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">tests</span><span class="p">[:</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">tests</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:],</span> <span class="n">error</span><span class="p">))</span>
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span> <span class="o">=</span> <span class="n">_select_k</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span><span class="p">,</span> <span class="n">order</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">red_size</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_ptr_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">predictions</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Selects the predictions made by models that have been trained on samples with a prevalence that is most similar</span>
|
||||
<span class="sd"> to a first approximation of the test prevalence as made by all models in the ensemble.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">test_prev_estim</span> <span class="o">=</span> <span class="n">predictions</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">tr_prevs</span> <span class="o">=</span> <span class="p">[</span><span class="n">m</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span><span class="p">]</span>
|
||||
<span class="n">ptr_differences</span> <span class="o">=</span> <span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mse</span><span class="p">(</span><span class="n">ptr_i</span><span class="p">,</span> <span class="n">test_prev_estim</span><span class="p">)</span> <span class="k">for</span> <span class="n">ptr_i</span> <span class="ow">in</span> <span class="n">tr_prevs</span><span class="p">]</span>
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">ptr_differences</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_select_k</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">order</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">red_size</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_ds_policy_get_posteriors</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> In the original article, there are some aspects regarding this method that are not mentioned. The paper says</span>
|
||||
<span class="sd"> that the distribution of posterior probabilities from training and test examples is compared by means of the</span>
|
||||
<span class="sd"> Hellinger Distance. However, how these posterior probabilities are generated is not specified. In the article,</span>
|
||||
<span class="sd"> a Logistic Regressor (LR) is used as the classifier device and that could be used for this purpose. However, in</span>
|
||||
<span class="sd"> general, a Quantifier is not necessarily an instance of Aggreggative Probabilistic Quantifiers, and so, that the</span>
|
||||
<span class="sd"> quantifier builds on top of a probabilistic classifier cannot be given for granted. Additionally, it would not</span>
|
||||
<span class="sd"> be correct to generate the posterior probabilities for training instances that have concurred in training the</span>
|
||||
<span class="sd"> classifier that generates them.</span>
|
||||
|
||||
<span class="sd"> This function thus generates the posterior probabilities for all training documents in a cross-validation way,</span>
|
||||
<span class="sd"> using LR with hyperparameters that have previously been optimized via grid search in 5FCV.</span>
|
||||
|
||||
<span class="sd"> :param data: a LabelledCollection</span>
|
||||
<span class="sd"> :return: (P,f,) where P is an ndarray containing the posterior probabilities of the training data, generated via</span>
|
||||
<span class="sd"> cross-validation and using an optimized LR, and the function to be used in order to generate posterior</span>
|
||||
<span class="sd"> probabilities for test instances.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">Xy</span>
|
||||
<span class="n">lr_base</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">class_weight</span><span class="o">=</span><span class="s1">'balanced'</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
|
||||
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">)}</span>
|
||||
<span class="n">optim</span> <span class="o">=</span> <span class="n">GridSearchCV</span><span class="p">(</span><span class="n">lr_base</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="n">param_grid</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span> <span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
|
||||
<span class="n">posteriors</span> <span class="o">=</span> <span class="n">cross_val_predict</span><span class="p">(</span><span class="n">optim</span><span class="o">.</span><span class="n">best_estimator_</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s1">'predict_proba'</span><span class="p">)</span>
|
||||
<span class="n">posteriors_generator</span> <span class="o">=</span> <span class="n">optim</span><span class="o">.</span><span class="n">best_estimator_</span><span class="o">.</span><span class="n">predict_proba</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">posteriors</span><span class="p">,</span> <span class="n">posteriors_generator</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_ds_policy</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
|
||||
<span class="n">test_posteriors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">post_proba_fn</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
|
||||
<span class="n">test_distribution</span> <span class="o">=</span> <span class="n">get_probability_distribution</span><span class="p">(</span><span class="n">test_posteriors</span><span class="p">)</span>
|
||||
<span class="n">tr_distributions</span> <span class="o">=</span> <span class="p">[</span><span class="n">m</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">ensemble</span><span class="p">]</span>
|
||||
<span class="n">dist</span> <span class="o">=</span> <span class="p">[</span><span class="n">F</span><span class="o">.</span><span class="n">HellingerDistance</span><span class="p">(</span><span class="n">tr_dist_i</span><span class="p">,</span> <span class="n">test_distribution</span><span class="p">)</span> <span class="k">for</span> <span class="n">tr_dist_i</span> <span class="ow">in</span> <span class="n">tr_distributions</span><span class="p">]</span>
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">dist</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_select_k</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">order</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">red_size</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">aggregative</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Indicates that the quantifier is not aggregative.</span>
|
||||
|
||||
<span class="sd"> :return: False</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="kc">False</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">probabilistic</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Indicates that the quantifier is not probabilistic.</span>
|
||||
|
||||
<span class="sd"> :return: False</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="kc">False</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_probability_distribution"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.get_probability_distribution">[docs]</a><span class="k">def</span> <span class="nf">get_probability_distribution</span><span class="p">(</span><span class="n">posterior_probabilities</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Gets a histogram out of the posterior probabilities (only for the binary case).</span>
|
||||
|
||||
<span class="sd"> :param posterior_probabilities: array-like of shape `(n_instances, 2,)`</span>
|
||||
<span class="sd"> :param bins: integer</span>
|
||||
<span class="sd"> :return: `np.ndarray` with the relative frequencies for each bin (for the positive class only)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">posterior_probabilities</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'the posterior probabilities do not seem to be for a binary problem'</span>
|
||||
<span class="n">posterior_probabilities</span> <span class="o">=</span> <span class="n">posterior_probabilities</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> <span class="c1"># take the positive posteriors only</span>
|
||||
<span class="n">distribution</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="n">posterior_probabilities</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">bins</span><span class="p">,</span> <span class="nb">range</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">density</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">distribution</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_select_k</span><span class="p">(</span><span class="n">elements</span><span class="p">,</span> <span class="n">order</span><span class="p">,</span> <span class="n">k</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="p">[</span><span class="n">elements</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="n">order</span><span class="p">[:</span><span class="n">k</span><span class="p">]]</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_new_instance</span><span class="p">(</span><span class="n">args</span><span class="p">):</span>
|
||||
<span class="n">base_quantifier</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">val_split</span><span class="p">,</span> <span class="n">prev</span><span class="p">,</span> <span class="n">posteriors</span><span class="p">,</span> <span class="n">keep_samples</span><span class="p">,</span> <span class="n">verbose</span><span class="p">,</span> <span class="n">sample_size</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="s1">fit-start for prev </span><span class="si">{</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">prev</span><span class="p">)</span><span class="si">}</span><span class="s1">, sample_size=</span><span class="si">{</span><span class="n">sample_size</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="n">base_quantifier</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">val_split</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">val_split</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="mi">0</span> <span class="o"><</span> <span class="n">val_split</span> <span class="o"><</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'val_split should be in (0,1)'</span>
|
||||
<span class="n">data</span><span class="p">,</span> <span class="n">val_split</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mi">1</span> <span class="o">-</span> <span class="n">val_split</span><span class="p">)</span>
|
||||
|
||||
<span class="n">sample_index</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling_index</span><span class="p">(</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prev</span><span class="p">)</span>
|
||||
<span class="n">sample</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">sample_index</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">val_split</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">sample</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="n">val_split</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tr_prevalence</span> <span class="o">=</span> <span class="n">sample</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="n">tr_distribution</span> <span class="o">=</span> <span class="n">get_probability_distribution</span><span class="p">(</span><span class="n">posteriors</span><span class="p">[</span><span class="n">sample_index</span><span class="p">])</span> <span class="k">if</span> <span class="p">(</span><span class="n">posteriors</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">)</span> <span class="k">else</span> <span class="kc">None</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="s1">\--fit-ended for prev </span><span class="si">{</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">prev</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">tr_prevalence</span><span class="p">,</span> <span class="n">tr_distribution</span><span class="p">,</span> <span class="n">sample</span> <span class="k">if</span> <span class="n">keep_samples</span> <span class="k">else</span> <span class="kc">None</span><span class="p">)</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_delayed_quantify</span><span class="p">(</span><span class="n">args</span><span class="p">):</span>
|
||||
<span class="n">quantifier</span><span class="p">,</span> <span class="n">instances</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="k">return</span> <span class="n">quantifier</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_draw_simplex</span><span class="p">(</span><span class="n">ndim</span><span class="p">,</span> <span class="n">min_val</span><span class="p">,</span> <span class="n">max_trials</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a uniform sampling from the ndim-dimensional simplex but guarantees that all dimensions</span>
|
||||
<span class="sd"> are >= min_class_prev (for min_val>0, this makes the sampling not truly uniform)</span>
|
||||
|
||||
<span class="sd"> :param ndim: number of dimensions of the simplex</span>
|
||||
<span class="sd"> :param min_val: minimum class prevalence allowed. If less than 1/ndim a ValueError will be throw since</span>
|
||||
<span class="sd"> there is no possible solution.</span>
|
||||
<span class="sd"> :return: a sample from the ndim-dimensional simplex that is uniform in S(ndim)-R where S(ndim) is the simplex</span>
|
||||
<span class="sd"> and R is the simplex subset containing dimensions lower than min_val</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">min_val</span> <span class="o">>=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">ndim</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'no sample can be draw from the </span><span class="si">{</span><span class="n">ndim</span><span class="si">}</span><span class="s1">-dimensional simplex so that '</span>
|
||||
<span class="sa">f</span><span class="s1">'all its values are >=</span><span class="si">{</span><span class="n">min_val</span><span class="si">}</span><span class="s1"> (try with a larger value for min_pos)'</span><span class="p">)</span>
|
||||
<span class="n">trials</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
|
||||
<span class="n">u</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">uniform_simplex_sampling</span><span class="p">(</span><span class="n">ndim</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">all</span><span class="p">(</span><span class="n">u</span> <span class="o">>=</span> <span class="n">min_val</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">u</span>
|
||||
<span class="n">trials</span> <span class="o">+=</span> <span class="mi">1</span>
|
||||
<span class="k">if</span> <span class="n">trials</span> <span class="o">>=</span> <span class="n">max_trials</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'it looks like finding a random simplex with all its dimensions being'</span>
|
||||
<span class="sa">f</span><span class="s1">'>= </span><span class="si">{</span><span class="n">min_val</span><span class="si">}</span><span class="s1"> is unlikely (it failed after </span><span class="si">{</span><span class="n">max_trials</span><span class="si">}</span><span class="s1"> trials)'</span><span class="p">)</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_instantiate_ensemble</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">base_quantifier_class</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">optim</span><span class="p">,</span> <span class="n">param_model_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">optim</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">base_quantifier</span> <span class="o">=</span> <span class="n">base_quantifier_class</span><span class="p">(</span><span class="n">classifier</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">optim</span> <span class="ow">in</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">CLASSIFICATION_ERROR</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">optim</span> <span class="o">==</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">f1e</span><span class="p">:</span>
|
||||
<span class="n">scoring</span> <span class="o">=</span> <span class="n">make_scorer</span><span class="p">(</span><span class="n">f1_score</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">optim</span> <span class="o">==</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">acce</span><span class="p">:</span>
|
||||
<span class="n">scoring</span> <span class="o">=</span> <span class="n">make_scorer</span><span class="p">(</span><span class="n">accuracy_score</span><span class="p">)</span>
|
||||
<span class="n">classifier</span> <span class="o">=</span> <span class="n">GridSearchCV</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">scoring</span><span class="o">=</span><span class="n">scoring</span><span class="p">)</span>
|
||||
<span class="n">base_quantifier</span> <span class="o">=</span> <span class="n">base_quantifier_class</span><span class="p">(</span><span class="n">classifier</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">base_quantifier</span> <span class="o">=</span> <span class="n">GridSearchQ</span><span class="p">(</span><span class="n">base_quantifier_class</span><span class="p">(</span><span class="n">classifier</span><span class="p">),</span>
|
||||
<span class="n">param_grid</span><span class="o">=</span><span class="n">param_grid</span><span class="p">,</span>
|
||||
<span class="o">**</span><span class="n">param_model_sel</span><span class="p">,</span>
|
||||
<span class="n">error</span><span class="o">=</span><span class="n">optim</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">Ensemble</span><span class="p">(</span><span class="n">base_quantifier</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_check_error</span><span class="p">(</span><span class="n">error</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">error</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="n">error</span> <span class="ow">in</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">QUANTIFICATION_ERROR</span> <span class="ow">or</span> <span class="n">error</span> <span class="ow">in</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">CLASSIFICATION_ERROR</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">error</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">error</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'unexpected error type; must either be a callable function or a str representing</span><span class="se">\n</span><span class="s1">'</span>
|
||||
<span class="sa">f</span><span class="s1">'the name of an error function in </span><span class="si">{</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">ERROR_NAMES</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ensembleFactory"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.ensembleFactory">[docs]</a><span class="k">def</span> <span class="nf">ensembleFactory</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">base_quantifier_class</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">optim</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">param_model_sel</span><span class="p">:</span> <span class="nb">dict</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Ensemble factory. Provides a unified interface for instantiating ensembles that can be optimized (via model</span>
|
||||
<span class="sd"> selection for quantification) for a given evaluation metric using :class:`quapy.model_selection.GridSearchQ`.</span>
|
||||
<span class="sd"> If the evaluation metric is classification-oriented</span>
|
||||
<span class="sd"> (instead of quantification-oriented), then the optimization will be carried out via sklearn's</span>
|
||||
<span class="sd"> `GridSearchCV <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html>`_.</span>
|
||||
|
||||
<span class="sd"> Example to instantiate an :class:`Ensemble` based on :class:`quapy.method.aggregative.PACC`</span>
|
||||
<span class="sd"> in which the base members are optimized for :meth:`quapy.error.mae` via</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`. The ensemble follows the policy `Accuracy` based</span>
|
||||
<span class="sd"> on :meth:`quapy.error.mae` (the same measure being optimized),</span>
|
||||
<span class="sd"> meaning that a static selection of members of the ensemble is made based on their performance</span>
|
||||
<span class="sd"> in terms of this error.</span>
|
||||
|
||||
<span class="sd"> >>> param_grid = {</span>
|
||||
<span class="sd"> >>> 'C': np.logspace(-3,3,7),</span>
|
||||
<span class="sd"> >>> 'class_weight': ['balanced', None]</span>
|
||||
<span class="sd"> >>> }</span>
|
||||
<span class="sd"> >>> param_mod_sel = {</span>
|
||||
<span class="sd"> >>> 'sample_size': 500,</span>
|
||||
<span class="sd"> >>> 'protocol': 'app'</span>
|
||||
<span class="sd"> >>> }</span>
|
||||
<span class="sd"> >>> common={</span>
|
||||
<span class="sd"> >>> 'max_sample_size': 1000,</span>
|
||||
<span class="sd"> >>> 'n_jobs': -1,</span>
|
||||
<span class="sd"> >>> 'param_grid': param_grid,</span>
|
||||
<span class="sd"> >>> 'param_mod_sel': param_mod_sel,</span>
|
||||
<span class="sd"> >>> }</span>
|
||||
<span class="sd"> >>></span>
|
||||
<span class="sd"> >>> ensembleFactory(LogisticRegression(), PACC, optim='mae', policy='mae', **common)</span>
|
||||
|
||||
<span class="sd"> :param classifier: sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param base_quantifier_class: a class of quantifiers</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with the grid of parameters to optimize for</span>
|
||||
<span class="sd"> :param optim: a valid quantification or classification error, or a string name of it</span>
|
||||
<span class="sd"> :param param_model_sel: a dictionary containing any keyworded argument to pass to</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`</span>
|
||||
<span class="sd"> :param kwargs: kwargs for the class :class:`Ensemble`</span>
|
||||
<span class="sd"> :return: an instance of :class:`Ensemble`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">optim</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">param_grid</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'param_grid is None but optim was requested.'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">param_model_sel</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'param_model_sel is None but optim was requested.'</span><span class="p">)</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">_check_error</span><span class="p">(</span><span class="n">optim</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_instantiate_ensemble</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">base_quantifier_class</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">param_model_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ECC"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.ECC">[docs]</a><span class="k">def</span> <span class="nf">ECC</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">optim</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements an ensemble of :class:`quapy.method.aggregative.CC` quantifiers, as used by</span>
|
||||
<span class="sd"> `Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.</span>
|
||||
|
||||
<span class="sd"> Equivalent to:</span>
|
||||
|
||||
<span class="sd"> >>> ensembleFactory(classifier, CC, param_grid, optim, param_mod_sel, **kwargs)</span>
|
||||
|
||||
<span class="sd"> See :meth:`ensembleFactory` for further details.</span>
|
||||
|
||||
<span class="sd"> :param classifier: sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with the grid of parameters to optimize for</span>
|
||||
<span class="sd"> :param optim: a valid quantification or classification error, or a string name of it</span>
|
||||
<span class="sd"> :param param_model_sel: a dictionary containing any keyworded argument to pass to</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`</span>
|
||||
<span class="sd"> :param kwargs: kwargs for the class :class:`Ensemble`</span>
|
||||
<span class="sd"> :return: an instance of :class:`Ensemble`</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">ensembleFactory</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">CC</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">optim</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EACC"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.EACC">[docs]</a><span class="k">def</span> <span class="nf">EACC</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">optim</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements an ensemble of :class:`quapy.method.aggregative.ACC` quantifiers, as used by</span>
|
||||
<span class="sd"> `Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.</span>
|
||||
|
||||
<span class="sd"> Equivalent to:</span>
|
||||
|
||||
<span class="sd"> >>> ensembleFactory(classifier, ACC, param_grid, optim, param_mod_sel, **kwargs)</span>
|
||||
|
||||
<span class="sd"> See :meth:`ensembleFactory` for further details.</span>
|
||||
|
||||
<span class="sd"> :param classifier: sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with the grid of parameters to optimize for</span>
|
||||
<span class="sd"> :param optim: a valid quantification or classification error, or a string name of it</span>
|
||||
<span class="sd"> :param param_model_sel: a dictionary containing any keyworded argument to pass to</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`</span>
|
||||
<span class="sd"> :param kwargs: kwargs for the class :class:`Ensemble`</span>
|
||||
<span class="sd"> :return: an instance of :class:`Ensemble`</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">ensembleFactory</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">ACC</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">optim</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EPACC"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.EPACC">[docs]</a><span class="k">def</span> <span class="nf">EPACC</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">optim</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements an ensemble of :class:`quapy.method.aggregative.PACC` quantifiers.</span>
|
||||
|
||||
<span class="sd"> Equivalent to:</span>
|
||||
|
||||
<span class="sd"> >>> ensembleFactory(classifier, PACC, param_grid, optim, param_mod_sel, **kwargs)</span>
|
||||
|
||||
<span class="sd"> See :meth:`ensembleFactory` for further details.</span>
|
||||
|
||||
<span class="sd"> :param classifier: sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with the grid of parameters to optimize for</span>
|
||||
<span class="sd"> :param optim: a valid quantification or classification error, or a string name of it</span>
|
||||
<span class="sd"> :param param_model_sel: a dictionary containing any keyworded argument to pass to</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`</span>
|
||||
<span class="sd"> :param kwargs: kwargs for the class :class:`Ensemble`</span>
|
||||
<span class="sd"> :return: an instance of :class:`Ensemble`</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">ensembleFactory</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">PACC</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">optim</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EHDy"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.EHDy">[docs]</a><span class="k">def</span> <span class="nf">EHDy</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">optim</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements an ensemble of :class:`quapy.method.aggregative.HDy` quantifiers, as used by</span>
|
||||
<span class="sd"> `Pérez-Gállego et al., 2019 <https://www.sciencedirect.com/science/article/pii/S1566253517303652>`_.</span>
|
||||
|
||||
<span class="sd"> Equivalent to:</span>
|
||||
|
||||
<span class="sd"> >>> ensembleFactory(classifier, HDy, param_grid, optim, param_mod_sel, **kwargs)</span>
|
||||
|
||||
<span class="sd"> See :meth:`ensembleFactory` for further details.</span>
|
||||
|
||||
<span class="sd"> :param classifier: sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with the grid of parameters to optimize for</span>
|
||||
<span class="sd"> :param optim: a valid quantification or classification error, or a string name of it</span>
|
||||
<span class="sd"> :param param_model_sel: a dictionary containing any keyworded argument to pass to</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`</span>
|
||||
<span class="sd"> :param kwargs: kwargs for the class :class:`Ensemble`</span>
|
||||
<span class="sd"> :return: an instance of :class:`Ensemble`</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">ensembleFactory</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">HDy</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">optim</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EEMQ"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.meta.EEMQ">[docs]</a><span class="k">def</span> <span class="nf">EEMQ</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">optim</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements an ensemble of :class:`quapy.method.aggregative.EMQ` quantifiers.</span>
|
||||
|
||||
<span class="sd"> Equivalent to:</span>
|
||||
|
||||
<span class="sd"> >>> ensembleFactory(classifier, EMQ, param_grid, optim, param_mod_sel, **kwargs)</span>
|
||||
|
||||
<span class="sd"> See :meth:`ensembleFactory` for further details.</span>
|
||||
|
||||
<span class="sd"> :param classifier: sklearn's Estimator that generates a classifier</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with the grid of parameters to optimize for</span>
|
||||
<span class="sd"> :param optim: a valid quantification or classification error, or a string name of it</span>
|
||||
<span class="sd"> :param param_model_sel: a dictionary containing any keyworded argument to pass to</span>
|
||||
<span class="sd"> :class:`quapy.model_selection.GridSearchQ`</span>
|
||||
<span class="sd"> :param kwargs: kwargs for the class :class:`Ensemble`</span>
|
||||
<span class="sd"> :return: an instance of :class:`Ensemble`</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">ensembleFactory</span><span class="p">(</span><span class="n">classifier</span><span class="p">,</span> <span class="n">EMQ</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">optim</span><span class="p">,</span> <span class="n">param_mod_sel</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span></div>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,266 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.method.non_aggregative — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/jquery.js"></script>
|
||||
<script src="../../../_static/underscore.js"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.method.non_aggregative</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.method.non_aggregative</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span><span class="p">,</span> <span class="n">Callable</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.functional</span> <span class="kn">import</span> <span class="n">get_divergence</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.base</span> <span class="kn">import</span> <span class="n">BaseQuantifier</span><span class="p">,</span> <span class="n">BinaryQuantifier</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MaximumLikelihoodPrevalenceEstimation"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation">[docs]</a><span class="k">class</span> <span class="nc">MaximumLikelihoodPrevalenceEstimation</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> The `Maximum Likelihood Prevalence Estimation` (MLPE) method is a lazy method that assumes there is no prior</span>
|
||||
<span class="sd"> probability shift between training and test instances (put it other way, that the i.i.d. assumpion holds).</span>
|
||||
<span class="sd"> The estimation of class prevalence values for any test sample is always (i.e., irrespective of the test sample</span>
|
||||
<span class="sd"> itself) the class prevalence seen during training. This method is considered to be a lower-bound quantifier that</span>
|
||||
<span class="sd"> any quantification method should beat.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_classes_</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
<div class="viewcode-block" id="MaximumLikelihoodPrevalenceEstimation.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Computes the training prevalence and stores it.</span>
|
||||
|
||||
<span class="sd"> :param data: the training sample</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">estimated_prevalence</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="MaximumLikelihoodPrevalenceEstimation.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Ignores the input instances and returns, as the class prevalence estimantes, the training prevalence.</span>
|
||||
|
||||
<span class="sd"> :param instances: array-like (ignored)</span>
|
||||
<span class="sd"> :return: the class prevalence seen during training</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">estimated_prevalence</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="DMx"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.DMx">[docs]</a><span class="k">class</span> <span class="nc">DMx</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generic Distribution Matching quantifier for binary or multiclass quantification based on the space of covariates.</span>
|
||||
<span class="sd"> This implementation takes the number of bins, the divergence, and the possibility to work on CDF as hyperparameters.</span>
|
||||
|
||||
<span class="sd"> :param nbins: number of bins used to discretize the distributions (default 8)</span>
|
||||
<span class="sd"> :param divergence: a string representing a divergence measure (currently, "HD" and "topsoe" are implemented)</span>
|
||||
<span class="sd"> or a callable function taking two ndarrays of the same dimension as input (default "HD", meaning Hellinger</span>
|
||||
<span class="sd"> Distance)</span>
|
||||
<span class="sd"> :param cdf: whether to use CDF instead of PDF (default False)</span>
|
||||
<span class="sd"> :param n_jobs: number of parallel workers (default None)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nbins</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">divergence</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Callable</span><span class="p">]</span><span class="o">=</span><span class="s1">'HD'</span><span class="p">,</span> <span class="n">cdf</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">search</span><span class="o">=</span><span class="s1">'optim_minimize'</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">nbins</span> <span class="o">=</span> <span class="n">nbins</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">divergence</span> <span class="o">=</span> <span class="n">divergence</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">cdf</span> <span class="o">=</span> <span class="n">cdf</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">search</span> <span class="o">=</span> <span class="n">search</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
|
||||
|
||||
<div class="viewcode-block" id="DMx.HDx"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.DMx.HDx">[docs]</a> <span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">HDx</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> `Hellinger Distance x <https://www.sciencedirect.com/science/article/pii/S0020025512004069>`_ (HDx).</span>
|
||||
<span class="sd"> HDx is a method for training binary quantifiers, that models quantification as the problem of</span>
|
||||
<span class="sd"> minimizing the average divergence (in terms of the Hellinger Distance) across the feature-specific normalized</span>
|
||||
<span class="sd"> histograms of two representations, one for the unlabelled examples, and another generated from the training</span>
|
||||
<span class="sd"> examples as a mixture model of the class-specific representations. The parameters of the mixture thus represent</span>
|
||||
<span class="sd"> the estimates of the class prevalence values.</span>
|
||||
|
||||
<span class="sd"> The method computes all matchings for nbins in [10, 20, ..., 110] and reports the mean of the median.</span>
|
||||
<span class="sd"> The best prevalence is searched via linear search, from 0 to 1 stepping by 0.01.</span>
|
||||
|
||||
<span class="sd"> :param n_jobs: number of parallel workers</span>
|
||||
<span class="sd"> :return: an instance of this class setup to mimick the performance of the HDx as originally proposed by</span>
|
||||
<span class="sd"> González-Castro, Alaiz-Rodríguez, Alegre (2013)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.meta</span> <span class="kn">import</span> <span class="n">MedianEstimator</span>
|
||||
|
||||
<span class="n">dmx</span> <span class="o">=</span> <span class="n">DMx</span><span class="p">(</span><span class="n">divergence</span><span class="o">=</span><span class="s1">'HD'</span><span class="p">,</span> <span class="n">cdf</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">search</span><span class="o">=</span><span class="s1">'linear_search'</span><span class="p">)</span>
|
||||
<span class="n">nbins</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'nbins'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">110</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">int</span><span class="p">)}</span>
|
||||
<span class="n">hdx</span> <span class="o">=</span> <span class="n">MedianEstimator</span><span class="p">(</span><span class="n">base_quantifier</span><span class="o">=</span><span class="n">dmx</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="n">nbins</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">hdx</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__get_distributions</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
|
||||
<span class="n">histograms</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">feat_idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nfeats</span><span class="p">):</span>
|
||||
<span class="n">feature</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:,</span> <span class="n">feat_idx</span><span class="p">]</span>
|
||||
<span class="n">feat_range</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">feat_ranges</span><span class="p">[</span><span class="n">feat_idx</span><span class="p">]</span>
|
||||
<span class="n">hist</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="n">feature</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">nbins</span><span class="p">,</span> <span class="nb">range</span><span class="o">=</span><span class="n">feat_range</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="n">norm_hist</span> <span class="o">=</span> <span class="n">hist</span> <span class="o">/</span> <span class="n">hist</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="n">histograms</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">norm_hist</span><span class="p">)</span>
|
||||
<span class="n">distributions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vstack</span><span class="p">(</span><span class="n">histograms</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">cdf</span><span class="p">:</span>
|
||||
<span class="n">distributions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">cumsum</span><span class="p">(</span><span class="n">distributions</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">distributions</span>
|
||||
|
||||
<div class="viewcode-block" id="DMx.fit"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.DMx.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates the validation distributions out of the training data (covariates).</span>
|
||||
<span class="sd"> The validation distributions have shape `(n, nfeats, nbins)`, with `n` the number of classes, `nfeats`</span>
|
||||
<span class="sd"> the number of features, and `nbins` the number of bins.</span>
|
||||
<span class="sd"> In particular, let `V` be the validation distributions; then `di=V[i]` are the distributions obtained from</span>
|
||||
<span class="sd"> training data labelled with class `i`; while `dij = di[j]` is the discrete distribution for feature j in</span>
|
||||
<span class="sd"> training data labelled with class `i`, and `dij[k]` is the fraction of instances with a value in the `k`-th bin.</span>
|
||||
|
||||
<span class="sd"> :param data: the training set</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">Xy</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">nfeats</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">feat_ranges</span> <span class="o">=</span> <span class="n">_get_features_range</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">validation_distribution</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span>
|
||||
<span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">__get_distributions</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">y</span><span class="o">==</span><span class="n">cat</span><span class="p">])</span> <span class="k">for</span> <span class="n">cat</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">)]</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="DMx.quantify"><a class="viewcode-back" href="../../../quapy.method.html#quapy.method.non_aggregative.DMx.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Searches for the mixture model parameter (the sought prevalence values) that yields a validation distribution</span>
|
||||
<span class="sd"> (the mixture) that best matches the test distribution, in terms of the divergence measure of choice.</span>
|
||||
<span class="sd"> The matching is computed as the average dissimilarity (in terms of the dissimilarity measure of choice)</span>
|
||||
<span class="sd"> between all feature-specific discrete distributions.</span>
|
||||
|
||||
<span class="sd"> :param instances: instances in the sample</span>
|
||||
<span class="sd"> :return: a vector of class prevalence estimates</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">assert</span> <span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">nfeats</span><span class="p">,</span> <span class="sa">f</span><span class="s1">'wrong shape; expected </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">nfeats</span><span class="si">}</span><span class="s1">, found </span><span class="si">{</span><span class="n">instances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">}</span><span class="s1">'</span>
|
||||
|
||||
<span class="n">test_distribution</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">__get_distributions</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">divergence</span> <span class="o">=</span> <span class="n">get_divergence</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">divergence</span><span class="p">)</span>
|
||||
<span class="n">n_classes</span><span class="p">,</span> <span class="n">n_feats</span><span class="p">,</span> <span class="n">nbins</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">validation_distribution</span><span class="o">.</span><span class="n">shape</span>
|
||||
<span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="n">prev</span><span class="p">):</span>
|
||||
<span class="n">prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">prev</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">mixture_distribution</span> <span class="o">=</span> <span class="p">(</span><span class="n">prev</span> <span class="o">@</span> <span class="bp">self</span><span class="o">.</span><span class="n">validation_distribution</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">n_classes</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">n_feats</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">divs</span> <span class="o">=</span> <span class="p">[</span><span class="n">divergence</span><span class="p">(</span><span class="n">test_distribution</span><span class="p">[</span><span class="n">feat</span><span class="p">],</span> <span class="n">mixture_distribution</span><span class="p">[</span><span class="n">feat</span><span class="p">])</span> <span class="k">for</span> <span class="n">feat</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_feats</span><span class="p">)]</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">divs</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">F</span><span class="o">.</span><span class="n">argmin_prevalence</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">n_classes</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">search</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_get_features_range</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
|
||||
<span class="n">feat_ranges</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">ncols</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="k">for</span> <span class="n">col_idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">ncols</span><span class="p">):</span>
|
||||
<span class="n">feature</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:,</span><span class="n">col_idx</span><span class="p">]</span>
|
||||
<span class="n">feat_ranges</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">feature</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">feature</span><span class="p">)))</span>
|
||||
<span class="k">return</span> <span class="n">feat_ranges</span>
|
||||
|
||||
|
||||
<span class="c1">#---------------------------------------------------------------</span>
|
||||
<span class="c1"># aliases</span>
|
||||
<span class="c1">#---------------------------------------------------------------</span>
|
||||
|
||||
<span class="n">DistributionMatchingX</span> <span class="o">=</span> <span class="n">DMx</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,516 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.model_selection — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/jquery.js"></script>
|
||||
<script src="../../_static/underscore.js"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.model_selection</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.model_selection</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">itertools</span>
|
||||
<span class="kn">import</span> <span class="nn">signal</span>
|
||||
<span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span>
|
||||
<span class="kn">from</span> <span class="nn">enum</span> <span class="kn">import</span> <span class="n">Enum</span>
|
||||
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Union</span><span class="p">,</span> <span class="n">Callable</span>
|
||||
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">wraps</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn</span> <span class="kn">import</span> <span class="n">clone</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy</span> <span class="kn">import</span> <span class="n">evaluation</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">AbstractProtocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data.base</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">BaseQuantifier</span><span class="p">,</span> <span class="n">AggregativeQuantifier</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.util</span> <span class="kn">import</span> <span class="n">timeout</span>
|
||||
<span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">time</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="Status"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.Status">[docs]</a><span class="k">class</span> <span class="nc">Status</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span>
|
||||
<span class="n">SUCCESS</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="n">TIMEOUT</span> <span class="o">=</span> <span class="mi">2</span>
|
||||
<span class="n">INVALID</span> <span class="o">=</span> <span class="mi">3</span>
|
||||
<span class="n">ERROR</span> <span class="o">=</span> <span class="mi">4</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ConfigStatus"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.ConfigStatus">[docs]</a><span class="k">class</span> <span class="nc">ConfigStatus</span><span class="p">:</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">msg</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="n">params</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="n">status</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">msg</span> <span class="o">=</span> <span class="n">msg</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="sa">f</span><span class="s1">':params:</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="si">}</span><span class="s1"> :status:</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">status</span><span class="si">}</span><span class="s1"> '</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">msg</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="ConfigStatus.success"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.ConfigStatus.success">[docs]</a> <span class="k">def</span> <span class="nf">success</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">==</span> <span class="n">Status</span><span class="o">.</span><span class="n">SUCCESS</span></div>
|
||||
|
||||
<div class="viewcode-block" id="ConfigStatus.failed"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.ConfigStatus.failed">[docs]</a> <span class="k">def</span> <span class="nf">failed</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">!=</span> <span class="n">Status</span><span class="o">.</span><span class="n">SUCCESS</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="GridSearchQ"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.GridSearchQ">[docs]</a><span class="k">class</span> <span class="nc">GridSearchQ</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Grid Search optimization targeting a quantification-oriented metric.</span>
|
||||
|
||||
<span class="sd"> Optimizes the hyperparameters of a quantification method, based on an evaluation method and on an evaluation</span>
|
||||
<span class="sd"> protocol for quantification.</span>
|
||||
|
||||
<span class="sd"> :param model: the quantifier to optimize</span>
|
||||
<span class="sd"> :type model: BaseQuantifier</span>
|
||||
<span class="sd"> :param param_grid: a dictionary with keys the parameter names and values the list of values to explore</span>
|
||||
<span class="sd"> :param protocol: a sample generation protocol, an instance of :class:`quapy.protocol.AbstractProtocol`</span>
|
||||
<span class="sd"> :param error: an error function (callable) or a string indicating the name of an error function (valid ones</span>
|
||||
<span class="sd"> are those in :class:`quapy.error.QUANTIFICATION_ERROR`</span>
|
||||
<span class="sd"> :param refit: whether to refit the model on the whole labelled collection (training+validation) with</span>
|
||||
<span class="sd"> the best chosen hyperparameter combination. Ignored if protocol='gen'</span>
|
||||
<span class="sd"> :param timeout: establishes a timer (in seconds) for each of the hyperparameters configurations being tested.</span>
|
||||
<span class="sd"> Whenever a run takes longer than this timer, that configuration will be ignored. If all configurations end up</span>
|
||||
<span class="sd"> being ignored, a TimeoutError exception is raised. If -1 (default) then no time bound is set.</span>
|
||||
<span class="sd"> :param raise_errors: boolean, if True then raises an exception when a param combination yields any error, if</span>
|
||||
<span class="sd"> otherwise is False (default), then the combination is marked with an error status, but the process goes on.</span>
|
||||
<span class="sd"> However, if no configuration yields a valid model, then a ValueError exception will be raised.</span>
|
||||
<span class="sd"> :param verbose: set to True to get information through the stdout</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">model</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span>
|
||||
<span class="n">param_grid</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span>
|
||||
<span class="n">protocol</span><span class="p">:</span> <span class="n">AbstractProtocol</span><span class="p">,</span>
|
||||
<span class="n">error</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="n">Callable</span><span class="p">,</span> <span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">,</span>
|
||||
<span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">timeout</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">raise_errors</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">model</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span> <span class="o">=</span> <span class="n">param_grid</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">protocol</span> <span class="o">=</span> <span class="n">protocol</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">refit</span> <span class="o">=</span> <span class="n">refit</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">timeout</span> <span class="o">=</span> <span class="n">timeout</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_njobs</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">raise_errors</span> <span class="o">=</span> <span class="n">raise_errors</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">__check_error</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">protocol</span><span class="p">,</span> <span class="n">AbstractProtocol</span><span class="p">),</span> <span class="s1">'unknown protocol'</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_sout</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'[</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">:</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1">]: </span><span class="si">{</span><span class="n">msg</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">__check_error</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">error</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">error</span> <span class="ow">in</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">QUANTIFICATION_ERROR</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">error</span> <span class="o">=</span> <span class="n">error</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">error</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">error</span><span class="p">,</span> <span class="s1">'__call__'</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">error</span> <span class="o">=</span> <span class="n">error</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s1">'unexpected error type; must either be a callable function or a str representing</span><span class="se">\n</span><span class="s1">'</span>
|
||||
<span class="sa">f</span><span class="s1">'the name of an error function in </span><span class="si">{</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">QUANTIFICATION_ERROR_NAMES</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_prepare_classifier</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cls_params</span><span class="p">):</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">job</span><span class="p">(</span><span class="n">cls_params</span><span class="p">):</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">cls_params</span><span class="p">)</span>
|
||||
<span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier_fit_predict</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_training</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">predictions</span>
|
||||
|
||||
<span class="n">predictions</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_error_handler</span><span class="p">(</span><span class="n">job</span><span class="p">,</span> <span class="n">cls_params</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'[classifier fit] hyperparams=</span><span class="si">{</span><span class="n">cls_params</span><span class="si">}</span><span class="s1"> [took </span><span class="si">{</span><span class="n">took</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">s]'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_prepare_aggregation</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">):</span>
|
||||
<span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">cls_took</span><span class="p">,</span> <span class="n">cls_params</span><span class="p">,</span> <span class="n">q_params</span> <span class="o">=</span> <span class="n">args</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
|
||||
<span class="n">params</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">cls_params</span><span class="p">,</span> <span class="o">**</span><span class="n">q_params</span><span class="p">}</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">job</span><span class="p">(</span><span class="n">q_params</span><span class="p">):</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">q_params</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">aggregation_fit</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_training</span><span class="p">)</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">protocol</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">score</span>
|
||||
|
||||
<span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">aggr_took</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_error_handler</span><span class="p">(</span><span class="n">job</span><span class="p">,</span> <span class="n">q_params</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_print_status</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">aggr_took</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">model</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="p">(</span><span class="n">cls_took</span><span class="o">+</span><span class="n">aggr_took</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_prepare_nonaggr_model</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">job</span><span class="p">(</span><span class="n">params</span><span class="p">):</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">set_params</span><span class="p">(</span><span class="o">**</span><span class="n">params</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_training</span><span class="p">)</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">protocol</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">score</span>
|
||||
|
||||
<span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_error_handler</span><span class="p">(</span><span class="n">job</span><span class="p">,</span> <span class="n">params</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_print_status</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">model</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_break_down_fit</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Decides whether to break down the fit phase in two (classifier-fit followed by aggregation-fit).</span>
|
||||
<span class="sd"> In order to do so, some conditions should be met: a) the quantifier is of type aggregative,</span>
|
||||
<span class="sd"> b) the set of hyperparameters can be split into two disjoint non-empty groups.</span>
|
||||
|
||||
<span class="sd"> :return: True if the conditions are met, False otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="p">,</span> <span class="n">AggregativeQuantifier</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="kc">False</span>
|
||||
<span class="n">cls_configs</span><span class="p">,</span> <span class="n">q_configs</span> <span class="o">=</span> <span class="n">group_params</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">cls_configs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="ow">or</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">q_configs</span><span class="p">)</span><span class="o">==</span><span class="mi">1</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="kc">False</span>
|
||||
<span class="k">return</span> <span class="kc">True</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_compute_scores_aggregative</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">training</span><span class="p">):</span>
|
||||
<span class="c1"># break down the set of hyperparameters into two: classifier-specific, quantifier-specific</span>
|
||||
<span class="n">cls_configs</span><span class="p">,</span> <span class="n">q_configs</span> <span class="o">=</span> <span class="n">group_params</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># train all classifiers and get the predictions</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_training</span> <span class="o">=</span> <span class="n">training</span>
|
||||
<span class="n">cls_outs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_prepare_classifier</span><span class="p">,</span>
|
||||
<span class="n">cls_configs</span><span class="p">,</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="c1"># filter out classifier configurations that yielded any error</span>
|
||||
<span class="n">success_outs</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span><span class="p">),</span> <span class="n">cls_config</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">cls_outs</span><span class="p">,</span> <span class="n">cls_configs</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">status</span><span class="o">.</span><span class="n">success</span><span class="p">():</span>
|
||||
<span class="n">success_outs</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">model</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">took</span><span class="p">,</span> <span class="n">cls_config</span><span class="p">))</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">error_collector</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">success_outs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'No valid configuration found for the classifier!'</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># explore the quantifier-specific hyperparameters for each valid training configuration</span>
|
||||
<span class="n">aggr_configs</span> <span class="o">=</span> <span class="p">[(</span><span class="o">*</span><span class="n">out</span><span class="p">,</span> <span class="n">q_config</span><span class="p">)</span> <span class="k">for</span> <span class="n">out</span><span class="p">,</span> <span class="n">q_config</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">product</span><span class="p">(</span><span class="n">success_outs</span><span class="p">,</span> <span class="n">q_configs</span><span class="p">)]</span>
|
||||
<span class="n">aggr_outs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_prepare_aggregation</span><span class="p">,</span>
|
||||
<span class="n">aggr_configs</span><span class="p">,</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">aggr_outs</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_compute_scores_nonaggregative</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">training</span><span class="p">):</span>
|
||||
<span class="n">configs</span> <span class="o">=</span> <span class="n">expand_grid</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_training</span> <span class="o">=</span> <span class="n">training</span>
|
||||
<span class="n">scores</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">parallel</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_prepare_nonaggr_model</span><span class="p">,</span>
|
||||
<span class="n">configs</span><span class="p">,</span>
|
||||
<span class="n">seed</span><span class="o">=</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'_R_SEED'</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span>
|
||||
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">scores</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_print_status</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">status</span><span class="o">.</span><span class="n">success</span><span class="p">():</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'hyperparams=[</span><span class="si">{</span><span class="n">params</span><span class="si">}</span><span class="s1">]</span><span class="se">\t</span><span class="s1"> got </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1"> = </span><span class="si">{</span><span class="n">score</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1"> [took </span><span class="si">{</span><span class="n">took</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">s]'</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'error=</span><span class="si">{</span><span class="n">status</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="GridSearchQ.fit"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.GridSearchQ.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">training</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">""" Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing</span>
|
||||
<span class="sd"> the error metric.</span>
|
||||
|
||||
<span class="sd"> :param training: the training set on which to optimize the hyperparameters</span>
|
||||
<span class="sd"> :return: self</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">refit</span> <span class="ow">and</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">protocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">RuntimeWarning</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s1">'"refit" was requested, but the protocol does not implement '</span>
|
||||
<span class="sa">f</span><span class="s1">'the </span><span class="si">{</span><span class="n">OnLabelledCollectionProtocol</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s1"> interface'</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">error_collector</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'starting model selection with n_jobs=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_break_down_fit</span><span class="p">():</span>
|
||||
<span class="n">results</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_compute_scores_aggregative</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">results</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_compute_scores_nonaggregative</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_scores_</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_score_</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">for</span> <span class="n">model</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">status</span><span class="o">.</span><span class="n">success</span><span class="p">():</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">best_score_</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">score</span> <span class="o"><</span> <span class="bp">self</span><span class="o">.</span><span class="n">best_score_</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_score_</span> <span class="o">=</span> <span class="n">score</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_params_</span> <span class="o">=</span> <span class="n">params</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_model_</span> <span class="o">=</span> <span class="n">model</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_scores_</span><span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">params</span><span class="p">)]</span> <span class="o">=</span> <span class="n">score</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_scores_</span><span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">params</span><span class="p">)]</span> <span class="o">=</span> <span class="n">status</span><span class="o">.</span><span class="n">status</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">error_collector</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tend</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span><span class="o">-</span><span class="n">tinit</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">best_score_</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'no combination of hyperparameters seemed to work'</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'optimization finished: best params </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">best_params_</span><span class="si">}</span><span class="s1"> (score=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">best_score_</span><span class="si">:</span><span class="s1">.5f</span><span class="si">}</span><span class="s1">) '</span>
|
||||
<span class="sa">f</span><span class="s1">'[took </span><span class="si">{</span><span class="n">tend</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">s]'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">no_errors</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">error_collector</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">no_errors</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'warning: </span><span class="si">{</span><span class="n">no_errors</span><span class="si">}</span><span class="s1"> errors found'</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">err</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">error_collector</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="si">{</span><span class="nb">str</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">refit</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">protocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">):</span>
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_sout</span><span class="p">(</span><span class="sa">f</span><span class="s1">'refitting on the whole development set'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_model_</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">protocol</span><span class="o">.</span><span class="n">get_labelled_collection</span><span class="p">())</span>
|
||||
<span class="n">tend</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">tinit</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">refit_time_</span> <span class="o">=</span> <span class="n">tend</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="c1"># already checked</span>
|
||||
<span class="k">raise</span> <span class="ne">RuntimeWarning</span><span class="p">(</span><span class="sa">f</span><span class="s1">'the model cannot be refit on the whole dataset'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span></div>
|
||||
|
||||
<div class="viewcode-block" id="GridSearchQ.quantify"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.GridSearchQ.quantify">[docs]</a> <span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Estimate class prevalence values using the best model found after calling the :meth:`fit` method.</span>
|
||||
|
||||
<span class="sd"> :param instances: sample contanining the instances</span>
|
||||
<span class="sd"> :return: a ndarray of shape `(n_classes)` with class prevalence estimates as according to the best model found</span>
|
||||
<span class="sd"> by the model selection process.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'best_model_'</span><span class="p">),</span> <span class="s1">'quantify called before fit'</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">best_model</span><span class="p">()</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="GridSearchQ.set_params"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.GridSearchQ.set_params">[docs]</a> <span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">parameters</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Sets the hyper-parameters to explore.</span>
|
||||
|
||||
<span class="sd"> :param parameters: a dictionary with keys the parameter names and values the list of values to explore</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span> <span class="o">=</span> <span class="n">parameters</span></div>
|
||||
|
||||
<div class="viewcode-block" id="GridSearchQ.get_params"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.GridSearchQ.get_params">[docs]</a> <span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""Returns the dictionary of hyper-parameters to explore (`param_grid`)</span>
|
||||
|
||||
<span class="sd"> :param deep: Unused</span>
|
||||
<span class="sd"> :return: the dictionary `param_grid`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">param_grid</span></div>
|
||||
|
||||
<div class="viewcode-block" id="GridSearchQ.best_model"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.GridSearchQ.best_model">[docs]</a> <span class="k">def</span> <span class="nf">best_model</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the best model found after calling the :meth:`fit` method, i.e., the one trained on the combination</span>
|
||||
<span class="sd"> of hyper-parameters that minimized the error function.</span>
|
||||
|
||||
<span class="sd"> :return: a trained quantifier</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'best_model_'</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">best_model_</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'best_model called before fit'</span><span class="p">)</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_error_handler</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Endorses one job with two returned values: the status, and the time of execution</span>
|
||||
|
||||
<span class="sd"> :param func: the function to be called</span>
|
||||
<span class="sd"> :param params: parameters of the function</span>
|
||||
<span class="sd"> :return: `tuple(out, status, time)` where `out` is the function output,</span>
|
||||
<span class="sd"> `status` is an enum value from `Status`, and `time` is the time it</span>
|
||||
<span class="sd"> took to complete the call</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">output</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_handle</span><span class="p">(</span><span class="n">status</span><span class="p">,</span> <span class="n">exception</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">raise_errors</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="n">exception</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">ConfigStatus</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">status</span><span class="p">)</span>
|
||||
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="k">with</span> <span class="n">timeout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">timeout</span><span class="p">):</span>
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">output</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="n">params</span><span class="p">)</span>
|
||||
<span class="n">status</span> <span class="o">=</span> <span class="n">ConfigStatus</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">Status</span><span class="o">.</span><span class="n">SUCCESS</span><span class="p">)</span>
|
||||
|
||||
<span class="k">except</span> <span class="ne">TimeoutError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
|
||||
<span class="n">status</span> <span class="o">=</span> <span class="n">_handle</span><span class="p">(</span><span class="n">Status</span><span class="o">.</span><span class="n">TIMEOUT</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
|
||||
|
||||
<span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
|
||||
<span class="n">status</span> <span class="o">=</span> <span class="n">_handle</span><span class="p">(</span><span class="n">Status</span><span class="o">.</span><span class="n">INVALID</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
|
||||
|
||||
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
|
||||
<span class="n">status</span> <span class="o">=</span> <span class="n">_handle</span><span class="p">(</span><span class="n">Status</span><span class="o">.</span><span class="n">ERROR</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
|
||||
|
||||
<span class="n">took</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">tinit</span>
|
||||
<span class="k">return</span> <span class="n">output</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">took</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="cross_val_predict"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.cross_val_predict">[docs]</a><span class="k">def</span> <span class="nf">cross_val_predict</span><span class="p">(</span><span class="n">quantifier</span><span class="p">:</span> <span class="n">BaseQuantifier</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">nfolds</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Akin to `scikit-learn's cross_val_predict <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html>`_</span>
|
||||
<span class="sd"> but for quantification.</span>
|
||||
|
||||
<span class="sd"> :param quantifier: a quantifier issuing class prevalence values</span>
|
||||
<span class="sd"> :param data: a labelled collection</span>
|
||||
<span class="sd"> :param nfolds: number of folds for k-fold cross validation generation</span>
|
||||
<span class="sd"> :param random_state: random seed for reproducibility</span>
|
||||
<span class="sd"> :return: a vector of class prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">total_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">)</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">kFCV</span><span class="p">(</span><span class="n">nfolds</span><span class="o">=</span><span class="n">nfolds</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="n">quantifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
<span class="n">fold_prev</span> <span class="o">=</span> <span class="n">quantifier</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="n">rel_size</span> <span class="o">=</span> <span class="mf">1.</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">test</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
|
||||
<span class="n">total_prev</span> <span class="o">+=</span> <span class="n">fold_prev</span><span class="o">*</span><span class="n">rel_size</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">total_prev</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="expand_grid"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.expand_grid">[docs]</a><span class="k">def</span> <span class="nf">expand_grid</span><span class="p">(</span><span class="n">param_grid</span><span class="p">:</span> <span class="nb">dict</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Expands a param_grid dictionary as a list of configurations.</span>
|
||||
<span class="sd"> Example:</span>
|
||||
|
||||
<span class="sd"> >>> combinations = expand_grid({'A': [1, 10, 100], 'B': [True, False]})</span>
|
||||
<span class="sd"> >>> print(combinations)</span>
|
||||
<span class="sd"> >>> [{'A': 1, 'B': True}, {'A': 1, 'B': False}, {'A': 10, 'B': True}, {'A': 10, 'B': False}, {'A': 100, 'B': True}, {'A': 100, 'B': False}]</span>
|
||||
|
||||
<span class="sd"> :param param_grid: dictionary with keys representing hyper-parameter names, and values representing the range</span>
|
||||
<span class="sd"> to explore for that hyper-parameter</span>
|
||||
<span class="sd"> :return: a list of configurations, i.e., combinations of hyper-parameter assignments in the grid.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">params_keys</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">param_grid</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
|
||||
<span class="n">params_values</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">param_grid</span><span class="o">.</span><span class="n">values</span><span class="p">())</span>
|
||||
<span class="n">configs</span> <span class="o">=</span> <span class="p">[{</span><span class="n">k</span><span class="p">:</span> <span class="n">combs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">params_keys</span><span class="p">)}</span> <span class="k">for</span> <span class="n">combs</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">product</span><span class="p">(</span><span class="o">*</span><span class="n">params_values</span><span class="p">)]</span>
|
||||
<span class="k">return</span> <span class="n">configs</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="group_params"><a class="viewcode-back" href="../../quapy.html#quapy.model_selection.group_params">[docs]</a><span class="k">def</span> <span class="nf">group_params</span><span class="p">(</span><span class="n">param_grid</span><span class="p">:</span> <span class="nb">dict</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Partitions a param_grid dictionary as two lists of configurations, one for the classifier-specific</span>
|
||||
<span class="sd"> hyper-parameters, and another for que quantifier-specific hyper-parameters</span>
|
||||
|
||||
<span class="sd"> :param param_grid: dictionary with keys representing hyper-parameter names, and values representing the range</span>
|
||||
<span class="sd"> to explore for that hyper-parameter</span>
|
||||
<span class="sd"> :return: two expanded grids of configurations, one for the classifier, another for the quantifier</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">classifier_params</span><span class="p">,</span> <span class="n">quantifier_params</span> <span class="o">=</span> <span class="p">{},</span> <span class="p">{}</span>
|
||||
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">values</span> <span class="ow">in</span> <span class="n">param_grid</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
|
||||
<span class="k">if</span> <span class="n">key</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'classifier__'</span><span class="p">)</span> <span class="ow">or</span> <span class="n">key</span> <span class="o">==</span> <span class="s1">'val_split'</span><span class="p">:</span>
|
||||
<span class="n">classifier_params</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">values</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">quantifier_params</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">values</span>
|
||||
|
||||
<span class="n">classifier_configs</span> <span class="o">=</span> <span class="n">expand_grid</span><span class="p">(</span><span class="n">classifier_params</span><span class="p">)</span>
|
||||
<span class="n">quantifier_configs</span> <span class="o">=</span> <span class="n">expand_grid</span><span class="p">(</span><span class="n">quantifier_params</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">classifier_configs</span><span class="p">,</span> <span class="n">quantifier_configs</span></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,687 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.plot — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.plot</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.plot</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
|
||||
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
|
||||
<span class="kn">from</span> <span class="nn">matplotlib.cm</span> <span class="kn">import</span> <span class="n">get_cmap</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">cm</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">ttest_ind_from_stats</span>
|
||||
<span class="kn">from</span> <span class="nn">matplotlib.ticker</span> <span class="kn">import</span> <span class="n">ScalarFormatter</span>
|
||||
<span class="kn">import</span> <span class="nn">math</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s1">'figure.figsize'</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s1">'figure.dpi'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">200</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s1">'font.size'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">18</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binary_diagonal">
|
||||
<a class="viewcode-back" href="../../quapy.html#quapy.plot.binary_diagonal">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">binary_diagonal</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">show_std</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">train_prev</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">savepath</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">method_order</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> The diagonal plot displays the predicted prevalence values (along the y-axis) as a function of the true prevalence</span>
|
||||
<span class="sd"> values (along the x-axis). The optimal quantifier is described by the diagonal (0,0)-(1,1) of the plot (hence the</span>
|
||||
<span class="sd"> name). It is convenient for binary quantification problems, though it can be used for multiclass problems by</span>
|
||||
<span class="sd"> indicating which class is to be taken as the positive class. (For multiclass quantification problems, other plots</span>
|
||||
<span class="sd"> like the :meth:`error_by_drift` might be preferable though).</span>
|
||||
|
||||
<span class="sd"> :param method_names: array-like with the method names for each experiment</span>
|
||||
<span class="sd"> :param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for</span>
|
||||
<span class="sd"> each experiment</span>
|
||||
<span class="sd"> :param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)</span>
|
||||
<span class="sd"> for each experiment</span>
|
||||
<span class="sd"> :param pos_class: index of the positive class</span>
|
||||
<span class="sd"> :param title: the title to be displayed in the plot</span>
|
||||
<span class="sd"> :param show_std: whether or not to show standard deviations (represented by color bands). This might be inconvenient</span>
|
||||
<span class="sd"> for cases in which many methods are compared, or when the standard deviations are high -- default True)</span>
|
||||
<span class="sd"> :param legend: whether or not to display the leyend (default True)</span>
|
||||
<span class="sd"> :param train_prev: if indicated (default is None), the training prevalence (for the positive class) is hightlighted</span>
|
||||
<span class="sd"> in the plot. This is convenient when all the experiments have been conducted in the same dataset.</span>
|
||||
<span class="sd"> :param savepath: path where to save the plot. If not indicated (as default), the plot is shown.</span>
|
||||
<span class="sd"> :param method_order: if indicated (default is None), imposes the order in which the methods are processed (i.e.,</span>
|
||||
<span class="sd"> listed in the legend and associated with matplotlib colors).</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_aspect</span><span class="p">(</span><span class="s1">'equal'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="s1">'--k'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s1">'ideal'</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">_merge</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">)</span>
|
||||
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">))</span>
|
||||
<span class="k">if</span> <span class="n">method_order</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">table</span> <span class="o">=</span> <span class="p">{</span><span class="n">method_name</span><span class="p">:[</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">]</span> <span class="k">for</span> <span class="n">method_name</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="ow">in</span> <span class="n">order</span><span class="p">}</span>
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="p">[(</span><span class="n">method_name</span><span class="p">,</span> <span class="o">*</span><span class="n">table</span><span class="p">[</span><span class="n">method_name</span><span class="p">])</span> <span class="k">for</span> <span class="n">method_name</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">]</span>
|
||||
|
||||
<span class="n">NUM_COLORS</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">method_names</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">NUM_COLORS</span><span class="o">></span><span class="mi">10</span><span class="p">:</span>
|
||||
<span class="n">cm</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">get_cmap</span><span class="p">(</span><span class="s1">'tab20'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_prop_cycle</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="p">[</span><span class="n">cm</span><span class="p">(</span><span class="mf">1.</span> <span class="o">*</span> <span class="n">i</span> <span class="o">/</span> <span class="n">NUM_COLORS</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUM_COLORS</span><span class="p">)])</span>
|
||||
<span class="k">for</span> <span class="n">method</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="ow">in</span> <span class="n">order</span><span class="p">:</span>
|
||||
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">true_prev</span><span class="p">[:,</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">estim_prev</span><span class="p">[:,</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
|
||||
<span class="n">x_ticks</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">true_prev</span><span class="p">)</span>
|
||||
<span class="n">x_ticks</span><span class="o">.</span><span class="n">sort</span><span class="p">()</span>
|
||||
<span class="n">y_ave</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="n">estim_prev</span><span class="p">[</span><span class="n">true_prev</span> <span class="o">==</span> <span class="n">x</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">x_ticks</span><span class="p">])</span>
|
||||
<span class="n">y_std</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="n">estim_prev</span><span class="p">[</span><span class="n">true_prev</span> <span class="o">==</span> <span class="n">x</span><span class="p">]</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">x_ticks</span><span class="p">])</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">errorbar</span><span class="p">(</span><span class="n">x_ticks</span><span class="p">,</span> <span class="n">y_ave</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="s1">'-'</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s1">'o'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">method</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">show_std</span><span class="p">:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">fill_between</span><span class="p">(</span><span class="n">x_ticks</span><span class="p">,</span> <span class="n">y_ave</span> <span class="o">-</span> <span class="n">y_std</span><span class="p">,</span> <span class="n">y_ave</span> <span class="o">+</span> <span class="n">y_std</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.25</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">train_prev</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">train_prev</span> <span class="o">=</span> <span class="n">train_prev</span><span class="p">[</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">train_prev</span><span class="p">,</span> <span class="n">train_prev</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s1">'c'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s1">'tr-prev'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s1">'k'</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s1">'true prevalence'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s1">'estimated prevalence'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="n">title</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">legend</span><span class="p">:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s1">'center left'</span><span class="p">,</span> <span class="n">bbox_to_anchor</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">))</span>
|
||||
<span class="c1"># box = ax.get_position()</span>
|
||||
<span class="c1"># ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])</span>
|
||||
<span class="c1"># ax.legend(loc='lower center',</span>
|
||||
<span class="c1"># bbox_to_anchor=(1, -0.5),</span>
|
||||
<span class="c1"># ncol=(len(method_names)+1)//2)</span>
|
||||
|
||||
<span class="n">_save_or_show</span><span class="p">(</span><span class="n">savepath</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binary_bias_global">
|
||||
<a class="viewcode-back" href="../../quapy.html#quapy.plot.binary_bias_global">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">binary_bias_global</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">savepath</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Box-plots displaying the global bias (i.e., signed error computed as the estimated value minus the true value)</span>
|
||||
<span class="sd"> for each quantification method with respect to a given positive class.</span>
|
||||
|
||||
<span class="sd"> :param method_names: array-like with the method names for each experiment</span>
|
||||
<span class="sd"> :param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for</span>
|
||||
<span class="sd"> each experiment</span>
|
||||
<span class="sd"> :param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)</span>
|
||||
<span class="sd"> for each experiment</span>
|
||||
<span class="sd"> :param pos_class: index of the positive class</span>
|
||||
<span class="sd"> :param title: the title to be displayed in the plot</span>
|
||||
<span class="sd"> :param savepath: path where to save the plot. If not indicated (as default), the plot is shown.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">_merge</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">)</span>
|
||||
|
||||
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span>
|
||||
|
||||
<span class="n">data</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">method</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">):</span>
|
||||
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">true_prev</span><span class="p">[:,</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">estim_prev</span><span class="p">[:,</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
<span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prev</span><span class="o">-</span><span class="n">true_prev</span><span class="p">)</span>
|
||||
<span class="n">labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="n">labels</span><span class="p">,</span> <span class="n">patch_artist</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">showmeans</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">rotation</span><span class="o">=</span><span class="mi">45</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">ylabel</span><span class="o">=</span><span class="s1">'error bias'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="n">title</span><span class="p">)</span>
|
||||
|
||||
<span class="n">_save_or_show</span><span class="p">(</span><span class="n">savepath</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binary_bias_bins">
|
||||
<a class="viewcode-back" href="../../quapy.html#quapy.plot.binary_bias_bins">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">binary_bias_bins</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">pos_class</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">nbins</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">colormap</span><span class="o">=</span><span class="n">cm</span><span class="o">.</span><span class="n">tab10</span><span class="p">,</span>
|
||||
<span class="n">vertical_xticks</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">savepath</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Box-plots displaying the local bias (i.e., signed error computed as the estimated value minus the true value)</span>
|
||||
<span class="sd"> for different bins of (true) prevalence of the positive classs, for each quantification method.</span>
|
||||
|
||||
<span class="sd"> :param method_names: array-like with the method names for each experiment</span>
|
||||
<span class="sd"> :param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for</span>
|
||||
<span class="sd"> each experiment</span>
|
||||
<span class="sd"> :param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)</span>
|
||||
<span class="sd"> for each experiment</span>
|
||||
<span class="sd"> :param pos_class: index of the positive class</span>
|
||||
<span class="sd"> :param title: the title to be displayed in the plot</span>
|
||||
<span class="sd"> :param nbins: number of bins</span>
|
||||
<span class="sd"> :param colormap: the matplotlib colormap to use (default cm.tab10)</span>
|
||||
<span class="sd"> :param vertical_xticks: whether or not to add secondary grid (default is False)</span>
|
||||
<span class="sd"> :param legend: whether or not to display the legend (default is True)</span>
|
||||
<span class="sd"> :param savepath: path where to save the plot. If not indicated (as default), the plot is shown.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="kn">from</span> <span class="nn">pylab</span> <span class="kn">import</span> <span class="n">boxplot</span><span class="p">,</span> <span class="n">plot</span><span class="p">,</span> <span class="n">setp</span>
|
||||
|
||||
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span>
|
||||
|
||||
<span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">_merge</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">)</span>
|
||||
|
||||
<span class="n">bins</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">nbins</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">binwidth</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="n">nbins</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="k">for</span> <span class="n">method</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">):</span>
|
||||
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">true_prev</span><span class="p">[:,</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">estim_prev</span><span class="p">[:,</span><span class="n">pos_class</span><span class="p">]</span>
|
||||
|
||||
<span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">inds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">digitize</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">ind</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">)):</span>
|
||||
<span class="n">selected</span> <span class="o">=</span> <span class="n">inds</span><span class="o">==</span><span class="n">ind</span>
|
||||
<span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prev</span><span class="p">[</span><span class="n">selected</span><span class="p">]</span> <span class="o">-</span> <span class="n">true_prev</span><span class="p">[</span><span class="n">selected</span><span class="p">])</span>
|
||||
|
||||
<span class="n">nmethods</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">method_names</span><span class="p">)</span>
|
||||
<span class="n">boxwidth</span> <span class="o">=</span> <span class="n">binwidth</span><span class="o">/</span><span class="p">(</span><span class="n">nmethods</span><span class="o">+</span><span class="mi">4</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="nb">bin</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">bins</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]):</span>
|
||||
<span class="n">boxdata</span> <span class="o">=</span> <span class="p">[</span><span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">method</span> <span class="ow">in</span> <span class="n">method_names</span><span class="p">]</span>
|
||||
<span class="n">positions</span> <span class="o">=</span> <span class="p">[</span><span class="nb">bin</span><span class="o">+</span><span class="p">(</span><span class="n">i</span><span class="o">*</span><span class="n">boxwidth</span><span class="p">)</span><span class="o">+</span><span class="mi">2</span><span class="o">*</span><span class="n">boxwidth</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">method_names</span><span class="p">)]</span>
|
||||
<span class="n">box</span> <span class="o">=</span> <span class="n">boxplot</span><span class="p">(</span><span class="n">boxdata</span><span class="p">,</span> <span class="n">showmeans</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">positions</span><span class="o">=</span><span class="n">positions</span><span class="p">,</span> <span class="n">widths</span> <span class="o">=</span> <span class="n">boxwidth</span><span class="p">,</span> <span class="n">sym</span><span class="o">=</span><span class="s1">'+'</span><span class="p">,</span> <span class="n">patch_artist</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">boxid</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">method_names</span><span class="p">)):</span>
|
||||
<span class="n">c</span> <span class="o">=</span> <span class="n">colormap</span><span class="o">.</span><span class="n">colors</span><span class="p">[</span><span class="n">boxid</span><span class="o">%</span><span class="nb">len</span><span class="p">(</span><span class="n">colormap</span><span class="o">.</span><span class="n">colors</span><span class="p">)]</span>
|
||||
<span class="n">setp</span><span class="p">(</span><span class="n">box</span><span class="p">[</span><span class="s1">'fliers'</span><span class="p">][</span><span class="n">boxid</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="n">c</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s1">'+'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mf">3.</span><span class="p">,</span> <span class="n">markeredgecolor</span><span class="o">=</span><span class="n">c</span><span class="p">)</span>
|
||||
<span class="n">setp</span><span class="p">(</span><span class="n">box</span><span class="p">[</span><span class="s1">'boxes'</span><span class="p">][</span><span class="n">boxid</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="n">c</span><span class="p">)</span>
|
||||
<span class="n">setp</span><span class="p">(</span><span class="n">box</span><span class="p">[</span><span class="s1">'medians'</span><span class="p">][</span><span class="n">boxid</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s1">'k'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">major_xticks_positions</span><span class="p">,</span> <span class="n">minor_xticks_positions</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="n">major_xticks_labels</span><span class="p">,</span> <span class="n">minor_xticks_labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">bins</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]):</span>
|
||||
<span class="n">major_xticks_positions</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
|
||||
<span class="n">minor_xticks_positions</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">b</span> <span class="o">+</span> <span class="n">binwidth</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">major_xticks_labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
|
||||
<span class="n">minor_xticks_labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="sa">f</span><span class="s1">'[</span><span class="si">{</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="si">:</span><span class="s1">.2f</span><span class="si">}</span><span class="s1">-</span><span class="si">{</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="s1">.2f</span><span class="si">}</span><span class="s1">)'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xticks</span><span class="p">(</span><span class="n">major_xticks_positions</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xticks</span><span class="p">(</span><span class="n">minor_xticks_positions</span><span class="p">,</span> <span class="n">minor</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xticklabels</span><span class="p">(</span><span class="n">major_xticks_labels</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xticklabels</span><span class="p">(</span><span class="n">minor_xticks_labels</span><span class="p">,</span> <span class="n">minor</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">rotation</span><span class="o">=</span><span class="s1">'vertical'</span> <span class="k">if</span> <span class="n">vertical_xticks</span> <span class="k">else</span> <span class="s1">'horizontal'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">vertical_xticks</span><span class="p">:</span>
|
||||
<span class="c1"># Pad margins so that markers don't get clipped by the axes</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">margins</span><span class="p">(</span><span class="mf">0.2</span><span class="p">)</span>
|
||||
<span class="c1"># Tweak spacing to prevent clipping of tick-labels</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">subplots_adjust</span><span class="p">(</span><span class="n">bottom</span><span class="o">=</span><span class="mf">0.15</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">legend</span><span class="p">:</span>
|
||||
<span class="c1"># adds the legend to the list hs, initialized with the "ideal" quantifier (one that has 0 bias across all bins. i.e.</span>
|
||||
<span class="c1"># a line from (0,0) to (1,0). The other elements are simply labelled dot-plots that are to be removed (setting</span>
|
||||
<span class="c1"># set_visible to False for all but the first element) after the legend has been placed</span>
|
||||
<span class="n">hs</span><span class="o">=</span><span class="p">[</span><span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s1">'-k'</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=</span><span class="mi">2</span><span class="p">)[</span><span class="mi">0</span><span class="p">]]</span>
|
||||
<span class="k">for</span> <span class="n">colorid</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">method_names</span><span class="p">)):</span>
|
||||
<span class="n">color</span><span class="o">=</span><span class="n">colormap</span><span class="o">.</span><span class="n">colors</span><span class="p">[</span><span class="n">colorid</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">colormap</span><span class="o">.</span><span class="n">colors</span><span class="p">)]</span>
|
||||
<span class="n">h</span><span class="p">,</span> <span class="o">=</span> <span class="n">plot</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s1">'-s'</span><span class="p">,</span> <span class="n">markerfacecolor</span><span class="o">=</span><span class="n">color</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">'k'</span><span class="p">,</span><span class="n">mec</span><span class="o">=</span><span class="n">color</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mf">1.</span><span class="p">)</span>
|
||||
<span class="n">hs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>
|
||||
<span class="n">box</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">get_position</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_position</span><span class="p">([</span><span class="n">box</span><span class="o">.</span><span class="n">x0</span><span class="p">,</span> <span class="n">box</span><span class="o">.</span><span class="n">y0</span><span class="p">,</span> <span class="n">box</span><span class="o">.</span><span class="n">width</span> <span class="o">*</span> <span class="mf">0.8</span><span class="p">,</span> <span class="n">box</span><span class="o">.</span><span class="n">height</span><span class="p">])</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">hs</span><span class="p">,</span> <span class="p">[</span><span class="s1">'ideal'</span><span class="p">]</span><span class="o">+</span><span class="n">method_names</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="s1">'center left'</span><span class="p">,</span> <span class="n">bbox_to_anchor</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">))</span>
|
||||
<span class="p">[</span><span class="n">h</span><span class="o">.</span><span class="n">set_visible</span><span class="p">(</span><span class="kc">False</span><span class="p">)</span> <span class="k">for</span> <span class="n">h</span> <span class="ow">in</span> <span class="n">hs</span><span class="p">[</span><span class="mi">1</span><span class="p">:]]</span>
|
||||
|
||||
<span class="c1"># x-axis and y-axis labels and limits</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s1">'prevalence'</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s1">'error bias'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="n">title</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">_save_or_show</span><span class="p">(</span><span class="n">savepath</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="error_by_drift">
|
||||
<a class="viewcode-back" href="../../quapy.html#quapy.plot.error_by_drift">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">error_by_drift</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span><span class="p">,</span>
|
||||
<span class="n">n_bins</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">error_name</span><span class="o">=</span><span class="s1">'ae'</span><span class="p">,</span> <span class="n">show_std</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="n">show_density</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">show_legend</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">logscale</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="n">title</span><span class="o">=</span><span class="sa">f</span><span class="s1">'Quantification error as a function of distribution shift'</span><span class="p">,</span>
|
||||
<span class="n">vlines</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">method_order</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">savepath</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Plots the error (along the x-axis, as measured in terms of `error_name`) as a function of the train-test shift</span>
|
||||
<span class="sd"> (along the y-axis, as measured in terms of :meth:`quapy.error.ae`). This plot is useful especially for multiclass</span>
|
||||
<span class="sd"> problems, in which "diagonal plots" may be cumbersone, and in order to gain understanding about how methods</span>
|
||||
<span class="sd"> fare in different regions of the prior probability shift spectrum (e.g., in the low-shift regime vs. in the</span>
|
||||
<span class="sd"> high-shift regime).</span>
|
||||
|
||||
<span class="sd"> :param method_names: array-like with the method names for each experiment</span>
|
||||
<span class="sd"> :param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for</span>
|
||||
<span class="sd"> each experiment</span>
|
||||
<span class="sd"> :param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)</span>
|
||||
<span class="sd"> for each experiment</span>
|
||||
<span class="sd"> :param tr_prevs: training prevalence of each experiment</span>
|
||||
<span class="sd"> :param n_bins: number of bins in which the y-axis is to be divided (default is 20)</span>
|
||||
<span class="sd"> :param error_name: a string representing the name of an error function (as defined in `quapy.error`, default is "ae")</span>
|
||||
<span class="sd"> :param show_std: whether or not to show standard deviations as color bands (default is False)</span>
|
||||
<span class="sd"> :param show_density: whether or not to display the distribution of experiments for each bin (default is True)</span>
|
||||
<span class="sd"> :param show_density: whether or not to display the legend of the chart (default is True)</span>
|
||||
<span class="sd"> :param logscale: whether or not to log-scale the y-error measure (default is False)</span>
|
||||
<span class="sd"> :param title: title of the plot (default is "Quantification error as a function of distribution shift")</span>
|
||||
<span class="sd"> :param vlines: array-like list of values (default is None). If indicated, highlights some regions of the space</span>
|
||||
<span class="sd"> using vertical dotted lines.</span>
|
||||
<span class="sd"> :param method_order: if indicated (default is None), imposes the order in which the methods are processed (i.e.,</span>
|
||||
<span class="sd"> listed in the legend and associated with matplotlib colors).</span>
|
||||
<span class="sd"> :param savepath: path where to save the plot. If not indicated (as default), the plot is shown.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span>
|
||||
|
||||
<span class="n">x_error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">ae</span>
|
||||
<span class="n">y_error</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="p">,</span> <span class="n">error_name</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># get all data as a dictionary {'m':{'x':ndarray, 'y':ndarray}} where 'm' is a method name (in the same</span>
|
||||
<span class="c1"># order as in method_order (if specified), and where 'x' are the train-test shifts (computed as according to</span>
|
||||
<span class="c1"># x_error function) and 'y' is the estim-test shift (computed as according to y_error)</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">_join_data_by_drift</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span><span class="p">,</span> <span class="n">x_error</span><span class="p">,</span> <span class="n">y_error</span><span class="p">,</span> <span class="n">method_order</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">method_order</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">method_order</span> <span class="o">=</span> <span class="n">method_names</span>
|
||||
|
||||
<span class="n">_set_colors</span><span class="p">(</span><span class="n">ax</span><span class="p">,</span> <span class="n">n_methods</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">method_order</span><span class="p">))</span>
|
||||
|
||||
<span class="n">bins</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">n_bins</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">binwidth</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">n_bins</span>
|
||||
<span class="n">min_x</span><span class="p">,</span> <span class="n">max_x</span><span class="p">,</span> <span class="n">min_y</span><span class="p">,</span> <span class="n">max_y</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
|
||||
<span class="n">npoints</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">method</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">:</span>
|
||||
<span class="n">tr_test_drifts</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'x'</span><span class="p">]</span>
|
||||
<span class="n">method_drifts</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'y'</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">logscale</span><span class="p">:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_yscale</span><span class="p">(</span><span class="s2">"log"</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">yaxis</span><span class="o">.</span><span class="n">set_major_formatter</span><span class="p">(</span><span class="n">ScalarFormatter</span><span class="p">())</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">yaxis</span><span class="o">.</span><span class="n">get_major_formatter</span><span class="p">()</span><span class="o">.</span><span class="n">set_scientific</span><span class="p">(</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">minorticks_off</span><span class="p">()</span>
|
||||
|
||||
<span class="n">inds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">digitize</span><span class="p">(</span><span class="n">tr_test_drifts</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">xs</span><span class="p">,</span> <span class="n">ys</span><span class="p">,</span> <span class="n">ystds</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">p</span><span class="p">,</span><span class="n">ind</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">))):</span>
|
||||
<span class="n">selected</span> <span class="o">=</span> <span class="n">inds</span><span class="o">==</span><span class="n">ind</span>
|
||||
<span class="k">if</span> <span class="n">selected</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">xs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ind</span><span class="o">*</span><span class="n">binwidth</span><span class="o">-</span><span class="n">binwidth</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">ys</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">method_drifts</span><span class="p">[</span><span class="n">selected</span><span class="p">]))</span>
|
||||
<span class="n">ystds</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">std</span><span class="p">(</span><span class="n">method_drifts</span><span class="p">[</span><span class="n">selected</span><span class="p">]))</span>
|
||||
<span class="n">npoints</span><span class="p">[</span><span class="n">p</span><span class="p">]</span> <span class="o">+=</span> <span class="nb">len</span><span class="p">(</span><span class="n">method_drifts</span><span class="p">[</span><span class="n">selected</span><span class="p">])</span>
|
||||
|
||||
<span class="n">xs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">xs</span><span class="p">)</span>
|
||||
<span class="n">ys</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">ys</span><span class="p">)</span>
|
||||
<span class="n">ystds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">ystds</span><span class="p">)</span>
|
||||
|
||||
<span class="n">min_x_method</span><span class="p">,</span> <span class="n">max_x_method</span><span class="p">,</span> <span class="n">min_y_method</span><span class="p">,</span> <span class="n">max_y_method</span> <span class="o">=</span> <span class="n">xs</span><span class="o">.</span><span class="n">min</span><span class="p">(),</span> <span class="n">xs</span><span class="o">.</span><span class="n">max</span><span class="p">(),</span> <span class="n">ys</span><span class="o">.</span><span class="n">min</span><span class="p">(),</span> <span class="n">ys</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
|
||||
<span class="n">min_x</span> <span class="o">=</span> <span class="n">min_x_method</span> <span class="k">if</span> <span class="n">min_x</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">min_x_method</span> <span class="o"><</span> <span class="n">min_x</span> <span class="k">else</span> <span class="n">min_x</span>
|
||||
<span class="n">max_x</span> <span class="o">=</span> <span class="n">max_x_method</span> <span class="k">if</span> <span class="n">max_x</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">max_x_method</span> <span class="o">></span> <span class="n">max_x</span> <span class="k">else</span> <span class="n">max_x</span>
|
||||
<span class="n">max_y</span> <span class="o">=</span> <span class="n">max_y_method</span> <span class="k">if</span> <span class="n">max_y</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">max_y_method</span> <span class="o">></span> <span class="n">max_y</span> <span class="k">else</span> <span class="n">max_y</span>
|
||||
<span class="n">min_y</span> <span class="o">=</span> <span class="n">min_y_method</span> <span class="k">if</span> <span class="n">min_y</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">min_y_method</span> <span class="o"><</span> <span class="n">min_y</span> <span class="k">else</span> <span class="n">min_y</span>
|
||||
<span class="n">max_y</span> <span class="o">=</span> <span class="n">max_y_method</span> <span class="k">if</span> <span class="n">max_y</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">max_y_method</span> <span class="o">></span> <span class="n">max_y</span> <span class="k">else</span> <span class="n">max_y</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">errorbar</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">ys</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="s1">'-'</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s1">'o'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">'w'</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">errorbar</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">ys</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="s1">'-'</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s1">'o'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">method</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">6</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">show_std</span><span class="p">:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">fill_between</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">ys</span><span class="o">-</span><span class="n">ystds</span><span class="p">,</span> <span class="n">ys</span><span class="o">+</span><span class="n">ystds</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.25</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">show_density</span><span class="p">:</span>
|
||||
<span class="n">ax2</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">twinx</span><span class="p">()</span>
|
||||
<span class="n">densities</span> <span class="o">=</span> <span class="n">npoints</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">npoints</span><span class="p">)</span>
|
||||
<span class="n">ax2</span><span class="o">.</span><span class="n">bar</span><span class="p">([</span><span class="n">ind</span> <span class="o">*</span> <span class="n">binwidth</span><span class="o">-</span><span class="n">binwidth</span><span class="o">/</span><span class="mi">2</span> <span class="k">for</span> <span class="n">ind</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">))],</span>
|
||||
<span class="n">densities</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.15</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">'g'</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="n">binwidth</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s1">'density'</span><span class="p">)</span>
|
||||
<span class="n">ax2</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="nb">max</span><span class="p">(</span><span class="n">densities</span><span class="p">))</span>
|
||||
<span class="n">ax2</span><span class="o">.</span><span class="n">spines</span><span class="p">[</span><span class="s1">'right'</span><span class="p">]</span><span class="o">.</span><span class="n">set_color</span><span class="p">(</span><span class="s1">'g'</span><span class="p">)</span>
|
||||
<span class="n">ax2</span><span class="o">.</span><span class="n">tick_params</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="s1">'y'</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">'g'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="sa">f</span><span class="s1">'Distribution shift between training set and test sample'</span><span class="p">,</span>
|
||||
<span class="n">ylabel</span><span class="o">=</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">error_name</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span><span class="si">}</span><span class="s1"> (true distribution, predicted distribution)'</span><span class="p">,</span>
|
||||
<span class="n">title</span><span class="o">=</span><span class="n">title</span><span class="p">)</span>
|
||||
<span class="n">box</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">get_position</span><span class="p">()</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_position</span><span class="p">([</span><span class="n">box</span><span class="o">.</span><span class="n">x0</span><span class="p">,</span> <span class="n">box</span><span class="o">.</span><span class="n">y0</span><span class="p">,</span> <span class="n">box</span><span class="o">.</span><span class="n">width</span> <span class="o">*</span> <span class="mf">0.8</span><span class="p">,</span> <span class="n">box</span><span class="o">.</span><span class="n">height</span><span class="p">])</span>
|
||||
<span class="k">if</span> <span class="n">vlines</span><span class="p">:</span>
|
||||
<span class="k">for</span> <span class="n">vline</span> <span class="ow">in</span> <span class="n">vlines</span><span class="p">:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">axvline</span><span class="p">(</span><span class="n">vline</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s1">'--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">'k'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="n">min_x</span><span class="p">,</span> <span class="n">max_x</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">logscale</span><span class="p">:</span>
|
||||
<span class="c1">#nice scale for the logaritmic axis</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">10</span> <span class="o">**</span> <span class="n">math</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">log10</span><span class="p">(</span><span class="n">max_y</span><span class="p">)))</span>
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="n">show_legend</span><span class="p">:</span>
|
||||
<span class="n">fig</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s1">'lower center'</span><span class="p">,</span>
|
||||
<span class="n">bbox_to_anchor</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">),</span>
|
||||
<span class="n">ncol</span><span class="o">=</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">method_names</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">//</span><span class="mi">2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">_save_or_show</span><span class="p">(</span><span class="n">savepath</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="brokenbar_supremacy_by_drift">
|
||||
<a class="viewcode-back" href="../../quapy.html#quapy.plot.brokenbar_supremacy_by_drift">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">brokenbar_supremacy_by_drift</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span><span class="p">,</span>
|
||||
<span class="n">n_bins</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">binning</span><span class="o">=</span><span class="s1">'isomerous'</span><span class="p">,</span>
|
||||
<span class="n">x_error</span><span class="o">=</span><span class="s1">'ae'</span><span class="p">,</span> <span class="n">y_error</span><span class="o">=</span><span class="s1">'ae'</span><span class="p">,</span> <span class="n">ttest_alpha</span><span class="o">=</span><span class="mf">0.005</span><span class="p">,</span> <span class="n">tail_density_threshold</span><span class="o">=</span><span class="mf">0.005</span><span class="p">,</span>
|
||||
<span class="n">method_order</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">savepath</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Displays (only) the top performing methods for different regions of the train-test shift in form of a broken</span>
|
||||
<span class="sd"> bar chart, in which each method has bars only for those regions in which either one of the following conditions</span>
|
||||
<span class="sd"> hold: (i) it is the best method (in average) for the bin, or (ii) it is not statistically significantly different</span>
|
||||
<span class="sd"> (in average) as according to a two-sided t-test on independent samples at confidence `ttest_alpha`.</span>
|
||||
<span class="sd"> The binning can be made "isometric" (same size), or "isomerous" (same number of experiments -- default). A second</span>
|
||||
<span class="sd"> plot is displayed on top, that displays the distribution of experiments for each bin (when binning="isometric") or</span>
|
||||
<span class="sd"> the percentiles points of the distribution (when binning="isomerous").</span>
|
||||
|
||||
<span class="sd"> :param method_names: array-like with the method names for each experiment</span>
|
||||
<span class="sd"> :param true_prevs: array-like with the true prevalence values (each being a ndarray with n_classes components) for</span>
|
||||
<span class="sd"> each experiment</span>
|
||||
<span class="sd"> :param estim_prevs: array-like with the estimated prevalence values (each being a ndarray with n_classes components)</span>
|
||||
<span class="sd"> for each experiment</span>
|
||||
<span class="sd"> :param tr_prevs: training prevalence of each experiment</span>
|
||||
<span class="sd"> :param n_bins: number of bins in which the y-axis is to be divided (default is 20)</span>
|
||||
<span class="sd"> :param binning: type of binning, either "isomerous" (default) or "isometric"</span>
|
||||
<span class="sd"> :param x_error: a string representing the name of an error function (as defined in `quapy.error`) to be used for</span>
|
||||
<span class="sd"> measuring the amount of train-test shift (default is "ae")</span>
|
||||
<span class="sd"> :param y_error: a string representing the name of an error function (as defined in `quapy.error`) to be used for</span>
|
||||
<span class="sd"> measuring the amount of error in the prevalence estimations (default is "ae")</span>
|
||||
<span class="sd"> :param ttest_alpha: the confidence interval above which a p-value (two-sided t-test on independent samples) is</span>
|
||||
<span class="sd"> to be considered as an indicator that the two means are not statistically significantly different. Default is</span>
|
||||
<span class="sd"> 0.005, meaning that a `p-value > 0.005` indicates the two methods involved are to be considered similar</span>
|
||||
<span class="sd"> :param tail_density_threshold: sets a threshold on the density of experiments (over the total number of experiments)</span>
|
||||
<span class="sd"> below which a bin in the tail (i.e., the right-most ones) will be discarded. This is in order to avoid some</span>
|
||||
<span class="sd"> bins to be shown for train-test outliers.</span>
|
||||
<span class="sd"> :param method_order: if indicated (default is None), imposes the order in which the methods are processed (i.e.,</span>
|
||||
<span class="sd"> listed in the legend and associated with matplotlib colors).</span>
|
||||
<span class="sd"> :param savepath: path where to save the plot. If not indicated (as default), the plot is shown.</span>
|
||||
<span class="sd"> :return:</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">binning</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'isomerous'</span><span class="p">,</span> <span class="s1">'isometric'</span><span class="p">],</span> <span class="s1">'unknown binning type; valid types are "isomerous" and "isometric"'</span>
|
||||
|
||||
<span class="n">x_error</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="p">,</span> <span class="n">x_error</span><span class="p">)</span>
|
||||
<span class="n">y_error</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="p">,</span> <span class="n">y_error</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># get all data as a dictionary {'m':{'x':ndarray, 'y':ndarray}} where 'm' is a method name (in the same</span>
|
||||
<span class="c1"># order as in method_order (if specified), and where 'x' are the train-test shifts (computed as according to</span>
|
||||
<span class="c1"># x_error function) and 'y' is the estim-test shift (computed as according to y_error)</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">_join_data_by_drift</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span><span class="p">,</span> <span class="n">x_error</span><span class="p">,</span> <span class="n">y_error</span><span class="p">,</span> <span class="n">method_order</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">method_order</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">method_order</span> <span class="o">=</span> <span class="n">method_names</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">binning</span> <span class="o">==</span> <span class="s1">'isomerous'</span><span class="p">:</span>
|
||||
<span class="c1"># take bins containing the same amount of examples</span>
|
||||
<span class="n">tr_test_drifts</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">data</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="s1">'x'</span><span class="p">]</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">])</span>
|
||||
<span class="n">bins</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">tr_test_drifts</span><span class="p">,</span> <span class="n">q</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">n_bins</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="c1"># take equidistant bins</span>
|
||||
<span class="n">bins</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">n_bins</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">bins</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.001</span>
|
||||
<span class="n">bins</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+=</span> <span class="mf">0.001</span>
|
||||
|
||||
<span class="c1"># we use this to keep track of how many datapoits contribute to each bin</span>
|
||||
<span class="n">inds_histogram_global</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n_bins</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
|
||||
<span class="n">n_methods</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">method_order</span><span class="p">)</span>
|
||||
<span class="n">buckets</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n_methods</span><span class="p">,</span> <span class="n">n_bins</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">method</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">method_order</span><span class="p">):</span>
|
||||
<span class="n">tr_test_drifts</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'x'</span><span class="p">]</span>
|
||||
<span class="n">method_drifts</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'y'</span><span class="p">]</span>
|
||||
|
||||
<span class="n">inds</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">digitize</span><span class="p">(</span><span class="n">tr_test_drifts</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">inds_histogram_global</span> <span class="o">+=</span> <span class="n">np</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="n">tr_test_drifts</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">bins</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">)):</span>
|
||||
<span class="n">selected</span> <span class="o">=</span> <span class="n">inds</span> <span class="o">==</span> <span class="n">j</span>
|
||||
<span class="k">if</span> <span class="n">selected</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">buckets</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">method_drifts</span><span class="p">[</span><span class="n">selected</span><span class="p">])</span>
|
||||
<span class="n">buckets</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">std</span><span class="p">(</span><span class="n">method_drifts</span><span class="p">[</span><span class="n">selected</span><span class="p">])</span>
|
||||
<span class="n">buckets</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">selected</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
|
||||
<span class="c1"># cancel last buckets with low density</span>
|
||||
<span class="n">histogram</span> <span class="o">=</span> <span class="n">inds_histogram_global</span> <span class="o">/</span> <span class="n">inds_histogram_global</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">tail</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">histogram</span><span class="p">))):</span>
|
||||
<span class="k">if</span> <span class="n">histogram</span><span class="p">[</span><span class="n">tail</span><span class="p">]</span> <span class="o"><</span> <span class="n">tail_density_threshold</span><span class="p">:</span>
|
||||
<span class="n">buckets</span><span class="p">[:,</span><span class="n">tail</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">break</span>
|
||||
|
||||
<span class="n">salient_methods</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
|
||||
<span class="n">best_methods</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">bucket</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">buckets</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
|
||||
<span class="n">nc</span> <span class="o">=</span> <span class="n">buckets</span><span class="p">[:,</span> <span class="n">bucket</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="n">nc</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">best_methods</span><span class="o">.</span><span class="n">append</span><span class="p">([])</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="n">order</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">buckets</span><span class="p">[:,</span> <span class="n">bucket</span><span class="p">,</span> <span class="mi">0</span><span class="p">])</span>
|
||||
<span class="n">rank1</span> <span class="o">=</span> <span class="n">order</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="n">best_bucket_methods</span> <span class="o">=</span> <span class="p">[</span><span class="n">method_order</span><span class="p">[</span><span class="n">rank1</span><span class="p">]]</span>
|
||||
<span class="n">best_mean</span><span class="p">,</span> <span class="n">best_std</span><span class="p">,</span> <span class="n">best_nc</span> <span class="o">=</span> <span class="n">buckets</span><span class="p">[</span><span class="n">rank1</span><span class="p">,</span> <span class="n">bucket</span><span class="p">,</span> <span class="p">:]</span>
|
||||
<span class="k">for</span> <span class="n">method_index</span> <span class="ow">in</span> <span class="n">order</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
|
||||
<span class="n">method_mean</span><span class="p">,</span> <span class="n">method_std</span><span class="p">,</span> <span class="n">method_nc</span> <span class="o">=</span> <span class="n">buckets</span><span class="p">[</span><span class="n">method_index</span><span class="p">,</span> <span class="n">bucket</span><span class="p">,</span> <span class="p">:]</span>
|
||||
<span class="n">_</span><span class="p">,</span> <span class="n">pval</span> <span class="o">=</span> <span class="n">ttest_ind_from_stats</span><span class="p">(</span><span class="n">best_mean</span><span class="p">,</span> <span class="n">best_std</span><span class="p">,</span> <span class="n">best_nc</span><span class="p">,</span> <span class="n">method_mean</span><span class="p">,</span> <span class="n">method_std</span><span class="p">,</span> <span class="n">method_nc</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">pval</span> <span class="o">></span> <span class="n">ttest_alpha</span><span class="p">:</span>
|
||||
<span class="n">best_bucket_methods</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method_order</span><span class="p">[</span><span class="n">method_index</span><span class="p">])</span>
|
||||
<span class="n">best_methods</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">best_bucket_methods</span><span class="p">)</span>
|
||||
<span class="n">salient_methods</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">best_bucket_methods</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">best_bucket_methods</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">binning</span><span class="o">==</span><span class="s1">'isomerous'</span><span class="p">:</span>
|
||||
<span class="n">fig</span><span class="p">,</span> <span class="n">axes</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">gridspec_kw</span><span class="o">=</span><span class="p">{</span><span class="s1">'height_ratios'</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mi">1</span><span class="p">]},</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">salient_methods</span><span class="p">)))</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">fig</span><span class="p">,</span> <span class="n">axes</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">gridspec_kw</span><span class="o">=</span><span class="p">{</span><span class="s1">'height_ratios'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">]},</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">salient_methods</span><span class="p">)))</span>
|
||||
|
||||
<span class="n">ax</span> <span class="o">=</span> <span class="n">axes</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="n">high_from</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="n">yticks</span><span class="p">,</span> <span class="n">yticks_method_names</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="n">color</span> <span class="o">=</span> <span class="n">get_cmap</span><span class="p">(</span><span class="s1">'Accent'</span><span class="p">)</span><span class="o">.</span><span class="n">colors</span>
|
||||
<span class="n">vlines</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">bar_high</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="k">for</span> <span class="n">method</span> <span class="ow">in</span> <span class="p">[</span><span class="n">m</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">method_order</span> <span class="k">if</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">salient_methods</span><span class="p">]:</span>
|
||||
<span class="n">broken_paths</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">path_start</span><span class="p">,</span> <span class="n">path_end</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
|
||||
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">best_bucket_methods</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">best_methods</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">method</span> <span class="ow">in</span> <span class="n">best_bucket_methods</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">path_start</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">path_start</span> <span class="o">=</span> <span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
|
||||
<span class="n">path_end</span> <span class="o">=</span> <span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="n">path_start</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">path_end</span> <span class="o">+=</span> <span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">path_start</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">broken_paths</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">tuple</span><span class="p">((</span><span class="n">path_start</span><span class="p">,</span> <span class="n">path_end</span><span class="p">)))</span>
|
||||
<span class="n">path_start</span><span class="p">,</span> <span class="n">path_end</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="n">path_start</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">broken_paths</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">tuple</span><span class="p">((</span><span class="n">path_start</span><span class="p">,</span> <span class="n">path_end</span><span class="p">)))</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">broken_barh</span><span class="p">(</span><span class="n">broken_paths</span><span class="p">,</span> <span class="p">(</span><span class="n">high_from</span><span class="p">,</span> <span class="n">bar_high</span><span class="p">),</span> <span class="n">facecolors</span><span class="o">=</span><span class="n">color</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">yticks_method_names</span><span class="p">)])</span>
|
||||
<span class="n">yticks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">high_from</span><span class="o">+</span><span class="n">bar_high</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">high_from</span> <span class="o">+=</span> <span class="n">bar_high</span>
|
||||
<span class="n">yticks_method_names</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">path_start</span><span class="p">,</span> <span class="n">path_end</span> <span class="ow">in</span> <span class="n">broken_paths</span><span class="p">:</span>
|
||||
<span class="n">vlines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">path_start</span><span class="p">,</span> <span class="n">path_start</span><span class="o">+</span><span class="n">path_end</span><span class="p">])</span>
|
||||
|
||||
<span class="n">vlines</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">vlines</span><span class="p">)</span>
|
||||
<span class="n">vlines</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">vlines</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">vlines</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">axvline</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">v</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">'k'</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s1">'--'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">high_from</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="n">vlines</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">vlines</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s1">'Distribution shift between training set and sample'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_yticks</span><span class="p">(</span><span class="n">yticks</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_yticklabels</span><span class="p">(</span><span class="n">yticks_method_names</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># upper plot (explaining distribution)</span>
|
||||
<span class="n">ax</span> <span class="o">=</span> <span class="n">axes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">binning</span> <span class="o">==</span> <span class="s1">'isometric'</span><span class="p">:</span>
|
||||
<span class="c1"># show the density for each region</span>
|
||||
<span class="n">bins</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">0</span>
|
||||
<span class="n">y_pos</span> <span class="o">=</span> <span class="p">[</span><span class="n">b</span><span class="o">+</span><span class="p">(</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="n">b</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">bins</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="k">if</span> <span class="n">histogram</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">></span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="n">bar_width</span> <span class="o">=</span> <span class="p">[</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]))</span> <span class="k">if</span> <span class="n">histogram</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">></span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">y_pos</span><span class="p">,</span> <span class="p">[</span><span class="n">n</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">histogram</span> <span class="k">if</span> <span class="n">n</span><span class="o">></span><span class="mi">0</span><span class="p">],</span> <span class="n">bar_width</span><span class="p">,</span> <span class="n">align</span><span class="o">=</span><span class="s1">'center'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s1">'silver'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s1">'shift</span><span class="se">\n</span><span class="s1">distribution'</span><span class="p">,</span> <span class="n">rotation</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">ha</span><span class="o">=</span><span class="s1">'right'</span><span class="p">,</span> <span class="n">va</span><span class="o">=</span><span class="s1">'center'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="n">vlines</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">vlines</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">get_xaxis</span><span class="p">()</span><span class="o">.</span><span class="n">set_visible</span><span class="p">(</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">subplots_adjust</span><span class="p">(</span><span class="n">wspace</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">hspace</span><span class="o">=</span><span class="mf">0.1</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="c1"># show the percentiles of the distribution</span>
|
||||
<span class="n">cumsum</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">cumsum</span><span class="p">(</span><span class="n">histogram</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">])):</span>
|
||||
<span class="n">start</span><span class="p">,</span> <span class="n">width</span> <span class="o">=</span> <span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">broken_barh</span><span class="p">([</span><span class="nb">tuple</span><span class="p">((</span><span class="n">start</span><span class="p">,</span> <span class="n">width</span><span class="p">))],</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">facecolors</span><span class="o">=</span><span class="s1">'whitesmoke'</span> <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span><span class="o">==</span><span class="mi">0</span> <span class="k">else</span> <span class="s1">'silver'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">i</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">bins</span><span class="p">)</span><span class="o">-</span><span class="mi">2</span><span class="p">:</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">text</span><span class="p">(</span><span class="n">bins</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="mf">0.5</span><span class="p">,</span> <span class="s1">'$P_{'</span><span class="o">+</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">cumsum</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">*</span><span class="mi">100</span><span class="p">))</span><span class="si">}</span><span class="s1">'</span><span class="o">+</span><span class="s1">'}$'</span><span class="p">,</span> <span class="n">ha</span><span class="o">=</span><span class="s1">'center'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="n">vlines</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">vlines</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">get_yaxis</span><span class="p">()</span><span class="o">.</span><span class="n">set_visible</span><span class="p">(</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">get_xaxis</span><span class="p">()</span><span class="o">.</span><span class="n">set_visible</span><span class="p">(</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">subplots_adjust</span><span class="p">(</span><span class="n">wspace</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">hspace</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="n">_save_or_show</span><span class="p">(</span><span class="n">savepath</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_merge</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">):</span>
|
||||
<span class="n">ndims</span> <span class="o">=</span> <span class="n">true_prevs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">defaultdict</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="p">{</span><span class="s1">'true'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">ndims</span><span class="p">)),</span> <span class="s1">'estim'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">ndims</span><span class="p">))})</span>
|
||||
<span class="n">method_order</span><span class="o">=</span><span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">method</span><span class="p">,</span> <span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">):</span>
|
||||
<span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'true'</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'true'</span><span class="p">],</span> <span class="n">true_prev</span><span class="p">])</span>
|
||||
<span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'estim'</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'estim'</span><span class="p">],</span> <span class="n">estim_prev</span><span class="p">])</span>
|
||||
<span class="k">if</span> <span class="n">method</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">:</span>
|
||||
<span class="n">method_order</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method</span><span class="p">)</span>
|
||||
<span class="n">true_prevs_</span> <span class="o">=</span> <span class="p">[</span><span class="n">data</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="s1">'true'</span><span class="p">]</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">]</span>
|
||||
<span class="n">estim_prevs_</span> <span class="o">=</span> <span class="p">[</span><span class="n">data</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="s1">'estim'</span><span class="p">]</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="n">method_order</span><span class="p">,</span> <span class="n">true_prevs_</span><span class="p">,</span> <span class="n">estim_prevs_</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_set_colors</span><span class="p">(</span><span class="n">ax</span><span class="p">,</span> <span class="n">n_methods</span><span class="p">):</span>
|
||||
<span class="n">NUM_COLORS</span> <span class="o">=</span> <span class="n">n_methods</span>
|
||||
<span class="n">cm</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">get_cmap</span><span class="p">(</span><span class="s1">'tab20'</span><span class="p">)</span>
|
||||
<span class="n">ax</span><span class="o">.</span><span class="n">set_prop_cycle</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="p">[</span><span class="n">cm</span><span class="p">(</span><span class="mf">1.</span> <span class="o">*</span> <span class="n">i</span> <span class="o">/</span> <span class="n">NUM_COLORS</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUM_COLORS</span><span class="p">)])</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_save_or_show</span><span class="p">(</span><span class="n">savepath</span><span class="p">):</span>
|
||||
<span class="c1"># if savepath is specified, then saves the plot in that path; otherwise the plot is shown</span>
|
||||
<span class="k">if</span> <span class="n">savepath</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">create_parent_dir</span><span class="p">(</span><span class="n">savepath</span><span class="p">)</span>
|
||||
<span class="c1"># plt.tight_layout()</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="n">savepath</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s1">'tight'</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_join_data_by_drift</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span><span class="p">,</span> <span class="n">x_error</span><span class="p">,</span> <span class="n">y_error</span><span class="p">,</span> <span class="n">method_order</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">defaultdict</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="p">{</span><span class="s1">'x'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">)),</span> <span class="s1">'y'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">))})</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">method_order</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">method_order</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">method</span><span class="p">,</span> <span class="n">test_prevs_i</span><span class="p">,</span> <span class="n">estim_prevs_i</span><span class="p">,</span> <span class="n">tr_prev_i</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">method_names</span><span class="p">,</span> <span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">,</span> <span class="n">tr_prevs</span><span class="p">):</span>
|
||||
<span class="n">tr_prev_i</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">repeat</span><span class="p">(</span><span class="n">tr_prev_i</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="n">repeats</span><span class="o">=</span><span class="n">test_prevs_i</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tr_test_drifts</span> <span class="o">=</span> <span class="n">x_error</span><span class="p">(</span><span class="n">test_prevs_i</span><span class="p">,</span> <span class="n">tr_prev_i</span><span class="p">)</span>
|
||||
<span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'x'</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'x'</span><span class="p">],</span> <span class="n">tr_test_drifts</span><span class="p">])</span>
|
||||
|
||||
<span class="n">method_drifts</span> <span class="o">=</span> <span class="n">y_error</span><span class="p">(</span><span class="n">test_prevs_i</span><span class="p">,</span> <span class="n">estim_prevs_i</span><span class="p">)</span>
|
||||
<span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'y'</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">data</span><span class="p">[</span><span class="n">method</span><span class="p">][</span><span class="s1">'y'</span><span class="p">],</span> <span class="n">method_drifts</span><span class="p">])</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">method</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">method_order</span><span class="p">:</span>
|
||||
<span class="n">method_order</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">method</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">data</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,606 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.protocol — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/jquery.js"></script>
|
||||
<script src="../../_static/underscore.js"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.protocol</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.protocol</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">import</span> <span class="nn">itertools</span>
|
||||
<span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">ExitStack</span>
|
||||
<span class="kn">from</span> <span class="nn">abc</span> <span class="kn">import</span> <span class="n">ABCMeta</span><span class="p">,</span> <span class="n">abstractmethod</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
<span class="kn">from</span> <span class="nn">os.path</span> <span class="kn">import</span> <span class="n">exists</span>
|
||||
<span class="kn">from</span> <span class="nn">glob</span> <span class="kn">import</span> <span class="n">glob</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="AbstractProtocol"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.AbstractProtocol">[docs]</a><span class="k">class</span> <span class="nc">AbstractProtocol</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">ABCMeta</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Abstract parent class for sample generation protocols.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implements the protocol. Yields one sample at a time along with its prevalence</span>
|
||||
|
||||
<span class="sd"> :return: yields a tuple `(sample, prev) at a time, where `sample` is a set of instances</span>
|
||||
<span class="sd"> and in which `prev` is an `nd.array` with the class prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span>
|
||||
|
||||
<div class="viewcode-block" id="AbstractProtocol.total"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.AbstractProtocol.total">[docs]</a> <span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Indicates the total number of samples that the protocol generates.</span>
|
||||
|
||||
<span class="sd"> :return: The number of samples to generate if known, or `None` otherwise.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="kc">None</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="IterateProtocol"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.IterateProtocol">[docs]</a><span class="k">class</span> <span class="nc">IterateProtocol</span><span class="p">(</span><span class="n">AbstractProtocol</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> A very simple protocol which simply iterates over a list of previously generated samples</span>
|
||||
|
||||
<span class="sd"> :param samples: a list of :class:`quapy.data.base.LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">samples</span><span class="p">:</span> <span class="p">[</span><span class="n">LabelledCollection</span><span class="p">]):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">samples</span> <span class="o">=</span> <span class="n">samples</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Yields one sample from the initial list at a time</span>
|
||||
|
||||
<span class="sd"> :return: yields a tuple `(sample, prev) at a time, where `sample` is a set of instances</span>
|
||||
<span class="sd"> and in which `prev` is an `nd.array` with the class prevalence values</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">samples</span><span class="p">:</span>
|
||||
<span class="k">yield</span> <span class="n">sample</span><span class="o">.</span><span class="n">Xp</span>
|
||||
|
||||
<div class="viewcode-block" id="IterateProtocol.total"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.IterateProtocol.total">[docs]</a> <span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the number of samples in this protocol</span>
|
||||
|
||||
<span class="sd"> :return: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">samples</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="AbstractStochasticSeededProtocol"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.AbstractStochasticSeededProtocol">[docs]</a><span class="k">class</span> <span class="nc">AbstractStochasticSeededProtocol</span><span class="p">(</span><span class="n">AbstractProtocol</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An `AbstractStochasticSeededProtocol` is a protocol that generates, via any random procedure (e.g.,</span>
|
||||
<span class="sd"> via random sampling), sequences of :class:`quapy.data.base.LabelledCollection` samples.</span>
|
||||
<span class="sd"> The protocol abstraction enforces</span>
|
||||
<span class="sd"> the object to be instantiated using a seed, so that the sequence can be fully replicated.</span>
|
||||
<span class="sd"> In order to make this functionality possible, the classes extending this abstraction need to</span>
|
||||
<span class="sd"> implement only two functions, :meth:`samples_parameters` which generates all the parameters</span>
|
||||
<span class="sd"> needed for extracting the samples, and :meth:`sample` that, given some parameters as input,</span>
|
||||
<span class="sd"> deterministically generates a sample.</span>
|
||||
|
||||
<span class="sd"> :param random_state: the seed for allowing to replicate any sequence of samples. Default is 0, meaning that</span>
|
||||
<span class="sd"> the sequence will be consistent every time the protocol is called.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">_random_state</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># means "not set"</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">random_state</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_random_state</span>
|
||||
|
||||
<span class="nd">@random_state</span><span class="o">.</span><span class="n">setter</span>
|
||||
<span class="k">def</span> <span class="nf">random_state</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
|
||||
<div class="viewcode-block" id="AbstractStochasticSeededProtocol.samples_parameters"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.samples_parameters">[docs]</a> <span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">samples_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This function has to return all the necessary parameters to replicate the samples</span>
|
||||
|
||||
<span class="sd"> :return: a list of parameters, each of which serves to deterministically generate a sample</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
<div class="viewcode-block" id="AbstractStochasticSeededProtocol.sample"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.sample">[docs]</a> <span class="nd">@abstractmethod</span>
|
||||
<span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Extract one sample determined by the given parameters</span>
|
||||
|
||||
<span class="sd"> :param params: all the necessary parameters to generate a sample</span>
|
||||
<span class="sd"> :return: one sample (the same sample has to be generated for the same parameters)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="o">...</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Yields one sample at a time. The type of object returned depends on the `collator` function. The</span>
|
||||
<span class="sd"> default behaviour returns tuples of the form `(sample, prevalence)`.</span>
|
||||
|
||||
<span class="sd"> :return: a tuple `(sample, prevalence)` if return_type='sample_prev', or an instance of</span>
|
||||
<span class="sd"> :class:`qp.data.LabelledCollection` if return_type='labelled_collection'</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">with</span> <span class="n">ExitStack</span><span class="p">()</span> <span class="k">as</span> <span class="n">stack</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'The random seed has never been initialized. '</span>
|
||||
<span class="s1">'Set it to None not to impose replicability.'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">stack</span><span class="o">.</span><span class="n">enter_context</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">))</span>
|
||||
<span class="k">for</span> <span class="n">params</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">samples_parameters</span><span class="p">():</span>
|
||||
<span class="k">yield</span> <span class="bp">self</span><span class="o">.</span><span class="n">collator</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">params</span><span class="p">))</span>
|
||||
|
||||
<div class="viewcode-block" id="AbstractStochasticSeededProtocol.collator"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.AbstractStochasticSeededProtocol.collator">[docs]</a> <span class="k">def</span> <span class="nf">collator</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sample</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> The collator prepares the sample to accommodate the desired output format before returning the output.</span>
|
||||
<span class="sd"> This collator simply returns the sample as it is. Classes inheriting from this abstract class can</span>
|
||||
<span class="sd"> implement their custom collators.</span>
|
||||
|
||||
<span class="sd"> :param sample: the sample to be returned</span>
|
||||
<span class="sd"> :param args: additional arguments</span>
|
||||
<span class="sd"> :return: the sample adhering to a desired output format (in this case, the sample is returned as it is)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">sample</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="OnLabelledCollectionProtocol"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.OnLabelledCollectionProtocol">[docs]</a><span class="k">class</span> <span class="nc">OnLabelledCollectionProtocol</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Protocols that generate samples from a :class:`qp.data.LabelledCollection` object.</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">RETURN_TYPES</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'sample_prev'</span><span class="p">,</span> <span class="s1">'labelled_collection'</span><span class="p">,</span> <span class="s1">'index'</span><span class="p">]</span>
|
||||
|
||||
<div class="viewcode-block" id="OnLabelledCollectionProtocol.get_labelled_collection"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.OnLabelledCollectionProtocol.get_labelled_collection">[docs]</a> <span class="k">def</span> <span class="nf">get_labelled_collection</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the labelled collection on which this protocol acts.</span>
|
||||
|
||||
<span class="sd"> :return: an object of type :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span></div>
|
||||
|
||||
<div class="viewcode-block" id="OnLabelledCollectionProtocol.on_preclassified_instances"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.OnLabelledCollectionProtocol.on_preclassified_instances">[docs]</a> <span class="k">def</span> <span class="nf">on_preclassified_instances</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pre_classifications</span><span class="p">,</span> <span class="n">in_place</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a copy of this protocol that acts on a modified version of the original</span>
|
||||
<span class="sd"> :class:`qp.data.LabelledCollection` in which the original instances have been replaced</span>
|
||||
<span class="sd"> with the outputs of a classifier for each instance. (This is convenient for speeding-up</span>
|
||||
<span class="sd"> the evaluation procedures for many samples, by pre-classifying the instances in advance.)</span>
|
||||
|
||||
<span class="sd"> :param pre_classifications: the predictions issued by a classifier, typically an array-like</span>
|
||||
<span class="sd"> with shape `(n_instances,)` when the classifier is a hard one, or with shape</span>
|
||||
<span class="sd"> `(n_instances, n_classes)` when the classifier is a probabilistic one.</span>
|
||||
<span class="sd"> :param in_place: whether or not to apply the modification in-place or in a new copy (default).</span>
|
||||
<span class="sd"> :return: a copy of this protocol</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">pre_classifications</span><span class="p">)</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'error: the pre-classified data has different shape '</span> \
|
||||
<span class="sa">f</span><span class="s1">'(expected </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">)</span><span class="si">}</span><span class="s1">, found </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">pre_classifications</span><span class="p">)</span><span class="si">}</span><span class="s1">)'</span>
|
||||
<span class="k">if</span> <span class="n">in_place</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">pre_classifications</span>
|
||||
<span class="k">return</span> <span class="bp">self</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">new</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">new</span><span class="o">.</span><span class="n">on_preclassified_instances</span><span class="p">(</span><span class="n">pre_classifications</span><span class="p">,</span> <span class="n">in_place</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="OnLabelledCollectionProtocol.get_collator"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.OnLabelledCollectionProtocol.get_collator">[docs]</a> <span class="nd">@classmethod</span>
|
||||
<span class="k">def</span> <span class="nf">get_collator</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s1">'sample_prev'</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a collator function, i.e., a function that prepares the yielded data</span>
|
||||
|
||||
<span class="sd"> :param return_type: either 'sample_prev' (default) if the collator is requested to yield tuples of</span>
|
||||
<span class="sd"> `(sample, prevalence)`, or 'labelled_collection' when it is requested to yield instances of</span>
|
||||
<span class="sd"> :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> :return: the collator function (a callable function that takes as input an instance of</span>
|
||||
<span class="sd"> :class:`qp.data.LabelledCollection`)</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">assert</span> <span class="n">return_type</span> <span class="ow">in</span> <span class="bp">cls</span><span class="o">.</span><span class="n">RETURN_TYPES</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'unknown return type passed as argument; valid ones are </span><span class="si">{</span><span class="bp">cls</span><span class="o">.</span><span class="n">RETURN_TYPES</span><span class="si">}</span><span class="s1">'</span>
|
||||
<span class="k">if</span> <span class="n">return_type</span><span class="o">==</span><span class="s1">'sample_prev'</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="k">lambda</span> <span class="n">lc</span><span class="p">:</span><span class="n">lc</span><span class="o">.</span><span class="n">Xp</span>
|
||||
<span class="k">elif</span> <span class="n">return_type</span><span class="o">==</span><span class="s1">'labelled_collection'</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="k">lambda</span> <span class="n">lc</span><span class="p">:</span><span class="n">lc</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="APP"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.APP">[docs]</a><span class="k">class</span> <span class="nc">APP</span><span class="p">(</span><span class="n">AbstractStochasticSeededProtocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Implementation of the artificial prevalence protocol (APP).</span>
|
||||
<span class="sd"> The APP consists of exploring a grid of prevalence values containing `n_prevalences` points (e.g.,</span>
|
||||
<span class="sd"> [0, 0.05, 0.1, 0.15, ..., 1], if `n_prevalences=21`), and generating all valid combinations of</span>
|
||||
<span class="sd"> prevalence values for all classes (e.g., for 3 classes, samples with [0, 0, 1], [0, 0.05, 0.95], ...,</span>
|
||||
<span class="sd"> [1, 0, 0] prevalence values of size `sample_size` will be yielded). The number of samples for each valid</span>
|
||||
<span class="sd"> combination of prevalence values is indicated by `repeats`.</span>
|
||||
|
||||
<span class="sd"> :param data: a `LabelledCollection` from which the samples will be drawn</span>
|
||||
<span class="sd"> :param sample_size: integer, number of instances in each sample; if None (default) then it is taken from</span>
|
||||
<span class="sd"> qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.</span>
|
||||
<span class="sd"> :param n_prevalences: the number of equidistant prevalence points to extract from the [0,1] interval for the</span>
|
||||
<span class="sd"> grid (default is 21)</span>
|
||||
<span class="sd"> :param repeats: number of copies for each valid prevalence vector (default is 10)</span>
|
||||
<span class="sd"> :param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1</span>
|
||||
<span class="sd"> :param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples</span>
|
||||
<span class="sd"> will be the same every time the protocol is called)</span>
|
||||
<span class="sd"> :param sanity_check: int, raises an exception warning the user that the number of examples to be generated exceed</span>
|
||||
<span class="sd"> this number; set to None for skipping this check</span>
|
||||
<span class="sd"> :param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or</span>
|
||||
<span class="sd"> to "labelled_collection" to get instead instances of LabelledCollection</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">21</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
|
||||
<span class="n">smooth_limits_epsilon</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sanity_check</span><span class="o">=</span><span class="mi">10000</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s1">'sample_prev'</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">(</span><span class="n">APP</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_sample_size</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">n_prevalences</span> <span class="o">=</span> <span class="n">n_prevalences</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">repeats</span> <span class="o">=</span> <span class="n">repeats</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">smooth_limits_epsilon</span> <span class="o">=</span> <span class="n">smooth_limits_epsilon</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="p">((</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">sanity_check</span><span class="p">,</span> <span class="nb">int</span><span class="p">)</span> <span class="ow">and</span> <span class="n">sanity_check</span><span class="o">></span><span class="mi">0</span><span class="p">)</span> <span class="ow">or</span> <span class="n">sanity_check</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'param "sanity_check" must either be None or a positive integer'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">sanity_check</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="o">=</span><span class="n">n_prevalences</span><span class="p">,</span> <span class="n">n_classes</span><span class="o">=</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="n">repeats</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">n</span> <span class="o">></span> <span class="n">sanity_check</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s2">"Abort: the number of samples that will be generated by </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="si">}</span><span class="s2"> (</span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s2">) "</span>
|
||||
<span class="sa">f</span><span class="s2">"exceeds the maximum number of allowed samples (</span><span class="si">{</span><span class="n">sanity_check</span><span class="w"> </span><span class="si">= }</span><span class="s2">). Set 'sanity_check' to "</span>
|
||||
<span class="sa">f</span><span class="s2">"None, or to a higher number, for bypassing this check."</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">collator</span> <span class="o">=</span> <span class="n">OnLabelledCollectionProtocol</span><span class="o">.</span><span class="n">get_collator</span><span class="p">(</span><span class="n">return_type</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="APP.prevalence_grid"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.APP.prevalence_grid">[docs]</a> <span class="k">def</span> <span class="nf">prevalence_grid</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates vectors of prevalence values from an exhaustive grid of prevalence values. The</span>
|
||||
<span class="sd"> number of prevalence values explored for each dimension depends on `n_prevalences`, so that, if, for example,</span>
|
||||
<span class="sd"> `n_prevalences=11` then the prevalence values of the grid are taken from [0, 0.1, 0.2, ..., 0.9, 1]. Only</span>
|
||||
<span class="sd"> valid prevalence distributions are returned, i.e., vectors of prevalence values that sum up to 1. For each</span>
|
||||
<span class="sd"> valid vector of prevalence values, `repeat` copies are returned. The vector of prevalence values can be</span>
|
||||
<span class="sd"> implicit (by setting `return_constrained_dim=False`), meaning that the last dimension (which is constrained</span>
|
||||
<span class="sd"> to 1 - sum of the rest) is not returned (note that, quite obviously, in this case the vector does not sum up to</span>
|
||||
<span class="sd"> 1). Note that this method is deterministic, i.e., there is no random sampling anywhere.</span>
|
||||
|
||||
<span class="sd"> :return: a `np.ndarray` of shape `(n, dimensions)` if `return_constrained_dim=True` or of shape</span>
|
||||
<span class="sd"> `(n, dimensions-1)` if `return_constrained_dim=False`, where `n` is the number of valid combinations found</span>
|
||||
<span class="sd"> in the grid multiplied by `repeat`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">dimensions</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span>
|
||||
<span class="n">s</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">prevalence_linspace</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_prevalences</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">smooth_limits_epsilon</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">smooth_limits_epsilon</span><span class="p">)</span>
|
||||
<span class="n">eps</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="o">/</span><span class="mi">2</span> <span class="c1"># handling floating rounding</span>
|
||||
<span class="n">s</span> <span class="o">=</span> <span class="p">[</span><span class="n">s</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">dimensions</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">product</span><span class="p">(</span><span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">if</span> <span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="o"><</span> <span class="p">(</span><span class="mf">1.</span><span class="o">+</span><span class="n">eps</span><span class="p">))]</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">prevs</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">repeats</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">repeat</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">repeats</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">prevs</span></div>
|
||||
|
||||
<div class="viewcode-block" id="APP.samples_parameters"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.APP.samples_parameters">[docs]</a> <span class="k">def</span> <span class="nf">samples_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return all the necessary parameters to replicate the samples as according to the APP protocol.</span>
|
||||
|
||||
<span class="sd"> :return: a list of indexes that realize the APP sampling</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">indexes</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">prevs</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">prevalence_grid</span><span class="p">():</span>
|
||||
<span class="n">index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">sampling_index</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prevs</span><span class="p">)</span>
|
||||
<span class="n">indexes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">indexes</span></div>
|
||||
|
||||
<div class="viewcode-block" id="APP.sample"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.APP.sample">[docs]</a> <span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Realizes the sample given the index of the instances.</span>
|
||||
|
||||
<span class="sd"> :param index: indexes of the instances to select</span>
|
||||
<span class="sd"> :return: an instance of :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">index</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="APP.total"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.APP.total">[docs]</a> <span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the number of samples that will be generated</span>
|
||||
|
||||
<span class="sd"> :return: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">F</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_prevalences</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">repeats</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="NPP"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.NPP">[docs]</a><span class="k">class</span> <span class="nc">NPP</span><span class="p">(</span><span class="n">AbstractStochasticSeededProtocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> A generator of samples that implements the natural prevalence protocol (NPP). The NPP consists of drawing</span>
|
||||
<span class="sd"> samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.</span>
|
||||
|
||||
<span class="sd"> :param data: a `LabelledCollection` from which the samples will be drawn</span>
|
||||
<span class="sd"> :param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from</span>
|
||||
<span class="sd"> qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.</span>
|
||||
<span class="sd"> :param repeats: the number of samples to generate. Default is 100.</span>
|
||||
<span class="sd"> :param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples</span>
|
||||
<span class="sd"> will be the same every time the protocol is called)</span>
|
||||
<span class="sd"> :param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or</span>
|
||||
<span class="sd"> to "labelled_collection" to get instead instances of LabelledCollection</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span><span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
|
||||
<span class="n">return_type</span><span class="o">=</span><span class="s1">'sample_prev'</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">(</span><span class="n">NPP</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_sample_size</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">repeats</span> <span class="o">=</span> <span class="n">repeats</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">collator</span> <span class="o">=</span> <span class="n">OnLabelledCollectionProtocol</span><span class="o">.</span><span class="n">get_collator</span><span class="p">(</span><span class="n">return_type</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="NPP.samples_parameters"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.NPP.samples_parameters">[docs]</a> <span class="k">def</span> <span class="nf">samples_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return all the necessary parameters to replicate the samples as according to the NPP protocol.</span>
|
||||
|
||||
<span class="sd"> :return: a list of indexes that realize the NPP sampling</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">indexes</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">repeats</span><span class="p">):</span>
|
||||
<span class="n">index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">uniform_sampling_index</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="n">indexes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">indexes</span></div>
|
||||
|
||||
<div class="viewcode-block" id="NPP.sample"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.NPP.sample">[docs]</a> <span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Realizes the sample given the index of the instances.</span>
|
||||
|
||||
<span class="sd"> :param index: indexes of the instances to select</span>
|
||||
<span class="sd"> :return: an instance of :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">index</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="NPP.total"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.NPP.total">[docs]</a> <span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the number of samples that will be generated (equals to "repeats")</span>
|
||||
|
||||
<span class="sd"> :return: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">repeats</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="UPP"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.UPP">[docs]</a><span class="k">class</span> <span class="nc">UPP</span><span class="p">(</span><span class="n">AbstractStochasticSeededProtocol</span><span class="p">,</span> <span class="n">OnLabelledCollectionProtocol</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> A variant of :class:`APP` that, instead of using a grid of equidistant prevalence values,</span>
|
||||
<span class="sd"> relies on the Kraemer algorithm for sampling unit (k-1)-simplex uniformly at random, with</span>
|
||||
<span class="sd"> k the number of classes. This protocol covers the entire range of prevalence values in a</span>
|
||||
<span class="sd"> statistical sense, i.e., unlike APP there is no guarantee that it is covered precisely</span>
|
||||
<span class="sd"> equally for all classes, but it is preferred in cases in which the number of possible</span>
|
||||
<span class="sd"> combinations of the grid values of APP makes this endeavour intractable.</span>
|
||||
|
||||
<span class="sd"> :param data: a `LabelledCollection` from which the samples will be drawn</span>
|
||||
<span class="sd"> :param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from</span>
|
||||
<span class="sd"> qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.</span>
|
||||
<span class="sd"> :param repeats: the number of samples to generate. Default is 100.</span>
|
||||
<span class="sd"> :param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples</span>
|
||||
<span class="sd"> will be the same every time the protocol is called)</span>
|
||||
<span class="sd"> :param return_type: set to "sample_prev" (default) to get the pairs of (sample, prevalence) at each iteration, or</span>
|
||||
<span class="sd"> to "labelled_collection" to get instead instances of LabelledCollection</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
|
||||
<span class="n">return_type</span><span class="o">=</span><span class="s1">'sample_prev'</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">(</span><span class="n">UPP</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_sample_size</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">repeats</span> <span class="o">=</span> <span class="n">repeats</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">collator</span> <span class="o">=</span> <span class="n">OnLabelledCollectionProtocol</span><span class="o">.</span><span class="n">get_collator</span><span class="p">(</span><span class="n">return_type</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="UPP.samples_parameters"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.UPP.samples_parameters">[docs]</a> <span class="k">def</span> <span class="nf">samples_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return all the necessary parameters to replicate the samples as according to the UPP protocol.</span>
|
||||
|
||||
<span class="sd"> :return: a list of indexes that realize the UPP sampling</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">indexes</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">prevs</span> <span class="ow">in</span> <span class="n">F</span><span class="o">.</span><span class="n">uniform_simplex_sampling</span><span class="p">(</span><span class="n">n_classes</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">repeats</span><span class="p">):</span>
|
||||
<span class="n">index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">sampling_index</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span><span class="p">,</span> <span class="o">*</span><span class="n">prevs</span><span class="p">)</span>
|
||||
<span class="n">indexes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">indexes</span></div>
|
||||
|
||||
<div class="viewcode-block" id="UPP.sample"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.UPP.sample">[docs]</a> <span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Realizes the sample given the index of the instances.</span>
|
||||
|
||||
<span class="sd"> :param index: indexes of the instances to select</span>
|
||||
<span class="sd"> :return: an instance of :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">index</span><span class="p">)</span></div>
|
||||
|
||||
<div class="viewcode-block" id="UPP.total"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.UPP.total">[docs]</a> <span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the number of samples that will be generated (equals to "repeats")</span>
|
||||
|
||||
<span class="sd"> :return: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">repeats</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="DomainMixer"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.DomainMixer">[docs]</a><span class="k">class</span> <span class="nc">DomainMixer</span><span class="p">(</span><span class="n">AbstractStochasticSeededProtocol</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates mixtures of two domains (A and B) at controlled rates, but preserving the original class prevalence.</span>
|
||||
|
||||
<span class="sd"> :param domainA: one domain, an object of :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> :param domainB: another domain, an object of :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> :param sample_size: integer, the number of instances in each sample; if None (default) then it is taken from</span>
|
||||
<span class="sd"> qp.environ["SAMPLE_SIZE"]. If this is not set, a ValueError exception is raised.</span>
|
||||
<span class="sd"> :param repeats: int, number of samples to draw for every mixture rate</span>
|
||||
<span class="sd"> :param prevalence: the prevalence to preserv along the mixtures. If specified, should be an array containing</span>
|
||||
<span class="sd"> one prevalence value (positive float) for each class and summing up to one. If not specified, the prevalence</span>
|
||||
<span class="sd"> will be taken from the domain A (default).</span>
|
||||
<span class="sd"> :param mixture_points: an integer indicating the number of points to take from a linear scale (e.g., 21 will</span>
|
||||
<span class="sd"> generate the mixture points [1, 0.95, 0.9, ..., 0]), or the array of mixture values itself.</span>
|
||||
<span class="sd"> the specific points</span>
|
||||
<span class="sd"> :param random_state: allows replicating samples across runs (default 0, meaning that the sequence of samples</span>
|
||||
<span class="sd"> will be the same every time the protocol is called)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">domainA</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span>
|
||||
<span class="n">domainB</span><span class="p">:</span> <span class="n">LabelledCollection</span><span class="p">,</span>
|
||||
<span class="n">sample_size</span><span class="p">,</span>
|
||||
<span class="n">repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
|
||||
<span class="n">prevalence</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">mixture_points</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span>
|
||||
<span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
|
||||
<span class="n">return_type</span><span class="o">=</span><span class="s1">'sample_prev'</span><span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">(</span><span class="n">DomainMixer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">A</span> <span class="o">=</span> <span class="n">domainA</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">B</span> <span class="o">=</span> <span class="n">domainB</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">_get_sample_size</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">repeats</span> <span class="o">=</span> <span class="n">repeats</span>
|
||||
<span class="k">if</span> <span class="n">prevalence</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span> <span class="o">=</span> <span class="n">domainA</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">prevalence</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">)</span> <span class="o">==</span> <span class="n">domainA</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> \
|
||||
<span class="sa">f</span><span class="s1">'wrong shape for the vector prevalence (expected </span><span class="si">{</span><span class="n">domainA</span><span class="o">.</span><span class="n">n_classes</span><span class="si">}</span><span class="s1">)'</span>
|
||||
<span class="k">assert</span> <span class="n">F</span><span class="o">.</span><span class="n">check_prevalence_vector</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">),</span> \
|
||||
<span class="sa">f</span><span class="s1">'the prevalence vector is not valid (either it contains values outside [0,1] or does not sum up to 1)'</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">mixture_points</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">mixture_points</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">mixture_points</span><span class="p">)[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">mixture_points</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">mixture_points</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">logical_and</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mixture_points</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">mixture_points</span><span class="o"><=</span><span class="mi">1</span><span class="p">)),</span> \
|
||||
<span class="s1">'mixture_model datatype not understood (expected int or a sequence of real values in [0,1])'</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">collator</span> <span class="o">=</span> <span class="n">OnLabelledCollectionProtocol</span><span class="o">.</span><span class="n">get_collator</span><span class="p">(</span><span class="n">return_type</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="DomainMixer.samples_parameters"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.DomainMixer.samples_parameters">[docs]</a> <span class="k">def</span> <span class="nf">samples_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return all the necessary parameters to replicate the samples as according to the this protocol.</span>
|
||||
|
||||
<span class="sd"> :return: a list of zipped indexes (from A and B) that realize the sampling</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">indexesA</span><span class="p">,</span> <span class="n">indexesB</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">propA</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">mixture_points</span><span class="p">:</span>
|
||||
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">repeats</span><span class="p">):</span>
|
||||
<span class="n">nA</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span> <span class="o">*</span> <span class="n">propA</span><span class="p">))</span>
|
||||
<span class="n">nB</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sample_size</span><span class="o">-</span><span class="n">nA</span>
|
||||
<span class="n">sampleAidx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">A</span><span class="o">.</span><span class="n">sampling_index</span><span class="p">(</span><span class="n">nA</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">)</span>
|
||||
<span class="n">sampleBidx</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">B</span><span class="o">.</span><span class="n">sampling_index</span><span class="p">(</span><span class="n">nB</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">prevalence</span><span class="p">)</span>
|
||||
<span class="n">indexesA</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sampleAidx</span><span class="p">)</span>
|
||||
<span class="n">indexesB</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sampleBidx</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">indexesA</span><span class="p">,</span> <span class="n">indexesB</span><span class="p">))</span></div>
|
||||
|
||||
<div class="viewcode-block" id="DomainMixer.sample"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.DomainMixer.sample">[docs]</a> <span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">indexes</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Realizes the sample given a pair of indexes of the instances from A and B.</span>
|
||||
|
||||
<span class="sd"> :param indexes: indexes of the instances to select from A and B</span>
|
||||
<span class="sd"> :return: an instance of :class:`qp.data.LabelledCollection`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">indexesA</span><span class="p">,</span> <span class="n">indexesB</span> <span class="o">=</span> <span class="n">indexes</span>
|
||||
<span class="n">sampleA</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">A</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">indexesA</span><span class="p">)</span>
|
||||
<span class="n">sampleB</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">B</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">indexesB</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">sampleA</span><span class="o">+</span><span class="n">sampleB</span></div>
|
||||
|
||||
<div class="viewcode-block" id="DomainMixer.total"><a class="viewcode-back" href="../../quapy.html#quapy.protocol.DomainMixer.total">[docs]</a> <span class="k">def</span> <span class="nf">total</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the number of samples that will be generated (equals to "repeats * mixture_points")</span>
|
||||
|
||||
<span class="sd"> :return: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">repeats</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mixture_points</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<span class="c1"># aliases</span>
|
||||
|
||||
<span class="n">ArtificialPrevalenceProtocol</span> <span class="o">=</span> <span class="n">APP</span>
|
||||
<span class="n">NaturalPrevalenceProtocol</span> <span class="o">=</span> <span class="n">NPP</span>
|
||||
<span class="n">UniformPrevalenceProtocol</span> <span class="o">=</span> <span class="n">UPP</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_base — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_base</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_base</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">pytest</span>
|
||||
|
||||
<div class="viewcode-block" id="test_import">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_base.test_import">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_import</span><span class="p">():</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="k">assert</span> <span class="n">qp</span><span class="o">.</span><span class="n">__version__</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,178 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_datasets — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_datasets</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_datasets</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">pytest</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.data.datasets</span> <span class="kn">import</span> <span class="n">REVIEWS_SENTIMENT_DATASETS</span><span class="p">,</span> <span class="n">TWITTER_SENTIMENT_DATASETS_TEST</span><span class="p">,</span> \
|
||||
<span class="n">TWITTER_SENTIMENT_DATASETS_TRAIN</span><span class="p">,</span> <span class="n">UCI_BINARY_DATASETS</span><span class="p">,</span> <span class="n">LEQUA2022_TASKS</span><span class="p">,</span> <span class="n">UCI_MULTICLASS_DATASETS</span><span class="p">,</span>\
|
||||
<span class="n">fetch_reviews</span><span class="p">,</span> <span class="n">fetch_twitter</span><span class="p">,</span> <span class="n">fetch_UCIBinaryDataset</span><span class="p">,</span> <span class="n">fetch_lequa2022</span><span class="p">,</span> <span class="n">fetch_UCIMulticlassLabelledCollection</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_fetch_reviews">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_datasets.test_fetch_reviews">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset_name'</span><span class="p">,</span> <span class="n">REVIEWS_SENTIMENT_DATASETS</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_fetch_reviews</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">):</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">fetch_reviews</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Dataset </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Training set stats'</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Test set stats'</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_fetch_twitter">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_datasets.test_fetch_twitter">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset_name'</span><span class="p">,</span> <span class="n">TWITTER_SENTIMENT_DATASETS_TEST</span> <span class="o">+</span> <span class="n">TWITTER_SENTIMENT_DATASETS_TRAIN</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_fetch_twitter</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">):</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">fetch_twitter</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">)</span>
|
||||
<span class="k">except</span> <span class="ne">ValueError</span> <span class="k">as</span> <span class="n">ve</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'semeval'</span> <span class="ow">and</span> <span class="n">ve</span><span class="o">.</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span>
|
||||
<span class="s1">'dataset "semeval" can only be used for model selection.'</span><span class="p">):</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">fetch_twitter</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">,</span> <span class="n">for_model_selection</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Dataset </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Training set stats'</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Test set stats'</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_fetch_UCIDataset">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_datasets.test_fetch_UCIDataset">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset_name'</span><span class="p">,</span> <span class="n">UCI_BINARY_DATASETS</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_fetch_UCIDataset</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">):</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">fetch_UCIBinaryDataset</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">)</span>
|
||||
<span class="k">except</span> <span class="ne">FileNotFoundError</span> <span class="k">as</span> <span class="n">fnfe</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">dataset_name</span> <span class="o">==</span> <span class="s1">'pageblocks.5'</span> <span class="ow">and</span> <span class="n">fnfe</span><span class="o">.</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">find</span><span class="p">(</span>
|
||||
<span class="s1">'If this is the first time you attempt to load this dataset'</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'The pageblocks.5 dataset requires some hand processing to be usable, skipping this test.'</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Dataset </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Training set stats'</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Test set stats'</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_fetch_UCIMultiDataset">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_datasets.test_fetch_UCIMultiDataset">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset_name'</span><span class="p">,</span> <span class="n">UCI_MULTICLASS_DATASETS</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_fetch_UCIMultiDataset</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">):</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">fetch_UCIMulticlassLabelledCollection</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Dataset </span><span class="si">{</span><span class="n">dataset_name</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Training set stats'</span><span class="p">)</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">stats</span><span class="p">()</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Test set stats'</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_fetch_lequa2022">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_datasets.test_fetch_lequa2022">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset_name'</span><span class="p">,</span> <span class="n">LEQUA2022_TASKS</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_fetch_lequa2022</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">):</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">gen_val</span><span class="p">,</span> <span class="n">gen_test</span> <span class="o">=</span> <span class="n">fetch_lequa2022</span><span class="p">(</span><span class="n">dataset_name</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">train</span><span class="o">.</span><span class="n">stats</span><span class="p">())</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Val:'</span><span class="p">,</span> <span class="n">gen_val</span><span class="o">.</span><span class="n">total</span><span class="p">())</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Test:'</span><span class="p">,</span> <span class="n">gen_test</span><span class="o">.</span><span class="n">total</span><span class="p">())</span></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,195 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_evaluation — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_evaluation</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_evaluation</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">unittest</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">time</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.error</span> <span class="kn">import</span> <span class="n">QUANTIFICATION_ERROR_SINGLE</span><span class="p">,</span> <span class="n">QUANTIFICATION_ERROR</span><span class="p">,</span> <span class="n">QUANTIFICATION_ERROR_NAMES</span><span class="p">,</span> \
|
||||
<span class="n">QUANTIFICATION_ERROR_SINGLE_NAMES</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">EMQ</span><span class="p">,</span> <span class="n">PCC</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.base</span> <span class="kn">import</span> <span class="n">BaseQuantifier</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EvalTestCase">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_evaluation.EvalTestCase">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">EvalTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
|
||||
<div class="viewcode-block" id="EvalTestCase.test_eval_speedup">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_evaluation.EvalTestCase.test_eval_speedup">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_eval_speedup</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'hp'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">test</span>
|
||||
|
||||
<span class="n">protocol</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">protocol</span><span class="o">.</span><span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="k">class</span> <span class="nc">SlowLR</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">):</span>
|
||||
<span class="k">def</span> <span class="nf">predict_proba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
|
||||
<span class="kn">import</span> <span class="nn">time</span>
|
||||
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
|
||||
|
||||
<span class="n">emq</span> <span class="o">=</span> <span class="n">EMQ</span><span class="p">(</span><span class="n">SlowLR</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">emq</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">aggr_speedup</span><span class="o">=</span><span class="s1">'force'</span><span class="p">)</span>
|
||||
<span class="n">tend_optim</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span><span class="o">-</span><span class="n">tinit</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'evaluation (with optimization) took </span><span class="si">{</span><span class="n">tend_optim</span><span class="si">}</span><span class="s1">s [MAE=</span><span class="si">{</span><span class="n">score</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">]'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">class</span> <span class="nc">NonAggregativeEMQ</span><span class="p">(</span><span class="n">BaseQuantifier</span><span class="p">):</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="bp">cls</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">emq</span> <span class="o">=</span> <span class="n">EMQ</span><span class="p">(</span><span class="bp">cls</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">quantify</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instances</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">emq</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">emq</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span>
|
||||
|
||||
<span class="n">emq</span> <span class="o">=</span> <span class="n">NonAggregativeEMQ</span><span class="p">(</span><span class="n">SlowLR</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">emq</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">tend_no_optim</span> <span class="o">=</span> <span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">tinit</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'evaluation (w/o optimization) took </span><span class="si">{</span><span class="n">tend_no_optim</span><span class="si">}</span><span class="s1">s [MAE=</span><span class="si">{</span><span class="n">score</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">]'</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">tend_no_optim</span><span class="o">></span><span class="p">(</span><span class="n">tend_optim</span><span class="o">/</span><span class="mi">2</span><span class="p">),</span> <span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EvalTestCase.test_evaluation_output">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_evaluation.EvalTestCase.test_evaluation_output">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_evaluation_output</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'hp'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">test</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span><span class="o">=</span><span class="mi">100</span>
|
||||
|
||||
<span class="n">protocol</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">protocol</span><span class="o">.</span><span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">PCC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
|
||||
<span class="n">single_errors</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">QUANTIFICATION_ERROR_SINGLE_NAMES</span><span class="p">)</span>
|
||||
<span class="n">averaged_errors</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'m'</span><span class="o">+</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">single_errors</span><span class="p">]</span>
|
||||
<span class="n">single_errors</span> <span class="o">=</span> <span class="n">single_errors</span> <span class="o">+</span> <span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">single_errors</span><span class="p">]</span>
|
||||
<span class="n">averaged_errors</span> <span class="o">=</span> <span class="n">averaged_errors</span> <span class="o">+</span> <span class="p">[</span><span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">averaged_errors</span><span class="p">]</span>
|
||||
<span class="k">for</span> <span class="n">error_metric</span><span class="p">,</span> <span class="n">averaged_error_metric</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">single_errors</span><span class="p">,</span> <span class="n">averaged_errors</span><span class="p">):</span>
|
||||
<span class="n">score</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="n">averaged_error_metric</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">score</span><span class="p">,</span> <span class="nb">float</span><span class="p">))</span>
|
||||
|
||||
<span class="n">scores</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="n">error_metric</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">))</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">scores</span><span class="o">.</span><span class="n">mean</span><span class="p">(),</span> <span class="n">score</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
|
||||
<span class="n">unittest</span><span class="o">.</span><span class="n">main</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,143 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_hierarchy — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_hierarchy</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_hierarchy</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">unittest</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="o">*</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="HierarchyTestCase">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_hierarchy.HierarchyTestCase">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">HierarchyTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
|
||||
|
||||
<div class="viewcode-block" id="HierarchyTestCase.test_aggregative">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_hierarchy.HierarchyTestCase.test_aggregative">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_aggregative</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="p">[</span><span class="n">CC</span><span class="p">(</span><span class="n">lr</span><span class="p">),</span> <span class="n">PCC</span><span class="p">(</span><span class="n">lr</span><span class="p">),</span> <span class="n">ACC</span><span class="p">(</span><span class="n">lr</span><span class="p">),</span> <span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)]:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">AggregativeQuantifier</span><span class="p">),</span> <span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="HierarchyTestCase.test_binary">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_hierarchy.HierarchyTestCase.test_binary">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_binary</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="p">[</span><span class="n">HDy</span><span class="p">(</span><span class="n">lr</span><span class="p">)]:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">BinaryQuantifier</span><span class="p">),</span> <span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="HierarchyTestCase.test_probabilistic">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_hierarchy.HierarchyTestCase.test_probabilistic">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_probabilistic</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="p">[</span><span class="n">CC</span><span class="p">(</span><span class="n">lr</span><span class="p">),</span> <span class="n">ACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)]:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">AggregativeCrispQuantifier</span><span class="p">),</span> <span class="kc">True</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">AggregativeSoftQuantifier</span><span class="p">),</span> <span class="kc">False</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="p">[</span><span class="n">PCC</span><span class="p">(</span><span class="n">lr</span><span class="p">),</span> <span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)]:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">AggregativeCrispQuantifier</span><span class="p">),</span> <span class="kc">False</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">AggregativeSoftQuantifier</span><span class="p">),</span> <span class="kc">True</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
|
||||
<span class="n">unittest</span><span class="o">.</span><span class="n">main</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,176 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_labelcollection — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_labelcollection</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_labelcollection</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">unittest</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">scipy.sparse</span> <span class="kn">import</span> <span class="n">csr_matrix</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelCollectionTestCase">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_labelcollection.LabelCollectionTestCase">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">LabelCollectionTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
|
||||
<div class="viewcode-block" id="LabelCollectionTestCase.test_split">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_labelcollection.LabelCollectionTestCase.test_split">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_split</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">100</span><span class="p">)</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
|
||||
<span class="n">tr</span><span class="p">,</span> <span class="n">te</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_random</span><span class="p">(</span><span class="mf">0.7</span><span class="p">)</span>
|
||||
<span class="n">check_prev</span> <span class="o">=</span> <span class="n">tr</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span><span class="o">*</span><span class="mf">0.7</span> <span class="o">+</span> <span class="n">te</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span><span class="o">*</span><span class="mf">0.3</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">tr</span><span class="p">),</span> <span class="mi">70</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">te</span><span class="p">),</span> <span class="mi">30</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">check_prev</span><span class="p">,</span> <span class="n">data</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()),</span> <span class="kc">True</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">tr</span><span class="o">+</span><span class="n">te</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="LabelCollectionTestCase.test_join">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_labelcollection.LabelCollectionTestCase.test_join">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_join</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">50</span><span class="p">)</span>
|
||||
<span class="n">data1</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">200</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
|
||||
<span class="n">data2</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
|
||||
<span class="n">data3</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
|
||||
<span class="n">combined</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data1</span><span class="p">,</span> <span class="n">data2</span><span class="p">,</span> <span class="n">data3</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">combined</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">data1</span><span class="p">)</span><span class="o">+</span><span class="nb">len</span><span class="p">(</span><span class="n">data2</span><span class="p">)</span><span class="o">+</span><span class="nb">len</span><span class="p">(</span><span class="n">data3</span><span class="p">))</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">all</span><span class="p">(</span><span class="n">combined</span><span class="o">.</span><span class="n">classes_</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">6</span><span class="p">)),</span> <span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
|
||||
<span class="n">data4</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
|
||||
<span class="n">combined</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data1</span><span class="p">,</span> <span class="n">data2</span><span class="p">,</span> <span class="n">data3</span><span class="p">,</span> <span class="n">data4</span><span class="p">)</span>
|
||||
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span>
|
||||
<span class="n">data5</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="n">combined</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data4</span><span class="p">,</span> <span class="n">data5</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">combined</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">data4</span><span class="p">)</span><span class="o">+</span><span class="nb">len</span><span class="p">(</span><span class="n">data5</span><span class="p">))</span>
|
||||
|
||||
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
|
||||
<span class="n">data6</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
|
||||
<span class="n">combined</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data4</span><span class="p">,</span> <span class="n">data5</span><span class="p">,</span> <span class="n">data6</span><span class="p">)</span>
|
||||
|
||||
<span class="n">data4</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">csr_matrix</span><span class="p">(</span><span class="n">data4</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
|
||||
<span class="n">combined</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data4</span><span class="p">,</span> <span class="n">data5</span><span class="p">)</span>
|
||||
<span class="n">data5</span><span class="o">.</span><span class="n">instances</span> <span class="o">=</span> <span class="n">csr_matrix</span><span class="p">(</span><span class="n">data5</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">combined</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">LabelledCollection</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data4</span><span class="p">,</span> <span class="n">data5</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">combined</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">data4</span><span class="p">)</span> <span class="o">+</span> <span class="nb">len</span><span class="p">(</span><span class="n">data5</span><span class="p">))</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
|
||||
<span class="n">unittest</span><span class="o">.</span><span class="n">main</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,357 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_methods — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_methods</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_methods</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">import</span> <span class="nn">pytest</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">LinearSVC</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">method.aggregative</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.model_selection</span> <span class="kn">import</span> <span class="n">GridSearchQ</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.base</span> <span class="kn">import</span> <span class="n">BinaryQuantifier</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method</span> <span class="kn">import</span> <span class="n">AGGREGATIVE_METHODS</span><span class="p">,</span> <span class="n">NON_AGGREGATIVE_METHODS</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.meta</span> <span class="kn">import</span> <span class="n">Ensemble</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">DMy</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.meta</span> <span class="kn">import</span> <span class="n">MedianEstimator</span>
|
||||
|
||||
<span class="c1"># datasets = [pytest.param(qp.datasets.fetch_twitter('hcr', pickle=True), id='hcr'),</span>
|
||||
<span class="c1"># pytest.param(qp.datasets.fetch_UCIDataset('ionosphere'), id='ionosphere')]</span>
|
||||
|
||||
<span class="n">tinydatasets</span> <span class="o">=</span> <span class="p">[</span><span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_twitter</span><span class="p">(</span><span class="s1">'hcr'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">reduce</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'tiny_hcr'</span><span class="p">),</span>
|
||||
<span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_UCIBinaryDataset</span><span class="p">(</span><span class="s1">'ionosphere'</span><span class="p">)</span><span class="o">.</span><span class="n">reduce</span><span class="p">(),</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'tiny_ionosphere'</span><span class="p">)]</span>
|
||||
|
||||
<span class="n">learners</span> <span class="o">=</span> <span class="p">[</span><span class="n">LogisticRegression</span><span class="p">,</span> <span class="n">LinearSVC</span><span class="p">]</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_aggregative_methods">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_aggregative_methods">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset'</span><span class="p">,</span> <span class="n">tinydatasets</span><span class="p">)</span>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'aggregative_method'</span><span class="p">,</span> <span class="n">AGGREGATIVE_METHODS</span><span class="p">)</span>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'learner'</span><span class="p">,</span> <span class="n">learners</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_aggregative_methods</span><span class="p">(</span><span class="n">dataset</span><span class="p">:</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">aggregative_method</span><span class="p">,</span> <span class="n">learner</span><span class="p">):</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">aggregative_method</span><span class="p">(</span><span class="n">learner</span><span class="p">())</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">BinaryQuantifier</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">dataset</span><span class="o">.</span><span class="n">binary</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'skipping the test of binary model </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">model</span><span class="p">)</span><span class="si">}</span><span class="s1"> on non-binary dataset </span><span class="si">{</span><span class="n">dataset</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">estim_prevalences</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="n">true_prevalences</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">,</span> <span class="n">estim_prevalences</span><span class="p">)</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">error</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_non_aggregative_methods">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_non_aggregative_methods">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset'</span><span class="p">,</span> <span class="n">tinydatasets</span><span class="p">)</span>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'non_aggregative_method'</span><span class="p">,</span> <span class="n">NON_AGGREGATIVE_METHODS</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_non_aggregative_methods</span><span class="p">(</span><span class="n">dataset</span><span class="p">:</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">non_aggregative_method</span><span class="p">):</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">non_aggregative_method</span><span class="p">()</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">BinaryQuantifier</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">dataset</span><span class="o">.</span><span class="n">binary</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'skipping the test of binary model </span><span class="si">{</span><span class="n">model</span><span class="si">}</span><span class="s1"> on non-binary dataset </span><span class="si">{</span><span class="n">dataset</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">estim_prevalences</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="n">true_prevalences</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">,</span> <span class="n">estim_prevalences</span><span class="p">)</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">error</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_ensemble_method">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_ensemble_method">[docs]</a>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'base_method'</span><span class="p">,</span> <span class="p">[</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">ACC</span><span class="p">,</span> <span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">PACC</span><span class="p">])</span>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'learner'</span><span class="p">,</span> <span class="p">[</span><span class="n">LogisticRegression</span><span class="p">])</span>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'dataset'</span><span class="p">,</span> <span class="n">tinydatasets</span><span class="p">)</span>
|
||||
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span><span class="s1">'policy'</span><span class="p">,</span> <span class="n">Ensemble</span><span class="o">.</span><span class="n">VALID_POLICIES</span><span class="p">)</span>
|
||||
<span class="k">def</span> <span class="nf">test_ensemble_method</span><span class="p">(</span><span class="n">base_method</span><span class="p">,</span> <span class="n">learner</span><span class="p">,</span> <span class="n">dataset</span><span class="p">:</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">policy</span><span class="p">):</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">20</span>
|
||||
|
||||
<span class="n">base_quantifier</span><span class="o">=</span><span class="n">base_method</span><span class="p">(</span><span class="n">learner</span><span class="p">())</span>
|
||||
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">dataset</span><span class="o">.</span><span class="n">binary</span> <span class="ow">and</span> <span class="n">policy</span><span class="o">==</span><span class="s1">'ds'</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'skipping the test of binary policy ds on non-binary dataset </span><span class="si">{</span><span class="n">dataset</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">Ensemble</span><span class="p">(</span><span class="n">quantifier</span><span class="o">=</span><span class="n">base_quantifier</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">policy</span><span class="o">=</span><span class="n">policy</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">estim_prevalences</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="n">true_prevalences</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">,</span> <span class="n">estim_prevalences</span><span class="p">)</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">error</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_quanet_method">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_quanet_method">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_quanet_method</span><span class="p">():</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.classification.neural</span>
|
||||
<span class="k">except</span> <span class="ne">ModuleNotFoundError</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'skipping QuaNet test due to missing torch package'</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
|
||||
|
||||
<span class="c1"># load the kindle dataset as text, and convert words to numerical indexes</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'kindle'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="mi">200</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">preprocessing</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.classification.neural</span> <span class="kn">import</span> <span class="n">CNNnet</span>
|
||||
<span class="n">cnn</span> <span class="o">=</span> <span class="n">CNNnet</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">vocabulary_size</span><span class="p">,</span> <span class="n">dataset</span><span class="o">.</span><span class="n">n_classes</span><span class="p">)</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.classification.neural</span> <span class="kn">import</span> <span class="n">NeuralClassifierTrainer</span>
|
||||
<span class="n">learner</span> <span class="o">=</span> <span class="n">NeuralClassifierTrainer</span><span class="p">(</span><span class="n">cnn</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.meta</span> <span class="kn">import</span> <span class="n">QuaNet</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">QuaNet</span><span class="p">(</span><span class="n">learner</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">BinaryQuantifier</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">dataset</span><span class="o">.</span><span class="n">binary</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'skipping the test of binary model </span><span class="si">{</span><span class="n">model</span><span class="si">}</span><span class="s1"> on non-binary dataset </span><span class="si">{</span><span class="n">dataset</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">estim_prevalences</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
|
||||
<span class="n">true_prevalences</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">,</span> <span class="n">estim_prevalences</span><span class="p">)</span>
|
||||
|
||||
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">error</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_str_label_names">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_str_label_names">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_str_label_names</span><span class="p">():</span>
|
||||
<span class="n">model</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">method</span><span class="o">.</span><span class="n">aggregative</span><span class="o">.</span><span class="n">CC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">pickle</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="o">*</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()),</span>
|
||||
<span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">))</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">preprocessing</span><span class="o">.</span><span class="n">text2tfidf</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">int_estim_prevalences</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">true_prevalences</span> <span class="o">=</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">,</span> <span class="n">int_estim_prevalences</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">error</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span>
|
||||
|
||||
<span class="n">dataset_str</span> <span class="o">=</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">LabelledCollection</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span>
|
||||
<span class="p">[</span><span class="s1">'one'</span> <span class="k">if</span> <span class="n">label</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">else</span> <span class="s1">'zero'</span> <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">labels</span><span class="p">]),</span>
|
||||
<span class="n">LabelledCollection</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">,</span>
|
||||
<span class="p">[</span><span class="s1">'one'</span> <span class="k">if</span> <span class="n">label</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">else</span> <span class="s1">'zero'</span> <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">labels</span><span class="p">]))</span>
|
||||
<span class="k">assert</span> <span class="nb">all</span><span class="p">(</span><span class="n">dataset_str</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">classes_</span> <span class="o">==</span> <span class="n">dataset_str</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">classes_</span><span class="p">),</span> <span class="s1">'wrong indexation'</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset_str</span><span class="o">.</span><span class="n">training</span><span class="p">)</span>
|
||||
|
||||
<span class="n">str_estim_prevalences</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset_str</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">)</span>
|
||||
<span class="n">true_prevalences</span> <span class="o">=</span> <span class="n">dataset_str</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">prevalence</span><span class="p">()</span>
|
||||
|
||||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">,</span> <span class="n">str_estim_prevalences</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">error</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">true_prevalences</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">int_estim_prevalences</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">str_estim_prevalences</span><span class="p">)</span>
|
||||
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_almost_equal</span><span class="p">(</span><span class="n">int_estim_prevalences</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
|
||||
<span class="n">str_estim_prevalences</span><span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">classes_</span><span class="p">)</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="s1">'one'</span><span class="p">)])</span></div>
|
||||
|
||||
|
||||
<span class="c1"># helper</span>
|
||||
<span class="k">def</span> <span class="nf">__fit_test</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">):</span>
|
||||
<span class="n">quantifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
|
||||
<span class="n">test_samples</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
|
||||
<span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">test_samples</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span><span class="p">),</span> <span class="n">estim_prevs</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_median_meta">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_median_meta">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_median_meta</span><span class="p">():</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This test compares the performance of the MedianQuantifier with respect to computing the median of the predictions</span>
|
||||
<span class="sd"> of a differently parameterized quantifier. We use the DistributionMatching base quantifier and the median is</span>
|
||||
<span class="sd"> computed across different values of nbins</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
|
||||
|
||||
<span class="c1"># grid of values</span>
|
||||
<span class="n">nbins_grid</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">11</span><span class="p">))</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="s1">'kindle'</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">errors</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">nbins</span> <span class="ow">in</span> <span class="n">nbins_grid</span><span class="p">:</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">DMy</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">nbins</span><span class="o">=</span><span class="n">nbins</span><span class="p">)</span>
|
||||
<span class="n">mae</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">__fit_test</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
|
||||
<span class="n">prevs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">estim_prevs</span><span class="p">)</span>
|
||||
<span class="n">errors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">mae</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">dataset</span><span class="si">}</span><span class="s1"> DistributionMatching(nbins=</span><span class="si">{</span><span class="n">nbins</span><span class="si">}</span><span class="s1">) got MAE </span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="n">prevs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">prevs</span><span class="p">)</span>
|
||||
<span class="n">mae</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">errors</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="s1">MAE=</span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">DMy</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">MedianEstimator</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'nbins'</span><span class="p">:</span> <span class="n">nbins_grid</span><span class="p">},</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">median_mae</span><span class="p">,</span> <span class="n">prev</span> <span class="o">=</span> <span class="n">__fit_test</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="s1">MAE=</span><span class="si">{</span><span class="n">median_mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_almost_equal</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">median</span><span class="p">(</span><span class="n">prevs</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">),</span> <span class="n">prev</span><span class="p">)</span>
|
||||
<span class="k">assert</span> <span class="n">median_mae</span> <span class="o"><</span> <span class="n">mae</span><span class="p">,</span> <span class="s1">'the median-based quantifier provided a higher error...'</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="test_median_meta_modsel">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_methods.test_median_meta_modsel">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_median_meta_modsel</span><span class="p">():</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> This test checks the median-meta quantifier with model selection</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="s1">'kindle'</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">val</span> <span class="o">=</span> <span class="n">train</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
|
||||
<span class="n">nbins_grid</span> <span class="o">=</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">15</span><span class="p">]</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">DMy</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">MedianEstimator</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'nbins'</span><span class="p">:</span> <span class="n">nbins_grid</span><span class="p">},</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">median_mae</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">__fit_test</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="s1">MAE=</span><span class="si">{</span><span class="n">median_mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">DMy</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">())</span>
|
||||
<span class="n">lr_params</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)}</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">MedianEstimator</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="p">{</span><span class="s1">'nbins'</span><span class="p">:</span> <span class="n">nbins_grid</span><span class="p">},</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">GridSearchQ</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="o">=</span><span class="n">lr_params</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">APP</span><span class="p">(</span><span class="n">val</span><span class="p">),</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">optimized_median_ave</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">__fit_test</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">train</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="se">\t</span><span class="s1">MAE=</span><span class="si">{</span><span class="n">optimized_median_ave</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">assert</span> <span class="n">optimized_median_ave</span> <span class="o"><</span> <span class="n">median_mae</span><span class="p">,</span> <span class="s2">"the optimized method yielded worse performance..."</span></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,225 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_modsel — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_modsel</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_modsel</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">unittest</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">SVC</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">PACC</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.model_selection</span> <span class="kn">import</span> <span class="n">GridSearchQ</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span>
|
||||
<span class="kn">import</span> <span class="nn">time</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ModselTestCase">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_modsel.ModselTestCase">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">ModselTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
|
||||
|
||||
<div class="viewcode-block" id="ModselTestCase.test_modsel">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_modsel.ModselTestCase.test_modsel">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_modsel</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">5000</span><span class="p">))</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">)}</span>
|
||||
<span class="n">app</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">app</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'best params'</span><span class="p">,</span> <span class="n">q</span><span class="o">.</span><span class="n">best_params_</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'best score'</span><span class="p">,</span> <span class="n">q</span><span class="o">.</span><span class="n">best_score_</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">q</span><span class="o">.</span><span class="n">best_params_</span><span class="p">[</span><span class="s1">'classifier__C'</span><span class="p">],</span> <span class="mf">10.0</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">q</span><span class="o">.</span><span class="n">best_model</span><span class="p">()</span><span class="o">.</span><span class="n">get_params</span><span class="p">()[</span><span class="s1">'classifier__C'</span><span class="p">],</span> <span class="mf">10.0</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ModselTestCase.test_modsel_parallel">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_modsel.ModselTestCase.test_modsel_parallel">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_modsel_parallel</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">5000</span><span class="p">))</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="c1"># test = data.test</span>
|
||||
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">)}</span>
|
||||
<span class="n">app</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">app</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'best params'</span><span class="p">,</span> <span class="n">q</span><span class="o">.</span><span class="n">best_params_</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'best score'</span><span class="p">,</span> <span class="n">q</span><span class="o">.</span><span class="n">best_score_</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">q</span><span class="o">.</span><span class="n">best_params_</span><span class="p">[</span><span class="s1">'classifier__C'</span><span class="p">],</span> <span class="mf">10.0</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">q</span><span class="o">.</span><span class="n">best_model</span><span class="p">()</span><span class="o">.</span><span class="n">get_params</span><span class="p">()[</span><span class="s1">'classifier__C'</span><span class="p">],</span> <span class="mf">10.0</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ModselTestCase.test_modsel_parallel_speedup">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_modsel.ModselTestCase.test_modsel_parallel_speedup">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_modsel_parallel_speedup</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">class</span> <span class="nc">SlowLR</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">):</span>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_weight</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">SlowLR</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_weight</span><span class="p">)</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">SlowLR</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">5000</span><span class="p">))</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">7</span><span class="p">)}</span>
|
||||
<span class="n">app</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">app</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">refit</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">tend_nooptim</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span><span class="o">-</span><span class="n">tinit</span>
|
||||
|
||||
<span class="n">tinit</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">app</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">refit</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span>
|
||||
<span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
|
||||
<span class="n">tend_optim</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">tinit</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'parallel training took </span><span class="si">{</span><span class="n">tend_optim</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">s'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'sequential training took </span><span class="si">{</span><span class="n">tend_nooptim</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">s'</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">tend_optim</span> <span class="o"><</span> <span class="p">(</span><span class="mf">0.5</span><span class="o">*</span><span class="n">tend_nooptim</span><span class="p">),</span> <span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ModselTestCase.test_modsel_timeout">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_modsel.ModselTestCase.test_modsel_timeout">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_modsel_timeout</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="k">class</span> <span class="nc">SlowLR</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">):</span>
|
||||
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_weight</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
|
||||
<span class="kn">import</span> <span class="nn">time</span>
|
||||
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
|
||||
<span class="nb">super</span><span class="p">(</span><span class="n">SlowLR</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">sample_weight</span><span class="p">)</span>
|
||||
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">SlowLR</span><span class="p">())</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_reviews</span><span class="p">(</span><span class="s1">'imdb'</span><span class="p">,</span> <span class="n">tfidf</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">min_df</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
|
||||
<span class="n">training</span><span class="p">,</span> <span class="n">validation</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">training</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="c1"># test = data.test</span>
|
||||
|
||||
<span class="n">param_grid</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'classifier__C'</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">7</span><span class="p">)}</span>
|
||||
<span class="n">app</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">validation</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">q</span> <span class="o">=</span> <span class="n">GridSearchQ</span><span class="p">(</span>
|
||||
<span class="n">q</span><span class="p">,</span> <span class="n">param_grid</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">app</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">,</span> <span class="n">refit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="ne">TimeoutError</span><span class="p">):</span>
|
||||
<span class="n">q</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
|
||||
<span class="n">unittest</span><span class="o">.</span><span class="n">main</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,336 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_protocols — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_protocols</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_protocols</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">unittest</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.protocol</span> <span class="kn">import</span> <span class="n">APP</span><span class="p">,</span> <span class="n">NPP</span><span class="p">,</span> <span class="n">UPP</span><span class="p">,</span> <span class="n">DomainMixer</span><span class="p">,</span> <span class="n">AbstractStochasticSeededProtocol</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="mock_labelled_collection">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.mock_labelled_collection">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">mock_labelled_collection</span><span class="p">(</span><span class="n">prefix</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="mi">250</span> <span class="o">+</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">250</span> <span class="o">+</span> <span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="mi">250</span> <span class="o">+</span> <span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">*</span> <span class="mi">250</span>
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="p">[</span><span class="n">prefix</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">+</span> <span class="s1">'-'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">yi</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">y</span><span class="p">)]</span>
|
||||
<span class="k">return</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">classes</span><span class="o">=</span><span class="nb">sorted</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">y</span><span class="p">)))</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="samples_to_str">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.samples_to_str">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">samples_to_str</span><span class="p">(</span><span class="n">protocol</span><span class="p">):</span>
|
||||
<span class="n">samples_str</span> <span class="o">=</span> <span class="s2">""</span>
|
||||
<span class="k">for</span> <span class="n">instances</span><span class="p">,</span> <span class="n">prev</span> <span class="ow">in</span> <span class="n">protocol</span><span class="p">():</span>
|
||||
<span class="n">samples_str</span> <span class="o">+=</span> <span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">instances</span><span class="si">}</span><span class="se">\t</span><span class="si">{</span><span class="n">prev</span><span class="si">}</span><span class="se">\n</span><span class="s1">'</span>
|
||||
<span class="k">return</span> <span class="n">samples_str</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">TestProtocols</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_app_sanity_check">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_app_sanity_check">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_app_sanity_check</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">n_prevpoints</span> <span class="o">=</span> <span class="mi">101</span>
|
||||
<span class="n">repeats</span> <span class="o">=</span> <span class="mi">10</span>
|
||||
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="ne">RuntimeError</span><span class="p">):</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="n">repeats</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
|
||||
<span class="n">n_combinations</span> <span class="o">=</span> \
|
||||
<span class="n">quapy</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">num_prevalence_combinations</span><span class="p">(</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">n_classes</span><span class="o">=</span><span class="n">data</span><span class="o">.</span><span class="n">n_classes</span><span class="p">,</span> <span class="n">n_repeats</span><span class="o">=</span><span class="n">repeats</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span> <span class="n">sanity_check</span><span class="o">=</span><span class="n">n_combinations</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="n">n_prevpoints</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span> <span class="n">sanity_check</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_app_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_app_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_app_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span> <span class="c1"># <- random_state is by default set to 0</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_app_not_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_app_not_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_app_not_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_app_number">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_app_number">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_app_number</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">APP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">n_prevalences</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># surprisingly enough, for some n_prevalences the test fails, notwithstanding</span>
|
||||
<span class="c1"># everything is correct. The problem is that in function APP.prevalence_grid()</span>
|
||||
<span class="c1"># there is sometimes one rounding error that gets cumulated and</span>
|
||||
<span class="c1"># surpasses 1.0 (by a very small float value, 0.0000000000002 or sthe like)</span>
|
||||
<span class="c1"># so these tuples are mistakenly removed... I have tried with np.close, and</span>
|
||||
<span class="c1"># other workarounds, but eventually happens that there is some negative probability</span>
|
||||
<span class="c1"># in the sampling function...</span>
|
||||
|
||||
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">p</span><span class="p">():</span>
|
||||
<span class="n">count</span><span class="o">+=</span><span class="mi">1</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">count</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">total</span><span class="p">())</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_npp_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_npp_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_npp_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">NPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">NPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span> <span class="c1"># <- random_state is by default set to 0</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_npp_not_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_npp_not_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_npp_not_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">NPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">NPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">NPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_kraemer_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_kraemer_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_kraemer_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">UPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">UPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span> <span class="c1"># <- random_state is by default set to 0</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_kraemer_not_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_kraemer_not_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_kraemer_not_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">UPP</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeats</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_covariate_shift_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_covariate_shift_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_covariate_shift_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">dataA</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">(</span><span class="s1">'domA'</span><span class="p">)</span>
|
||||
<span class="n">dataB</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">(</span><span class="s1">'domB'</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">DomainMixer</span><span class="p">(</span><span class="n">dataA</span><span class="p">,</span> <span class="n">dataB</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">mixture_points</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">DomainMixer</span><span class="p">(</span><span class="n">dataA</span><span class="p">,</span> <span class="n">dataB</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">mixture_points</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span> <span class="c1"># <- random_state is by default set to 0</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_covariate_shift_not_replicate">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_covariate_shift_not_replicate">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_covariate_shift_not_replicate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="n">dataA</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">(</span><span class="s1">'domA'</span><span class="p">)</span>
|
||||
<span class="n">dataB</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">(</span><span class="s1">'domB'</span><span class="p">)</span>
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">DomainMixer</span><span class="p">(</span><span class="n">dataA</span><span class="p">,</span> <span class="n">dataB</span><span class="p">,</span> <span class="n">sample_size</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">mixture_points</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
|
||||
|
||||
<span class="n">samples1</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
<span class="n">samples2</span> <span class="o">=</span> <span class="n">samples_to_str</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">samples1</span><span class="p">,</span> <span class="n">samples2</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="TestProtocols.test_no_seed_init">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_protocols.TestProtocols.test_no_seed_init">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_no_seed_init</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="k">class</span> <span class="nc">NoSeedInit</span><span class="p">(</span><span class="n">AbstractStochasticSeededProtocol</span><span class="p">):</span>
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">mock_labelled_collection</span><span class="p">()</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">samples_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="c1"># return a matrix containing sampling indexes in the rows</span>
|
||||
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">),</span> <span class="mi">10</span><span class="o">*</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
|
||||
<span class="n">index</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">unique</span><span class="p">(</span><span class="n">params</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">sampling_from_index</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
|
||||
|
||||
<span class="n">p</span> <span class="o">=</span> <span class="n">NoSeedInit</span><span class="p">()</span>
|
||||
|
||||
<span class="c1"># this should raise a ValueError, since the class is said to be AbstractStochasticSeededProtocol but the</span>
|
||||
<span class="c1"># random_seed has never been passed to super(NoSeedInit, self).__init__(random_seed)</span>
|
||||
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">assertRaises</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">p</span><span class="p">():</span>
|
||||
<span class="k">pass</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'done'</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
|
||||
<span class="n">unittest</span><span class="o">.</span><span class="n">main</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,225 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../../../">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.tests.test_replicability — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../../../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../../../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.tests.test_replicability</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.tests.test_replicability</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">unittest</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.data</span> <span class="kn">import</span> <span class="n">LabelledCollection</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.functional</span> <span class="kn">import</span> <span class="n">strprev</span>
|
||||
<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">quapy.method.aggregative</span> <span class="kn">import</span> <span class="n">PACC</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy.functional</span> <span class="k">as</span> <span class="nn">F</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MyTestCase">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_replicability.MyTestCase">[docs]</a>
|
||||
<span class="k">class</span> <span class="nc">MyTestCase</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
|
||||
|
||||
<div class="viewcode-block" id="MyTestCase.test_prediction_replicability">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_replicability.MyTestCase.test_prediction_replicability">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_prediction_replicability</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="n">dataset</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_UCIBinaryDataset</span><span class="p">(</span><span class="s1">'yeast'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
|
||||
<span class="n">prev</span> <span class="o">=</span> <span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="n">str_prev1</span> <span class="o">=</span> <span class="n">strprev</span><span class="p">(</span><span class="n">prev</span><span class="p">,</span> <span class="n">prec</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
|
||||
<span class="n">prev2</span> <span class="o">=</span> <span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">training</span><span class="p">)</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">X</span><span class="p">)</span>
|
||||
<span class="n">str_prev2</span> <span class="o">=</span> <span class="n">strprev</span><span class="p">(</span><span class="n">prev2</span><span class="p">,</span> <span class="n">prec</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">str_prev1</span><span class="p">,</span> <span class="n">str_prev2</span><span class="p">)</span> <span class="c1"># add assertion here</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MyTestCase.test_samping_replicability">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_replicability.MyTestCase.test_samping_replicability">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_samping_replicability</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">equal_collections</span><span class="p">(</span><span class="n">c1</span><span class="p">,</span> <span class="n">c2</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">all</span><span class="p">(</span><span class="n">c1</span><span class="o">.</span><span class="n">Xtr</span> <span class="o">==</span> <span class="n">c2</span><span class="o">.</span><span class="n">Xtr</span><span class="p">),</span> <span class="n">value</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">all</span><span class="p">(</span><span class="n">c1</span><span class="o">.</span><span class="n">ytr</span> <span class="o">==</span> <span class="n">c2</span><span class="o">.</span><span class="n">ytr</span><span class="p">),</span> <span class="n">value</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">value</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">all</span><span class="p">(</span><span class="n">c1</span><span class="o">.</span><span class="n">classes_</span> <span class="o">==</span> <span class="n">c2</span><span class="o">.</span><span class="n">classes_</span><span class="p">),</span> <span class="n">value</span><span class="p">)</span>
|
||||
|
||||
<span class="n">X</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)))</span>
|
||||
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">LabelledCollection</span><span class="p">(</span><span class="n">instances</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">labels</span><span class="o">=</span><span class="n">y</span><span class="p">)</span>
|
||||
|
||||
<span class="n">sample1</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span>
|
||||
<span class="n">sample2</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1</span><span class="p">,</span> <span class="n">sample2</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="n">sample1</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">sample2</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1</span><span class="p">,</span> <span class="n">sample2</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">sample1</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">sample2</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1</span><span class="p">,</span> <span class="n">sample2</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">sample1</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">sample2</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1</span><span class="p">,</span> <span class="n">sample2</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">sample1</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">sample2</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1</span><span class="p">,</span> <span class="n">sample2</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="n">sample1_tr</span><span class="p">,</span> <span class="n">sample1_te</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">sample2_tr</span><span class="p">,</span> <span class="n">sample2_te</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1_tr</span><span class="p">,</span> <span class="n">sample2_tr</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1_te</span><span class="p">,</span> <span class="n">sample2_te</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">sample1_tr</span><span class="p">,</span> <span class="n">sample1_te</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">sample2_tr</span><span class="p">,</span> <span class="n">sample2_te</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">split_stratified</span><span class="p">(</span><span class="n">train_prop</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1_tr</span><span class="p">,</span> <span class="n">sample2_tr</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">equal_collections</span><span class="p">(</span><span class="n">sample1_te</span><span class="p">,</span> <span class="n">sample2_te</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
|
||||
<div class="viewcode-block" id="MyTestCase.test_parallel_replicability">
|
||||
<a class="viewcode-back" href="../../../quapy.tests.html#quapy.tests.test_replicability.MyTestCase.test_parallel_replicability">[docs]</a>
|
||||
<span class="k">def</span> <span class="nf">test_parallel_replicability</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
|
||||
<span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">fetch_UCIMulticlassDataset</span><span class="p">(</span><span class="s1">'dry-bean'</span><span class="p">)</span><span class="o">.</span><span class="n">train_test</span>
|
||||
|
||||
<span class="n">test</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="mi">500</span><span class="p">,</span> <span class="o">*</span><span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">])</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
|
||||
<span class="n">prev1</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">pacc</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">))</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
|
||||
<span class="n">prev2</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">pacc</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">))</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
|
||||
<span class="n">pacc</span> <span class="o">=</span> <span class="n">PACC</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">(),</span> <span class="n">val_split</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
|
||||
<span class="n">pacc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">val_split</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
|
||||
<span class="n">prev3</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">(</span><span class="n">pacc</span><span class="o">.</span><span class="n">quantify</span><span class="p">(</span><span class="n">test</span><span class="o">.</span><span class="n">instances</span><span class="p">))</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">prev1</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">prev2</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">prev3</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertNotEqual</span><span class="p">(</span><span class="n">prev1</span><span class="p">,</span> <span class="n">prev2</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="n">prev2</span><span class="p">,</span> <span class="n">prev3</span><span class="p">)</span></div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
|
||||
<span class="n">unittest</span><span class="o">.</span><span class="n">main</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,402 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.util — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/jquery.js"></script>
|
||||
<script src="../../_static/underscore.js"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Module code</a></li>
|
||||
<li class="breadcrumb-item active">quapy.util</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<h1>Source code for quapy.util</h1><div class="highlight"><pre>
|
||||
<span></span><span class="kn">import</span> <span class="nn">contextlib</span>
|
||||
<span class="kn">import</span> <span class="nn">itertools</span>
|
||||
<span class="kn">import</span> <span class="nn">multiprocessing</span>
|
||||
<span class="kn">import</span> <span class="nn">os</span>
|
||||
<span class="kn">import</span> <span class="nn">pickle</span>
|
||||
<span class="kn">import</span> <span class="nn">urllib</span>
|
||||
<span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
|
||||
<span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">ExitStack</span>
|
||||
<span class="kn">import</span> <span class="nn">quapy</span> <span class="k">as</span> <span class="nn">qp</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">from</span> <span class="nn">joblib</span> <span class="kn">import</span> <span class="n">Parallel</span><span class="p">,</span> <span class="n">delayed</span>
|
||||
<span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">time</span>
|
||||
<span class="kn">import</span> <span class="nn">signal</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_get_parallel_slices</span><span class="p">(</span><span class="n">n_tasks</span><span class="p">,</span> <span class="n">n_jobs</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">n_jobs</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">n_jobs</span> <span class="o">=</span> <span class="n">multiprocessing</span><span class="o">.</span><span class="n">cpu_count</span><span class="p">()</span>
|
||||
<span class="n">batch</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">n_tasks</span> <span class="o">/</span> <span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="n">remainder</span> <span class="o">=</span> <span class="n">n_tasks</span> <span class="o">%</span> <span class="n">n_jobs</span>
|
||||
<span class="k">return</span> <span class="p">[</span><span class="nb">slice</span><span class="p">(</span><span class="n">job</span> <span class="o">*</span> <span class="n">batch</span><span class="p">,</span> <span class="p">(</span><span class="n">job</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">batch</span> <span class="o">+</span> <span class="p">(</span><span class="n">remainder</span> <span class="k">if</span> <span class="n">job</span> <span class="o">==</span> <span class="n">n_jobs</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">else</span> <span class="mi">0</span><span class="p">))</span> <span class="k">for</span> <span class="n">job</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_jobs</span><span class="p">)]</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="map_parallel"><a class="viewcode-back" href="../../quapy.html#quapy.util.map_parallel">[docs]</a><span class="k">def</span> <span class="nf">map_parallel</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">n_jobs</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then</span>
|
||||
<span class="sd"> func is applied in two parallel processes to args[0:50] and to args[50:99]. func is a function</span>
|
||||
<span class="sd"> that already works with a list of arguments.</span>
|
||||
|
||||
<span class="sd"> :param func: function to be parallelized</span>
|
||||
<span class="sd"> :param args: array-like of arguments to be passed to the function in different parallel calls</span>
|
||||
<span class="sd"> :param n_jobs: the number of workers</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">args</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
|
||||
<span class="n">slices</span> <span class="o">=</span> <span class="n">_get_parallel_slices</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">),</span> <span class="n">n_jobs</span><span class="p">)</span>
|
||||
<span class="n">results</span> <span class="o">=</span> <span class="n">Parallel</span><span class="p">(</span><span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">)(</span>
|
||||
<span class="n">delayed</span><span class="p">(</span><span class="n">func</span><span class="p">)(</span><span class="n">args</span><span class="p">[</span><span class="n">slice_i</span><span class="p">])</span> <span class="k">for</span> <span class="n">slice_i</span> <span class="ow">in</span> <span class="n">slices</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="n">itertools</span><span class="o">.</span><span class="n">chain</span><span class="o">.</span><span class="n">from_iterable</span><span class="p">(</span><span class="n">results</span><span class="p">))</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="parallel"><a class="viewcode-back" href="../../quapy.html#quapy.util.parallel">[docs]</a><span class="k">def</span> <span class="nf">parallel</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">n_jobs</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">asarray</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">backend</span><span class="o">=</span><span class="s1">'loky'</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> A wrapper of multiprocessing:</span>
|
||||
|
||||
<span class="sd"> >>> Parallel(n_jobs=n_jobs)(</span>
|
||||
<span class="sd"> >>> delayed(func)(args_i) for args_i in args</span>
|
||||
<span class="sd"> >>> )</span>
|
||||
|
||||
<span class="sd"> that takes the `quapy.environ` variable as input silently.</span>
|
||||
<span class="sd"> Seeds the child processes to ensure reproducibility when n_jobs>1.</span>
|
||||
|
||||
<span class="sd"> :param func: callable</span>
|
||||
<span class="sd"> :param args: args of func</span>
|
||||
<span class="sd"> :param seed: the numeric seed</span>
|
||||
<span class="sd"> :param asarray: set to True to return a np.ndarray instead of a list</span>
|
||||
<span class="sd"> :param backend: indicates the backend used for handling parallel works</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="nf">func_dec</span><span class="p">(</span><span class="n">environ</span><span class="p">,</span> <span class="n">seed</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span> <span class="o">=</span> <span class="n">environ</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'N_JOBS'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="c1">#set a context with a temporal seed to ensure results are reproducibles in parallel</span>
|
||||
<span class="k">with</span> <span class="n">ExitStack</span><span class="p">()</span> <span class="k">as</span> <span class="n">stack</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">seed</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">stack</span><span class="o">.</span><span class="n">enter_context</span><span class="p">(</span><span class="n">qp</span><span class="o">.</span><span class="n">util</span><span class="o">.</span><span class="n">temp_seed</span><span class="p">(</span><span class="n">seed</span><span class="p">))</span>
|
||||
<span class="k">return</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
|
||||
|
||||
<span class="n">out</span> <span class="o">=</span> <span class="n">Parallel</span><span class="p">(</span><span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">,</span> <span class="n">backend</span><span class="o">=</span><span class="n">backend</span><span class="p">)(</span>
|
||||
<span class="n">delayed</span><span class="p">(</span><span class="n">func_dec</span><span class="p">)(</span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">,</span> <span class="kc">None</span> <span class="k">if</span> <span class="n">seed</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="n">seed</span><span class="o">+</span><span class="n">i</span><span class="p">,</span> <span class="n">args_i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">args_i</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">asarray</span><span class="p">:</span>
|
||||
<span class="n">out</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">out</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="temp_seed"><a class="viewcode-back" href="../../quapy.html#quapy.util.temp_seed">[docs]</a><span class="nd">@contextlib</span><span class="o">.</span><span class="n">contextmanager</span>
|
||||
<span class="k">def</span> <span class="nf">temp_seed</span><span class="p">(</span><span class="n">random_state</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Can be used in a "with" context to set a temporal seed without modifying the outer numpy's current state. E.g.:</span>
|
||||
|
||||
<span class="sd"> >>> with temp_seed(random_seed):</span>
|
||||
<span class="sd"> >>> pass # do any computation depending on np.random functionality</span>
|
||||
|
||||
<span class="sd"> :param random_state: the seed to set within the "with" context</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">random_state</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">state</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">get_state</span><span class="p">()</span>
|
||||
<span class="c1">#save the seed just in case is needed (for instance for setting the seed to child processes)</span>
|
||||
<span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'_R_SEED'</span><span class="p">]</span> <span class="o">=</span> <span class="n">random_state</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="k">yield</span>
|
||||
<span class="k">finally</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">random_state</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">set_state</span><span class="p">(</span><span class="n">state</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="download_file"><a class="viewcode-back" href="../../quapy.html#quapy.util.download_file">[docs]</a><span class="k">def</span> <span class="nf">download_file</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">archive_filename</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Downloads a file from a url</span>
|
||||
|
||||
<span class="sd"> :param url: the url</span>
|
||||
<span class="sd"> :param archive_filename: destination filename</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">def</span> <span class="nf">progress</span><span class="p">(</span><span class="n">blocknum</span><span class="p">,</span> <span class="n">bs</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
|
||||
<span class="n">total_sz_mb</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%.2f</span><span class="s1"> MB'</span> <span class="o">%</span> <span class="p">(</span><span class="n">size</span> <span class="o">/</span> <span class="mf">1e6</span><span class="p">)</span>
|
||||
<span class="n">current_sz_mb</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%.2f</span><span class="s1"> MB'</span> <span class="o">%</span> <span class="p">((</span><span class="n">blocknum</span> <span class="o">*</span> <span class="n">bs</span><span class="p">)</span> <span class="o">/</span> <span class="mf">1e6</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="se">\r</span><span class="s1">downloaded </span><span class="si">%s</span><span class="s1"> / </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">current_sz_mb</span><span class="p">,</span> <span class="n">total_sz_mb</span><span class="p">),</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s2">"Downloading </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">url</span><span class="p">)</span>
|
||||
<span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlretrieve</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="n">archive_filename</span><span class="p">,</span> <span class="n">reporthook</span><span class="o">=</span><span class="n">progress</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="download_file_if_not_exists"><a class="viewcode-back" href="../../quapy.html#quapy.util.download_file_if_not_exists">[docs]</a><span class="k">def</span> <span class="nf">download_file_if_not_exists</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">archive_filename</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Dowloads a function (using :meth:`download_file`) if the file does not exist.</span>
|
||||
|
||||
<span class="sd"> :param url: the url</span>
|
||||
<span class="sd"> :param archive_filename: destination filename</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">archive_filename</span><span class="p">):</span>
|
||||
<span class="k">return</span>
|
||||
<span class="n">create_if_not_exist</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">archive_filename</span><span class="p">))</span>
|
||||
<span class="n">download_file</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">archive_filename</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="create_if_not_exist"><a class="viewcode-back" href="../../quapy.html#quapy.util.create_if_not_exist">[docs]</a><span class="k">def</span> <span class="nf">create_if_not_exist</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> An alias to `os.makedirs(path, exist_ok=True)` that also returns the path. This is useful in cases like, e.g.:</span>
|
||||
|
||||
<span class="sd"> >>> path = create_if_not_exist(os.path.join(dir, subdir, anotherdir))</span>
|
||||
|
||||
<span class="sd"> :param path: path to create</span>
|
||||
<span class="sd"> :return: the path itself</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">path</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_quapy_home"><a class="viewcode-back" href="../../quapy.html#quapy.util.get_quapy_home">[docs]</a><span class="k">def</span> <span class="nf">get_quapy_home</span><span class="p">():</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Gets the home directory of QuaPy, i.e., the directory where QuaPy saves permanent data, such as dowloaded datasets.</span>
|
||||
<span class="sd"> This directory is `~/quapy_data`</span>
|
||||
|
||||
<span class="sd"> :return: a string representing the path</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">home</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">Path</span><span class="o">.</span><span class="n">home</span><span class="p">()),</span> <span class="s1">'quapy_data'</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">home</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">home</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="create_parent_dir"><a class="viewcode-back" href="../../quapy.html#quapy.util.create_parent_dir">[docs]</a><span class="k">def</span> <span class="nf">create_parent_dir</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Creates the parent dir (if any) of a given path, if not exists. E.g., for `./path/to/file.txt`, the path `./path/to`</span>
|
||||
<span class="sd"> is created.</span>
|
||||
|
||||
<span class="sd"> :param path: the path</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">parentdir</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">parent</span>
|
||||
<span class="k">if</span> <span class="n">parentdir</span><span class="p">:</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">parentdir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="save_text_file"><a class="viewcode-back" href="../../quapy.html#quapy.util.save_text_file">[docs]</a><span class="k">def</span> <span class="nf">save_text_file</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Saves a text file to disk, given its full path, and creates the parent directory if missing.</span>
|
||||
|
||||
<span class="sd"> :param path: path where to save the path.</span>
|
||||
<span class="sd"> :param text: text to save.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">create_parent_dir</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="s1">'wt'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fout</span><span class="p">:</span>
|
||||
<span class="n">fout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">text</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="pickled_resource"><a class="viewcode-back" href="../../quapy.html#quapy.util.pickled_resource">[docs]</a><span class="k">def</span> <span class="nf">pickled_resource</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">:</span><span class="nb">str</span><span class="p">,</span> <span class="n">generation_func</span><span class="p">:</span><span class="n">callable</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Allows for fast reuse of resources that are generated only once by calling generation_func(\\*args). The next times</span>
|
||||
<span class="sd"> this function is invoked, it loads the pickled resource. Example:</span>
|
||||
|
||||
<span class="sd"> >>> def some_array(n): # a mock resource created with one parameter (`n`)</span>
|
||||
<span class="sd"> >>> return np.random.rand(n)</span>
|
||||
<span class="sd"> >>> pickled_resource('./my_array.pkl', some_array, 10) # the resource does not exist: it is created by calling some_array(10)</span>
|
||||
<span class="sd"> >>> pickled_resource('./my_array.pkl', some_array, 10) # the resource exists; it is loaded from './my_array.pkl'</span>
|
||||
|
||||
<span class="sd"> :param pickle_path: the path where to save (first time) and load (next times) the resource</span>
|
||||
<span class="sd"> :param generation_func: the function that generates the resource, in case it does not exist in pickle_path</span>
|
||||
<span class="sd"> :param args: any arg that generation_func uses for generating the resources</span>
|
||||
<span class="sd"> :return: the resource</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">pickle_path</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">generation_func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">pickle</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">))</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">instance</span> <span class="o">=</span> <span class="n">generation_func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">Path</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">)</span><span class="o">.</span><span class="n">parent</span><span class="p">),</span> <span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">pickle</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="n">pickle_path</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">),</span> <span class="n">pickle</span><span class="o">.</span><span class="n">HIGHEST_PROTOCOL</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">instance</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_check_sample_size</span><span class="p">(</span><span class="n">sample_size</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">sample_size</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">assert</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">,</span> \
|
||||
<span class="s1">'error: sample_size set to None, and cannot be resolved from the environment'</span>
|
||||
<span class="n">sample_size</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span>
|
||||
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">sample_size</span><span class="p">,</span> <span class="nb">int</span><span class="p">)</span> <span class="ow">and</span> <span class="n">sample_size</span> <span class="o">></span> <span class="mi">0</span><span class="p">,</span> \
|
||||
<span class="s1">'error: sample_size is not a positive integer'</span>
|
||||
<span class="k">return</span> <span class="n">sample_size</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="EarlyStop"><a class="viewcode-back" href="../../quapy.html#quapy.util.EarlyStop">[docs]</a><span class="k">class</span> <span class="nc">EarlyStop</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> A class implementing the early-stopping condition typically used for training neural networks.</span>
|
||||
|
||||
<span class="sd"> >>> earlystop = EarlyStop(patience=2, lower_is_better=True)</span>
|
||||
<span class="sd"> >>> earlystop(0.9, epoch=0)</span>
|
||||
<span class="sd"> >>> earlystop(0.7, epoch=1)</span>
|
||||
<span class="sd"> >>> earlystop.IMPROVED # is True</span>
|
||||
<span class="sd"> >>> earlystop(1.0, epoch=2)</span>
|
||||
<span class="sd"> >>> earlystop.STOP # is False (patience=1)</span>
|
||||
<span class="sd"> >>> earlystop(1.0, epoch=3)</span>
|
||||
<span class="sd"> >>> earlystop.STOP # is True (patience=0)</span>
|
||||
<span class="sd"> >>> earlystop.best_epoch # is 1</span>
|
||||
<span class="sd"> >>> earlystop.best_score # is 0.7</span>
|
||||
|
||||
<span class="sd"> :param patience: the number of (consecutive) times that a monitored evaluation metric (typically obtaind in a</span>
|
||||
<span class="sd"> held-out validation split) can be found to be worse than the best one obtained so far, before flagging the</span>
|
||||
<span class="sd"> stopping condition. An instance of this class is `callable`, and is to be used as follows:</span>
|
||||
<span class="sd"> :param lower_is_better: if True (default) the metric is to be minimized.</span>
|
||||
<span class="sd"> :ivar best_score: keeps track of the best value seen so far</span>
|
||||
<span class="sd"> :ivar best_epoch: keeps track of the epoch in which the best score was set</span>
|
||||
<span class="sd"> :ivar STOP: flag (boolean) indicating the stopping condition</span>
|
||||
<span class="sd"> :ivar IMPROVED: flag (boolean) indicating whether there was an improvement in the last call</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">patience</span><span class="p">,</span> <span class="n">lower_is_better</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">PATIENCE_LIMIT</span> <span class="o">=</span> <span class="n">patience</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">better</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">:</span> <span class="n">a</span><span class="o"><</span><span class="n">b</span> <span class="k">if</span> <span class="n">lower_is_better</span> <span class="k">else</span> <span class="n">a</span><span class="o">></span><span class="n">b</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">patience</span> <span class="o">=</span> <span class="n">patience</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_score</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_epoch</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">STOP</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">IMPROVED</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">watch_score</span><span class="p">,</span> <span class="n">epoch</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Commits the new score found in epoch `epoch`. If the score improves over the best score found so far, then</span>
|
||||
<span class="sd"> the patiente counter gets reset. If otherwise, the patience counter is decreased, and in case it reachs 0,</span>
|
||||
<span class="sd"> the flag STOP becomes True.</span>
|
||||
|
||||
<span class="sd"> :param watch_score: the new score</span>
|
||||
<span class="sd"> :param epoch: the current epoch</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">IMPROVED</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">best_score</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="bp">self</span><span class="o">.</span><span class="n">better</span><span class="p">(</span><span class="n">watch_score</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">best_score</span><span class="p">))</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">IMPROVED</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_score</span> <span class="o">=</span> <span class="n">watch_score</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">best_epoch</span> <span class="o">=</span> <span class="n">epoch</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">patience</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">PATIENCE_LIMIT</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">patience</span> <span class="o">-=</span> <span class="mi">1</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">patience</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">STOP</span> <span class="o">=</span> <span class="kc">True</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="timeout"><a class="viewcode-back" href="../../quapy.html#quapy.util.timeout">[docs]</a><span class="nd">@contextlib</span><span class="o">.</span><span class="n">contextmanager</span>
|
||||
<span class="k">def</span> <span class="nf">timeout</span><span class="p">(</span><span class="n">seconds</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Opens a context that will launch an exception if not closed after a given number of seconds</span>
|
||||
|
||||
<span class="sd"> >>> def func(start_msg, end_msg):</span>
|
||||
<span class="sd"> >>> print(start_msg)</span>
|
||||
<span class="sd"> >>> sleep(2)</span>
|
||||
<span class="sd"> >>> print(end_msg)</span>
|
||||
<span class="sd"> >>></span>
|
||||
<span class="sd"> >>> with timeout(1):</span>
|
||||
<span class="sd"> >>> func('begin function', 'end function')</span>
|
||||
<span class="sd"> >>> Out[]</span>
|
||||
<span class="sd"> >>> begin function</span>
|
||||
<span class="sd"> >>> TimeoutError</span>
|
||||
|
||||
|
||||
<span class="sd"> :param seconds: number of seconds, set to <=0 to ignore the timer</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">seconds</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">signum</span><span class="p">,</span> <span class="n">frame</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">TimeoutError</span><span class="p">()</span>
|
||||
|
||||
<span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGALRM</span><span class="p">,</span> <span class="n">handler</span><span class="p">)</span>
|
||||
<span class="n">signal</span><span class="o">.</span><span class="n">alarm</span><span class="p">(</span><span class="n">seconds</span><span class="p">)</span>
|
||||
|
||||
<span class="k">yield</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">seconds</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="n">signal</span><span class="o">.</span><span class="n">alarm</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span></div>
|
||||
|
||||
</pre></div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,134 @@
|
|||
/*
|
||||
* _sphinx_javascript_frameworks_compat.js
|
||||
* ~~~~~~~~~~
|
||||
*
|
||||
* Compatability shim for jQuery and underscores.js.
|
||||
*
|
||||
* WILL BE REMOVED IN Sphinx 6.0
|
||||
* xref RemovedInSphinx60Warning
|
||||
*
|
||||
*/
|
||||
|
||||
/**
|
||||
* select a different prefix for underscore
|
||||
*/
|
||||
$u = _.noConflict();
|
||||
|
||||
|
||||
/**
|
||||
* small helper function to urldecode strings
|
||||
*
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
|
||||
*/
|
||||
jQuery.urldecode = function(x) {
|
||||
if (!x) {
|
||||
return x
|
||||
}
|
||||
return decodeURIComponent(x.replace(/\+/g, ' '));
|
||||
};
|
||||
|
||||
/**
|
||||
* small helper function to urlencode strings
|
||||
*/
|
||||
jQuery.urlencode = encodeURIComponent;
|
||||
|
||||
/**
|
||||
* This function returns the parsed url parameters of the
|
||||
* current request. Multiple values per key are supported,
|
||||
* it will always return arrays of strings for the value parts.
|
||||
*/
|
||||
jQuery.getQueryParameters = function(s) {
|
||||
if (typeof s === 'undefined')
|
||||
s = document.location.search;
|
||||
var parts = s.substr(s.indexOf('?') + 1).split('&');
|
||||
var result = {};
|
||||
for (var i = 0; i < parts.length; i++) {
|
||||
var tmp = parts[i].split('=', 2);
|
||||
var key = jQuery.urldecode(tmp[0]);
|
||||
var value = jQuery.urldecode(tmp[1]);
|
||||
if (key in result)
|
||||
result[key].push(value);
|
||||
else
|
||||
result[key] = [value];
|
||||
}
|
||||
return result;
|
||||
};
|
||||
|
||||
/**
|
||||
* highlight a given string on a jquery object by wrapping it in
|
||||
* span elements with the given class name.
|
||||
*/
|
||||
jQuery.fn.highlightText = function(text, className) {
|
||||
function highlight(node, addItems) {
|
||||
if (node.nodeType === 3) {
|
||||
var val = node.nodeValue;
|
||||
var pos = val.toLowerCase().indexOf(text);
|
||||
if (pos >= 0 &&
|
||||
!jQuery(node.parentNode).hasClass(className) &&
|
||||
!jQuery(node.parentNode).hasClass("nohighlight")) {
|
||||
var span;
|
||||
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
|
||||
if (isInSVG) {
|
||||
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
|
||||
} else {
|
||||
span = document.createElement("span");
|
||||
span.className = className;
|
||||
}
|
||||
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
|
||||
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
|
||||
document.createTextNode(val.substr(pos + text.length)),
|
||||
node.nextSibling));
|
||||
node.nodeValue = val.substr(0, pos);
|
||||
if (isInSVG) {
|
||||
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
|
||||
var bbox = node.parentElement.getBBox();
|
||||
rect.x.baseVal.value = bbox.x;
|
||||
rect.y.baseVal.value = bbox.y;
|
||||
rect.width.baseVal.value = bbox.width;
|
||||
rect.height.baseVal.value = bbox.height;
|
||||
rect.setAttribute('class', className);
|
||||
addItems.push({
|
||||
"parent": node.parentNode,
|
||||
"target": rect});
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (!jQuery(node).is("button, select, textarea")) {
|
||||
jQuery.each(node.childNodes, function() {
|
||||
highlight(this, addItems);
|
||||
});
|
||||
}
|
||||
}
|
||||
var addItems = [];
|
||||
var result = this.each(function() {
|
||||
highlight(this, addItems);
|
||||
});
|
||||
for (var i = 0; i < addItems.length; ++i) {
|
||||
jQuery(addItems[i].parent).before(addItems[i].target);
|
||||
}
|
||||
return result;
|
||||
};
|
||||
|
||||
/*
|
||||
* backward compatibility for jQuery.browser
|
||||
* This will be supported until firefox bug is fixed.
|
||||
*/
|
||||
if (!jQuery.browser) {
|
||||
jQuery.uaMatch = function(ua) {
|
||||
ua = ua.toLowerCase();
|
||||
|
||||
var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
|
||||
/(msie) ([\w.]+)/.exec(ua) ||
|
||||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
|
||||
[];
|
||||
|
||||
return {
|
||||
browser: match[ 1 ] || "",
|
||||
version: match[ 2 ] || "0"
|
||||
};
|
||||
};
|
||||
jQuery.browser = {};
|
||||
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
|
||||
}
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 107 B |
|
|
@ -0,0 +1 @@
|
|||
.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:""}.clearfix:after{clear:both}@font-face{font-family:FontAwesome;font-style:normal;font-weight:400;src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713?#iefix) format("embedded-opentype"),url(fonts/fontawesome-webfont.woff2?af7ae505a9eed503f8b8e6982036873e) format("woff2"),url(fonts/fontawesome-webfont.woff?fee66e712a8a08eef5805a46892932ad) format("woff"),url(fonts/fontawesome-webfont.ttf?b06871f281fee6b241d60582ae9369b9) format("truetype"),url(fonts/fontawesome-webfont.svg?912ec66d7572ff821749319396470bde#FontAwesome) format("svg")}.fa:before{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1}.fa:before,a .fa{text-decoration:inherit}.fa:before,a .fa,li .fa{display:inline-block}li .fa-large:before{width:1.875em}ul.fas{list-style-type:none;margin-left:2em;text-indent:-.8em}ul.fas li .fa{width:.8em}ul.fas li .fa-large:before{vertical-align:baseline}.fa-book:before,.icon-book:before{content:"\f02d"}.fa-caret-down:before,.icon-caret-down:before{content:"\f0d7"}.fa-caret-up:before,.icon-caret-up:before{content:"\f0d8"}.fa-caret-left:before,.icon-caret-left:before{content:"\f0d9"}.fa-caret-right:before,.icon-caret-right:before{content:"\f0da"}.rst-versions{position:fixed;bottom:0;left:0;width:300px;color:#fcfcfc;background:#1f1d1d;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;z-index:400}.rst-versions a{color:#2980b9;text-decoration:none}.rst-versions .rst-badge-small{display:none}.rst-versions .rst-current-version{padding:12px;background-color:#272525;display:block;text-align:right;font-size:90%;cursor:pointer;color:#27ae60}.rst-versions .rst-current-version:after{clear:both;content:"";display:block}.rst-versions .rst-current-version .fa{color:#fcfcfc}.rst-versions .rst-current-version .fa-book,.rst-versions .rst-current-version .icon-book{float:left}.rst-versions .rst-current-version.rst-out-of-date{background-color:#e74c3c;color:#fff}.rst-versions .rst-current-version.rst-active-old-version{background-color:#f1c40f;color:#000}.rst-versions.shift-up{height:auto;max-height:100%;overflow-y:scroll}.rst-versions.shift-up .rst-other-versions{display:block}.rst-versions .rst-other-versions{font-size:90%;padding:12px;color:grey;display:none}.rst-versions .rst-other-versions hr{display:block;height:1px;border:0;margin:20px 0;padding:0;border-top:1px solid #413d3d}.rst-versions .rst-other-versions dd{display:inline-block;margin:0}.rst-versions .rst-other-versions dd a{display:inline-block;padding:6px;color:#fcfcfc}.rst-versions.rst-badge{width:auto;bottom:20px;right:20px;left:auto;border:none;max-width:300px;max-height:90%}.rst-versions.rst-badge .fa-book,.rst-versions.rst-badge .icon-book{float:none;line-height:30px}.rst-versions.rst-badge.shift-up .rst-current-version{text-align:right}.rst-versions.rst-badge.shift-up .rst-current-version .fa-book,.rst-versions.rst-badge.shift-up .rst-current-version .icon-book{float:left}.rst-versions.rst-badge>.rst-current-version{width:auto;height:30px;line-height:30px;padding:0 6px;display:block;text-align:center}@media screen and (max-width:768px){.rst-versions{width:85%;display:none}.rst-versions.shift{display:block}}
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
|
After Width: | Height: | Size: 434 KiB |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because one or more lines are too long
|
|
@ -0,0 +1 @@
|
|||
!function(e){var t={};function r(n){if(t[n])return t[n].exports;var o=t[n]={i:n,l:!1,exports:{}};return e[n].call(o.exports,o,o.exports,r),o.l=!0,o.exports}r.m=e,r.c=t,r.d=function(e,t,n){r.o(e,t)||Object.defineProperty(e,t,{enumerable:!0,get:n})},r.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},r.t=function(e,t){if(1&t&&(e=r(e)),8&t)return e;if(4&t&&"object"==typeof e&&e&&e.__esModule)return e;var n=Object.create(null);if(r.r(n),Object.defineProperty(n,"default",{enumerable:!0,value:e}),2&t&&"string"!=typeof e)for(var o in e)r.d(n,o,function(t){return e[t]}.bind(null,o));return n},r.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return r.d(t,"a",t),t},r.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},r.p="",r(r.s=4)}({4:function(e,t,r){}});
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
/**
|
||||
* @preserve HTML5 Shiv 3.7.3-pre | @afarkas @jdalton @jon_neal @rem | MIT/GPL2 Licensed
|
||||
*/
|
||||
!function(a,b){function c(a,b){var c=a.createElement("p"),d=a.getElementsByTagName("head")[0]||a.documentElement;return c.innerHTML="x<style>"+b+"</style>",d.insertBefore(c.lastChild,d.firstChild)}function d(){var a=y.elements;return"string"==typeof a?a.split(" "):a}function e(a,b){var c=y.elements;"string"!=typeof c&&(c=c.join(" ")),"string"!=typeof a&&(a=a.join(" ")),y.elements=c+" "+a,j(b)}function f(a){var b=x[a[v]];return b||(b={},w++,a[v]=w,x[w]=b),b}function g(a,c,d){if(c||(c=b),q)return c.createElement(a);d||(d=f(c));var e;return e=d.cache[a]?d.cache[a].cloneNode():u.test(a)?(d.cache[a]=d.createElem(a)).cloneNode():d.createElem(a),!e.canHaveChildren||t.test(a)||e.tagUrn?e:d.frag.appendChild(e)}function h(a,c){if(a||(a=b),q)return a.createDocumentFragment();c=c||f(a);for(var e=c.frag.cloneNode(),g=0,h=d(),i=h.length;i>g;g++)e.createElement(h[g]);return e}function i(a,b){b.cache||(b.cache={},b.createElem=a.createElement,b.createFrag=a.createDocumentFragment,b.frag=b.createFrag()),a.createElement=function(c){return y.shivMethods?g(c,a,b):b.createElem(c)},a.createDocumentFragment=Function("h,f","return function(){var n=f.cloneNode(),c=n.createElement;h.shivMethods&&("+d().join().replace(/[\w\-:]+/g,function(a){return b.createElem(a),b.frag.createElement(a),'c("'+a+'")'})+");return n}")(y,b.frag)}function j(a){a||(a=b);var d=f(a);return!y.shivCSS||p||d.hasCSS||(d.hasCSS=!!c(a,"article,aside,dialog,figcaption,figure,footer,header,hgroup,main,nav,section{display:block}mark{background:#FF0;color:#000}template{display:none}")),q||i(a,d),a}function k(a){for(var b,c=a.getElementsByTagName("*"),e=c.length,f=RegExp("^(?:"+d().join("|")+")$","i"),g=[];e--;)b=c[e],f.test(b.nodeName)&&g.push(b.applyElement(l(b)));return g}function l(a){for(var b,c=a.attributes,d=c.length,e=a.ownerDocument.createElement(A+":"+a.nodeName);d--;)b=c[d],b.specified&&e.setAttribute(b.nodeName,b.nodeValue);return e.style.cssText=a.style.cssText,e}function m(a){for(var b,c=a.split("{"),e=c.length,f=RegExp("(^|[\\s,>+~])("+d().join("|")+")(?=[[\\s,>+~#.:]|$)","gi"),g="$1"+A+"\\:$2";e--;)b=c[e]=c[e].split("}"),b[b.length-1]=b[b.length-1].replace(f,g),c[e]=b.join("}");return c.join("{")}function n(a){for(var b=a.length;b--;)a[b].removeNode()}function o(a){function b(){clearTimeout(g._removeSheetTimer),d&&d.removeNode(!0),d=null}var d,e,g=f(a),h=a.namespaces,i=a.parentWindow;return!B||a.printShived?a:("undefined"==typeof h[A]&&h.add(A),i.attachEvent("onbeforeprint",function(){b();for(var f,g,h,i=a.styleSheets,j=[],l=i.length,n=Array(l);l--;)n[l]=i[l];for(;h=n.pop();)if(!h.disabled&&z.test(h.media)){try{f=h.imports,g=f.length}catch(o){g=0}for(l=0;g>l;l++)n.push(f[l]);try{j.push(h.cssText)}catch(o){}}j=m(j.reverse().join("")),e=k(a),d=c(a,j)}),i.attachEvent("onafterprint",function(){n(e),clearTimeout(g._removeSheetTimer),g._removeSheetTimer=setTimeout(b,500)}),a.printShived=!0,a)}var p,q,r="3.7.3",s=a.html5||{},t=/^<|^(?:button|map|select|textarea|object|iframe|option|optgroup)$/i,u=/^(?:a|b|code|div|fieldset|h1|h2|h3|h4|h5|h6|i|label|li|ol|p|q|span|strong|style|table|tbody|td|th|tr|ul)$/i,v="_html5shiv",w=0,x={};!function(){try{var a=b.createElement("a");a.innerHTML="<xyz></xyz>",p="hidden"in a,q=1==a.childNodes.length||function(){b.createElement("a");var a=b.createDocumentFragment();return"undefined"==typeof a.cloneNode||"undefined"==typeof a.createDocumentFragment||"undefined"==typeof a.createElement}()}catch(c){p=!0,q=!0}}();var y={elements:s.elements||"abbr article aside audio bdi canvas data datalist details dialog figcaption figure footer header hgroup main mark meter nav output picture progress section summary template time video",version:r,shivCSS:s.shivCSS!==!1,supportsUnknownElements:q,shivMethods:s.shivMethods!==!1,type:"default",shivDocument:j,createElement:g,createDocumentFragment:h,addElements:e};a.html5=y,j(b);var z=/^$|\b(?:all|print)\b/,A="html5shiv",B=!q&&function(){var c=b.documentElement;return!("undefined"==typeof b.namespaces||"undefined"==typeof b.parentWindow||"undefined"==typeof c.applyElement||"undefined"==typeof c.removeNode||"undefined"==typeof a.attachEvent)}();y.type+=" print",y.shivPrint=o,o(b),"object"==typeof module&&module.exports&&(module.exports=y)}("undefined"!=typeof window?window:this,document);
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
/**
|
||||
* @preserve HTML5 Shiv 3.7.3 | @afarkas @jdalton @jon_neal @rem | MIT/GPL2 Licensed
|
||||
*/
|
||||
!function(a,b){function c(a,b){var c=a.createElement("p"),d=a.getElementsByTagName("head")[0]||a.documentElement;return c.innerHTML="x<style>"+b+"</style>",d.insertBefore(c.lastChild,d.firstChild)}function d(){var a=t.elements;return"string"==typeof a?a.split(" "):a}function e(a,b){var c=t.elements;"string"!=typeof c&&(c=c.join(" ")),"string"!=typeof a&&(a=a.join(" ")),t.elements=c+" "+a,j(b)}function f(a){var b=s[a[q]];return b||(b={},r++,a[q]=r,s[r]=b),b}function g(a,c,d){if(c||(c=b),l)return c.createElement(a);d||(d=f(c));var e;return e=d.cache[a]?d.cache[a].cloneNode():p.test(a)?(d.cache[a]=d.createElem(a)).cloneNode():d.createElem(a),!e.canHaveChildren||o.test(a)||e.tagUrn?e:d.frag.appendChild(e)}function h(a,c){if(a||(a=b),l)return a.createDocumentFragment();c=c||f(a);for(var e=c.frag.cloneNode(),g=0,h=d(),i=h.length;i>g;g++)e.createElement(h[g]);return e}function i(a,b){b.cache||(b.cache={},b.createElem=a.createElement,b.createFrag=a.createDocumentFragment,b.frag=b.createFrag()),a.createElement=function(c){return t.shivMethods?g(c,a,b):b.createElem(c)},a.createDocumentFragment=Function("h,f","return function(){var n=f.cloneNode(),c=n.createElement;h.shivMethods&&("+d().join().replace(/[\w\-:]+/g,function(a){return b.createElem(a),b.frag.createElement(a),'c("'+a+'")'})+");return n}")(t,b.frag)}function j(a){a||(a=b);var d=f(a);return!t.shivCSS||k||d.hasCSS||(d.hasCSS=!!c(a,"article,aside,dialog,figcaption,figure,footer,header,hgroup,main,nav,section{display:block}mark{background:#FF0;color:#000}template{display:none}")),l||i(a,d),a}var k,l,m="3.7.3-pre",n=a.html5||{},o=/^<|^(?:button|map|select|textarea|object|iframe|option|optgroup)$/i,p=/^(?:a|b|code|div|fieldset|h1|h2|h3|h4|h5|h6|i|label|li|ol|p|q|span|strong|style|table|tbody|td|th|tr|ul)$/i,q="_html5shiv",r=0,s={};!function(){try{var a=b.createElement("a");a.innerHTML="<xyz></xyz>",k="hidden"in a,l=1==a.childNodes.length||function(){b.createElement("a");var a=b.createDocumentFragment();return"undefined"==typeof a.cloneNode||"undefined"==typeof a.createDocumentFragment||"undefined"==typeof a.createElement}()}catch(c){k=!0,l=!0}}();var t={elements:n.elements||"abbr article aside audio bdi canvas data datalist details dialog figcaption figure footer header hgroup main mark meter nav output picture progress section summary template time video",version:m,shivCSS:n.shivCSS!==!1,supportsUnknownElements:l,shivMethods:n.shivMethods!==!1,type:"default",shivDocument:j,createElement:g,createDocumentFragment:h,addElements:e};a.html5=t,j(b),"object"==typeof module&&module.exports&&(module.exports=t)}("undefined"!=typeof window?window:this,document);
|
||||
File diff suppressed because one or more lines are too long
Binary file not shown.
|
After Width: | Height: | Size: 120 B |
|
|
@ -0,0 +1,144 @@
|
|||
/* Highlighting utilities for Sphinx HTML documentation. */
|
||||
"use strict";
|
||||
|
||||
const SPHINX_HIGHLIGHT_ENABLED = true
|
||||
|
||||
/**
|
||||
* highlight a given string on a node by wrapping it in
|
||||
* span elements with the given class name.
|
||||
*/
|
||||
const _highlight = (node, addItems, text, className) => {
|
||||
if (node.nodeType === Node.TEXT_NODE) {
|
||||
const val = node.nodeValue;
|
||||
const parent = node.parentNode;
|
||||
const pos = val.toLowerCase().indexOf(text);
|
||||
if (
|
||||
pos >= 0 &&
|
||||
!parent.classList.contains(className) &&
|
||||
!parent.classList.contains("nohighlight")
|
||||
) {
|
||||
let span;
|
||||
|
||||
const closestNode = parent.closest("body, svg, foreignObject");
|
||||
const isInSVG = closestNode && closestNode.matches("svg");
|
||||
if (isInSVG) {
|
||||
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
|
||||
} else {
|
||||
span = document.createElement("span");
|
||||
span.classList.add(className);
|
||||
}
|
||||
|
||||
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
|
||||
parent.insertBefore(
|
||||
span,
|
||||
parent.insertBefore(
|
||||
document.createTextNode(val.substr(pos + text.length)),
|
||||
node.nextSibling
|
||||
)
|
||||
);
|
||||
node.nodeValue = val.substr(0, pos);
|
||||
|
||||
if (isInSVG) {
|
||||
const rect = document.createElementNS(
|
||||
"http://www.w3.org/2000/svg",
|
||||
"rect"
|
||||
);
|
||||
const bbox = parent.getBBox();
|
||||
rect.x.baseVal.value = bbox.x;
|
||||
rect.y.baseVal.value = bbox.y;
|
||||
rect.width.baseVal.value = bbox.width;
|
||||
rect.height.baseVal.value = bbox.height;
|
||||
rect.setAttribute("class", className);
|
||||
addItems.push({ parent: parent, target: rect });
|
||||
}
|
||||
}
|
||||
} else if (node.matches && !node.matches("button, select, textarea")) {
|
||||
node.childNodes.forEach((el) => _highlight(el, addItems, text, className));
|
||||
}
|
||||
};
|
||||
const _highlightText = (thisNode, text, className) => {
|
||||
let addItems = [];
|
||||
_highlight(thisNode, addItems, text, className);
|
||||
addItems.forEach((obj) =>
|
||||
obj.parent.insertAdjacentElement("beforebegin", obj.target)
|
||||
);
|
||||
};
|
||||
|
||||
/**
|
||||
* Small JavaScript module for the documentation.
|
||||
*/
|
||||
const SphinxHighlight = {
|
||||
|
||||
/**
|
||||
* highlight the search words provided in localstorage in the text
|
||||
*/
|
||||
highlightSearchWords: () => {
|
||||
if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight
|
||||
|
||||
// get and clear terms from localstorage
|
||||
const url = new URL(window.location);
|
||||
const highlight =
|
||||
localStorage.getItem("sphinx_highlight_terms")
|
||||
|| url.searchParams.get("highlight")
|
||||
|| "";
|
||||
localStorage.removeItem("sphinx_highlight_terms")
|
||||
url.searchParams.delete("highlight");
|
||||
window.history.replaceState({}, "", url);
|
||||
|
||||
// get individual terms from highlight string
|
||||
const terms = highlight.toLowerCase().split(/\s+/).filter(x => x);
|
||||
if (terms.length === 0) return; // nothing to do
|
||||
|
||||
// There should never be more than one element matching "div.body"
|
||||
const divBody = document.querySelectorAll("div.body");
|
||||
const body = divBody.length ? divBody[0] : document.querySelector("body");
|
||||
window.setTimeout(() => {
|
||||
terms.forEach((term) => _highlightText(body, term, "highlighted"));
|
||||
}, 10);
|
||||
|
||||
const searchBox = document.getElementById("searchbox");
|
||||
if (searchBox === null) return;
|
||||
searchBox.appendChild(
|
||||
document
|
||||
.createRange()
|
||||
.createContextualFragment(
|
||||
'<p class="highlight-link">' +
|
||||
'<a href="javascript:SphinxHighlight.hideSearchWords()">' +
|
||||
_("Hide Search Matches") +
|
||||
"</a></p>"
|
||||
)
|
||||
);
|
||||
},
|
||||
|
||||
/**
|
||||
* helper function to hide the search marks again
|
||||
*/
|
||||
hideSearchWords: () => {
|
||||
document
|
||||
.querySelectorAll("#searchbox .highlight-link")
|
||||
.forEach((el) => el.remove());
|
||||
document
|
||||
.querySelectorAll("span.highlighted")
|
||||
.forEach((el) => el.classList.remove("highlighted"));
|
||||
localStorage.removeItem("sphinx_highlight_terms")
|
||||
},
|
||||
|
||||
initEscapeListener: () => {
|
||||
// only install a listener if it is really needed
|
||||
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return;
|
||||
|
||||
document.addEventListener("keydown", (event) => {
|
||||
// bail for input elements
|
||||
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
|
||||
// bail with special keys
|
||||
if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return;
|
||||
if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) {
|
||||
SphinxHighlight.hideSearchWords();
|
||||
event.preventDefault();
|
||||
}
|
||||
});
|
||||
},
|
||||
};
|
||||
|
||||
_ready(SphinxHighlight.highlightSearchWords);
|
||||
_ready(SphinxHighlight.initEscapeListener);
|
||||
|
|
@ -0,0 +1,354 @@
|
|||
/*
|
||||
* sphinxdoc.css_t
|
||||
* ~~~~~~~~~~~~~~~
|
||||
*
|
||||
* Sphinx stylesheet -- sphinxdoc theme. Originally created by
|
||||
* Armin Ronacher for Werkzeug.
|
||||
*
|
||||
* :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
||||
@import url("basic.css");
|
||||
|
||||
/* -- page layout ----------------------------------------------------------- */
|
||||
|
||||
body {
|
||||
font-family: 'Lucida Grande', 'Lucida Sans Unicode', 'Geneva',
|
||||
'Verdana', sans-serif;
|
||||
font-size: 14px;
|
||||
letter-spacing: -0.01em;
|
||||
line-height: 150%;
|
||||
text-align: center;
|
||||
background-color: #BFD1D4;
|
||||
color: black;
|
||||
padding: 0;
|
||||
border: 1px solid #aaa;
|
||||
|
||||
margin: 0px 80px 0px 80px;
|
||||
min-width: 740px;
|
||||
}
|
||||
|
||||
div.document {
|
||||
background-color: white;
|
||||
text-align: left;
|
||||
background-image: url(contents.png);
|
||||
background-repeat: repeat-x;
|
||||
}
|
||||
|
||||
div.documentwrapper {
|
||||
float: left;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
div.bodywrapper {
|
||||
margin: 0 calc(230px + 10px) 0 0;
|
||||
border-right: 1px solid #ccc;
|
||||
}
|
||||
|
||||
div.body {
|
||||
margin: 0;
|
||||
padding: 0.5em 20px 20px 20px;
|
||||
}
|
||||
|
||||
div.related {
|
||||
font-size: 1em;
|
||||
}
|
||||
|
||||
div.related ul {
|
||||
background-image: url(navigation.png);
|
||||
height: 2em;
|
||||
border-top: 1px solid #ddd;
|
||||
border-bottom: 1px solid #ddd;
|
||||
}
|
||||
|
||||
div.related ul li {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
height: 2em;
|
||||
float: left;
|
||||
}
|
||||
|
||||
div.related ul li.right {
|
||||
float: right;
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
div.related ul li a {
|
||||
margin: 0;
|
||||
padding: 0 5px 0 5px;
|
||||
line-height: 1.75em;
|
||||
color: #EE9816;
|
||||
}
|
||||
|
||||
div.related ul li a:hover {
|
||||
color: #3CA8E7;
|
||||
}
|
||||
|
||||
div.sphinxsidebarwrapper {
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.sphinxsidebar {
|
||||
padding: 0.5em 15px 15px 0;
|
||||
width: calc(230px - 20px);
|
||||
float: right;
|
||||
font-size: 1em;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3, div.sphinxsidebar h4 {
|
||||
margin: 1em 0 0.5em 0;
|
||||
font-size: 1em;
|
||||
padding: 0.1em 0 0.1em 0.5em;
|
||||
color: white;
|
||||
border: 1px solid #86989B;
|
||||
background-color: #AFC1C4;
|
||||
}
|
||||
|
||||
div.sphinxsidebar h3 a {
|
||||
color: white;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul {
|
||||
padding-left: 1.5em;
|
||||
margin-top: 7px;
|
||||
padding: 0;
|
||||
line-height: 130%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul ul {
|
||||
margin-left: 20px;
|
||||
}
|
||||
|
||||
div.footer {
|
||||
background-color: #E3EFF1;
|
||||
color: #86989B;
|
||||
padding: 3px 8px 3px 0;
|
||||
clear: both;
|
||||
font-size: 0.8em;
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
div.footer a {
|
||||
color: #86989B;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
/* -- body styles ----------------------------------------------------------- */
|
||||
|
||||
p {
|
||||
margin: 0.8em 0 0.5em 0;
|
||||
}
|
||||
|
||||
a {
|
||||
color: #CA7900;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
color: #2491CF;
|
||||
}
|
||||
|
||||
a:visited {
|
||||
color: #551A8B;
|
||||
}
|
||||
|
||||
div.body a {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
h1 {
|
||||
margin: 0;
|
||||
padding: 0.7em 0 0.3em 0;
|
||||
font-size: 1.5em;
|
||||
color: #11557C;
|
||||
}
|
||||
|
||||
h2 {
|
||||
margin: 1.3em 0 0.2em 0;
|
||||
font-size: 1.35em;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
h3 {
|
||||
margin: 1em 0 -0.3em 0;
|
||||
font-size: 1.2em;
|
||||
}
|
||||
|
||||
div.body h1 a, div.body h2 a, div.body h3 a, div.body h4 a, div.body h5 a, div.body h6 a {
|
||||
color: black!important;
|
||||
}
|
||||
|
||||
h1 a.anchor, h2 a.anchor, h3 a.anchor, h4 a.anchor, h5 a.anchor, h6 a.anchor {
|
||||
display: none;
|
||||
margin: 0 0 0 0.3em;
|
||||
padding: 0 0.2em 0 0.2em;
|
||||
color: #aaa!important;
|
||||
}
|
||||
|
||||
h1:hover a.anchor, h2:hover a.anchor, h3:hover a.anchor, h4:hover a.anchor,
|
||||
h5:hover a.anchor, h6:hover a.anchor {
|
||||
display: inline;
|
||||
}
|
||||
|
||||
h1 a.anchor:hover, h2 a.anchor:hover, h3 a.anchor:hover, h4 a.anchor:hover,
|
||||
h5 a.anchor:hover, h6 a.anchor:hover {
|
||||
color: #777;
|
||||
background-color: #eee;
|
||||
}
|
||||
|
||||
a.headerlink {
|
||||
color: #c60f0f!important;
|
||||
font-size: 1em;
|
||||
margin-left: 6px;
|
||||
padding: 0 4px 0 4px;
|
||||
text-decoration: none!important;
|
||||
}
|
||||
|
||||
a.headerlink:hover {
|
||||
background-color: #ccc;
|
||||
color: white!important;
|
||||
}
|
||||
|
||||
cite, code, code {
|
||||
font-family: 'Consolas', 'Deja Vu Sans Mono',
|
||||
'Bitstream Vera Sans Mono', monospace;
|
||||
font-size: 0.95em;
|
||||
letter-spacing: 0.01em;
|
||||
}
|
||||
|
||||
code {
|
||||
background-color: #f2f2f2;
|
||||
border-bottom: 1px solid #ddd;
|
||||
color: #333;
|
||||
}
|
||||
|
||||
code.descname, code.descclassname, code.xref {
|
||||
border: 0;
|
||||
}
|
||||
|
||||
hr {
|
||||
border: 1px solid #abc;
|
||||
margin: 2em;
|
||||
}
|
||||
|
||||
a code {
|
||||
border: 0;
|
||||
color: #CA7900;
|
||||
}
|
||||
|
||||
a code:hover {
|
||||
color: #2491CF;
|
||||
}
|
||||
|
||||
pre {
|
||||
font-family: 'Consolas', 'Deja Vu Sans Mono',
|
||||
'Bitstream Vera Sans Mono', monospace;
|
||||
font-size: 0.95em;
|
||||
letter-spacing: 0.015em;
|
||||
line-height: 120%;
|
||||
padding: 0.5em;
|
||||
border: 1px solid #ccc;
|
||||
}
|
||||
|
||||
pre a {
|
||||
color: inherit;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
td.linenos pre {
|
||||
padding: 0.5em 0;
|
||||
}
|
||||
|
||||
div.quotebar {
|
||||
background-color: #f8f8f8;
|
||||
max-width: 250px;
|
||||
float: right;
|
||||
padding: 2px 7px;
|
||||
border: 1px solid #ccc;
|
||||
}
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.topic {
|
||||
background-color: #f8f8f8;
|
||||
}
|
||||
|
||||
table {
|
||||
border-collapse: collapse;
|
||||
margin: 0 -0.5em 0 -0.5em;
|
||||
}
|
||||
|
||||
table td, table th {
|
||||
padding: 0.2em 0.5em 0.2em 0.5em;
|
||||
}
|
||||
|
||||
div.admonition, div.warning {
|
||||
font-size: 0.9em;
|
||||
margin: 1em 0 1em 0;
|
||||
border: 1px solid #86989B;
|
||||
background-color: #f7f7f7;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.admonition p, div.warning p {
|
||||
margin: 0.5em 1em 0.5em 1em;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.admonition pre, div.warning pre {
|
||||
margin: 0.4em 1em 0.4em 1em;
|
||||
}
|
||||
|
||||
div.admonition p.admonition-title,
|
||||
div.warning p.admonition-title {
|
||||
margin: 0;
|
||||
padding: 0.1em 0 0.1em 0.5em;
|
||||
color: white;
|
||||
border-bottom: 1px solid #86989B;
|
||||
font-weight: bold;
|
||||
background-color: #AFC1C4;
|
||||
}
|
||||
|
||||
div.warning {
|
||||
border: 1px solid #940000;
|
||||
}
|
||||
|
||||
div.warning p.admonition-title {
|
||||
background-color: #CF0000;
|
||||
border-bottom-color: #940000;
|
||||
}
|
||||
|
||||
div.admonition ul, div.admonition ol,
|
||||
div.warning ul, div.warning ol {
|
||||
margin: 0.1em 0.5em 0.5em 3em;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.versioninfo {
|
||||
margin: 1em 0 0 0;
|
||||
border: 1px solid #ccc;
|
||||
background-color: #DDEAF0;
|
||||
padding: 8px;
|
||||
line-height: 1.3em;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.viewcode-back {
|
||||
font-family: 'Lucida Grande', 'Lucida Sans Unicode', 'Geneva',
|
||||
'Verdana', sans-serif;
|
||||
}
|
||||
|
||||
div.viewcode-block:target {
|
||||
background-color: #f4debf;
|
||||
border-top: 1px solid #ac9;
|
||||
border-bottom: 1px solid #ac9;
|
||||
}
|
||||
|
||||
div.code-block-caption {
|
||||
background-color: #ddd;
|
||||
color: #222;
|
||||
border: 1px solid #ccc;
|
||||
}
|
||||
|
|
@ -0,0 +1,113 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="./">
|
||||
<head>
|
||||
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>API — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<!-- Local TOC -->
|
||||
<div class="local-toc"><ul>
|
||||
<li><a class="reference internal" href="#">API</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item active">API</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
<a href="_sources/api.rst.txt" rel="nofollow"> View page source</a>
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<section id="api">
|
||||
<h1>API<a class="headerlink" href="#api" title="Link to this heading"></a></h1>
|
||||
<table class="autosummary longtable docutils align-default">
|
||||
<tbody>
|
||||
<tr class="row-odd"><td><p><a class="reference internal" href="generated/quapy.html#module-quapy" title="quapy"><code class="xref py py-obj docutils literal notranslate"><span class="pre">quapy</span></code></a></p></td>
|
||||
<td><p>QuaPy module for quantification</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,106 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="../">
|
||||
<head>
|
||||
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="../_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="../_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../genindex.html" />
|
||||
<link rel="search" title="Search" href="../search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item active">quapy</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
<a href="../_sources/generated/quapy.rst.txt" rel="nofollow"> View page source</a>
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<section id="module-quapy">
|
||||
<span id="quapy"></span><h1>quapy<a class="headerlink" href="#module-quapy" title="Link to this heading"></a></h1>
|
||||
<p>QuaPy module for quantification</p>
|
||||
</section>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,119 @@
|
|||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en" data-content_root="./">
|
||||
<head>
|
||||
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>quapy.benchmarking package — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=92fd9be5" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script src="_static/documentation_options.js?v=22607128"></script>
|
||||
<script src="_static/doctools.js?v=9a2dae69"></script>
|
||||
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
|
||||
<script src="_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="genindex.html" />
|
||||
<link rel="search" title="Search" href="search.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="index.html" class="icon icon-home">
|
||||
QuaPy: A Python-based open-source framework for quantification
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="modules.html">quapy</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="index.html">QuaPy: A Python-based open-source framework for quantification</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item active">quapy.benchmarking package</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
<a href="_sources/quapy.benchmarking.rst.txt" rel="nofollow"> View page source</a>
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<section id="quapy-benchmarking-package">
|
||||
<h1>quapy.benchmarking package<a class="headerlink" href="#quapy-benchmarking-package" title="Link to this heading"></a></h1>
|
||||
<section id="submodules">
|
||||
<h2>Submodules<a class="headerlink" href="#submodules" title="Link to this heading"></a></h2>
|
||||
</section>
|
||||
<section id="module-quapy.benchmarking.typical">
|
||||
<span id="quapy-benchmarking-typical-module"></span><h2>quapy.benchmarking.typical module<a class="headerlink" href="#module-quapy.benchmarking.typical" title="Link to this heading"></a></h2>
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="quapy.benchmarking.typical.wrap_cls_params">
|
||||
<span class="sig-prename descclassname"><span class="pre">quapy.benchmarking.typical.</span></span><span class="sig-name descname"><span class="pre">wrap_cls_params</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#quapy.benchmarking.typical.wrap_cls_params" title="Link to this definition"></a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-quapy.benchmarking">
|
||||
<span id="module-contents"></span><h2>Module contents<a class="headerlink" href="#module-quapy.benchmarking" title="Link to this heading"></a></h2>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2024, Alejandro Moreo.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
Para meter los módulos dentro de doc hay que hacer un
|
||||
|
||||
sphinx-apidoc -o docs/source/ quapy/ -P
|
||||
|
||||
Eso importa todo lo que haya en quapy/ (incluidos los ficheros _ gracias a -P) en source y crea un rst para cada uno.
|
||||
|
||||
Parece que lo del -P no funciona. Hay que meterlos a mano en quapy.method.rst
|
||||
|
||||
Luego, simplemente
|
||||
make html
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 111 KiB |
|
|
@ -24,19 +24,18 @@ The following script fetches a dataset of tweets, trains, applies, and evaluates
|
|||
|
||||
```python
|
||||
import quapy as qp
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
dataset = qp.datasets.fetch_twitter('semeval16')
|
||||
training, test = qp.datasets.fetch_UCIBinaryDataset("yeast").train_test
|
||||
|
||||
# create an "Adjusted Classify & Count" quantifier
|
||||
model = qp.method.aggregative.ACC(LogisticRegression())
|
||||
model.fit(dataset.training)
|
||||
model = qp.method.aggregative.ACC()
|
||||
Xtr, ytr = training.Xy
|
||||
model.fit(Xtr, ytr)
|
||||
|
||||
estim_prevalence = model.predict(dataset.test.instances)
|
||||
true_prevalence = dataset.test.prevalence()
|
||||
estim_prevalence = model.predict(test.X)
|
||||
true_prevalence = test.prevalence()
|
||||
|
||||
error = qp.error.mae(true_prevalence, estim_prevalence)
|
||||
|
||||
print(f'Mean Absolute Error (MAE)={error:.3f}')
|
||||
```
|
||||
|
||||
|
|
@ -60,19 +59,19 @@ API <quapy>
|
|||
|
||||
## Features
|
||||
|
||||
- Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
|
||||
- Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
|
||||
- Implementation of most commonly used evaluation metrics (e.g., AE, RAE, NAE, NRAE, SE, KLD, NKLD, etc.).
|
||||
- Datasets frequently used in quantification (textual and numeric), including:
|
||||
- 32 UCI Machine Learning binary datasets.
|
||||
- 5 UCI Machine Learning multiclass datasets (new in v0.1.8!).
|
||||
- 11 Twitter quantification-by-sentiment datasets.
|
||||
- 3 product reviews quantification-by-sentiment datasets.
|
||||
- 4 tasks from LeQua competition (new in v0.1.7!)
|
||||
- IFCB dataset of plankton water samples (new in v0.1.8!).
|
||||
- Native support for binary and single-label multiclass quantification scenarios.
|
||||
- Model selection functionality that minimizes quantification-oriented loss functions.
|
||||
- Visualization tools for analysing the experimental results.
|
||||
* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
|
||||
quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
|
||||
* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
|
||||
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, NAE, NRAE, SE, KLD, NKLD, etc.).
|
||||
* Datasets frequently used in quantification (textual and numeric), including:
|
||||
* 32 UCI Machine Learning datasets.
|
||||
* 11 Twitter quantification-by-sentiment datasets.
|
||||
* 3 product reviews quantification-by-sentiment datasets.
|
||||
* 4 tasks from LeQua 2022 competition and 4 tasks from LeQua 2024 competition
|
||||
* IFCB for Plancton quantification
|
||||
* Native support for binary and single-label multiclass quantification scenarios.
|
||||
* Model selection functionality that minimizes quantification-oriented loss functions.
|
||||
* Visualization tools for analysing the experimental results.
|
||||
|
||||
## Citing QuaPy
|
||||
|
||||
|
|
@ -98,3 +97,7 @@ In case you want to contribute improvements to quapy, please generate pull reque
|
|||
:width: 250px
|
||||
:alt: SoBigData++
|
||||
```
|
||||
|
||||
This work has been supported by the QuaDaSh project
|
||||
_"Finanziato dall’Unione europea---Next Generation EU,
|
||||
Missione 4 Componente 2 CUP B53D23026250001"_.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,41 @@
|
|||
.. QuaPy: A Python-based open-source framework for quantification documentation master file, created by
|
||||
sphinx-quickstart on Wed Feb 7 16:26:46 2024.
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
|
||||
Welcome to QuaPy's documentation!
|
||||
==========================================================================================
|
||||
|
||||
QuaPy is a Python-based open-source framework for quantification.
|
||||
|
||||
This document contains the API of the modules included in QuaPy.
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
`pip install quapy`
|
||||
|
||||
GitHub
|
||||
------------
|
||||
|
||||
QuaPy is hosted in GitHub at `https://github.com/HLT-ISTI/QuaPy <https://github.com/HLT-ISTI/QuaPy>`_
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
Contents
|
||||
--------
|
||||
|
||||
.. toctree::
|
||||
|
||||
modules
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
* :ref:`genindex`
|
||||
* :ref:`modindex`
|
||||
* :ref:`search`
|
||||
|
|
@ -340,10 +340,10 @@ and a set of test samples (for evaluation). QuaPy returns this data as a Labelle
|
|||
(training) and two generation protocols (for validation and test samples), as follows:
|
||||
|
||||
```python
|
||||
training, val_generator, test_generator = fetch_lequa2022(task=task)
|
||||
training, val_generator, test_generator = qp.datasets.fetch_lequa2022(task=task)
|
||||
```
|
||||
|
||||
See the `lequa2022_experiments.py` in the examples folder for further details on how to
|
||||
See the `5a.lequa2022_experiments.py` in the examples folder for further details on how to
|
||||
carry out experiments using these datasets.
|
||||
|
||||
The datasets are downloaded only once, and stored for fast reuse.
|
||||
|
|
@ -365,6 +365,53 @@ Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
|
|||
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
|
||||
```
|
||||
|
||||
## LeQua 2024 Datasets
|
||||
|
||||
QuaPy also provides the datasets used for the [LeQua 2024 competition](https://lequa2024.github.io/).
|
||||
In brief, there are 4 tasks:
|
||||
* T1: binary quantification (by sentiment)
|
||||
* T2: multiclass quantification (28 classes, merchandise products)
|
||||
* T3: ordinal quantification (5-stars sentiment ratings)
|
||||
* T4: binary sentiment quantification under a combination of covariate shift and prior shift
|
||||
|
||||
In all cases, the covariate space has 256 dimensions (extracted using the `ELECTRA-Small` model).
|
||||
|
||||
Every task consists of a training set, a set of validation samples (for model selection)
|
||||
and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
|
||||
(training bags) and sampling generation protocols (for validation and test bags).
|
||||
T3 also offers the possibility to obtain a series of training bags (in form of a
|
||||
sampling generation protocol) instead of one single training bag. Use it as follows:
|
||||
|
||||
```python
|
||||
training, val_generator, test_generator = qp.datasets.fetch_lequa2024(task=task)
|
||||
```
|
||||
|
||||
See the `5b.lequa2024_experiments.py` in the examples folder for further details on how to
|
||||
carry out experiments using these datasets.
|
||||
|
||||
The datasets are downloaded only once, and stored for fast reuse.
|
||||
|
||||
Some statistics are summarized below:
|
||||
|
||||
| Dataset | classes | train size | validation samples | test samples | docs by sample | type |
|
||||
|---------|:-------:|:-----------:|:------------------:|:------------:|:--------------:|:--------:|
|
||||
| T1 | 2 | 5000 | 1000 | 5000 | 250 | vector |
|
||||
| T2 | 28 | 20000 | 1000 | 5000 | 1000 | vector |
|
||||
| T3 | 5 | 100 samples | 1000 | 5000 | 200 | vector |
|
||||
| T4 | 2 | 5000 | 1000 | 5000 | 250 | vector |
|
||||
|
||||
For further details on the datasets or the competition, we refer to
|
||||
[the official site](https://lequa2024.github.io/data/) and
|
||||
[the overview paper](http://nmis.isti.cnr.it/sebastiani/Publications/LQ2024.pdf).
|
||||
|
||||
```
|
||||
Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
|
||||
An Overview of LeQua 2024, the 2nd International Data Challenge on Learning to Quantify,
|
||||
Proceedings of the 4th International Workshop on Learning to Quantify (LQ 2024),
|
||||
ECML-PKDD 2024, Vilnius, Lithuania.
|
||||
```
|
||||
|
||||
|
||||
## IFCB Plankton dataset
|
||||
|
||||
IFCB is a dataset of plankton species in water samples hosted in `Zenodo <https://zenodo.org/records/10036244>`_.
|
||||
|
|
@ -402,12 +449,20 @@ train, test_gen = qp.datasets.fetch_IFCB(for_model_selection=False, single_sampl
|
|||
# ... train and evaluation
|
||||
```
|
||||
|
||||
See also [Automatic plankton quantification using deep features
|
||||
P González, A Castaño, EE Peacock, J Díez, JJ Del Coz, HM Sosik
|
||||
Journal of Plankton Research 41 (4), 449-463](https://par.nsf.gov/servlets/purl/10172325).
|
||||
|
||||
|
||||
|
||||
## Adding Custom Datasets
|
||||
|
||||
It is straightforward to import your own datasets into QuaPy.
|
||||
I what follows, there are some code snippets for doing so; see also the example
|
||||
[3.custom_collection.py](https://github.com/HLT-ISTI/QuaPy/blob/master/examples/3.custom_collection.py).
|
||||
|
||||
QuaPy provides data loaders for simple formats dealing with
|
||||
text, following the format:
|
||||
text; for example, use `qp.data.reader.from_text` for the following the format:
|
||||
|
||||
```
|
||||
class-id \t first document's pre-processed text \n
|
||||
|
|
@ -415,13 +470,16 @@ class-id \t second document's pre-processed text \n
|
|||
...
|
||||
```
|
||||
|
||||
and sparse representations of the form:
|
||||
or `qp.data.reader.from_sparse` for sparse representations of the form:
|
||||
|
||||
```
|
||||
{-1, 0, or +1} col(int):val(float) col(int):val(float) ... \n
|
||||
...
|
||||
```
|
||||
|
||||
both functions return a tuple `X, y` containing a list of strings and the corresponding
|
||||
labels, respectively.
|
||||
|
||||
The code in charge in loading a LabelledCollection is:
|
||||
|
||||
```python
|
||||
|
|
@ -430,12 +488,13 @@ def load(cls, path:str, loader_func:callable):
|
|||
return LabelledCollection(*loader_func(path))
|
||||
```
|
||||
|
||||
indicating that any _loader_func_ (e.g., a user-defined one) which
|
||||
indicating that any `loader_func` (e.g., `from_text`, `from_sparse`, `from_csv`, or a user-defined one) which
|
||||
returns valid arguments for initializing a _LabelledCollection_ object will allow
|
||||
to load any collection. In particular, the _LabelledCollection_ receives as
|
||||
arguments the instances (as an iterable) and the labels (as an iterable) and,
|
||||
additionally, the number of classes can be specified (it would otherwise be
|
||||
inferred from the labels, but that requires at least one positive example for
|
||||
to load any collection. More specifically, the _LabelledCollection_ receives as
|
||||
arguments the _instances_ (iterable) and the _labels_ (iterable) and,
|
||||
optionally, the number of classes (it would be
|
||||
inferred from the labels if not indicated, but this requires at least one
|
||||
positive example for
|
||||
all classes to be present in the collection).
|
||||
|
||||
The same _loader_func_ can be passed to a Dataset, along with two
|
||||
|
|
@ -448,20 +507,23 @@ import quapy as qp
|
|||
train_path = '../my_data/train.dat'
|
||||
test_path = '../my_data/test.dat'
|
||||
|
||||
def my_custom_loader(path):
|
||||
def my_custom_loader(path, **custom_kwargs):
|
||||
with open(path, 'rb') as fin:
|
||||
...
|
||||
return instances, labels
|
||||
|
||||
data = qp.data.Dataset.load(train_path, test_path, my_custom_loader)
|
||||
data = qp.data.Dataset.load(train_path, test_path, my_custom_loader, **custom_kwargs)
|
||||
```
|
||||
|
||||
### Data Processing
|
||||
|
||||
QuaPy implements a number of preprocessing functions in the package _qp.data.preprocessing_, including:
|
||||
QuaPy implements a number of preprocessing functions in the package `qp.data.preprocessing`, including:
|
||||
|
||||
* _text2tfidf_: tfidf vectorization
|
||||
* _reduce_columns_: reducing the number of columns based on term frequency
|
||||
* _standardize_: transforms the column values into z-scores (i.e., subtract the mean and normalizes by the standard deviation, so
|
||||
that the column values have zero mean and unit variance).
|
||||
* _index_: transforms textual tokens into lists of numeric ids
|
||||
|
||||
These functions are applied to `Dataset` objects, and offer the possibility to apply the transformation
|
||||
inline (thus modifying the original dataset), or to return a modified copy.
|
||||
|
|
@ -46,18 +46,18 @@ e.g.:
|
|||
|
||||
```python
|
||||
qp.environ['SAMPLE_SIZE'] = 100 # once for all
|
||||
true_prev = np.asarray([0.5, 0.3, 0.2]) # let's assume 3 classes
|
||||
estim_prev = np.asarray([0.1, 0.3, 0.6])
|
||||
true_prev = [0.5, 0.3, 0.2] # let's assume 3 classes
|
||||
estim_prev = [0.1, 0.3, 0.6]
|
||||
error = qp.error.mrae(true_prev, estim_prev)
|
||||
print(f'mrae({true_prev}, {estim_prev}) = {error:.3f}')
|
||||
```
|
||||
|
||||
will print:
|
||||
```
|
||||
mrae([0.500, 0.300, 0.200], [0.100, 0.300, 0.600]) = 0.914
|
||||
mrae([0.5, 0.3, 0.2], [0.1, 0.3, 0.6]) = 0.914
|
||||
```
|
||||
|
||||
Finally, it is possible to instantiate QuaPy's quantification
|
||||
It is also possible to instantiate QuaPy's quantification
|
||||
error functions from strings using, e.g.:
|
||||
|
||||
```python
|
||||
|
|
@ -85,7 +85,7 @@ print(f'MAE = {mae:.4f}')
|
|||
```
|
||||
|
||||
It is often desirable to evaluate our system using more than one
|
||||
single evaluatio measure. In this case, it is convenient to generate
|
||||
single evaluation measure. In this case, it is convenient to generate
|
||||
a _report_. A report in QuaPy is a dataframe accounting for all the
|
||||
true prevalence values with their corresponding prevalence values
|
||||
as estimated by the quantifier, along with the error each has given
|
||||
|
|
@ -104,7 +104,7 @@ report['estim-prev'] = report['estim-prev'].map(F.strprev)
|
|||
print(report)
|
||||
|
||||
print('Averaged values:')
|
||||
print(report.mean())
|
||||
print(report.mean(numeric_only=True))
|
||||
```
|
||||
|
||||
This will produce an output like:
|
||||
|
|
@ -141,11 +141,14 @@ true_prevs, estim_prevs = qp.evaluation.prediction(quantifier, protocol=prot)
|
|||
|
||||
All the evaluation functions implement specific optimizations for speeding-up
|
||||
the evaluation of aggregative quantifiers (i.e., of instances of _AggregativeQuantifier_).
|
||||
|
||||
The optimization comes down to generating classification predictions (either crisp or soft)
|
||||
only once for the entire test set, and then applying the sampling procedure to the
|
||||
predictions, instead of generating samples of instances and then computing the
|
||||
classification predictions every time. This is only possible when the protocol
|
||||
is an instance of _OnLabelledCollectionProtocol_. The optimization is only
|
||||
is an instance of _OnLabelledCollectionProtocol_.
|
||||
|
||||
The optimization is only
|
||||
carried out when the number of classification predictions thus generated would be
|
||||
smaller than the number of predictions required for the entire protocol; e.g.,
|
||||
if the original dataset contains 1M instances, but the protocol is such that it would
|
||||
|
|
@ -156,4 +159,4 @@ precompute all the predictions irrespectively of the number of instances and num
|
|||
Finally, this can be deactivated by setting _aggr_speedup=False_. Note that this optimization
|
||||
is not only applied for the final evaluation, but also for the internal evaluations carried
|
||||
out during _model selection_. Since these are typically many, the heuristic can help reduce the
|
||||
execution time a lot.
|
||||
execution time significatively.
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
# Quantification Methods
|
||||
|
||||
Quantification methods can be categorized as belonging to
|
||||
`aggregative` and `non-aggregative` groups.
|
||||
`aggregative`, `non-aggregative`, and `meta-learning` groups.
|
||||
Most methods included in QuaPy at the moment are of type `aggregative`
|
||||
(though we plan to add many more methods in the near future), i.e.,
|
||||
are methods characterized by the fact that
|
||||
|
|
@ -12,21 +12,17 @@ Any quantifier in QuaPy shoud extend the class `BaseQuantifier`,
|
|||
and implement some abstract methods:
|
||||
```python
|
||||
@abstractmethod
|
||||
def fit(self, data: LabelledCollection): ...
|
||||
def fit(self, X, y): ...
|
||||
|
||||
@abstractmethod
|
||||
def quantify(self, instances): ...
|
||||
def predict(self, X): ...
|
||||
```
|
||||
The meaning of those functions should be familiar to those
|
||||
used to work with scikit-learn since the class structure of QuaPy
|
||||
is directly inspired by scikit-learn's _Estimators_. Functions
|
||||
`fit` and `quantify` are used to train the model and to provide
|
||||
class estimations (the reason why
|
||||
scikit-learn' structure has not been adopted _as is_ in QuaPy responds to
|
||||
the fact that scikit-learn's `predict` function is expected to return
|
||||
one output for each input element --e.g., a predicted label for each
|
||||
instance in a sample-- while in quantification the output for a sample
|
||||
is one single array of class prevalences).
|
||||
`fit` and `predict` (for which there is an alias `quantify`)
|
||||
are used to train the model and to provide
|
||||
class estimations.
|
||||
Quantifiers also extend from scikit-learn's `BaseEstimator`, in order
|
||||
to simplify the use of `set_params` and `get_params` used in
|
||||
[model selection](./model-selection).
|
||||
|
|
@ -40,21 +36,26 @@ The methods that any `aggregative` quantifier must implement are:
|
|||
|
||||
```python
|
||||
@abstractmethod
|
||||
def aggregation_fit(self, classif_predictions: LabelledCollection, data: LabelledCollection):
|
||||
def aggregation_fit(self, classif_predictions, labels):
|
||||
|
||||
@abstractmethod
|
||||
def aggregate(self, classif_predictions:np.ndarray): ...
|
||||
def aggregate(self, classif_predictions): ...
|
||||
```
|
||||
|
||||
These two functions replace the `fit` and `quantify` methods, since those
|
||||
come with default implementations. The `fit` function is provided and amounts to:
|
||||
The argument `classif_predictions` is whatever the method `classify` returns.
|
||||
QuaPy comes with default implementations that cover most common cases, but you can
|
||||
override `classify` in case your method requires further or different information to work.
|
||||
|
||||
These two functions replace the `fit` and `predict` methods, which
|
||||
come with default implementations. For instance, the `fit` function is
|
||||
provided and amounts to:
|
||||
|
||||
```python
|
||||
def fit(self, data: LabelledCollection, fit_classifier=True, val_split=None):
|
||||
self._check_init_parameters()
|
||||
classif_predictions = self.classifier_fit_predict(data, fit_classifier, predict_on=val_split)
|
||||
self.aggregation_fit(classif_predictions, data)
|
||||
return self
|
||||
def fit(self, X, y):
|
||||
self._check_init_parameters()
|
||||
classif_predictions, labels = self.classifier_fit_predict(X, y)
|
||||
self.aggregation_fit(classif_predictions, labels)
|
||||
return self
|
||||
```
|
||||
|
||||
Note that this function fits the classifier, and generates the predictions. This is assumed
|
||||
|
|
@ -72,11 +73,11 @@ overriden (if needed) and allows the method to quickly raise any exception based
|
|||
found in the `__init__` arguments, thus avoiding to break after training the classifier and generating
|
||||
predictions.
|
||||
|
||||
Similarly, the function `quantify` is provided, and amounts to:
|
||||
Similarly, the function `predict` (alias `quantify`) is provided, and amounts to:
|
||||
|
||||
```python
|
||||
def quantify(self, instances):
|
||||
classif_predictions = self.classify(instances)
|
||||
def predict(self, X):
|
||||
classif_predictions = self.classify(X)
|
||||
return self.aggregate(classif_predictions)
|
||||
```
|
||||
|
||||
|
|
@ -84,12 +85,14 @@ in which only the function `aggregate` is required to be overriden in most cases
|
|||
|
||||
Aggregative quantifiers are expected to maintain a classifier (which is
|
||||
accessed through the `@property` `classifier`). This classifier is
|
||||
given as input to the quantifier, and can be already fit
|
||||
on external data (in which case, the `fit_learner` argument should
|
||||
be set to False), or be fit by the quantifier's fit (default).
|
||||
given as input to the quantifier, and will be trained by the quantifier's fit (default).
|
||||
Alternatively, the classifier can be already fit on external data; in this case, the `fit_learner`
|
||||
argument in the `__init__` should be set to False (see [4.using_pretrained_classifier.py](https://github.com/HLT-ISTI/QuaPy/blob/master/examples/4.using_pretrained_classifier.py)
|
||||
for a full code example).
|
||||
|
||||
The above patterns (in training: fit the classifier, then fit the aggregation;
|
||||
in test: classify, then aggregate) allows QuaPy to optimize many internal procedures.
|
||||
The above patterns (in training: (i) fit the classifier, then (ii) fit the aggregation;
|
||||
in test: (i) classify, then (ii) aggregate) allows QuaPy to optimize many internal procedures,
|
||||
on the grounds that steps (i) are slower than steps (ii).
|
||||
In particular, the model selection routing takes advantage of this two-step process
|
||||
and generates classifiers only for the valid combinations of hyperparameters of the
|
||||
classifier, and then _clones_ these classifiers and explores the combinations
|
||||
|
|
@ -124,6 +127,7 @@ import quapy.functional as F
|
|||
from sklearn.svm import LinearSVC
|
||||
|
||||
training, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
|
||||
Xtr, ytr = training.Xy
|
||||
|
||||
# instantiate a classifier learner, in this case a SVM
|
||||
svm = LinearSVC()
|
||||
|
|
@ -131,7 +135,7 @@ svm = LinearSVC()
|
|||
# instantiate a Classify & Count with the SVM
|
||||
# (an alias is available in qp.method.aggregative.ClassifyAndCount)
|
||||
model = qp.method.aggregative.CC(svm)
|
||||
model.fit(training)
|
||||
model.fit(Xtr, ytr)
|
||||
estim_prevalence = model.predict(test.instances)
|
||||
```
|
||||
|
||||
|
|
@ -153,26 +157,14 @@ predictions. This parameters can also be set with an integer,
|
|||
indicating that the parameters should be estimated by means of
|
||||
_k_-fold cross-validation, for which the integer indicates the
|
||||
number _k_ of folds (the default value is 5). Finally, `val_split` can be set to a
|
||||
specific held-out validation set (i.e., an instance of `LabelledCollection`).
|
||||
|
||||
The specification of `val_split` can be
|
||||
postponed to the invokation of the fit method (if `val_split` was also
|
||||
set in the constructor, the one specified at fit time would prevail),
|
||||
e.g.:
|
||||
|
||||
```python
|
||||
model = qp.method.aggregative.ACC(svm)
|
||||
# perform 5-fold cross validation for estimating ACC's parameters
|
||||
# (overrides the default val_split=0.4 in the constructor)
|
||||
model.fit(training, val_split=5)
|
||||
```
|
||||
specific held-out validation set (i.e., an tuple `(X,y)`).
|
||||
|
||||
The following code illustrates the case in which PCC is used:
|
||||
|
||||
```python
|
||||
model = qp.method.aggregative.PCC(svm)
|
||||
model.fit(training)
|
||||
estim_prevalence = model.predict(test.instances)
|
||||
model.fit(Xtr, ytr)
|
||||
estim_prevalence = model.predict(Xte)
|
||||
print('classifier:', model.classifier)
|
||||
```
|
||||
In this case, QuaPy will print:
|
||||
|
|
@ -185,11 +177,11 @@ is not a probabilistic classifier (i.e., it does not implement the
|
|||
`predict_proba` method) and so, the classifier will be converted to
|
||||
a probabilistic one through [calibration](https://scikit-learn.org/stable/modules/calibration.html).
|
||||
As a result, the classifier that is printed in the second line points
|
||||
to a `CalibratedClassifier` instance. Note that calibration can only
|
||||
be applied to hard classifiers when `fit_learner=True`; an exception
|
||||
to a `CalibratedClassifierCV` instance. Note that calibration can only
|
||||
be applied to hard classifiers if `fit_learner=True`; an exception
|
||||
will be raised otherwise.
|
||||
|
||||
Lastly, everything we said aboud ACC and PCC
|
||||
Lastly, everything we said about ACC and PCC
|
||||
applies to PACC as well.
|
||||
|
||||
_New in v0.1.9_: quantifiers ACC and PACC now have three additional arguments: `method`, `solver` and `norm`:
|
||||
|
|
@ -221,7 +213,7 @@ Options are:
|
|||
* `"condsoftmax"` applies softmax normalization only if the prevalence vector lies outside of the probability simplex.
|
||||
|
||||
|
||||
#### BayesianCC (_New in v0.1.9_!)
|
||||
#### BayesianCC
|
||||
|
||||
The `BayesianCC` is a variant of ACC introduced in
|
||||
[Ziegler, A. and Czyż, P. "Bayesian quantification with black-box estimators", arXiv (2023)](https://arxiv.org/abs/2302.09159),
|
||||
|
|
@ -259,29 +251,35 @@ An example of use can be found below:
|
|||
import quapy as qp
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
|
||||
train, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
|
||||
|
||||
model = qp.method.aggregative.EMQ(LogisticRegression())
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.predict(dataset.test.instances)
|
||||
model.fit(*train.Xy)
|
||||
estim_prevalence = model.predict(test.X)
|
||||
```
|
||||
|
||||
_New in v0.1.7_: EMQ now accepts two new parameters in the construction method, namely
|
||||
`exact_train_prev` which allows to use the true training prevalence as the departing
|
||||
prevalence estimation (default behaviour), or instead an approximation of it as
|
||||
EMQ accepts additional parameters in the construction method:
|
||||
* `exact_train_prev`: set to True for using the true training prevalence as the departing
|
||||
prevalence estimation (default behaviour), or to False for using an approximation of it as
|
||||
suggested by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html)
|
||||
(by setting `exact_train_prev=False`).
|
||||
The other parameter is `recalib` which allows to indicate a calibration method, among those
|
||||
* `calib`: allows to indicate a calibration method, among those
|
||||
proposed by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html),
|
||||
including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
|
||||
See the API documentation for further details.
|
||||
including the Bias-Corrected Temperature Scaling
|
||||
(`bcts`), Vector Scaling (`bcts`), No-Bias Temperature Scaling (`nbvs`),
|
||||
or Temperature Scaling (`ts`); default is `None` (no calibration).
|
||||
* `on_calib_error`: indicates the policy to follow in case the calibrator fails at runtime.
|
||||
Options include `raise` (default), in which case a RuntimeException is raised; and `backup`, in which
|
||||
case the calibrator is silently skipped.
|
||||
|
||||
You can use the class method `EMQ_BCTS` to effortlessly instantiate EMQ with the best performing
|
||||
heuristics found by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html). See the API documentation for further details.
|
||||
|
||||
|
||||
### Hellinger Distance y (HDy)
|
||||
|
||||
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||
[González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
|
||||
[González-Castro, V., Alaiz-Rodríguez, R., and Alegre, E. (2013). Class distribution
|
||||
estimation based on the Hellinger distance. Information Sciences, 218:146-164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
|
||||
|
||||
It is implemented in `qp.method.aggregative.HDy` (also accessible
|
||||
through the allias `qp.method.aggregative.HellingerDistanceY`).
|
||||
|
|
@ -289,11 +287,10 @@ This method works with a probabilistic classifier (hard classifiers
|
|||
can be used as well and will be calibrated) and requires a validation
|
||||
set to estimate parameter for the mixture model. Just like
|
||||
ACC and PACC, this quantifier receives a `val_split` argument
|
||||
in the constructor (or in the fit method, in which case the previous
|
||||
value is overridden) that can either be a float indicating the proportion
|
||||
in the constructor that can either be a float indicating the proportion
|
||||
of training data to be taken as the validation set (in a random
|
||||
stratified split), or a validation set (i.e., an instance of
|
||||
`LabelledCollection`) itself.
|
||||
stratified split), or the validation set itself (i.e., an tuple
|
||||
`(X,y)`).
|
||||
|
||||
HDy was proposed as a binary classifier and the implementation
|
||||
provided in QuaPy accepts only binary datasets.
|
||||
|
|
@ -309,11 +306,11 @@ dataset = qp.datasets.fetch_reviews('hp', pickle=True)
|
|||
qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
|
||||
|
||||
model = qp.method.aggregative.HDy(LogisticRegression())
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.predict(dataset.test.instances)
|
||||
model.fit(*dataset.training.Xy)
|
||||
estim_prevalence = model.predict(dataset.test.X)
|
||||
```
|
||||
|
||||
_New in v0.1.7:_ QuaPy now provides an implementation of the generalized
|
||||
QuaPy also provides an implementation of the generalized
|
||||
"Distribution Matching" approaches for multiclass, inspired by the framework
|
||||
of [Firat (2016)](https://arxiv.org/abs/1606.00868). One can instantiate
|
||||
a variant of HDy for multiclass quantification as follows:
|
||||
|
|
@ -322,17 +319,22 @@ a variant of HDy for multiclass quantification as follows:
|
|||
mutliclassHDy = qp.method.aggregative.DMy(classifier=LogisticRegression(), divergence='HD', cdf=False)
|
||||
```
|
||||
|
||||
_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
|
||||
QuaPy also provides an implementation of the "DyS"
|
||||
framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
|
||||
and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
|
||||
(thanks to _Pablo González_ for the contributions!)
|
||||
|
||||
### Threshold Optimization methods
|
||||
|
||||
_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
|
||||
QuaPy implements Forman's threshold optimization methods;
|
||||
see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423)
|
||||
and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
|
||||
These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
|
||||
These include: `T50`, `MAX`, `X`, Median Sweep (`MS`), and its variant `MS2`.
|
||||
|
||||
These methods are binary-only and implement different heuristics for
|
||||
improving the stability of the denominator of the ACC adjustment (`tpr-fpr`).
|
||||
The methods are called "threshold" since said heuristics have to do
|
||||
with different choices of the underlying classifier's threshold.
|
||||
|
||||
### Explicit Loss Minimization
|
||||
|
||||
|
|
@ -415,16 +417,18 @@ model.fit(dataset.training)
|
|||
estim_prevalence = model.predict(dataset.test.instances)
|
||||
```
|
||||
|
||||
Check the examples on [explicit_loss_minimization](https://github.com/HLT-ISTI/QuaPy/blob/devel/examples/5.explicit_loss_minimization.py)
|
||||
Check the examples on [explicit loss minimization](https://github.com/HLT-ISTI/QuaPy/blob/devel/examples/17.explicit_loss_minimization.py)
|
||||
and on [one versus all quantification](https://github.com/HLT-ISTI/QuaPy/blob/devel/examples/10.one_vs_all.py) for more details.
|
||||
**Note** that the _one versus all_ approach is considered inappropriate under prior probability shift, though.
|
||||
|
||||
### Kernel Density Estimation methods (KDEy)
|
||||
|
||||
_New in v0.1.8_: QuaPy now provides implementations for the three variants
|
||||
QuaPy provides implementations for the three variants
|
||||
of KDE-based methods proposed in
|
||||
_[Moreo, A., González, P. and del Coz, J.J., 2023.
|
||||
_[Moreo, A., González, P. and del Coz, J.J..
|
||||
Kernel Density Estimation for Multiclass Quantification.
|
||||
arXiv preprint arXiv:2401.00490.](https://arxiv.org/abs/2401.00490)_.
|
||||
Machine Learning. Vol 114 (92), 2025](https://link.springer.com/article/10.1007/s10994-024-06726-5)_
|
||||
(a [preprint](https://arxiv.org/abs/2401.00490) is available online).
|
||||
The variants differ in the divergence metric to be minimized:
|
||||
|
||||
- KDEy-HD: minimizes the (squared) Hellinger Distance and solves the problem via a Monte Carlo approach
|
||||
|
|
@ -435,30 +439,42 @@ These methods are specifically devised for multiclass problems (although they ca
|
|||
binary problems too).
|
||||
|
||||
All KDE-based methods depend on the hyperparameter `bandwidth` of the kernel. Typical values
|
||||
that can be explored in model selection range in [0.01, 0.25]. The methods' performance
|
||||
vary smoothing with smooth variations of this hyperparameter.
|
||||
that can be explored in model selection range in [0.01, 0.25]. Previous experiments reveal the methods' performance
|
||||
varies smoothly at small variations of this hyperparameter.
|
||||
|
||||
|
||||
## Composable Methods
|
||||
|
||||
The [](quapy.method.composable) module allows the composition of quantification methods from loss functions and feature transformations. Any composed method solves a linear system of equations by minimizing the loss after transforming the data. Methods of this kind include ACC, PACC, HDx, HDy, and many other well-known methods, as well as an unlimited number of re-combinations of their building blocks.
|
||||
The `quapy.method.composable` module integrates [qunfold](https://github.com/mirkobunse/qunfold) allows the composition
|
||||
of quantification methods from loss functions and feature transformations (thanks to Mirko Bunse for the integration!).
|
||||
|
||||
Any composed method solves a linear system of equations by minimizing the loss after transforming the data. Methods of this kind include ACC, PACC, HDx, HDy, and many other well-known methods, as well as an unlimited number of re-combinations of their building blocks.
|
||||
|
||||
### Installation
|
||||
|
||||
```sh
|
||||
pip install --upgrade pip setuptools wheel
|
||||
pip install "jax[cpu]"
|
||||
pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
|
||||
pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.5"
|
||||
```
|
||||
|
||||
**Note:** since version 0.2.0, QuaPy is only compatible with qunfold >=0.1.5.
|
||||
|
||||
### Basics
|
||||
|
||||
The composition of a method is implemented through the [](quapy.method.composable.ComposableQuantifier) class. Its documentation also features an example to get you started in composing your own methods.
|
||||
|
||||
```python
|
||||
from quapy.method.composable import (
|
||||
ComposableQuantifier,
|
||||
TikhonovRegularized,
|
||||
LeastSquaresLoss,
|
||||
ClassRepresentation,
|
||||
)
|
||||
|
||||
ComposableQuantifier( # ordinal ACC, as proposed by Bunse et al., 2022
|
||||
TikhonovRegularized(LeastSquaresLoss(), 0.01),
|
||||
ClassTransformer(RandomForestClassifier(oob_score=True))
|
||||
TikhonovRegularized(LeastSquaresLoss(), 0.01),
|
||||
ClassRepresentation(RandomForestClassifier(oob_score=True))
|
||||
)
|
||||
```
|
||||
|
||||
|
|
@ -485,16 +501,16 @@ You can use the [](quapy.method.composable.CombinedLoss) to create arbitrary, we
|
|||
|
||||
### Feature transformations
|
||||
|
||||
- [](quapy.method.composable.ClassTransformer)
|
||||
- [](quapy.method.composable.DistanceTransformer)
|
||||
- [](quapy.method.composable.HistogramTransformer)
|
||||
- [](quapy.method.composable.EnergyKernelTransformer)
|
||||
- [](quapy.method.composable.GaussianKernelTransformer)
|
||||
- [](quapy.method.composable.LaplacianKernelTransformer)
|
||||
- [](quapy.method.composable.GaussianRFFKernelTransformer)
|
||||
- [](quapy.method.composable.ClassRepresentation)
|
||||
- [](quapy.method.composable.DistanceRepresentation)
|
||||
- [](quapy.method.composable.HistogramRepresentation)
|
||||
- [](quapy.method.composable.EnergyKernelRepresentation)
|
||||
- [](quapy.method.composable.GaussianKernelRepresentation)
|
||||
- [](quapy.method.composable.LaplacianKernelRepresentation)
|
||||
- [](quapy.method.composable.GaussianRFFKernelRepresentation)
|
||||
|
||||
```{hint}
|
||||
The [](quapy.method.composable.ClassTransformer) requires the classifier to have a property `oob_score==True` and to produce a property `oob_decision_function` during fitting. In [scikit-learn](https://scikit-learn.org/), this requirement is fulfilled by any bagging classifier, such as random forests. Any other classifier needs to be cross-validated through the [](quapy.method.composable.CVClassifier).
|
||||
The [](quapy.method.composable.ClassRepresentation) requires the classifier to have a property `oob_score==True` and to produce a property `oob_decision_function` during fitting. In [scikit-learn](https://scikit-learn.org/), this requirement is fulfilled by any bagging classifier, such as random forests. Any other classifier needs to be cross-validated through the [](quapy.method.composable.CVClassifier).
|
||||
```
|
||||
|
||||
|
||||
|
|
@ -529,10 +545,11 @@ from quapy.method.meta import Ensemble
|
|||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
dataset = qp.datasets.fetch_UCIBinaryDataset('haberman')
|
||||
train, test = dataset.train_test
|
||||
|
||||
model = Ensemble(quantifier=ACC(LogisticRegression()), size=30, policy='ave', n_jobs=-1)
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.predict(dataset.test.instances)
|
||||
model.fit(*train.Xy)
|
||||
estim_prevalence = model.predict(test.X)
|
||||
```
|
||||
|
||||
Other aggregation policies implemented in QuaPy include:
|
||||
|
|
@ -579,7 +596,29 @@ learner = NeuralClassifierTrainer(cnn, device='cuda')
|
|||
|
||||
# train QuaNet
|
||||
model = QuaNet(learner, device='cuda')
|
||||
model.fit(dataset.training)
|
||||
estim_prevalence = model.predict(dataset.test.instances)
|
||||
model.fit(*dataset.training.Xy)
|
||||
estim_prevalence = model.predict(dataset.test.X)
|
||||
```
|
||||
|
||||
## Confidence Regions for Class Prevalence Estimation
|
||||
|
||||
_(New in v0.2.0!)_ Some quantification methods go beyond providing a single point estimate of class prevalence values and also produce confidence regions, which characterize the uncertainty around the point estimate. In QuaPy, two such methods are currently implemented:
|
||||
|
||||
* Aggregative Bootstrap: The Aggregative Bootstrap method extends any aggregative quantifier by generating confidence regions for class prevalence estimates through bootstrapping. Key features of this method include:
|
||||
|
||||
* Optimized Computation: The bootstrap is applied to pre-classified instances, significantly speeding up training and inference.
|
||||
During training, bootstrap repetitions are performed only after training the classifier once. These repetitions are used to train multiple aggregation functions.
|
||||
During inference, bootstrap is applied over pre-classified test instances.
|
||||
* General Applicability: Aggregative Bootstrap can be applied to any aggregative quantifier.
|
||||
For further information, check the [example](https://github.com/HLT-ISTI/QuaPy/tree/master/examples/16.confidence_regions.py) provided.
|
||||
|
||||
* BayesianCC: is a Bayesian variant of the Adjusted Classify & Count (ACC) quantifier; see more details in the [example](https://github.com/HLT-ISTI/QuaPy/tree/master/examples/14.bayesian_quantification.py) provided.
|
||||
|
||||
Confidence regions are constructed around a point estimate, which is typically computed as the mean value of a set of samples.
|
||||
The confidence region can be instantiated in three ways:
|
||||
* Confidence intervals: are standard confidence intervals generated for each class independently (_method="intervals"_).
|
||||
* Confidence ellipse in the simplex: an ellipse constructed around the mean point; the ellipse lies on the simplex and takes
|
||||
into account possible inter-class dependencies in the data (_method="ellipse"_).
|
||||
* Confidence ellipse in the Centered-Log Ratio (CLR) space: the underlying assumption of the ellipse is that the components are
|
||||
normally distributed. However, we know elements from the simplex have an inner structure. A better approach is to first
|
||||
transform the components into an unconstrained space (the CLR), and then construct the ellipse in such space (_method="ellipse-clr"_).
|
||||
|
|
@ -87,7 +87,7 @@ model = qp.model_selection.GridSearchQ(
|
|||
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
|
||||
refit=True, # retrain on the whole labelled set once done
|
||||
verbose=True # show information as the process goes on
|
||||
).fit(training)
|
||||
).fit(*training.Xy)
|
||||
|
||||
print(f'model selection ended: best hyper-parameters={model.best_params_}')
|
||||
model = model.best_model_
|
||||
|
|
@ -133,7 +133,7 @@ learner = GridSearchCV(
|
|||
LogisticRegression(),
|
||||
param_grid={'C': np.logspace(-4, 5, 10), 'class_weight': ['balanced', None]},
|
||||
cv=5)
|
||||
model = DistributionMatching(learner).fit(dataset.train)
|
||||
model = DistributionMatching(learner).fit(*dataset.train.Xy)
|
||||
```
|
||||
|
||||
However, this is conceptually flawed, since the model should be
|
||||
|
|
|
|||
|
|
@ -2,6 +2,9 @@
|
|||
|
||||
The module _qp.plot_ implements some basic plotting functions
|
||||
that can help analyse the performance of a quantification method.
|
||||
See the provided
|
||||
[code example](https://github.com/HLT-ISTI/QuaPy/blob/master/examples/13.plotting.py)
|
||||
for a full example.
|
||||
|
||||
All plotting functions receive as inputs the outcomes of
|
||||
some experiments and include, for each experiment,
|
||||
|
|
@ -77,7 +80,7 @@ def gen_data():
|
|||
method_names, true_prevs, estim_prevs, tr_prevs = [], [], [], []
|
||||
|
||||
for method_name, model in models():
|
||||
model.fit(train)
|
||||
model.fit(*train.Xy)
|
||||
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
|
||||
|
||||
method_names.append(method_name)
|
||||
|
|
@ -171,7 +174,7 @@ def gen_data():
|
|||
training_size = 5000
|
||||
# since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained
|
||||
train_sample = train.sampling(training_size, 1-training_prevalence)
|
||||
model.fit(train_sample)
|
||||
model.fit(*train_sample.Xy)
|
||||
true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
|
||||
method_name = 'CC$_{'+f'{int(100*training_prevalence)}' + '\%}$'
|
||||
method_data.append((method_name, true_prev, estim_prev, train_sample.prevalence()))
|
||||
|
|
|
|||
|
|
@ -1,7 +1,5 @@
|
|||
# Protocols
|
||||
|
||||
_New in v0.1.7!_
|
||||
|
||||
Quantification methods are expected to behave robustly in the presence of
|
||||
shift. For this reason, quantification methods need to be confronted with
|
||||
samples exhibiting widely varying amounts of shift.
|
||||
|
|
@ -106,15 +104,16 @@ train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
|
|||
|
||||
# model selection
|
||||
train, val = train.split_stratified(train_prop=0.75)
|
||||
Xtr, ytr = train.Xy
|
||||
quantifier = qp.model_selection.GridSearchQ(
|
||||
quantifier,
|
||||
param_grid={'classifier__C': np.logspace(-2, 2, 5)},
|
||||
protocol=APP(val) # <- this is the protocol we use for generating validation samples
|
||||
).fit(train)
|
||||
).fit(Xtr, ytr)
|
||||
|
||||
# default values are n_prevalences=21, repeats=10, random_state=0; this is equialent to:
|
||||
# val_app = APP(val, n_prevalences=21, repeats=10, random_state=0)
|
||||
# quantifier = GridSearchQ(quantifier, param_grid, protocol=val_app).fit(train)
|
||||
# quantifier = GridSearchQ(quantifier, param_grid, protocol=val_app).fit(Xtr, ytr)
|
||||
|
||||
# evaluation with APP
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=APP(test), error_metric='mae')
|
||||
|
|
|
|||
|
|
@ -0,0 +1,7 @@
|
|||
quapy
|
||||
=====
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
quapy
|
||||
|
|
@ -9,6 +9,11 @@ import numpy as np
|
|||
"""
|
||||
In this example, we will create a quantifier for tweet sentiment analysis considering three classes: negative, neutral,
|
||||
and positive. We will use a one-vs-all approach using a binary quantifier for demonstration purposes.
|
||||
|
||||
Caveat: the one-vs-all approach is deemed inadequate under prior probability shift conditions. The reasons
|
||||
are discussed in:
|
||||
Donyavi, Z., Serapio, A., & Batista, G. (2023). MC-SQ: A highly accurate ensemble for multi-class quantifi-
|
||||
cation. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), SIAM, pp. 622–630
|
||||
"""
|
||||
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
|
|
@ -40,11 +45,11 @@ param_grid = {
|
|||
}
|
||||
print('starting model selection')
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=UPP(val), verbose=True, refit=False)
|
||||
quantifier = model_selection.fit(train_modsel).best_model()
|
||||
quantifier = model_selection.fit(*train_modsel.Xy).best_model()
|
||||
|
||||
print('training on the whole training set')
|
||||
train, test = qp.datasets.fetch_twitter('hcr', for_model_selection=False, pickle=True).train_test
|
||||
quantifier.fit(train)
|
||||
quantifier.fit(*train.Xy)
|
||||
|
||||
# evaluation
|
||||
mae = qp.evaluation.evaluate(quantifier, protocol=UPP(test), error_metric='mae')
|
||||
|
|
|
|||
|
|
@ -23,8 +23,9 @@ qp.environ['SAMPLE_SIZE']=100
|
|||
|
||||
df = pd.DataFrame(columns=['method', 'dataset', 'MAE', 'MRAE', 'tr-time', 'te-time'])
|
||||
|
||||
datasets = qp.datasets.UCI_BINARY_DATASETS
|
||||
|
||||
for dataset_name in tqdm(qp.datasets.UCI_BINARY_DATASETS, total=len(qp.datasets.UCI_BINARY_DATASETS)):
|
||||
for dataset_name in tqdm(datasets, total=len(datasets), desc='datasets processed'):
|
||||
if dataset_name in ['acute.a', 'acute.b', 'balance.2', 'iris.1']:
|
||||
# these datasets tend to produce either too good or too bad results...
|
||||
continue
|
||||
|
|
@ -32,23 +33,25 @@ for dataset_name in tqdm(qp.datasets.UCI_BINARY_DATASETS, total=len(qp.datasets.
|
|||
collection = qp.datasets.fetch_UCIBinaryLabelledCollection(dataset_name, verbose=False)
|
||||
train, test = collection.split_stratified()
|
||||
|
||||
Xtr, ytr = train.Xy
|
||||
|
||||
# HDy............................................
|
||||
tinit = time()
|
||||
hdy = HDy(LogisticRegression()).fit(train)
|
||||
hdy = HDy(LogisticRegression()).fit(Xtr, ytr)
|
||||
t_hdy_train = time()-tinit
|
||||
|
||||
tinit = time()
|
||||
hdy_report = qp.evaluation.evaluation_report(hdy, APP(test), error_metrics=['mae', 'mrae']).mean()
|
||||
hdy_report = qp.evaluation.evaluation_report(hdy, APP(test), error_metrics=['mae', 'mrae']).mean(numeric_only=True)
|
||||
t_hdy_test = time() - tinit
|
||||
df.loc[len(df)] = ['HDy', dataset_name, hdy_report['mae'], hdy_report['mrae'], t_hdy_train, t_hdy_test]
|
||||
|
||||
# HDx............................................
|
||||
tinit = time()
|
||||
hdx = DMx.HDx(n_jobs=-1).fit(train)
|
||||
hdx = DMx.HDx(n_jobs=-1).fit(Xtr, ytr)
|
||||
t_hdx_train = time() - tinit
|
||||
|
||||
tinit = time()
|
||||
hdx_report = qp.evaluation.evaluation_report(hdx, APP(test), error_metrics=['mae', 'mrae']).mean()
|
||||
hdx_report = qp.evaluation.evaluation_report(hdx, APP(test), error_metrics=['mae', 'mrae']).mean(numeric_only=True)
|
||||
t_hdx_test = time() - tinit
|
||||
df.loc[len(df)] = ['HDx', dataset_name, hdx_report['mae'], hdx_report['mrae'], t_hdx_train, t_hdx_test]
|
||||
|
||||
|
|
|
|||
|
|
@ -3,14 +3,13 @@ from sklearn.linear_model import LogisticRegression
|
|||
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import PACC
|
||||
from quapy.data import LabelledCollection
|
||||
from quapy.protocol import AbstractStochasticSeededProtocol
|
||||
import quapy.functional as F
|
||||
|
||||
"""
|
||||
In this example, we create a custom protocol.
|
||||
The protocol generates samples of a Gaussian mixture model with random mixture parameter (the sample prevalence).
|
||||
Datapoints are univariate and we consider 2 classes only.
|
||||
The protocol generates synthetic samples of a Gaussian mixture model with random mixture parameter
|
||||
(the sample prevalence). Datapoints are univariate and we consider 2 classes only for simplicity.
|
||||
"""
|
||||
class GaussianMixProtocol(AbstractStochasticSeededProtocol):
|
||||
# We need to extend AbstractStochasticSeededProtocol if we want the samples to be replicable
|
||||
|
|
@ -81,10 +80,9 @@ with qp.util.temp_seed(0):
|
|||
Xpos = np.random.normal(loc=mu_2, scale=std_2, size=100)
|
||||
X = np.concatenate([Xneg, Xpos]).reshape(-1,1)
|
||||
y = [0]*100 + [1]*100
|
||||
training = LabelledCollection(X, y)
|
||||
|
||||
pacc = PACC(LogisticRegression())
|
||||
pacc.fit(training)
|
||||
pacc.fit(X, y)
|
||||
|
||||
|
||||
mae = qp.evaluation.evaluate(pacc, protocol=gm, error_metric='mae', verbose=True)
|
||||
|
|
|
|||
|
|
@ -122,7 +122,7 @@ def get_random_forest() -> RandomForestClassifier:
|
|||
def _get_estimate(estimator_class, training: LabelledCollection, test: np.ndarray) -> None:
|
||||
"""Auxiliary method for running ACC and PACC."""
|
||||
estimator = estimator_class(get_random_forest())
|
||||
estimator.fit(training)
|
||||
estimator.fit(*training.Xy)
|
||||
return estimator.predict(test)
|
||||
|
||||
|
||||
|
|
@ -130,7 +130,7 @@ def train_and_plot_bayesian_quantification(ax: plt.Axes, training: LabelledColle
|
|||
"""Fits Bayesian quantification and plots posterior mean as well as individual samples"""
|
||||
print('training model Bayesian CC...', end='')
|
||||
quantifier = BayesianCC(classifier=get_random_forest())
|
||||
quantifier.fit(training)
|
||||
quantifier.fit(*training.Xy)
|
||||
|
||||
# Obtain mean prediction
|
||||
mean_prediction = quantifier.predict(test.X)
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
"""
|
||||
This example illustrates the composition of quantification methods from
|
||||
arbitrary loss functions and feature transformations. It will extend the basic
|
||||
arbitrary loss functions and feature representations. It will extend the basic
|
||||
example on the usage of quapy with this composition.
|
||||
|
||||
This example requires the installation of qunfold, the back-end of QuaPy's
|
||||
|
|
@ -8,7 +8,7 @@ composition module:
|
|||
|
||||
pip install --upgrade pip setuptools wheel
|
||||
pip install "jax[cpu]"
|
||||
pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
|
||||
pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.5"
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
|
|
@ -22,22 +22,23 @@ data = qp.data.preprocessing.text2tfidf(
|
|||
min_df = 5,
|
||||
)
|
||||
training, testing = data.train_test
|
||||
Xtr, ytr = training.Xy
|
||||
|
||||
# We start by recovering PACC from its building blocks, a LeastSquaresLoss and
|
||||
# a probabilistic ClassTransformer. A 5-fold cross-validation is implemented
|
||||
# a probabilistic ClassRepresentation. A 5-fold cross-validation is implemented
|
||||
# through a CVClassifier.
|
||||
|
||||
from quapy.method.composable import (
|
||||
ComposableQuantifier,
|
||||
LeastSquaresLoss,
|
||||
ClassTransformer,
|
||||
ClassRepresentation,
|
||||
CVClassifier,
|
||||
)
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
pacc = ComposableQuantifier(
|
||||
LeastSquaresLoss(),
|
||||
ClassTransformer(
|
||||
ClassRepresentation(
|
||||
CVClassifier(LogisticRegression(random_state=0), 5),
|
||||
is_probabilistic = True
|
||||
),
|
||||
|
|
@ -46,7 +47,7 @@ pacc = ComposableQuantifier(
|
|||
# Let's evaluate this quantifier.
|
||||
|
||||
print(f"Evaluating PACC: {pacc}")
|
||||
pacc.fit(training)
|
||||
pacc.fit(Xtr, ytr)
|
||||
app = qp.protocol.APP(testing, sample_size=100, n_prevalences=21, repeats=1)
|
||||
absolute_errors = qp.evaluation.evaluate(
|
||||
model = pacc,
|
||||
|
|
@ -63,14 +64,14 @@ from quapy.method.composable import HellingerSurrogateLoss
|
|||
|
||||
model = ComposableQuantifier(
|
||||
HellingerSurrogateLoss(), # the loss is different from before
|
||||
ClassTransformer( # we use the same transformer
|
||||
ClassRepresentation( # we use the same representation
|
||||
CVClassifier(LogisticRegression(random_state=0), 5),
|
||||
is_probabilistic = True
|
||||
),
|
||||
)
|
||||
|
||||
print(f"Evaluating {model}")
|
||||
model.fit(training)
|
||||
model.fit(Xtr, ytr)
|
||||
absolute_errors = qp.evaluation.evaluate(
|
||||
model = model,
|
||||
protocol = app, # use the same protocol for evaluation
|
||||
|
|
@ -79,7 +80,7 @@ absolute_errors = qp.evaluation.evaluate(
|
|||
print(f"MAE = {np.mean(absolute_errors):.4f}+-{np.std(absolute_errors):.4f}")
|
||||
|
||||
# In general, any composed method solves a linear system of equations by
|
||||
# minimizing the loss after transforming the data. Methods of this kind include
|
||||
# minimizing the loss after representing the data. Methods of this kind include
|
||||
# ACC, PACC, HDx, HDy, and many other well-known methods, as well as an
|
||||
# unlimited number of re-combinations of their building blocks.
|
||||
|
||||
|
|
@ -93,18 +94,18 @@ from quapy.method.composable import CombinedLoss
|
|||
|
||||
model = ComposableQuantifier(
|
||||
CombinedLoss(HellingerSurrogateLoss(), LeastSquaresLoss()),
|
||||
ClassTransformer(
|
||||
ClassRepresentation(
|
||||
CVClassifier(LogisticRegression(random_state=0), 5),
|
||||
is_probabilistic = True
|
||||
),
|
||||
)
|
||||
|
||||
from qunfold.quapy import QuaPyWrapper
|
||||
from qunfold import GenericMethod
|
||||
from quapy.method.composable import QUnfoldWrapper
|
||||
from qunfold import LinearMethod
|
||||
|
||||
model = QuaPyWrapper(GenericMethod(
|
||||
model = QUnfoldWrapper(LinearMethod(
|
||||
CombinedLoss(HellingerSurrogateLoss(), LeastSquaresLoss()),
|
||||
ClassTransformer(
|
||||
ClassRepresentation(
|
||||
CVClassifier(LogisticRegression(random_state=0), 5),
|
||||
is_probabilistic = True
|
||||
),
|
||||
|
|
@ -115,7 +116,7 @@ model = QuaPyWrapper(GenericMethod(
|
|||
|
||||
param_grid = {
|
||||
"loss__weights": [ (w, 1-w) for w in [.1, .5, .9] ],
|
||||
"transformer__classifier__estimator__C": [1e-1, 1e1],
|
||||
"representation__classifier__estimator__C": [1e-1, 1e1],
|
||||
}
|
||||
|
||||
grid_search = qp.model_selection.GridSearchQ(
|
||||
|
|
@ -125,7 +126,7 @@ grid_search = qp.model_selection.GridSearchQ(
|
|||
error = "mae",
|
||||
refit = False,
|
||||
verbose = True,
|
||||
).fit(training)
|
||||
).fit(Xtr, ytr)
|
||||
print(
|
||||
f"Best hyper-parameters = {grid_search.best_params_}",
|
||||
f"Best MAE = {grid_search.best_score_}",
|
||||
|
|
|
|||
|
|
@ -0,0 +1,83 @@
|
|||
import quapy as qp
|
||||
import numpy as np
|
||||
from quapy.protocol import UPP
|
||||
from quapy.method.aggregative import KDEyML
|
||||
import quapy.functional as F
|
||||
from time import time
|
||||
|
||||
"""
|
||||
Let see one example:
|
||||
"""
|
||||
|
||||
# load some data
|
||||
qp.environ['SAMPLE_SIZE'] = 100
|
||||
data = qp.datasets.fetch_UCIMulticlassDataset('molecular')
|
||||
training, test = data.train_test
|
||||
training, validation = training.split_stratified(train_prop=0.7, random_state=0)
|
||||
protocol = UPP(validation)
|
||||
|
||||
hyper_C = np.logspace(-3, 3, 7)
|
||||
|
||||
model = KDEyML()
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
|
||||
param_grid = {
|
||||
'classifier__C': hyper_C,
|
||||
'bandwidth': np.linspace(0.01, 0.20, 20) # [0.01, 0.02, 0.03, ..., 0.20]
|
||||
}
|
||||
|
||||
model = qp.model_selection.GridSearchQ(
|
||||
model=model,
|
||||
param_grid=param_grid,
|
||||
protocol=protocol,
|
||||
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
|
||||
refit=False, # retrain on the whole labelled set once done
|
||||
n_jobs=-1,
|
||||
verbose=True # show information as the process goes on
|
||||
).fit(training)
|
||||
|
||||
best_params = model.best_params_
|
||||
took = model.fit_time_
|
||||
model = model.best_model_
|
||||
print(f'model selection ended: best hyper-parameters={best_params}')
|
||||
|
||||
# evaluation in terms of MAE
|
||||
# we use the same evaluation protocol (APP) on the test set
|
||||
mae_score = qp.evaluation.evaluate(model, protocol=UPP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE={mae_score:.5f}')
|
||||
print(f'model selection took {took:.1f}s')
|
||||
|
||||
|
||||
model = KDEyML(bandwidth='auto')
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
|
||||
param_grid = {
|
||||
'classifier__C': hyper_C,
|
||||
}
|
||||
|
||||
model = qp.model_selection.GridSearchQ(
|
||||
model=model,
|
||||
param_grid=param_grid,
|
||||
protocol=protocol,
|
||||
error='mae', # the error to optimize is the MAE (a quantification-oriented loss)
|
||||
refit=False, # retrain on the whole labelled set once done
|
||||
n_jobs=-1,
|
||||
verbose=True # show information as the process goes on
|
||||
).fit(training)
|
||||
|
||||
best_params = model.best_params_
|
||||
took = model.fit_time_
|
||||
model = model.best_model_
|
||||
bandwidth = model.bandwidth_val
|
||||
print(f'model selection ended: best hyper-parameters={best_params} ({bandwidth=})')
|
||||
|
||||
# evaluation in terms of MAE
|
||||
# we use the same evaluation protocol (APP) on the test set
|
||||
mae_score = qp.evaluation.evaluate(model, protocol=UPP(test), error_metric='mae')
|
||||
|
||||
print(f'MAE={mae_score:.5f}')
|
||||
print(f'model selection took {took:.1f}s')
|
||||
|
||||
|
|
@ -1,4 +1,3 @@
|
|||
from quapy.method.confidence import BayesianCC
|
||||
from quapy.method.confidence import AggregativeBootstrap
|
||||
from quapy.method.aggregative import PACC
|
||||
import quapy.functional as F
|
||||
|
|
@ -21,15 +20,15 @@ Let see one example:
|
|||
# load some data
|
||||
data = qp.datasets.fetch_UCIMulticlassDataset('molecular')
|
||||
train, test = data.train_test
|
||||
Xtr, ytr = train.Xy
|
||||
|
||||
# by simply wrapping an aggregative quantifier within the AggregativeBootstrap class, we can obtain confidence
|
||||
# intervals around the point estimate, in this case, at 95% of confidence
|
||||
pacc = AggregativeBootstrap(PACC(), n_test_samples=500, confidence_level=0.95)
|
||||
|
||||
|
||||
with qp.util.temp_seed(0):
|
||||
# we train the quantifier the usual way
|
||||
pacc.fit(train)
|
||||
pacc.fit(Xtr, ytr)
|
||||
|
||||
# let us simulate some shift in the test data
|
||||
random_prevalence = F.uniform_prevalence_sampling(n_classes=test.n_classes)
|
||||
|
|
@ -53,7 +52,7 @@ with qp.util.temp_seed(0):
|
|||
print(f'point-estimate: {F.strprev(pred_prev)}')
|
||||
print(f'absolute error: {error:.3f}')
|
||||
print(f'Is the true value in the confidence region?: {conf_intervals.coverage(true_prev)==1}')
|
||||
print(f'Proportion of simplex covered at {pacc.confidence_level*100:.1f}%: {conf_intervals.simplex_portion()*100:.2f}%')
|
||||
print(f'Proportion of simplex covered at confidence level {pacc.confidence_level*100:.1f}%: {conf_intervals.simplex_portion()*100:.2f}%')
|
||||
|
||||
"""
|
||||
Final remarks:
|
||||
|
|
|
|||
|
|
@ -0,0 +1,75 @@
|
|||
"""
|
||||
Aggregative quantifiers use an underlying classifier. Often, one has one pre-trained classifier available, and
|
||||
needs to use this classifier at the basis of a quantification system. In such cases, the classifier should not
|
||||
be retrained, but only used to issue classifier predictions for the quantifier.
|
||||
In this example, we show how to instantiate a quantifier with a pre-trained classifier.
|
||||
"""
|
||||
from typing import List, Dict
|
||||
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import PACC
|
||||
from sklearn.base import BaseEstimator, ClassifierMixin
|
||||
from transformers import pipeline
|
||||
import numpy as np
|
||||
import quapy.functional as F
|
||||
|
||||
|
||||
# A scikit-learn's style wrapper for a huggingface-based pre-trained transformer for binary sentiment classification
|
||||
class HFTextClassifier(BaseEstimator, ClassifierMixin):
|
||||
def __init__(self, model_name='distilbert-base-uncased-finetuned-sst-2-english'):
|
||||
self.pipe = pipeline("sentiment-analysis", model=model_name)
|
||||
self.classes_ = np.asarray([0,1])
|
||||
|
||||
def fit(self, X, y=None):
|
||||
return self
|
||||
|
||||
def _binary_decisions(self, transformer_output: List[Dict]):
|
||||
return np.array([(1 if p['label']=='POSITIVE' else 0) for p in transformer_output], dtype=int)
|
||||
|
||||
def predict(self, X):
|
||||
X = list(map(str, X))
|
||||
preds = self.pipe(X, truncation=True)
|
||||
return self._binary_decisions(preds)
|
||||
|
||||
def predict_proba(self, X):
|
||||
X = list(map(str, X))
|
||||
n_examples = len(X)
|
||||
preds = self.pipe(X, truncation=True)
|
||||
decisions = self._binary_decisions(preds)
|
||||
scores = np.array([p['score'] for p in preds], dtype=float)
|
||||
probas = np.zeros(shape=(len(X), 2), dtype=float)
|
||||
probas[np.arange(n_examples),decisions] = scores
|
||||
probas[np.arange(n_examples),~decisions] = 1-scores
|
||||
return probas
|
||||
|
||||
# load a sentiment dataset
|
||||
dataset = qp.datasets.fetch_reviews('imdb', tfidf=False) # raw text
|
||||
train, test = dataset.training, dataset.test
|
||||
|
||||
# instantiate a pre-trained classifier
|
||||
clf = HFTextClassifier()
|
||||
|
||||
# Let us fit a quantifier based on our pre-trained classifier.
|
||||
# Note that, since the classifier is already fit, we will use the entire training set for
|
||||
# learning the aggregation function of the quantifier.
|
||||
# To do so, we only need to indicate "fit_classifier"=False, as follows:
|
||||
quantifier = PACC(clf, fit_classifier=False) # Probabilistic Classify & Count using a pre-trained model
|
||||
|
||||
print('training PACC...')
|
||||
quantifier.fit(*train.Xy)
|
||||
|
||||
# let us simulate some shifted test data...
|
||||
new_prevalence = [0.75, 0.25]
|
||||
shifted_test = test.sampling(500, *new_prevalence, random_state=0)
|
||||
|
||||
# and do some evaluation
|
||||
print('predicting with PACC...')
|
||||
estim_prevalence = quantifier.predict(shifted_test.X)
|
||||
|
||||
print('Result:\n'+('='*20))
|
||||
print(f'training prevalence: {F.strprev(train.prevalence())}')
|
||||
print(f'(shifted) test prevalence: {F.strprev(shifted_test.prevalence())}')
|
||||
print(f'estimated prevalence: {F.strprev(estim_prevalence)}')
|
||||
|
||||
absolute_error = qp.error.ae(new_prevalence, estim_prevalence)
|
||||
print(f'absolute error={absolute_error:.4f}')
|
||||
|
|
@ -31,13 +31,13 @@ training, val_generator, test_generator = fetch_lequa2022(task=task)
|
|||
Xtr, ytr = training.Xy
|
||||
|
||||
# define the quantifier
|
||||
quantifier = EMQ(classifier=LogisticRegression())
|
||||
quantifier = EMQ(classifier=LogisticRegression(), val_split=5)
|
||||
|
||||
# model selection
|
||||
param_grid = {
|
||||
'classifier__C': np.logspace(-3, 3, 7), # classifier-dependent: inverse of regularization strength
|
||||
'classifier__class_weight': ['balanced', None], # classifier-dependent: weights of each class
|
||||
# 'calib': ['bcts', None] # quantifier-dependent: recalibration method (new in v0.1.7)
|
||||
'calib': ['bcts', None] # quantifier-dependent: recalibration method (new in v0.1.7)
|
||||
}
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=val_generator, error='mrae', refit=False, verbose=True)
|
||||
quantifier = model_selection.fit(Xtr, ytr)
|
||||
|
|
@ -1,6 +1,6 @@
|
|||
import quapy as qp
|
||||
import numpy as np
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
import quapy as qp
|
||||
import quapy.functional as F
|
||||
from quapy.data.datasets import LEQUA2024_SAMPLE_SIZE, fetch_lequa2024
|
||||
from quapy.evaluation import evaluation_report
|
||||
|
|
@ -14,6 +14,7 @@ LeQua competition itself, check:
|
|||
https://lequa2024.github.io/index (the site of the competition)
|
||||
"""
|
||||
|
||||
|
||||
# there are 4 tasks: T1 (binary), T2 (multiclass), T3 (ordinal), T4 (binary - covariate & prior shift)
|
||||
task = 'T2'
|
||||
|
||||
|
|
@ -38,6 +39,7 @@ param_grid = {
|
|||
'classifier__class_weight': ['balanced', None], # classifier-dependent: weights of each class
|
||||
'bandwidth': np.linspace(0.01, 0.2, 20) # quantifier-dependent: bandwidth of the kernel
|
||||
}
|
||||
|
||||
model_selection = GridSearchQ(quantifier, param_grid, protocol=val_generator, error='mrae', refit=False, verbose=True)
|
||||
quantifier = model_selection.fit(Xtr, ytr)
|
||||
|
||||
|
|
@ -1,4 +1,7 @@
|
|||
from copy import deepcopy
|
||||
from pathlib import Path
|
||||
|
||||
import pandas as pd
|
||||
|
||||
import quapy as qp
|
||||
from sklearn.calibration import CalibratedClassifierCV
|
||||
|
|
@ -15,6 +18,18 @@ import itertools
|
|||
import argparse
|
||||
import torch
|
||||
import shutil
|
||||
from glob import glob
|
||||
|
||||
|
||||
"""
|
||||
This example shows how to generate experiments for the UCI ML repository binary datasets following the protocol
|
||||
proposed in "Pérez-Gállego , P., Quevedo , J. R., and del Coz, J. J. Using ensembles for problems with characteriz-
|
||||
able changes in data distribution: A case study on quantification. Information Fusion 34 (2017), 87–100."
|
||||
|
||||
This example covers most important steps in the experimentation pipeline, namely, the training and optimization
|
||||
of the hyperparameters of different quantifiers, and the evaluation of these quantifiers based on standard
|
||||
prevalence sampling protocols aimed at simulating different levels of prior probability shift.
|
||||
"""
|
||||
|
||||
|
||||
N_JOBS = -1
|
||||
|
|
@ -28,10 +43,6 @@ def newLR():
|
|||
return LogisticRegression(max_iter=1000, solver='lbfgs', n_jobs=-1)
|
||||
|
||||
|
||||
def calibratedLR():
|
||||
return CalibratedClassifierCV(newLR())
|
||||
|
||||
|
||||
__C_range = np.logspace(-3, 3, 7)
|
||||
lr_params = {
|
||||
'classifier__C': __C_range,
|
||||
|
|
@ -74,6 +85,13 @@ def result_path(path, dataset_name, model_name, run, optim_loss):
|
|||
return os.path.join(path, f'{dataset_name}-{model_name}-run{run}-{optim_loss}.pkl')
|
||||
|
||||
|
||||
def parse_result_path(path):
|
||||
*dataset, method, run, metric = Path(path).name.split('-')
|
||||
dataset = '-'.join(dataset)
|
||||
run = int(run.replace('run',''))
|
||||
return dataset, method, run, metric
|
||||
|
||||
|
||||
def is_already_computed(dataset_name, model_name, run, optim_loss):
|
||||
return os.path.exists(result_path(args.results, dataset_name, model_name, run, optim_loss))
|
||||
|
||||
|
|
@ -130,10 +148,28 @@ def run(experiment):
|
|||
best_params)
|
||||
|
||||
|
||||
def show_results(result_folder):
|
||||
result_data = []
|
||||
for file in glob(os.path.join(result_folder,'*.pkl')):
|
||||
true_prevalences, estim_prevalences, *_ = pickle.load(open(file, 'rb'))
|
||||
dataset, method, run, metric = parse_result_path(file)
|
||||
mae = qp.error.mae(true_prevalences, estim_prevalences)
|
||||
result_data.append({
|
||||
'dataset': dataset,
|
||||
'method': method,
|
||||
'run': run,
|
||||
metric: mae
|
||||
})
|
||||
df = pd.DataFrame(result_data)
|
||||
pd.set_option("display.max_columns", None)
|
||||
pd.set_option("display.expand_frame_repr", False)
|
||||
print(df.pivot_table(index='dataset', columns='method', values=metric))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = argparse.ArgumentParser(description='Run experiments for Tweeter Sentiment Quantification')
|
||||
parser.add_argument('--results', metavar='RESULT_PATH', type=str,
|
||||
help='path to the directory where to store the results', default='./uci_results')
|
||||
help='path to the directory where to store the results', default='./results/uci_binary')
|
||||
parser.add_argument('--svmperfpath', metavar='SVMPERF_PATH', type=str, default='../svm_perf_quantification',
|
||||
help='path to the directory with svmperf')
|
||||
parser.add_argument('--checkpointdir', metavar='PATH', type=str, default='./checkpoint',
|
||||
|
|
@ -155,3 +191,5 @@ if __name__ == '__main__':
|
|||
qp.util.parallel(run, itertools.product(optim_losses, datasets, models), n_jobs=CUDA_N_JOBS)
|
||||
|
||||
shutil.rmtree(args.checkpointdir, ignore_errors=True)
|
||||
|
||||
show_results(args.results)
|
||||
|
|
@ -1,4 +1,3 @@
|
|||
import pickle
|
||||
import os
|
||||
from time import time
|
||||
from collections import defaultdict
|
||||
|
|
@ -7,11 +6,16 @@ import numpy as np
|
|||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
import quapy as qp
|
||||
from quapy.method.aggregative import PACC, EMQ
|
||||
from quapy.method.aggregative import PACC, EMQ, KDEyML
|
||||
from quapy.model_selection import GridSearchQ
|
||||
from quapy.protocol import UPP
|
||||
from pathlib import Path
|
||||
|
||||
"""
|
||||
This example is the analogous counterpart of example 7 but involving multiclass quantification problems
|
||||
using datasets from the UCI ML repository.
|
||||
"""
|
||||
|
||||
|
||||
SEED = 1
|
||||
|
||||
|
|
@ -31,7 +35,7 @@ def wrap_hyper(classifier_hyper_grid:dict):
|
|||
METHODS = [
|
||||
('PACC', PACC(newLR()), wrap_hyper(logreg_grid)),
|
||||
('EMQ', EMQ(newLR()), wrap_hyper(logreg_grid)),
|
||||
# ('KDEy-ML', KDEyML(newLR()), {**wrap_hyper(logreg_grid), **{'bandwidth': np.linspace(0.01, 0.2, 20)}}),
|
||||
('KDEy-ML', KDEyML(newLR()), {**wrap_hyper(logreg_grid), **{'bandwidth': np.linspace(0.01, 0.2, 20)}}),
|
||||
]
|
||||
|
||||
|
||||
|
|
@ -43,6 +47,7 @@ def show_results(result_path):
|
|||
pv = df.pivot_table(index='Dataset', columns="Method", values=["MAE", "MRAE", "t_train"], margins=True)
|
||||
print(pv)
|
||||
|
||||
|
||||
def load_timings(result_path):
|
||||
import pandas as pd
|
||||
timings = defaultdict(lambda: {})
|
||||
|
|
@ -59,7 +64,7 @@ if __name__ == '__main__':
|
|||
qp.environ['N_JOBS'] = -1
|
||||
n_bags_val = 250
|
||||
n_bags_test = 1000
|
||||
result_dir = f'results/ucimulti'
|
||||
result_dir = f'results/uci_multiclass'
|
||||
|
||||
os.makedirs(result_dir, exist_ok=True)
|
||||
|
||||
|
|
@ -100,7 +105,7 @@ if __name__ == '__main__':
|
|||
|
||||
t_init = time()
|
||||
try:
|
||||
modsel.fit(train)
|
||||
modsel.fit(*train.Xy)
|
||||
|
||||
print(f'best params {modsel.best_params_}')
|
||||
print(f'best score {modsel.best_score_}')
|
||||
|
|
@ -108,7 +113,8 @@ if __name__ == '__main__':
|
|||
quantifier = modsel.best_model()
|
||||
except:
|
||||
print('something went wrong... trying to fit the default model')
|
||||
quantifier.fit(train)
|
||||
quantifier.fit(*train.Xy)
|
||||
|
||||
timings[method_name][dataset] = time() - t_init
|
||||
|
||||
|
||||
|
|
@ -6,6 +6,18 @@ from sklearn.linear_model import LogisticRegression
|
|||
from quapy.model_selection import GridSearchQ
|
||||
from quapy.evaluation import evaluation_report
|
||||
|
||||
"""
|
||||
This example shows a complete experiment using the IFCB Plankton dataset;
|
||||
see https://hlt-isti.github.io/QuaPy/manuals/datasets.html#ifcb-plankton-dataset
|
||||
|
||||
Note that this dataset can be downloaded in two modes: for model selection or for evaluation.
|
||||
|
||||
See also:
|
||||
Automatic plankton quantification using deep features
|
||||
P González, A Castaño, EE Peacock, J Díez, JJ Del Coz, HM Sosik
|
||||
Journal of Plankton Research 41 (4), 449-463
|
||||
"""
|
||||
|
||||
|
||||
print('Quantifying the IFCB dataset with PACC\n')
|
||||
|
||||
|
|
@ -30,7 +42,7 @@ mod_sel = GridSearchQ(
|
|||
n_jobs=-1,
|
||||
verbose=True,
|
||||
raise_errors=True
|
||||
).fit(train)
|
||||
).fit(*train.Xy)
|
||||
|
||||
print(f'model selection chose hyperparameters: {mod_sel.best_params_}')
|
||||
quantifier = mod_sel.best_model_
|
||||
|
|
@ -42,7 +54,7 @@ print(f'\ttraining size={len(train)}, features={train.X.shape[1]}, classes={trai
|
|||
print(f'\ttest samples={test_gen.total()}')
|
||||
|
||||
print('training on the whole dataset before test')
|
||||
quantifier.fit(train)
|
||||
quantifier.fit(*train.Xy)
|
||||
|
||||
print('testing...')
|
||||
report = evaluation_report(quantifier, protocol=test_gen, error_metrics=['mae'], verbose=True)
|
||||
|
|
|
|||
|
|
@ -1,10 +1,16 @@
|
|||
from sklearn.exceptions import ConvergenceWarning
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.naive_bayes import MultinomialNB
|
||||
from sklearn.neighbors import KNeighborsClassifier
|
||||
from statsmodels.sandbox.distributions.genpareto import quant
|
||||
|
||||
import quapy as qp
|
||||
from quapy.protocol import UPP
|
||||
from quapy.method.aggregative import PACC, DMy, EMQ, KDEyML
|
||||
from quapy.method.meta import SCMQ
|
||||
from quapy.method.meta import SCMQ, MCMQ, MCSQ
|
||||
import warnings
|
||||
warnings.filterwarnings("ignore", category=DeprecationWarning)
|
||||
warnings.filterwarnings("ignore", category=ConvergenceWarning)
|
||||
|
||||
qp.environ["SAMPLE_SIZE"]=100
|
||||
|
||||
|
|
@ -32,5 +38,19 @@ scmq = SCMQ(classifier, quantifiers)
|
|||
|
||||
train_and_test_model(scmq, train, test)
|
||||
|
||||
for quantifier in quantifiers:
|
||||
train_and_test_model(quantifier, train, test)
|
||||
# for quantifier in quantifiers:
|
||||
# train_and_test_model(quantifier, train, test)
|
||||
|
||||
classifiers = [
|
||||
LogisticRegression(),
|
||||
KNeighborsClassifier(),
|
||||
# MultinomialNB()
|
||||
]
|
||||
|
||||
mcmq = MCMQ(classifiers, quantifiers)
|
||||
|
||||
train_and_test_model(mcmq, train, test)
|
||||
|
||||
mcsq = MCSQ(classifiers, PACC())
|
||||
|
||||
train_and_test_model(mcsq, train, test)
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 48 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 57 KiB |
|
|
@ -11,13 +11,5 @@ rm $FILE
|
|||
patch -s -p0 < svm-perf-quantification-ext.patch
|
||||
mv svm_perf svm_perf_quantification
|
||||
cd svm_perf_quantification
|
||||
make
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
make CFLAGS="-O3 -Wall -Wno-unused-result -fcommon"
|
||||
|
||||
|
|
|
|||
|
|
@ -1,5 +1,4 @@
|
|||
"""QuaPy module for quantification"""
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
from quapy.data import datasets
|
||||
from . import error
|
||||
|
|
@ -16,6 +15,12 @@ import os
|
|||
|
||||
__version__ = '0.2.0'
|
||||
|
||||
|
||||
def _default_cls():
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
return LogisticRegression()
|
||||
|
||||
|
||||
environ = {
|
||||
'SAMPLE_SIZE': None,
|
||||
'UNK_TOKEN': '[UNK]',
|
||||
|
|
@ -24,7 +29,7 @@ environ = {
|
|||
'PAD_INDEX': 1,
|
||||
'SVMPERF_HOME': './svm_perf_quantification',
|
||||
'N_JOBS': int(os.getenv('N_JOBS', 1)),
|
||||
'DEFAULT_CLS': LogisticRegression()
|
||||
'DEFAULT_CLS': _default_cls()
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -68,3 +73,5 @@ def _get_classifier(classifier):
|
|||
if classifier is None:
|
||||
raise ValueError('neither classifier nor qp.environ["DEFAULT_CLS"] have been specified')
|
||||
return classifier
|
||||
|
||||
|
||||
|
|
|
|||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue