QuaPy/quapy/CHANGE_LOG.txt

# main changes in 0.1.7

- Protocols are now abstracted as AbstractProtocol. There is a new class extending AbstractProtocol called
    AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
    There are some examples of protocols, APP, NPP, USimplexPP, CovariateShiftPP (experimental).
    The idea is to start the sampling by simply calling the __call__ method.
    This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
    and sampling functions in LabelledCollection make use of the old functions.

- ACC, PACC, Forman's threshold variants have been parallelized.

- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
    QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).

- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
    consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
    which the total number of classifications would be smaller than the number of classifications with the standard
    procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
    or not.

- n_jobs is now taken from the environment if set to None

- examples directory created!

- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
    test protocol maybe, or None for bypassing it?

- DyS, Topsoe distance and binary search (thanks to Pablo González)

- Multi-thread reproducibility via seeding (thanks to Pablo González)

- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes

- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
    with the plain python type (e.g., float).

- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
    https://github.com/kundajelab/abstention added.

- the internal classifier of aggregative methods is now called "classifier" instead of "learner"

- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters
    should be marked with a "classifier__" prefix (just like in scikit-learn with estimators), while the quantifier's
    specific hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has hyperparameters
    "classifier__C", "classifier__class_weight", etc., instead of "C" and "class_weight" as in v0.1.6.

- hyperparameters yielding to inconsistent runs raise a ValueError exception, while hyperparameter combinations
    yielding to internal errors of surrogate functions are reported and skipped, without stopping the grid search.

- DistributionMatching methods added. This is a general framework for distribution matching methods that catters for
    multiclass quantification. That is to say, one could get a multiclass variant of the (originally binary) HDy
    method aligned with the Firat's formulation.

Things to fix:
- calibration with recalibration methods has to be fixed for exact_train_prev in EMQ (conflicts with clone, deepcopy, etc.)
- clean functions like binary, aggregative, probabilistic, etc; those should be resolved via isinstance():
    this is not working; I don't know how to make the isinstance work. Looks like there is some problem with the
    path of the imported class wrt the path of the class that arrives from another module...
- clean classes_ and n_classes from methods (maybe not from aggregative ones, but those have to be used only
    internally and not imposed in any abstract class)
- optimize "qp.evaluation.prediction" for aggregative methods (pre-classification)
- update unit tests
- Policies should be able to set their output to "labelled_collection" or "instances_prevalence" or something similar.
- Policies should implement the "gen()" one, taking a reader function as an input, and a folder path maybe
- Review all documentation, redo the Sphinx doc, update Wikis...
- Resolve the OneVsAll thing (it is in base.py and in aggregative.py)
- Better handle the environment (e.g., with n_jobs)
- test cross_generate_predictions and cancel cross_generate_predictions_depr
- Add a proper log?
- test LoadSamplesFromDirectory (in protocols.py)
- improve plots?
- I have removed the distinction between "classify" and "posterior_probabilities" in the Aggregative quantifiers,
    so that probabilistic classifiers actually return posterior probabilities, while non-probabilistic quantifiers
    return instead crisp decisions. The idea was to unify the quantification function (i.e., now it is always
    classify & aggregate, irrespective of the class). However, this has caused a problem with OneVsAll. This has to
    be checked, since it is now innecessarily complicated (it also has old references to .probabilistic, and all this
    stuff).
- Check method  def __parallel(self, func, *args, **kwargs) in aggregative.OneVsAll

New features:
- Add LeQua2022 to datasets (everything automatic, and with proper protocols "gen")
- Add an "experimental room", with scripts to quickly test new ideas and see results.

# 0.1.7
# change the LabelledCollection API (removing protocol-related samplings)
# need to change the two references to the above in the wiki / doc, and code examples...
# removed artificial_prevalence_sampling from functional

# also: some parameters in the init could be used to indicate that the method should return a tuple with
# unlabelled instances and the vector of prevalence values (and not a LabelledCollection).
# Or: this can be done in a different function; i.e., we use one function (now __call__) to return
# LabelledCollections, and another new one for returning the other output, which is more general for
# evaluation purposes.

# the so-called "gen" function has to be implemented as a protocol. The problem here is that this function
# should be able to return only unlabelled instances plus a vector of prevalences (and not LabelledCollections).
# This was coded as different functions in 0.1.6