QuaPy/quapy/CHANGE_LOG.txt

# main changes in 0.1.7

- Protocols are now abstracted as AbstractProtocol. There is a new class extending AbstractProtocol called
    AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
    There are some examples of protocols, APP, NPP, USimplexPP, CovariateShiftPP (experimental).
    The idea is to start the sampling by simply calling the __call__ method.
    This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
    and sampling functions in LabelledCollection make use of the old functions.

- ACC, PACC, Forman's threshold variants have been parallelized.

- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
    QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).

- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
    consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
    which the total number of classifications would be smaller than the number of classifications with the standard
    procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
    or not.

- n_jobs is now taken from the environment if set to None

- examples directory created!

- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
    test protocol maybe, or None for bypassing it?

- I think Pablo added DyS, Topsoe distance and binary search.

- I think Pablo added multi-thread reproducibility.

- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes

- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
    with the plain python type (e.g., float).

- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
    https://github.com/kundajelab/abstention added.

- the internal classifier of aggregative methods is now called "classifier" instead of "learner"

- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters
    should be marked with a "classifier__" prefix (just like in scikit-learn), while the quantifier's specific
    hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has

Things to fix:
- calibration with recalibration methods has to be fixed for exact_train_prev in EMQ (conflicts with clone, deepcopy, etc.)
- clean functions like binary, aggregative, probabilistic, etc; those should be resolved via isinstance():
    this is not working; I don't know how to make the isinstance work. Looks like there is some problem with the
    path of the imported class wrt the path of the class that arrives from another module...
- clean classes_ and n_classes from methods (maybe not from aggregative ones, but those have to be used only
    internally and not imposed in any abstract class)
- optimize "qp.evaluation.prediction" for aggregative methods (pre-classification)
- update unit tests
- Policies should be able to set their output to "labelled_collection" or "instances_prevalence" or something similar.
- Policies should implement the "gen()" one, taking a reader function as an input, and a folder path maybe
- Review all documentation, redo the Sphinx doc, update Wikis...
- Resolve the OneVsAll thing (it is in base.py and in aggregative.py)
- Better handle the environment (e.g., with n_jobs)
- test cross_generate_predictions and cancel cross_generate_predictions_depr
- Add a proper log?
- test LoadSamplesFromDirectory (in protocols.py)
- improve plots?
- I have removed the distinction between "classify" and "posterior_probabilities" in the Aggregative quantifiers,
    so that probabilistic classifiers actually return posterior probabilities, while non-probabilistic quantifiers
    return instead crisp decisions. The idea was to unify the quantification function (i.e., now it is always
    classify & aggregate, irrespective of the class). However, this has caused a problem with OneVsAll. This has to
    be checked, since it is now innecessarily complicated (it also has old references to .probabilistic, and all this
    stuff).
- Check method  def __parallel(self, func, *args, **kwargs) in aggregative.OneVsAll

New features:
- Add LeQua2022 to datasets (everything automatic, and with proper protocols "gen")
- Add an "experimental room", with scripts to quickly test new ideas and see results.

# 0.1.7
# change the LabelledCollection API (removing protocol-related samplings)
# need to change the two references to the above in the wiki / doc, and code examples...
# removed artificial_prevalence_sampling from functional

# also: some parameters in the init could be used to indicate that the method should return a tuple with
# unlabelled instances and the vector of prevalence values (and not a LabelledCollection).
# Or: this can be done in a different function; i.e., we use one function (now __call__) to return
# LabelledCollections, and another new one for returning the other output, which is more general for
# evaluation purposes.

# the so-called "gen" function has to be implemented as a protocol. The problem here is that this function
# should be able to return only unlabelled instances plus a vector of prevalences (and not LabelledCollections).
# This was coded as different functions in 0.1.6
many changes, see change log 2022-05-25 19:14:33 +02:00			`# main changes in 0.1.7`

optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`- Protocols are now abstracted as AbstractProtocol. There is a new class extending AbstractProtocol called`
many changes, see change log 2022-05-25 19:14:33 +02:00			`AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.`
			`There are some examples of protocols, APP, NPP, USimplexPP, CovariateShiftPP (experimental).`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`The idea is to start the sampling by simply calling the __call__ method.`
many changes, see change log 2022-05-25 19:14:33 +02:00			`This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,`
			`and sampling functions in LabelledCollection make use of the old functions.`

			`- ACC, PACC, Forman's threshold variants have been parallelized.`

todos and change log 2022-11-08 16:36:52 +01:00			`- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in`
lequa as dataset 2022-06-01 18:28:59 +02:00			`QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).`

			`- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that`
			`consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in`
			`which the total number of classifications would be smaller than the number of classifications with the standard`
			`procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it`
			`or not.`
many changes, see change log 2022-05-25 19:14:33 +02:00
updating parallel policy to take n_jobs from environment (not yet tested) 2022-06-14 09:35:39 +02:00			`- n_jobs is now taken from the environment if set to None`

todos and change log 2022-11-08 16:36:52 +01:00			`- examples directory created!`

some bugfixes, unittest and minor changes 2023-01-16 13:51:29 +01:00			`- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a`
			`test protocol maybe, or None for bypassing it?`

			`- I think Pablo added DyS, Topsoe distance and binary search.`

			`- I think Pablo added multi-thread reproducibility.`

			`- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes`

			`- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances`
			`with the plain python type (e.g., float).`

adding calibration methods from the abstension package to quapy 2023-01-18 19:46:19 +01:00			`- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from`
			`https://github.com/kundajelab/abstention added.`
adding calibration methods from abstension package 2023-01-17 13:53:48 +01:00
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00			`- the internal classifier of aggregative methods is now called "classifier" instead of "learner"`

			`- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters`
			`should be marked with a "classifier__" prefix (just like in scikit-learn), while the quantifier's specific`
			`hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has`

many changes, see change log 2022-05-25 19:14:33 +02:00			`Things to fix:`
adding calibration methods from abstension package 2023-01-17 13:53:48 +01:00			`- calibration with recalibration methods has to be fixed for exact_train_prev in EMQ (conflicts with clone, deepcopy, etc.)`
lequa as dataset 2022-06-01 18:28:59 +02:00			`- clean functions like binary, aggregative, probabilistic, etc; those should be resolved via isinstance():`
			`this is not working; I don't know how to make the isinstance work. Looks like there is some problem with the`
			`path of the imported class wrt the path of the class that arrives from another module...`
many changes, see change log 2022-05-25 19:14:33 +02:00			`- clean classes_ and n_classes from methods (maybe not from aggregative ones, but those have to be used only`
			`internally and not imposed in any abstract class)`
			`- optimize "qp.evaluation.prediction" for aggregative methods (pre-classification)`
			`- update unit tests`
			`- Policies should be able to set their output to "labelled_collection" or "instances_prevalence" or something similar.`
			`- Policies should implement the "gen()" one, taking a reader function as an input, and a folder path maybe`
			`- Review all documentation, redo the Sphinx doc, update Wikis...`
updating parallel policy to take n_jobs from environment (not yet tested) 2022-06-14 09:35:39 +02:00			`- Resolve the OneVsAll thing (it is in base.py and in aggregative.py)`
many changes, see change log 2022-05-25 19:14:33 +02:00			`- Better handle the environment (e.g., with n_jobs)`
			`- test cross_generate_predictions and cancel cross_generate_predictions_depr`
			`- Add a proper log?`
			`- test LoadSamplesFromDirectory (in protocols.py)`
			`- improve plots?`
			`- I have removed the distinction between "classify" and "posterior_probabilities" in the Aggregative quantifiers,`
			`so that probabilistic classifiers actually return posterior probabilities, while non-probabilistic quantifiers`
			`return instead crisp decisions. The idea was to unify the quantification function (i.e., now it is always`
			`classify & aggregate, irrespective of the class). However, this has caused a problem with OneVsAll. This has to`
			`be checked, since it is now innecessarily complicated (it also has old references to .probabilistic, and all this`
optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`stuff).`
			`- Check method def __parallel(self, func, args, *kwargs) in aggregative.OneVsAll`

lequa as dataset 2022-06-01 18:28:59 +02:00			`New features:`
			`- Add LeQua2022 to datasets (everything automatic, and with proper protocols "gen")`
			`- Add an "experimental room", with scripts to quickly test new ideas and see results.`

optimization conditional in the prediction function 2022-05-26 17:59:23 +02:00			`# 0.1.7`
			`# change the LabelledCollection API (removing protocol-related samplings)`
			`# need to change the two references to the above in the wiki / doc, and code examples...`
			`# removed artificial_prevalence_sampling from functional`

			`# also: some parameters in the init could be used to indicate that the method should return a tuple with`
			`# unlabelled instances and the vector of prevalence values (and not a LabelledCollection).`
			`# Or: this can be done in a different function; i.e., we use one function (now __call__) to return`
			`# LabelledCollections, and another new one for returning the other output, which is more general for`
			`# evaluation purposes.`

			`# the so-called "gen" function has to be implemented as a protocol. The problem here is that this function`
			`# should be able to return only unlabelled instances plus a vector of prevalences (and not LabelledCollections).`
			`# This was coded as different functions in 0.1.6`