84 lines
5.1 KiB
Plaintext
84 lines
5.1 KiB
Plaintext
# main changes in 0.1.7
|
|
|
|
- Protocols are now abstracted as AbstractProtocol. There is a new class extending AbstractProtocol called
|
|
AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
|
|
There are some examples of protocols, APP, NPP, USimplexPP, CovariateShiftPP (experimental).
|
|
The idea is to start the sampling by simply calling the __call__ method.
|
|
This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
|
|
and sampling functions in LabelledCollection make use of the old functions.
|
|
|
|
- ACC, PACC, Forman's threshold variants have been parallelized.
|
|
|
|
- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
|
|
QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).
|
|
|
|
- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
|
|
consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
|
|
which the total number of classifications would be smaller than the number of classifications with the standard
|
|
procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
|
|
or not.
|
|
|
|
- n_jobs is now taken from the environment if set to None
|
|
|
|
- examples directory created!
|
|
|
|
- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
|
|
test protocol maybe, or None for bypassing it?
|
|
|
|
- I think Pablo added DyS, Topsoe distance and binary search.
|
|
|
|
- I think Pablo added multi-thread reproducibility.
|
|
|
|
- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes
|
|
|
|
- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
|
|
with the plain python type (e.g., float).
|
|
|
|
- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
|
|
https://github.com/kundajelab/abstention added.
|
|
|
|
Things to fix:
|
|
- calibration with recalibration methods has to be fixed for exact_train_prev in EMQ (conflicts with clone, deepcopy, etc.)
|
|
- clean functions like binary, aggregative, probabilistic, etc; those should be resolved via isinstance():
|
|
this is not working; I don't know how to make the isinstance work. Looks like there is some problem with the
|
|
path of the imported class wrt the path of the class that arrives from another module...
|
|
- clean classes_ and n_classes from methods (maybe not from aggregative ones, but those have to be used only
|
|
internally and not imposed in any abstract class)
|
|
- optimize "qp.evaluation.prediction" for aggregative methods (pre-classification)
|
|
- update unit tests
|
|
- Policies should be able to set their output to "labelled_collection" or "instances_prevalence" or something similar.
|
|
- Policies should implement the "gen()" one, taking a reader function as an input, and a folder path maybe
|
|
- Review all documentation, redo the Sphinx doc, update Wikis...
|
|
- Resolve the OneVsAll thing (it is in base.py and in aggregative.py)
|
|
- Better handle the environment (e.g., with n_jobs)
|
|
- test cross_generate_predictions and cancel cross_generate_predictions_depr
|
|
- Add a proper log?
|
|
- test LoadSamplesFromDirectory (in protocols.py)
|
|
- improve plots?
|
|
- I have removed the distinction between "classify" and "posterior_probabilities" in the Aggregative quantifiers,
|
|
so that probabilistic classifiers actually return posterior probabilities, while non-probabilistic quantifiers
|
|
return instead crisp decisions. The idea was to unify the quantification function (i.e., now it is always
|
|
classify & aggregate, irrespective of the class). However, this has caused a problem with OneVsAll. This has to
|
|
be checked, since it is now innecessarily complicated (it also has old references to .probabilistic, and all this
|
|
stuff).
|
|
- Check method def __parallel(self, func, *args, **kwargs) in aggregative.OneVsAll
|
|
|
|
New features:
|
|
- Add LeQua2022 to datasets (everything automatic, and with proper protocols "gen")
|
|
- Add an "experimental room", with scripts to quickly test new ideas and see results.
|
|
|
|
# 0.1.7
|
|
# change the LabelledCollection API (removing protocol-related samplings)
|
|
# need to change the two references to the above in the wiki / doc, and code examples...
|
|
# removed artificial_prevalence_sampling from functional
|
|
|
|
# also: some parameters in the init could be used to indicate that the method should return a tuple with
|
|
# unlabelled instances and the vector of prevalence values (and not a LabelledCollection).
|
|
# Or: this can be done in a different function; i.e., we use one function (now __call__) to return
|
|
# LabelledCollections, and another new one for returning the other output, which is more general for
|
|
# evaluation purposes.
|
|
|
|
# the so-called "gen" function has to be implemented as a protocol. The problem here is that this function
|
|
# should be able to return only unlabelled instances plus a vector of prevalences (and not LabelledCollections).
|
|
# This was coded as different functions in 0.1.6
|