QuaPy/CHANGE_LOG.txt

Change Log 0.1.9
----------------
- Added Continuous Integration with GitHub Actions (thanks to Mirko Bunse!)

- Added Bayesian CC method (thanks to Pawel Czyz!). The method is described in detail in the paper
    Ziegler, Albert, and Paweł Czyż. "Bayesian Quantification with Black-Box Estimators."
    arXiv preprint arXiv:2302.09159 (2023).

- Removed binary UCI datasets {acute.a, acute.b, balance.2} from the list qp.data.datasets.UCI_BINARY_DATASETS
    (the datasets are still loadable from the fetch_UCIBinaryLabelledCollection and fetch_UCIBinaryDataset
    functions, though). The reason is that these datasets tend to yield results (for all methods) that are
    one or two orders of magnitude greater than for other datasets, and this has a disproportionate impact in
    methods average (I suspect there is something wrong in those datasets).


Change Log 0.1.8
----------------

- Added Kernel Density Estimation methods (KDEyML, KDEyCS, KDEyHD) as proposed in the paper:
    Moreo, A., González, P., & del Coz, J. J. Kernel Density Estimation for Multiclass Quantification.
    arXiv preprint arXiv:2401.00490, 2024

- Substantial internal refactor: aggregative methods now inherit a pattern by which the fit method consists of:
    a) fitting the classifier and returning the representations of the training instances (typically the posterior
        probabilities, the label predictions, or the classifier scores, and typically obtained through kFCV).
    b) fitting an aggregation function
    The function implemented in step a) is inherited from the super class. Each new aggregative method now has to
    implement only the "aggregative_fit" of step b).
    This pattern was already implemented for the prediction (thus allowing evaluation functions to be performed
    very quicky), and is now available also for training. The main benefit is that model selection now can nestle
    the training of quantifiers in two levels: one for the classifier, and another for the aggregation function.
    As a result, a method with a param grid of 10 combinations for the classifier and 10 combinations for the
    quantifier, now implies 10 trainings of the classifier + 10*10 trainings of the aggregation function (this is
    typically much faster than the classifier training), whereas in versions <0.1.8 this amounted to training
    10*10 (classifiers+aggregations).

- Added different solvers for ACC and PACC quantifiers. In quapy < 0.1.8 these quantifiers try to solve the system
    of equations Ax=B exactly (by means of np.linalg.solve). As noted by Mirko Bunse (thanks!), such an exact solution
    does sometimes not exist. In cases like this, quapy < 0.1.8 resorted to CC for providing a plausible solution.
    ACC and PACC now resorts to an approximated solution in such cases (minimizing the L2-norm of the difference
    between Ax-B) as proposed by Mirko Bunse. A quick experiment reveals this heuristic greatly improves the results
    of ACC and PACC in T2A@LeQua.

- Fixed ThresholdOptimization methods (X, T50, MAX, MS and MS2). Thanks to Tobias Schumacher and colleagues for pointing
    this out in Appendix A of "Schumacher, T., Strohmaier, M., & Lemmerich, F. (2021). A comparative evaluation of 
    quantification methods. arXiv:2103.03223v3 [cs.LG]"
    
- Added HDx and DistributionMatchingX to non-aggregative quantifiers (see also the new example "comparing_HDy_HDx.py")

- New UCI multiclass datasets added (thanks to Pablo González). The 5 UCI multiclass datasets are those corresponding
    to the following criteria:
        - >1000 instances
        - >2 classes
        - classification datasets
        - Python API available

- New IFCB (plankton) dataset added (thanks to Pablo González). See qp.datasets.fetch_IFCB.

- Added new evaluation measures NAE, NRAE (thanks to Andrea Esuli)

- Added new meta method "MedianEstimator"; an ensemble of binary base quantifiers that receives as input a dictionary
    of hyperparameters that will explore exhaustively, fitting and generating predictions for each combination of
    hyperparameters, and that returns, as the prevalence estimates, the median across all predictions.

- Added "custom_protocol.py" example.

- New API documentation template.

Change Log 0.1.7
----------------

- Protocols are now abstracted as instances of AbstractProtocol. There is a new class extending AbstractProtocol called
    AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
    There are some examples of protocols, APP, NPP, UPP, DomainMixer (experimental).
    The idea is to start the sample generation by simply calling the __call__ method.
    This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
    and sampling functions in LabelledCollection relied of the old functions. E.g., the functionality of
    qp.evaluation.artificial_prevalence_report or qp.evaluation.natural_prevalence_report is now obtained by means of
    qp.evaluation.report which takes a protocol as an argument. I have not maintained compatibility with the old
    interfaces because I did not really like them. Check the wiki guide and the examples for more details.

- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
    QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).

- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
    consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
    which the total number of classifications would be smaller than the number of classifications with the standard
    procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
    or not.

- examples directory created!

- DyS, Topsoe distance and binary search (thanks to Pablo González)

- Multi-thread reproducibility via seeding (thanks to Pablo González)

- n_jobs is now taken from the environment if set to None

- ACC, PACC, Forman's threshold variants have been parallelized.

- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
    test protocol maybe, or None for bypassing it?

- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes

- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
    with the plain python type (e.g., float).

- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
    https://github.com/kundajelab/abstention added.

- the internal classifier of aggregative methods is now called "classifier" instead of "learner"

- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters
    should be marked with a "classifier__" prefix (just like in scikit-learn with estimators), while the quantifier's
    specific hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has hyperparameters
    "classifier__C", "classifier__class_weight", etc., instead of "C" and "class_weight" as in v0.1.6.

- hyperparameters yielding to inconsistent runs raise a ValueError exception, while hyperparameter combinations
    yielding to internal errors of surrogate functions are reported and skipped, without stopping the grid search.

- DistributionMatching methods added. This is a general framework for distribution matching methods that catters for
    multiclass quantification. That is to say, one could get a multiclass variant of the (originally binary) HDy
    method aligned with the Firat's formulation.

- internal method properties "binary", "aggregative", and "probabilistic" have been removed; these conditions are
    checked via isinstance

- quantifiers (i.e., classes that inherit from BaseQuantifier) are not forced to implement classes_ or n_classes;
    these can be used anyway internally, but the framework will not suppose (nor impose) that a quantifier implements
    them

- qp.evaluation.prediction has been optimized so that, if a quantifier is of type aggregative, and if the evaluation
    protocol is of type OnLabelledCollection, then the computation is faster. In this specific case, the predictions
    are issued only once and for all, and not for each sample. An exception to this (which is implement also), is
    when the number of instances across all samples is anyway smaller than the number of instances in the original
    labelled collection; in this case the heuristic is of no help, and is therefore not applied.

- the distinction between "classify" and "posterior_probabilities" has been removed in Aggregative quantifiers,
    so that probabilistic classifiers return posterior probabilities, while non-probabilistic quantifiers
    return crisp decisions.

- OneVsAll fixed. There are now two classes: a generic one OneVsAllGeneric that works with any quantifier (e.g.,
    any instance of BaseQuantifier), and a subclass of it called OneVsAllAggregative which implements the
    classify / aggregate interface. Both are instances of OneVsAll. There is a method getOneVsAll that returns the
    best instance based on the type of quantifier.
sketching readme system by Lu and King, Hopings and King 2024-02-16 17:34:10 +01:00			`Change Log 0.1.9`
			`----------------`
update changelog 2024-04-18 09:38:33 +02:00			`- Added Continuous Integration with GitHub Actions (thanks to Mirko Bunse!)`
sketching readme system by Lu and King, Hopings and King 2024-02-16 17:34:10 +01:00
updating change log 2024-03-15 16:43:37 +01:00			`- Added Bayesian CC method (thanks to Pawel Czyz!). The method is described in detail in the paper`
			`Ziegler, Albert, and Paweł Czyż. "Bayesian Quantification with Black-Box Estimators."`
			`arXiv preprint arXiv:2302.09159 (2023).`
sketching readme system by Lu and King, Hopings and King 2024-02-16 17:34:10 +01:00
update changelog 2024-04-18 09:38:33 +02:00			`- Removed binary UCI datasets {acute.a, acute.b, balance.2} from the list qp.data.datasets.UCI_BINARY_DATASETS`
			`(the datasets are still loadable from the fetch_UCIBinaryLabelledCollection and fetch_UCIBinaryDataset`
			`functions, though). The reason is that these datasets tend to yield results (for all methods) that are`
			`one or two orders of magnitude greater than for other datasets, and this has a disproportionate impact in`
			`methods average (I suspect there is something wrong in those datasets).`

sketching readme system by Lu and King, Hopings and King 2024-02-16 17:34:10 +01:00
Added NAE, NRAE 2023-11-03 15:45:46 +01:00			`Change Log 0.1.8`
preparing to merge 2023-02-14 17:00:50 +01:00			`----------------`
many changes, see change log 2022-05-25 19:14:33 +02:00
passing pytests 2024-01-29 09:43:29 +01:00			`- Added Kernel Density Estimation methods (KDEyML, KDEyCS, KDEyHD) as proposed in the paper:`
			`Moreo, A., González, P., & del Coz, J. J. Kernel Density Estimation for Multiclass Quantification.`
			`arXiv preprint arXiv:2401.00490, 2024`

update changelog 2024-02-08 16:10:11 +01:00			`- Substantial internal refactor: aggregative methods now inherit a pattern by which the fit method consists of:`
			`a) fitting the classifier and returning the representations of the training instances (typically the posterior`
			`probabilities, the label predictions, or the classifier scores, and typically obtained through kFCV).`
			`b) fitting an aggregation function`
			`The function implemented in step a) is inherited from the super class. Each new aggregative method now has to`
			`implement only the "aggregative_fit" of step b).`
			`This pattern was already implemented for the prediction (thus allowing evaluation functions to be performed`
			`very quicky), and is now available also for training. The main benefit is that model selection now can nestle`
			`the training of quantifiers in two levels: one for the classifier, and another for the aggregation function.`
			`As a result, a method with a param grid of 10 combinations for the classifier and 10 combinations for the`
			`quantifier, now implies 10 trainings of the classifier + 10*10 trainings of the aggregation function (this is`
			`typically much faster than the classifier training), whereas in versions <0.1.8 this amounted to training`
fixing ifcb and documenting 2024-02-12 12:39:18 +01:00			`10*10 (classifiers+aggregations).`
update changelog 2024-02-08 16:10:11 +01:00
adding the approximate solution to ACC and PACC as suggested by Mirko Bunse 2024-01-25 16:43:00 +01:00			`- Added different solvers for ACC and PACC quantifiers. In quapy < 0.1.8 these quantifiers try to solve the system`
			`of equations Ax=B exactly (by means of np.linalg.solve). As noted by Mirko Bunse (thanks!), such an exact solution`
			`does sometimes not exist. In cases like this, quapy < 0.1.8 resorted to CC for providing a plausible solution.`
			`ACC and PACC now resorts to an approximated solution in such cases (minimizing the L2-norm of the difference`
			`between Ax-B) as proposed by Mirko Bunse. A quick experiment reveals this heuristic greatly improves the results`
			`of ACC and PACC in T2A@LeQua.`

updating change log file to give credit to T.Schumacher and colleagues for pointing out the errors in the threshold optimization methods 2024-01-19 18:18:38 +01:00			`- Fixed ThresholdOptimization methods (X, T50, MAX, MS and MS2). Thanks to Tobias Schumacher and colleagues for pointing`
			`this out in Appendix A of "Schumacher, T., Strohmaier, M., & Lemmerich, F. (2021). A comparative evaluation of`
			`quantification methods. arXiv:2103.03223v3 [cs.LG]"`
refactoring aggregation methods 2024-01-25 14:33:41 +01:00
added MedianEstimator quantifier 2023-11-09 14:20:41 +01:00			`- Added HDx and DistributionMatchingX to non-aggregative quantifiers (see also the new example "comparing_HDy_HDx.py")`
refactoring aggregation methods 2024-01-25 14:33:41 +01:00
changelog added 2023-10-23 11:39:47 +02:00			`- New UCI multiclass datasets added (thanks to Pablo González). The 5 UCI multiclass datasets are those corresponding`
			`to the following criteria:`
			`- >1000 instances`
			`- >2 classes`
			`- classification datasets`
			`- Python API available`
refactoring aggregation methods 2024-01-25 14:33:41 +01:00
passing pytests 2024-01-29 09:43:29 +01:00			`- New IFCB (plankton) dataset added (thanks to Pablo González). See qp.datasets.fetch_IFCB.`
refactoring aggregation methods 2024-01-25 14:33:41 +01:00
changelog updated 2024-02-08 15:59:56 +01:00			`- Added new evaluation measures NAE, NRAE (thanks to Andrea Esuli)`
refactoring aggregation methods 2024-01-25 14:33:41 +01:00
added MedianEstimator quantifier 2023-11-09 14:20:41 +01:00			`- Added new meta method "MedianEstimator"; an ensemble of binary base quantifiers that receives as input a dictionary`
			`of hyperparameters that will explore exhaustively, fitting and generating predictions for each combination of`
			`hyperparameters, and that returns, as the prevalence estimates, the median across all predictions.`
changelog added 2023-10-23 11:39:47 +02:00
changelog updated 2024-02-08 15:59:56 +01:00			`- Added "custom_protocol.py" example.`

			`- New API documentation template.`

changelog added 2023-10-23 11:39:47 +02:00			`Change Log 0.1.7`
			`----------------`

adding documentation and adding one new example 2023-02-08 19:06:53 +01:00			`- Protocols are now abstracted as instances of AbstractProtocol. There is a new class extending AbstractProtocol called`
many changes, see change log 2022-05-25 19:14:33 +02:00			`AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.`
evaluation updated 2023-02-14 11:14:38 +01:00			`There are some examples of protocols, APP, NPP, UPP, DomainMixer (experimental).`
preparing to merge 2023-02-14 17:00:50 +01:00			`The idea is to start the sample generation by simply calling the __call__ method.`
many changes, see change log 2022-05-25 19:14:33 +02:00			`This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,`
adding documentation and adding one new example 2023-02-08 19:06:53 +01:00			`and sampling functions in LabelledCollection relied of the old functions. E.g., the functionality of`
			`qp.evaluation.artificial_prevalence_report or qp.evaluation.natural_prevalence_report is now obtained by means of`
			`qp.evaluation.report which takes a protocol as an argument. I have not maintained compatibility with the old`
			`interfaces because I did not really like them. Check the wiki guide and the examples for more details.`

todos and change log 2022-11-08 16:36:52 +01:00			`- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in`
lequa as dataset 2022-06-01 18:28:59 +02:00			`QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).`

			`- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that`
			`consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in`
			`which the total number of classifications would be smaller than the number of classifications with the standard`
			`procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it`
			`or not.`
many changes, see change log 2022-05-25 19:14:33 +02:00
todos and change log 2022-11-08 16:36:52 +01:00			`- examples directory created!`

added DistributionMatching method, a generic model for distribution matching for multiclass quantification problems that takes the divergence and number of bins as hyperparameters 2023-01-31 15:08:58 +01:00			`- DyS, Topsoe distance and binary search (thanks to Pablo González)`
some bugfixes, unittest and minor changes 2023-01-16 13:51:29 +01:00
added DistributionMatching method, a generic model for distribution matching for multiclass quantification problems that takes the divergence and number of bins as hyperparameters 2023-01-31 15:08:58 +01:00			`- Multi-thread reproducibility via seeding (thanks to Pablo González)`
some bugfixes, unittest and minor changes 2023-01-16 13:51:29 +01:00
preparing to merge 2023-02-14 17:00:50 +01:00			`- n_jobs is now taken from the environment if set to None`

			`- ACC, PACC, Forman's threshold variants have been parallelized.`

			`- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a`
			`test protocol maybe, or None for bypassing it?`

some bugfixes, unittest and minor changes 2023-01-16 13:51:29 +01:00			`- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes`

			`- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances`
			`with the plain python type (e.g., float).`

adding calibration methods from the abstension package to quapy 2023-01-18 19:46:19 +01:00			`- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from`
			`https://github.com/kundajelab/abstention added.`
adding calibration methods from abstension package 2023-01-17 13:53:48 +01:00
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00			`- the internal classifier of aggregative methods is now called "classifier" instead of "learner"`

			`- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters`
added DistributionMatching method, a generic model for distribution matching for multiclass quantification problems that takes the divergence and number of bins as hyperparameters 2023-01-31 15:08:58 +01:00			`should be marked with a "classifier__" prefix (just like in scikit-learn with estimators), while the quantifier's`
			`specific hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has hyperparameters`
			`"classifier__C", "classifier__class_weight", etc., instead of "C" and "class_weight" as in v0.1.6.`

			`- hyperparameters yielding to inconsistent runs raise a ValueError exception, while hyperparameter combinations`
			`yielding to internal errors of surrogate functions are reported and skipped, without stopping the grid search.`

			`- DistributionMatching methods added. This is a general framework for distribution matching methods that catters for`
			`multiclass quantification. That is to say, one could get a multiclass variant of the (originally binary) HDy`
			`method aligned with the Firat's formulation.`
fixing hyperparameters with prefixes, and replacing learner with classifier in aggregative quantifiers 2023-01-27 18:13:23 +01:00
adding documentation and adding one new example 2023-02-08 19:06:53 +01:00			`- internal method properties "binary", "aggregative", and "probabilistic" have been removed; these conditions are`
			`checked via isinstance`

			`- quantifiers (i.e., classes that inherit from BaseQuantifier) are not forced to implement classes_ or n_classes;`
			`these can be used anyway internally, but the framework will not suppose (nor impose) that a quantifier implements`
			`them`

			`- qp.evaluation.prediction has been optimized so that, if a quantifier is of type aggregative, and if the evaluation`
			`protocol is of type OnLabelledCollection, then the computation is faster. In this specific case, the predictions`
			`are issued only once and for all, and not for each sample. An exception to this (which is implement also), is`
			`when the number of instances across all samples is anyway smaller than the number of instances in the original`
			`labelled collection; in this case the heuristic is of no help, and is therefore not applied.`

			`- the distinction between "classify" and "posterior_probabilities" has been removed in Aggregative quantifiers,`
			`so that probabilistic classifiers return posterior probabilities, while non-probabilistic quantifiers`
			`return crisp decisions.`

more examples, one-vs-all fixed 2023-02-09 19:43:24 +01:00			`- OneVsAll fixed. There are now two classes: a generic one OneVsAllGeneric that works with any quantifier (e.g.,`
			`any instance of BaseQuantifier), and a subclass of it called OneVsAllAggregative which implements the`
			`classify / aggregate interface. Both are instances of OneVsAll. There is a method getOneVsAll that returns the`
			`best instance based on the type of quantifier.`