175 lines
11 KiB
Plaintext
175 lines
11 KiB
Plaintext
Change Log 0.1.9
|
|
----------------
|
|
- [TODO] add LeQua2024 and normalized match distance to qp.error
|
|
- [TODO] add Friedman's method and DeBias
|
|
|
|
- Added a default classifier for aggregative quantifiers, which now can be instantiated without specifying
|
|
the classifier. The default classifier can be accessed in qp.environ['DEFAULT_CLS'] and is assigned to
|
|
sklearn.linear_model.LogisticRegression(max_iter=3000). If the classifier is not specified, then a clone
|
|
of said classifier is returned. E.g.:
|
|
> pacc = PACC()
|
|
is equivalent to:
|
|
> pacc = PACC(classifier=LogisticRegression(max_iter=3000))
|
|
|
|
- Improved error loging in model selection. In v0.1.8 only Status.INVALID was reported; in v0.1.9 it is
|
|
now accompanied by a textual description of the error
|
|
|
|
- The number of parallel workers can now be set via an environment variable by running, e.g.:
|
|
> N_JOBS=10 python3 your_script.py
|
|
which has the same effect as writing the following code at the beginning of your_script.py:
|
|
> import quapy as qp
|
|
> qp.environ["N_JOBS"] = 10
|
|
|
|
- Some examples have been added to the ./examples/ dir, which now contains numbered examples from basics (0)
|
|
to advanced topics (higher numbers)
|
|
|
|
- Moved the wiki documents to the ./docs/ folder so that they become editable via PR for the community
|
|
|
|
- Added Composable methods from Mirko Bunse's qunfold library! (thanks to Mirko Bunse!)
|
|
|
|
- Added Continuous Integration with GitHub Actions (thanks to Mirko Bunse!)
|
|
|
|
- Added Bayesian CC method (thanks to Pawel Czyz!). The method is described in detail in the paper
|
|
Ziegler, Albert, and Paweł Czyż. "Bayesian Quantification with Black-Box Estimators."
|
|
arXiv preprint arXiv:2302.09159 (2023).
|
|
|
|
- Removed binary UCI datasets {acute.a, acute.b, balance.2} from the list qp.data.datasets.UCI_BINARY_DATASETS
|
|
(the datasets are still loadable from the fetch_UCIBinaryLabelledCollection and fetch_UCIBinaryDataset
|
|
functions, though). The reason is that these datasets tend to yield results (for all methods) that are
|
|
one or two orders of magnitude greater than for other datasets, and this has a disproportionate impact in
|
|
methods average (I suspect there is something wrong in those datasets).
|
|
|
|
|
|
Change Log 0.1.8
|
|
----------------
|
|
|
|
- Added Kernel Density Estimation methods (KDEyML, KDEyCS, KDEyHD) as proposed in the paper:
|
|
Moreo, A., González, P., & del Coz, J. J. Kernel Density Estimation for Multiclass Quantification.
|
|
arXiv preprint arXiv:2401.00490, 2024
|
|
|
|
- Substantial internal refactor: aggregative methods now inherit a pattern by which the fit method consists of:
|
|
a) fitting the classifier and returning the representations of the training instances (typically the posterior
|
|
probabilities, the label predictions, or the classifier scores, and typically obtained through kFCV).
|
|
b) fitting an aggregation function
|
|
The function implemented in step a) is inherited from the super class. Each new aggregative method now has to
|
|
implement only the "aggregative_fit" of step b).
|
|
This pattern was already implemented for the prediction (thus allowing evaluation functions to be performed
|
|
very quicky), and is now available also for training. The main benefit is that model selection now can nestle
|
|
the training of quantifiers in two levels: one for the classifier, and another for the aggregation function.
|
|
As a result, a method with a param grid of 10 combinations for the classifier and 10 combinations for the
|
|
quantifier, now implies 10 trainings of the classifier + 10*10 trainings of the aggregation function (this is
|
|
typically much faster than the classifier training), whereas in versions <0.1.8 this amounted to training
|
|
10*10 (classifiers+aggregations).
|
|
|
|
- Added different solvers for ACC and PACC quantifiers. In quapy < 0.1.8 these quantifiers try to solve the system
|
|
of equations Ax=B exactly (by means of np.linalg.solve). As noted by Mirko Bunse (thanks!), such an exact solution
|
|
does sometimes not exist. In cases like this, quapy < 0.1.8 resorted to CC for providing a plausible solution.
|
|
ACC and PACC now resorts to an approximated solution in such cases (minimizing the L2-norm of the difference
|
|
between Ax-B) as proposed by Mirko Bunse. A quick experiment reveals this heuristic greatly improves the results
|
|
of ACC and PACC in T2A@LeQua.
|
|
|
|
- Fixed ThresholdOptimization methods (X, T50, MAX, MS and MS2). Thanks to Tobias Schumacher and colleagues for pointing
|
|
this out in Appendix A of "Schumacher, T., Strohmaier, M., & Lemmerich, F. (2021). A comparative evaluation of
|
|
quantification methods. arXiv:2103.03223v3 [cs.LG]"
|
|
|
|
- Added HDx and DistributionMatchingX to non-aggregative quantifiers (see also the new example "comparing_HDy_HDx.py")
|
|
|
|
- New UCI multiclass datasets added (thanks to Pablo González). The 5 UCI multiclass datasets are those corresponding
|
|
to the following criteria:
|
|
- >1000 instances
|
|
- >2 classes
|
|
- classification datasets
|
|
- Python API available
|
|
|
|
- New IFCB (plankton) dataset added (thanks to Pablo González). See qp.datasets.fetch_IFCB.
|
|
|
|
- Added new evaluation measures NAE, NRAE (thanks to Andrea Esuli)
|
|
|
|
- Added new meta method "MedianEstimator"; an ensemble of binary base quantifiers that receives as input a dictionary
|
|
of hyperparameters that will explore exhaustively, fitting and generating predictions for each combination of
|
|
hyperparameters, and that returns, as the prevalence estimates, the median across all predictions.
|
|
|
|
- Added "custom_protocol.py" example.
|
|
|
|
- New API documentation template.
|
|
|
|
Change Log 0.1.7
|
|
----------------
|
|
|
|
- Protocols are now abstracted as instances of AbstractProtocol. There is a new class extending AbstractProtocol called
|
|
AbstractStochasticSeededProtocol, which implements a seeding policy to allow replicate the series of samplings.
|
|
There are some examples of protocols, APP, NPP, UPP, DomainMixer (experimental).
|
|
The idea is to start the sample generation by simply calling the __call__ method.
|
|
This change has a great impact in the framework, since many functions in qp.evaluation, qp.model_selection,
|
|
and sampling functions in LabelledCollection relied of the old functions. E.g., the functionality of
|
|
qp.evaluation.artificial_prevalence_report or qp.evaluation.natural_prevalence_report is now obtained by means of
|
|
qp.evaluation.report which takes a protocol as an argument. I have not maintained compatibility with the old
|
|
interfaces because I did not really like them. Check the wiki guide and the examples for more details.
|
|
|
|
- Exploration of hyperparameters in Model selection can now be run in parallel (there was a n_jobs argument in
|
|
QuaPy 0.1.6 but only the evaluation part for one specific hyperparameter was run in parallel).
|
|
|
|
- The prediction function has been refactored, so it applies the optimization for aggregative quantifiers (that
|
|
consists in pre-classifying all instances, and then only invoking aggregate on the samples) only in cases in
|
|
which the total number of classifications would be smaller than the number of classifications with the standard
|
|
procedure. The user can now specify "force", "auto", True of False, in order to actively decide for applying it
|
|
or not.
|
|
|
|
- examples directory created!
|
|
|
|
- DyS, Topsoe distance and binary search (thanks to Pablo González)
|
|
|
|
- Multi-thread reproducibility via seeding (thanks to Pablo González)
|
|
|
|
- n_jobs is now taken from the environment if set to None
|
|
|
|
- ACC, PACC, Forman's threshold variants have been parallelized.
|
|
|
|
- cross_val_predict (for quantification) added to model_selection: would be nice to allow the user specifies a
|
|
test protocol maybe, or None for bypassing it?
|
|
|
|
- Bugfix: adding two labelled collections (with +) now checks for consistency in the classes
|
|
|
|
- newer versions of numpy raise a warning when accessing types (e.g., np.float). I have replaced all such instances
|
|
with the plain python type (e.g., float).
|
|
|
|
- new dependency "abstention" (to add to the project requirements and setup). Calibration methods from
|
|
https://github.com/kundajelab/abstention added.
|
|
|
|
- the internal classifier of aggregative methods is now called "classifier" instead of "learner"
|
|
|
|
- when optimizing the hyperparameters of an aggregative quantifier, the classifier's specific hyperparameters
|
|
should be marked with a "classifier__" prefix (just like in scikit-learn with estimators), while the quantifier's
|
|
specific hyperparameters are named directly. For example, PCC(LogisticRegression()) quantifier has hyperparameters
|
|
"classifier__C", "classifier__class_weight", etc., instead of "C" and "class_weight" as in v0.1.6.
|
|
|
|
- hyperparameters yielding to inconsistent runs raise a ValueError exception, while hyperparameter combinations
|
|
yielding to internal errors of surrogate functions are reported and skipped, without stopping the grid search.
|
|
|
|
- DistributionMatching methods added. This is a general framework for distribution matching methods that catters for
|
|
multiclass quantification. That is to say, one could get a multiclass variant of the (originally binary) HDy
|
|
method aligned with the Firat's formulation.
|
|
|
|
- internal method properties "binary", "aggregative", and "probabilistic" have been removed; these conditions are
|
|
checked via isinstance
|
|
|
|
- quantifiers (i.e., classes that inherit from BaseQuantifier) are not forced to implement classes_ or n_classes;
|
|
these can be used anyway internally, but the framework will not suppose (nor impose) that a quantifier implements
|
|
them
|
|
|
|
- qp.evaluation.prediction has been optimized so that, if a quantifier is of type aggregative, and if the evaluation
|
|
protocol is of type OnLabelledCollection, then the computation is faster. In this specific case, the predictions
|
|
are issued only once and for all, and not for each sample. An exception to this (which is implement also), is
|
|
when the number of instances across all samples is anyway smaller than the number of instances in the original
|
|
labelled collection; in this case the heuristic is of no help, and is therefore not applied.
|
|
|
|
- the distinction between "classify" and "posterior_probabilities" has been removed in Aggregative quantifiers,
|
|
so that probabilistic classifiers return posterior probabilities, while non-probabilistic quantifiers
|
|
return crisp decisions.
|
|
|
|
- OneVsAll fixed. There are now two classes: a generic one OneVsAllGeneric that works with any quantifier (e.g.,
|
|
any instance of BaseQuantifier), and a subclass of it called OneVsAllAggregative which implements the
|
|
classify / aggregate interface. Both are instances of OneVsAll. There is a method getOneVsAll that returns the
|
|
best instance based on the type of quantifier.
|
|
|