QuaPy/TODO.txt


Packaging:
==========================================
Documentation with sphinx
Document methods with paper references
unit-tests
clean wiki_examples!

Refactor:
==========================================
Unify ThresholdOptimization methods, as an extension of PACC (and not ACC), the fit methods are almost identical and
    use a prob classifier (take into account that PACC uses pcc internally, whereas the threshold methods use cc
    instead). The fit method of ACC and PACC has a block for estimating the validation estimates that should be unified
    as well...
Refactor protocols. APP and NPP related functionalities are duplicated in functional, LabelledCollection, and evaluation


New features:
==========================================
Add NAE, NRAE
Add "measures for evaluating ordinal"?
Add datasets for topic.
Do we want to cover cross-lingual quantification natively in QuaPy, or does it make more sense as an application on top?

Current issues:
==========================================
Revise the class structure of quantification methods and the methods they inherit... There is some confusion regarding
    methods isbinary, isprobabilistic, and the like. The attribute "learner_" in aggregative quantifiers is also
    confusing, since there is a getter and a setter.
Remove the "deep" in get_params. There is no real compatibility with scikit-learn as for now.
SVMperf-based learners do not remove temp files in __del__?
In binary quantification (hp, kindle, imdb) we used F1 in the minority class (which in kindle and hp happens to be the
negative class). This is not covered in this new implementation, in which the binary case is not treated as such, but as
an instance of single-label with 2 labels. Check
Add automatic reindex of class labels in LabelledCollection (currently, class indexes should be ordered and with no gaps)
OVR I believe is currently tied to aggregative methods. We should provide a general interface also for general quantifiers
Currently, being "binary" only adds one checker; we should figure out how to impose the check to be automatically performed
Add random seed management to support replicability (see temp_seed in util.py).
GridSearchQ is not trully parallelized. It only parallelizes on the predictions.
In the context of a quantifier (e.g., QuaNet or CC), the parameters of the learner should be prefixed with "estimator__",
    in QuaNet this is resolved with a __check_params_colision, but this should be improved. It might be cumbersome to
    impose the "estimator__" prefix for, e.g., quantifiers like CC though... This should be changed everywhere...
QuaNet needs refactoring. The base quantifiers ACC and PACC receive val_data with instances already transformed. This
    issue is due to a bad design.

Improvements:
==========================================
Explore the hyperparameter "number of bins" in HDy
Rename EMQ to SLD ?
Parallelize the kFCV in ACC and PACC?
Parallelize model selection trainings
We might want to think of (improving and) adding the class Tabular (it is defined and used on branch tweetsent). A more
    recent version is in the project ql4facct. This class is meant to generate latex tables from results (highligting
    best results, computing statistical tests, colouring cells, producing rankings, producing averages, etc.). Trying
    to generate tables is typically a bad idea, but in this specific case we do have pretty good control of what an
    experiment looks like. (Do we want to abstract experimental results? this could be useful not only for tables but
    also for plots).
Add proper logging system. Currently we use print
It might be good to simplify the number of methods that have to be implemented for any new Quantifier. At the moment,
    there are many functions like get_params, set_params, and, specially, @property classes_, which are cumbersome to
    implement for quick experiments. A possible solution is to impose get_params and set_params only in cases in which
    the model extends some "ModelSelectable" interface only. The classes_ should have a default implementation.

Checks:
==========================================
How many times is the system of equations for ACC and PACC not solved? How many times is it clipped? Do they sum up
    to one always?
Re-check how hyperparameters from the quantifier and hyperparameters from the classifier (in aggregative quantifiers)
    is handled. In scikit-learn the hyperparameters from a wrapper method are indicated directly whereas the hyperparams
    from the internal learner are prefixed with "estimator__". In QuaPy, combinations having to do with the classifier
    can be computed at the begining, and then in an internal loop the hyperparams of the quantifier can be explored,
    passing fit_learner=False.
Re-check Ensembles. As for now, they are strongly tied to aggregative quantifiers.
Re-think the environment variables. Maybe add new ones (like, for example, parameters for the plots)
Do we want to wrap prevalences (currently simple np.ndarray) as a class? This might be convenient for some interfaces
    (e.g., for specifying artificial prevalences in samplings, for printing them -- currently supported through
    F.strprev(), etc.). This might however add some overload, and prevent/difficult post processing with numpy.
Would be nice to get a better integration with sklearn.
branch for LeQua2022 - first commit 2021-10-13 20:36:53 +02:00
todo update 2021-04-27 18:47:25 +02:00			`Packaging:`
			`==========================================`
svmperf wrapper added 2020-12-03 16:59:13 +01:00			`Documentation with sphinx`
todo update 2021-04-27 18:47:25 +02:00			`Document methods with paper references`
			`unit-tests`
fixing fit_learner=False case in QuaNet 2021-06-21 11:13:14 +02:00			`clean wiki_examples!`
todo update 2021-04-27 18:47:25 +02:00
renaming functions to match the app and npp nomenclature; adding npp as an option for GridSearchQ 2021-06-16 11:45:40 +02:00			`Refactor:`
			`==========================================`
			`Unify ThresholdOptimization methods, as an extension of PACC (and not ACC), the fit methods are almost identical and`
			`use a prob classifier (take into account that PACC uses pcc internally, whereas the threshold methods use cc`
			`instead). The fit method of ACC and PACC has a block for estimating the validation estimates that should be unified`
			`as well...`
update doc 2021-12-07 17:16:39 +01:00			`Refactor protocols. APP and NPP related functionalities are duplicated in functional, LabelledCollection, and evaluation`

renaming functions to match the app and npp nomenclature; adding npp as an option for GridSearchQ 2021-06-16 11:45:40 +02:00
todo update 2021-04-27 18:47:25 +02:00			`New features:`
			`==========================================`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`Add NAE, NRAE`
			`Add "measures for evaluating ordinal"?`
todo update 2021-04-27 18:47:25 +02:00			`Add datasets for topic.`
			`Do we want to cover cross-lingual quantification natively in QuaPy, or does it make more sense as an application on top?`

			`Current issues:`
			`==========================================`
adding documentation 2021-12-15 15:27:43 +01:00			`Revise the class structure of quantification methods and the methods they inherit... There is some confusion regarding`
			`methods isbinary, isprobabilistic, and the like. The attribute "learner_" in aggregative quantifiers is also`
			`confusing, since there is a getter and a setter.`
			`Remove the "deep" in get_params. There is no real compatibility with scikit-learn as for now.`
fix in PCALR 2021-06-11 10:52:30 +02:00			`SVMperf-based learners do not remove temp files in __del__?`
QuaNet added, two examples of TextClassifiers added (CNN, LSTM) 2020-12-29 20:33:59 +01:00			`In binary quantification (hp, kindle, imdb) we used F1 in the minority class (which in kindle and hp happens to be the`
			`negative class). This is not covered in this new implementation, in which the binary case is not treated as such, but as`
			`an instance of single-label with 2 labels. Check`
making everything work like in the wiki 2021-02-17 18:05:22 +01:00			`Add automatic reindex of class labels in LabelledCollection (currently, class indexes should be ordered and with no gaps)`
todo update 2021-04-27 18:47:25 +02:00			`OVR I believe is currently tied to aggregative methods. We should provide a general interface also for general quantifiers`
			`Currently, being "binary" only adds one checker; we should figure out how to impose the check to be automatically performed`
Tests 2021-05-10 10:26:51 +02:00			`Add random seed management to support replicability (see temp_seed in util.py).`
renaming functions to match the app and npp nomenclature; adding npp as an option for GridSearchQ 2021-06-16 11:45:40 +02:00			`GridSearchQ is not trully parallelized. It only parallelizes on the predictions.`
fixing fit_learner=False case in QuaNet 2021-06-21 11:13:14 +02:00			`In the context of a quantifier (e.g., QuaNet or CC), the parameters of the learner should be prefixed with "estimator__",`
			`in QuaNet this is resolved with a __check_params_colision, but this should be improved. It might be cumbersome to`
			`impose the "estimator__" prefix for, e.g., quantifiers like CC though... This should be changed everywhere...`
adding features for cross-lingual 2021-07-01 18:34:24 +02:00			`QuaNet needs refactoring. The base quantifiers ACC and PACC receive val_data with instances already transformed. This`
			`issue is due to a bad design.`
todo update 2021-04-27 18:47:25 +02:00
			`Improvements:`
			`==========================================`
added Ensemble methods (methods ALL, ACC, Ptr, DS from Pérez-Gallego et al 2017 and 2019) and some UCI ML datasets used in those articles (only 5 datasets out of 32 they used) 2021-01-06 14:58:29 +01:00			`Explore the hyperparameter "number of bins" in HDy`
			`Rename EMQ to SLD ?`
todo update 2021-04-27 18:47:25 +02:00			`Parallelize the kFCV in ACC and PACC?`
			`Parallelize model selection trainings`
			`We might want to think of (improving and) adding the class Tabular (it is defined and used on branch tweetsent). A more`
			`recent version is in the project ql4facct. This class is meant to generate latex tables from results (highligting`
			`best results, computing statistical tests, colouring cells, producing rankings, producing averages, etc.). Trying`
			`to generate tables is typically a bad idea, but in this specific case we do have pretty good control of what an`
			`experiment looks like. (Do we want to abstract experimental results? this could be useful not only for tables but`
			`also for plots).`
renaming functions to match the app and npp nomenclature; adding npp as an option for GridSearchQ 2021-06-16 11:45:40 +02:00			`Add proper logging system. Currently we use print`
adding features for cross-lingual 2021-07-01 18:34:24 +02:00			`It might be good to simplify the number of methods that have to be implemented for any new Quantifier. At the moment,`
			`there are many functions like get_params, set_params, and, specially, @property classes_, which are cumbersome to`
			`implement for quick experiments. A possible solution is to impose get_params and set_params only in cases in which`
			`the model extends some "ModelSelectable" interface only. The classes_ should have a default implementation.`
todo update 2021-04-27 18:47:25 +02:00
			`Checks:`
			`==========================================`
added Ensemble methods (methods ALL, ACC, Ptr, DS from Pérez-Gallego et al 2017 and 2019) and some UCI ML datasets used in those articles (only 5 datasets out of 32 they used) 2021-01-06 14:58:29 +01:00			`How many times is the system of equations for ACC and PACC not solved? How many times is it clipped? Do they sum up`
adding tweet sent quant experiments 2021-01-11 18:31:12 +01:00			`to one always?`
todo update 2021-04-27 18:47:25 +02:00			`Re-check how hyperparameters from the quantifier and hyperparameters from the classifier (in aggregative quantifiers)`
			`is handled. In scikit-learn the hyperparameters from a wrapper method are indicated directly whereas the hyperparams`
			`from the internal learner are prefixed with "estimator__". In QuaPy, combinations having to do with the classifier`
			`can be computed at the begining, and then in an internal loop the hyperparams of the quantifier can be explored,`
			`passing fit_learner=False.`
			`Re-check Ensembles. As for now, they are strongly tied to aggregative quantifiers.`
			`Re-think the environment variables. Maybe add new ones (like, for example, parameters for the plots)`
			`Do we want to wrap prevalences (currently simple np.ndarray) as a class? This might be convenient for some interfaces`
			`(e.g., for specifying artificial prevalences in samplings, for printing them -- currently supported through`
			`F.strprev(), etc.). This might however add some overload, and prevent/difficult post processing with numpy.`
			`Would be nice to get a better integration with sklearn.`