Go to file
andrea a4f74dcf41 fixed pl early stop --> patience was consumed if actual_monitor == best_monitor. Set policy to greater or equal. 2021-02-11 18:37:34 +01:00
src fixed pl early stop --> patience was consumed if actual_monitor == best_monitor. Set policy to greater or equal. 2021-02-11 18:37:34 +01:00
.gitignore gitignore 2020-07-27 12:20:59 +02:00
main.py fixed pl early stop --> patience was consumed if actual_monitor == best_monitor. Set policy to greater or equal. 2021-02-11 18:37:34 +01:00
readme.md Set arguments in order to reproduce 'master' performances with Neural setting 2021-01-29 18:20:29 +01:00
requirements.txt fixed imports 2021-01-26 15:52:09 +01:00
run.sh fixed view generators' transform method 2021-01-28 18:12:20 +01:00

readme.md

Generalized Funnelling (gFun)

Requirements

transformers==2.11.0
pandas==0.25.3
numpy==1.17.4
joblib==0.14.0
tqdm==4.50.2
pytorch_lightning==1.1.2
torch==1.3.1
nltk==3.4.5
scipy==1.3.3
rdflib==4.2.2
torchtext==0.4.0
scikit_learn==0.24.1

Usage

usage: main.py [-h] [-o CSV_DIR] [-x] [-w] [-m] [-b] [-g] [-c] [-n NEPOCHS]
               [-j N_JOBS] [--muse_dir MUSE_DIR] [--gru_wce]
               [--gru_dir GRU_DIR] [--bert_dir BERT_DIR] [--gpus GPUS]
               dataset

Run generalized funnelling, A. Moreo, A. Pedrotti and F. Sebastiani (2020).

positional arguments:
  dataset               Path to the dataset

optional arguments:
  -h, --help            show this help message and exit
  -o, --output          result file (default ../csv_logs/gfun/gfun_results.csv)
  -x, --post_embedder   deploy posterior probabilities embedder to compute document embeddings
  -w, --wce_embedder    deploy (supervised) Word-Class embedder to the compute document embeddings
  -m, --muse_embedder   deploy (pretrained) MUSE embedder to compute document embeddings
  -b, --bert_embedder   deploy multilingual Bert to compute document embeddings
  -g, --gru_embedder    deploy a GRU in order to compute document embeddings
  -c, --c_optimize      optimize SVMs C hyperparameter
  -j, --n_jobs          number of parallel jobs, default is -1 i.e., all 
  --nepochs_rnn         number of max epochs to train Recurrent embedder (i.e., -g), default 150
  --nepochs_bert        number of max epochs to train Bert model (i.e., -g), default 10
  --patience_rnn        set early stop patience for the RecurrentGen, default 25
  --patience_bert       set early stop patience for the BertGen, default 5
  --batch_rnn           set batchsize for the RecurrentGen, default 64
  --batch_bert          set batchsize for the BertGen, default 4
  --muse_dir            path to the MUSE polylingual word embeddings (default ../embeddings)
  --gru_wce             deploy WCE embedding as embedding layer of the GRU View Generator
  --rnn_dir             set the path to a pretrained RNN model (i.e., -g view generator)
  --bert_dir            set the path to a pretrained mBERT model (i.e., -b view generator)
  --gpus                specifies how many GPUs to use per node