2021-01-26 16:25:58 +01:00
|
|
|
# Generalized Funnelling (gFun)
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
```commandline
|
|
|
|
transformers==2.11.0
|
|
|
|
pandas==0.25.3
|
|
|
|
numpy==1.17.4
|
|
|
|
joblib==0.14.0
|
|
|
|
tqdm==4.50.2
|
|
|
|
pytorch_lightning==1.1.2
|
|
|
|
torch==1.3.1
|
|
|
|
nltk==3.4.5
|
|
|
|
scipy==1.3.3
|
|
|
|
rdflib==4.2.2
|
|
|
|
torchtext==0.4.0
|
|
|
|
scikit_learn==0.24.1
|
2021-01-26 16:17:17 +01:00
|
|
|
```
|
2021-01-26 16:25:58 +01:00
|
|
|
|
2021-01-26 16:27:00 +01:00
|
|
|
## Usage
|
2021-01-26 16:25:58 +01:00
|
|
|
```commandline
|
2021-01-26 16:17:17 +01:00
|
|
|
usage: main.py [-h] [-o CSV_DIR] [-x] [-w] [-m] [-b] [-g] [-c] [-n NEPOCHS]
|
|
|
|
[-j N_JOBS] [--muse_dir MUSE_DIR] [--gru_wce]
|
|
|
|
[--gru_dir GRU_DIR] [--bert_dir BERT_DIR] [--gpus GPUS]
|
|
|
|
dataset
|
|
|
|
|
|
|
|
Run generalized funnelling, A. Moreo, A. Pedrotti and F. Sebastiani (2020).
|
|
|
|
|
|
|
|
positional arguments:
|
|
|
|
dataset Path to the dataset
|
|
|
|
|
|
|
|
optional arguments:
|
|
|
|
-h, --help show this help message and exit
|
2021-01-26 17:23:54 +01:00
|
|
|
-o, --output result file (default ../csv_logs/gfun/gfun_results.csv)
|
2021-01-26 16:20:51 +01:00
|
|
|
-x, --post_embedder deploy posterior probabilities embedder to compute document embeddings
|
|
|
|
-w, --wce_embedder deploy (supervised) Word-Class embedder to the compute document embeddings
|
|
|
|
-m, --muse_embedder deploy (pretrained) MUSE embedder to compute document embeddings
|
|
|
|
-b, --bert_embedder deploy multilingual Bert to compute document embeddings
|
2021-01-26 16:17:17 +01:00
|
|
|
-g, --gru_embedder deploy a GRU in order to compute document embeddings
|
2021-01-26 17:23:54 +01:00
|
|
|
-c, --c_optimize optimize SVMs C hyperparameter
|
2021-01-28 18:12:20 +01:00
|
|
|
-j, --n_jobs number of parallel jobs, default is -1 i.e., all
|
2021-01-26 18:56:24 +01:00
|
|
|
--nepochs_rnn number of max epochs to train Recurrent embedder (i.e., -g), default 150
|
2021-01-26 17:23:54 +01:00
|
|
|
--nepochs_bert number of max epochs to train Bert model (i.e., -g), default 10
|
2021-01-29 18:18:47 +01:00
|
|
|
--patience_rnn set early stop patience for the RecurrentGen, default 25
|
|
|
|
--patience_bert set early stop patience for the BertGen, default 5
|
|
|
|
--batch_rnn set batchsize for the RecurrentGen, default 64
|
|
|
|
--batch_bert set batchsize for the BertGen, default 4
|
2021-01-26 17:23:54 +01:00
|
|
|
--muse_dir path to the MUSE polylingual word embeddings (default ../embeddings)
|
|
|
|
--gru_wce deploy WCE embedding as embedding layer of the GRU View Generator
|
2021-01-29 18:18:47 +01:00
|
|
|
--rnn_dir set the path to a pretrained RNN model (i.e., -g view generator)
|
2021-01-26 17:23:54 +01:00
|
|
|
--bert_dir set the path to a pretrained mBERT model (i.e., -b view generator)
|
|
|
|
--gpus specifies how many GPUs to use per node
|
2021-01-26 16:17:17 +01:00
|
|
|
```
|