gFun/readme.md

52 lines
2.2 KiB
Markdown
Raw Normal View History

2021-01-26 16:25:58 +01:00
# Generalized Funnelling (gFun)
## Requirements
```commandline
transformers==2.11.0
pandas==0.25.3
numpy==1.17.4
joblib==0.14.0
tqdm==4.50.2
pytorch_lightning==1.1.2
torch==1.3.1
nltk==3.4.5
scipy==1.3.3
rdflib==4.2.2
torchtext==0.4.0
scikit_learn==0.24.1
2021-01-26 16:17:17 +01:00
```
2021-01-26 16:25:58 +01:00
2021-01-26 16:27:00 +01:00
## Usage
2021-01-26 16:25:58 +01:00
```commandline
2021-01-26 16:17:17 +01:00
usage: main.py [-h] [-o CSV_DIR] [-x] [-w] [-m] [-b] [-g] [-c] [-n NEPOCHS]
[-j N_JOBS] [--muse_dir MUSE_DIR] [--gru_wce]
[--gru_dir GRU_DIR] [--bert_dir BERT_DIR] [--gpus GPUS]
dataset
Run generalized funnelling, A. Moreo, A. Pedrotti and F. Sebastiani (2020).
positional arguments:
dataset Path to the dataset
optional arguments:
-h, --help show this help message and exit
2021-01-26 17:23:54 +01:00
-o, --output result file (default ../csv_logs/gfun/gfun_results.csv)
2021-01-26 16:20:51 +01:00
-x, --post_embedder deploy posterior probabilities embedder to compute document embeddings
-w, --wce_embedder deploy (supervised) Word-Class embedder to the compute document embeddings
-m, --muse_embedder deploy (pretrained) MUSE embedder to compute document embeddings
-b, --bert_embedder deploy multilingual Bert to compute document embeddings
2021-01-26 16:17:17 +01:00
-g, --gru_embedder deploy a GRU in order to compute document embeddings
2021-01-26 17:23:54 +01:00
-c, --c_optimize optimize SVMs C hyperparameter
-j, --n_jobs number of parallel jobs, default is -1 i.e., all
2021-01-26 18:56:24 +01:00
--nepochs_rnn number of max epochs to train Recurrent embedder (i.e., -g), default 150
2021-01-26 17:23:54 +01:00
--nepochs_bert number of max epochs to train Bert model (i.e., -g), default 10
--patience_rnn set early stop patience for the RecurrentGen, default 25
--patience_bert set early stop patience for the BertGen, default 5
--batch_rnn set batchsize for the RecurrentGen, default 64
--batch_bert set batchsize for the BertGen, default 4
2021-01-26 17:23:54 +01:00
--muse_dir path to the MUSE polylingual word embeddings (default ../embeddings)
--gru_wce deploy WCE embedding as embedding layer of the GRU View Generator
--rnn_dir set the path to a pretrained RNN model (i.e., -g view generator)
2021-01-26 17:23:54 +01:00
--bert_dir set the path to a pretrained mBERT model (i.e., -b view generator)
--gpus specifies how many GPUs to use per node
2021-01-26 16:17:17 +01:00
```