Update README.md

This commit is contained in:
Alejandro Moreo Fernandez 2020-06-15 18:50:39 +02:00 committed by GitHub
parent c77ac59331
commit f0e596c1ac
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 8 additions and 10 deletions

View File

@ -11,10 +11,8 @@ The experiments have been run using the following packages (older versions might
* scikit-learn==0.22.2.post1
* scipy==1.4.1
## Disclaimer:
The dataset is not distributed in this version. We have asked the Editors of each document for permission to publish the corpus.
We are waiting for some of these responses to arrive.
## Dataset:
The dataset can be downloaded from [http://hlt.isti.cnr.it/medlatin/](http://hlt.isti.cnr.it/medlatin/).
## Running the Experiments
The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help):
@ -42,7 +40,7 @@ optional arguments:
The following command line:
```
cd src
python author_identification.py ../Corpora/CorpusI Dante --unknown ../Epistle/EpistolaXIII_1.txt
python author_identification.py ../Corpora/MedLatin1 Dante --unknown ../Epistle/EpistolaXIII_1.txt
```
Will use all texts in ../Corpora/CorpusI as training documents to train a verificator for the
@ -54,18 +52,18 @@ to the positive class.
Similarly, the command line:
```
cd src
python author_identification.py ../Corpora/CorpusI ALL --loo
python author_identification.py ../Corpora/MedLatin1 ALL --loo
```
will perform a cross-validation of the binary classifier for all authors using all training documents in a leave-one-out (LOO) fashion.
The script will report the results both in the standard output (more elaborated) and in a log file. For example, the last command will produce a log file containing:
```
F1 for ClaraAssisiensis = 0.400
F1 for ClaraAssisiensis = 0.571
F1 for Dante = 0.957
F1 for GiovanniBoccaccio = 1.000
F1 for GuidoFaba = 0.974
F1 for GuidoFaba = 0.980
F1 for PierDellaVigna = 0.993
LOO Macro-F1 = 0.865
LOO Micro-F1 = 0.981
LOO Macro-F1 = 0.900
LOO Micro-F1 = 0.985
```
(Note that small numerical variations with respect to the original papers might occur due to different software versions and as a result from any stochastic underlying process. Those changes should anyway not alter the conclusions derived from the published results.)