Merge branch 'master' of https://github.com/AlexMoreo/dante-verification
This commit is contained in:
commit
b4209a15a0
22
README.md
22
README.md
|
|
@ -1,8 +1,6 @@
|
|||
# Authorship Verification for Medieval Latin
|
||||
|
||||
Code to reproduce the experiments reported in the papers
|
||||
["The Epistle to Cangrande Through the Lens of Computational Authorship Verification"](https://link.springer.com/chapter/10.1007/978-3-030-30754-7_15)
|
||||
and
|
||||
Code to reproduce the experiments reported in the paper
|
||||
["L’Epistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192)
|
||||
|
||||
## Requirements:
|
||||
|
|
@ -13,10 +11,8 @@ The experiments have been run using the following packages (older versions might
|
|||
* scikit-learn==0.22.2.post1
|
||||
* scipy==1.4.1
|
||||
|
||||
|
||||
## Disclaimer:
|
||||
The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus.
|
||||
We are waiting for some of these responses to arrive.
|
||||
## Dataset:
|
||||
The dataset can be downloaded from [http://hlt.isti.cnr.it/medlatin/](http://hlt.isti.cnr.it/medlatin/).
|
||||
|
||||
## Running the Experiments
|
||||
The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help):
|
||||
|
|
@ -44,7 +40,7 @@ optional arguments:
|
|||
The following command line:
|
||||
```
|
||||
cd src
|
||||
python author_identification.py ../Corpora/CorpusI Dante --unknown ../Epistle/EpistolaXIII_1.txt
|
||||
python author_identification.py ../Corpora/MedLatin1 Dante --unknown ../Epistle/EpistolaXIII_1.txt
|
||||
```
|
||||
|
||||
Will use all texts in ../Corpora/CorpusI as training documents to train a verificator for the
|
||||
|
|
@ -56,18 +52,18 @@ to the positive class.
|
|||
Similarly, the command line:
|
||||
```
|
||||
cd src
|
||||
python author_identification.py ../Corpora/CorpusI ALL --loo
|
||||
python author_identification.py ../Corpora/MedLatin1 ALL --loo
|
||||
```
|
||||
will perform a cross-validation of the binary classifier for all authors using all training documents in a leave-one-out (LOO) fashion.
|
||||
|
||||
The script will report the results both in the standard output (more elaborated) and in a log file. For example, the last command will produce a log file containing:
|
||||
```
|
||||
F1 for ClaraAssisiensis = 0.400
|
||||
F1 for ClaraAssisiensis = 0.571
|
||||
F1 for Dante = 0.957
|
||||
F1 for GiovanniBoccaccio = 1.000
|
||||
F1 for GuidoFaba = 0.974
|
||||
F1 for GuidoFaba = 0.980
|
||||
F1 for PierDellaVigna = 0.993
|
||||
LOO Macro-F1 = 0.865
|
||||
LOO Micro-F1 = 0.981
|
||||
LOO Macro-F1 = 0.900
|
||||
LOO Micro-F1 = 0.985
|
||||
```
|
||||
(Note that small numerical variations with respect to the original papers might occur due to different software versions and as a result from any stochastic underlying process. Those changes should anyway not alter the conclusions derived from the published results.)
|
||||
|
|
|
|||
Loading…
Reference in New Issue