This commit is contained in:
Alejandro Moreo Fernandez 2020-06-16 13:49:27 +02:00
commit b4209a15a0
1 changed files with 9 additions and 13 deletions

View File

@ -1,8 +1,6 @@
# Authorship Verification for Medieval Latin # Authorship Verification for Medieval Latin
Code to reproduce the experiments reported in the papers Code to reproduce the experiments reported in the paper
["The Epistle to Cangrande Through the Lens of Computational Authorship Verification"](https://link.springer.com/chapter/10.1007/978-3-030-30754-7_15)
and
["LEpistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192) ["LEpistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192)
## Requirements: ## Requirements:
@ -13,10 +11,8 @@ The experiments have been run using the following packages (older versions might
* scikit-learn==0.22.2.post1 * scikit-learn==0.22.2.post1
* scipy==1.4.1 * scipy==1.4.1
## Dataset:
## Disclaimer: The dataset can be downloaded from [http://hlt.isti.cnr.it/medlatin/](http://hlt.isti.cnr.it/medlatin/).
The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus.
We are waiting for some of these responses to arrive.
## Running the Experiments ## Running the Experiments
The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help): The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help):
@ -44,7 +40,7 @@ optional arguments:
The following command line: The following command line:
``` ```
cd src cd src
python author_identification.py ../Corpora/CorpusI Dante --unknown ../Epistle/EpistolaXIII_1.txt python author_identification.py ../Corpora/MedLatin1 Dante --unknown ../Epistle/EpistolaXIII_1.txt
``` ```
Will use all texts in ../Corpora/CorpusI as training documents to train a verificator for the Will use all texts in ../Corpora/CorpusI as training documents to train a verificator for the
@ -56,18 +52,18 @@ to the positive class.
Similarly, the command line: Similarly, the command line:
``` ```
cd src cd src
python author_identification.py ../Corpora/CorpusI ALL --loo python author_identification.py ../Corpora/MedLatin1 ALL --loo
``` ```
will perform a cross-validation of the binary classifier for all authors using all training documents in a leave-one-out (LOO) fashion. will perform a cross-validation of the binary classifier for all authors using all training documents in a leave-one-out (LOO) fashion.
The script will report the results both in the standard output (more elaborated) and in a log file. For example, the last command will produce a log file containing: The script will report the results both in the standard output (more elaborated) and in a log file. For example, the last command will produce a log file containing:
``` ```
F1 for ClaraAssisiensis = 0.400 F1 for ClaraAssisiensis = 0.571
F1 for Dante = 0.957 F1 for Dante = 0.957
F1 for GiovanniBoccaccio = 1.000 F1 for GiovanniBoccaccio = 1.000
F1 for GuidoFaba = 0.974 F1 for GuidoFaba = 0.980
F1 for PierDellaVigna = 0.993 F1 for PierDellaVigna = 0.993
LOO Macro-F1 = 0.865 LOO Macro-F1 = 0.900
LOO Micro-F1 = 0.981 LOO Micro-F1 = 0.985
``` ```
(Note that small numerical variations with respect to the original papers might occur due to different software versions and as a result from any stochastic underlying process. Those changes should anyway not alter the conclusions derived from the published results.) (Note that small numerical variations with respect to the original papers might occur due to different software versions and as a result from any stochastic underlying process. Those changes should anyway not alter the conclusions derived from the published results.)