From 0fc7b09282f3f96fc25e57aba2ba536c0f8bc461 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20Moreo=20Fern=C3=A1ndez?= Date: Wed, 1 Apr 2020 17:52:29 +0200 Subject: [PATCH] Update README.md first commit --- README.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/README.md b/README.md index e69de29..3468212 100755 --- a/README.md +++ b/README.md @@ -0,0 +1,47 @@ +# Authorship Verification for Medieval Latin + +Code to reproduce the experiments reported in the papers +["The Epistle to Cangrande Through the Lens of Computational Authorship Verification"](https://link.springer.com/chapter/10.1007/978-3-030-30754-7_15) +and +["L’Epistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192) + +## Disclaimer: +The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus. +We are waiting for some of these responses to arrive. + +## Running the Experiments +The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help): + +``` +Authorship verification for Epistola XIII + +positional arguments: + PATH Path to the directory containing the corpus (documents + must be named _.txt) + positive Positive author for the hypothesis (default "Dante"); set + to "ALL" to check every author + +optional arguments: + -h, --help show this help message and exit + --loo submit each binary classifier to leave-one-out validation + --unknown PATH path to the file of unknown paternity (default None) +``` + +The following command line: +``` +cd src +python author_identification.py ../Corpora/CorpusI Dante --unknown ../Epistle/EpistolaXIII_1.txt +``` + +Will use all texts in ../Corpora/CorpusI as training documents to train a verificator for the +file ../Epistle/EpistolaXIII_1.txt assuming Dante is the positive class (i.e., it will check if, on the +basis of the evidence shown in other Dante's texts, the unknown one belongs to Dante or not). +The output is probabilistic, informing of the uncertainty that the classifier has in attributing the document +to the positive class. + +Similarly, the command line: +``` +cd src +python author_identification.py ../Corpora/CorpusI Dante --loo +``` +will perform a cross-validation of the binary classifier for Dante using all training documents in a leave-one-out (LOO) fashion.