Code to reproduce the experiments of "The Epistle to Cangrande Through the Lens of Computational Authorship Verification"
Go to file
Alejandro Moreo Fernandez a3893c77fe parallelization of feature extractor 2020-04-02 11:54:00 +02:00
src parallelization of feature extractor 2020-04-02 11:54:00 +02:00
README.md Update README.md 2020-04-01 17:52:29 +02:00

README.md

Authorship Verification for Medieval Latin

Code to reproduce the experiments reported in the papers “The Epistle to Cangrande Through the Lens of Computational Authorship Verification” and “LEpistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)”

Disclaimer:

The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus. We are waiting for some of these responses to arrive.

Running the Experiments

The script in ./src/author_identification.py executes the experiments. This is the script syntax (help):

Authorship verification for Epistola XIII

positional arguments:
  PATH               Path to the directory containing the corpus (documents
                     must be named <author>_<texname>.txt)
  positive           Positive author for the hypothesis (default "Dante"); set
                     to "ALL" to check every author

optional arguments:
  -h, --help         show this help message and exit
  --loo              submit each binary classifier to leave-one-out validation
  --unknown PATH     path to the file of unknown paternity (default None)

The following command line:

cd src
python author_identification.py ../Corpora/CorpusI Dante --unknown ../Epistle/EpistolaXIII_1.txt

Will use all texts in ../Corpora/CorpusI as training documents to train a verificator for the file ../Epistle/EpistolaXIII_1.txt assuming Dante is the positive class (i.e., it will check if, on the basis of the evidence shown in other Dantes texts, the unknown one belongs to Dante or not). The output is probabilistic, informing of the uncertainty that the classifier has in attributing the document to the positive class.

Similarly, the command line:

cd src
python author_identification.py ../Corpora/CorpusI Dante --loo 

will perform a cross-validation of the binary classifier for Dante using all training documents in a leave-one-out (LOO) fashion.