Merge branch 'master' of https://github.com/AlexMoreo/dante-verification

2020-04-03 11:21:34 +02:00 · 2020-04-03 11:21:34 +02:00 · dc810272b2
parent 843cfbe8fe 6471250118
commit dc810272b2
1 changed files with 35 additions and 9 deletions
--- a/README.md
+++ b/README.md
@ -5,6 +5,15 @@ Code to reproduce the experiments reported in the papers
 and 
 ["L’Epistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192)
 ## Requirements:
 The experiments have been run using the following packages (older versions might work as well):
 * joblib==0.11
 * nltk==3.4.5
 * numpy==1.18.2
 * scikit-learn==0.22.2.post1
 * scipy==1.4.1
 ## Disclaimer:
 The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus.
 We are waiting for some of these responses to arrive.
@ -13,18 +22,23 @@ We are waiting for some of these responses to arrive.
 The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help):
 ```
 usage: author_identification.py [-h] [--loo] [--unknown PATH] [--log PATH]
                                CORPUSPATH AUTHOR
 Authorship verification for Epistola XIII
 positional arguments:
-  PATH               Path to the directory containing the corpus (documents
+  CORPUSPATH      Path to the directory containing the corpus (documents must
-                     must be named <author>_<texname>.txt)
+                  be named <author>_<texname>.txt)
-  positive           Positive author for the hypothesis (default "Dante"); set
+  AUTHOR          Positive author for the hypothesis (default "Dante"); set to
-                     to "ALL" to check every author
+                  "ALL" to check every author
 optional arguments:
-  -h, --help         show this help message and exit
+  -h, --help      show this help message and exit
-  --loo              submit each binary classifier to leave-one-out validation
+  --loo           submit each binary classifier to leave-one-out validation
-  --unknown PATH     path to the file of unknown paternity (default None)
+  --unknown PATH  path to the file of unknown paternity (default None)
  --log PATH      path to the log file where to write the results (default
                  ./results.txt)
 ```
 The following command line:
@ -42,6 +56,18 @@ to the positive class.
 Similarly, the command line:
 ```
 cd src
-python author_identification.py ../Corpora/CorpusI Dante --loo 
+python author_identification.py ../Corpora/CorpusI ALL --loo 
 ```
-will perform a cross-validation of the binary classifier for Dante using all training documents in a leave-one-out (LOO) fashion.
+will perform a cross-validation of the binary classifier for all authors using all training documents in a leave-one-out (LOO) fashion.
 The script will report the results both in the standard output (more elaborated) and in a log file. For example, the last command will produce a log file containing:
 ```
 F1 for ClaraAssisiensis = 0.400
 F1 for Dante = 0.957
 F1 for GiovanniBoccaccio = 1.000
 F1 for GuidoFaba = 0.974
 F1 for PierDellaVigna = 0.993
 LOO Macro-F1 = 0.865
 LOO Micro-F1 = 0.981
 ```
 (Note that small numerical variations with respect to the original papers might occur due to different software versions and as a result from any stochastic underlying process. Those changes should anyway not alter the conclusions derived from the published results.)