Merge branch 'master' of https://github.com/AlexMoreo/dante-verification
This commit is contained in:
commit
dc810272b2
38
README.md
38
README.md
|
|
@ -5,6 +5,15 @@ Code to reproduce the experiments reported in the papers
|
||||||
and
|
and
|
||||||
["L’Epistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192)
|
["L’Epistola a Cangrande al vaglio della Computational Authorship Verification: Risultati preliminari (con una postilla sulla cosiddetta XIV Epistola di Dante Alighieri)"](https://www.academia.edu/42297516/L_Epistola_a_Cangrande_al_vaglio_della_Computational_Authorship_Verification_risultati_preliminari_con_una_postilla_sulla_cosiddetta_XIV_Epistola_di_Dante_Alighieri_in_Nuove_inchieste_sull_Epistola_a_Cangrande_a_c._di_A._Casadei_Pisa_Pisa_University_Press_pp._153-192)
|
||||||
|
|
||||||
|
## Requirements:
|
||||||
|
The experiments have been run using the following packages (older versions might work as well):
|
||||||
|
* joblib==0.11
|
||||||
|
* nltk==3.4.5
|
||||||
|
* numpy==1.18.2
|
||||||
|
* scikit-learn==0.22.2.post1
|
||||||
|
* scipy==1.4.1
|
||||||
|
|
||||||
|
|
||||||
## Disclaimer:
|
## Disclaimer:
|
||||||
The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus.
|
The dataset is not distributed in this version. We have asked the Editors for permission to publish the corpus.
|
||||||
We are waiting for some of these responses to arrive.
|
We are waiting for some of these responses to arrive.
|
||||||
|
|
@ -13,18 +22,23 @@ We are waiting for some of these responses to arrive.
|
||||||
The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help):
|
The script in __./src/author_identification.py__ executes the experiments. This is the script syntax (--help):
|
||||||
|
|
||||||
```
|
```
|
||||||
|
usage: author_identification.py [-h] [--loo] [--unknown PATH] [--log PATH]
|
||||||
|
CORPUSPATH AUTHOR
|
||||||
|
|
||||||
Authorship verification for Epistola XIII
|
Authorship verification for Epistola XIII
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
PATH Path to the directory containing the corpus (documents
|
CORPUSPATH Path to the directory containing the corpus (documents must
|
||||||
must be named <author>_<texname>.txt)
|
be named <author>_<texname>.txt)
|
||||||
positive Positive author for the hypothesis (default "Dante"); set
|
AUTHOR Positive author for the hypothesis (default "Dante"); set to
|
||||||
to "ALL" to check every author
|
"ALL" to check every author
|
||||||
|
|
||||||
optional arguments:
|
optional arguments:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
--loo submit each binary classifier to leave-one-out validation
|
--loo submit each binary classifier to leave-one-out validation
|
||||||
--unknown PATH path to the file of unknown paternity (default None)
|
--unknown PATH path to the file of unknown paternity (default None)
|
||||||
|
--log PATH path to the log file where to write the results (default
|
||||||
|
./results.txt)
|
||||||
```
|
```
|
||||||
|
|
||||||
The following command line:
|
The following command line:
|
||||||
|
|
@ -42,6 +56,18 @@ to the positive class.
|
||||||
Similarly, the command line:
|
Similarly, the command line:
|
||||||
```
|
```
|
||||||
cd src
|
cd src
|
||||||
python author_identification.py ../Corpora/CorpusI Dante --loo
|
python author_identification.py ../Corpora/CorpusI ALL --loo
|
||||||
```
|
```
|
||||||
will perform a cross-validation of the binary classifier for Dante using all training documents in a leave-one-out (LOO) fashion.
|
will perform a cross-validation of the binary classifier for all authors using all training documents in a leave-one-out (LOO) fashion.
|
||||||
|
|
||||||
|
The script will report the results both in the standard output (more elaborated) and in a log file. For example, the last command will produce a log file containing:
|
||||||
|
```
|
||||||
|
F1 for ClaraAssisiensis = 0.400
|
||||||
|
F1 for Dante = 0.957
|
||||||
|
F1 for GiovanniBoccaccio = 1.000
|
||||||
|
F1 for GuidoFaba = 0.974
|
||||||
|
F1 for PierDellaVigna = 0.993
|
||||||
|
LOO Macro-F1 = 0.865
|
||||||
|
LOO Micro-F1 = 0.981
|
||||||
|
```
|
||||||
|
(Note that small numerical variations with respect to the original papers might occur due to different software versions and as a result from any stochastic underlying process. Those changes should anyway not alter the conclusions derived from the published results.)
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue