sshoc-skosmapping/sshoc_lessico_panlatino.ipynb

52 KiB
Raw Blame History

Mapping Pan-Latin Textile Fibres Vocabulary from spreadsheet to SKOS resources

This Notebook implements a simple parser used to transform the Pan-Latin Textile Fibres Vocabulary, developed within the Realiter network, and published as spreadsheets, into SKOS resources. The parser reads the spreadsheets and transforms the content in SKOS data following a set of mapping rules, the result is stored in two Turtle files.

In [1]:
import pandas as pd
import rdflib
import itertools
import yaml
import datetime

The file config.yaml contains the external information used in the parsing, including the position of the spreadsheets. Set the correct values before running the Notebook.

In [2]:
try:
    with open("config-lessico.yaml", 'r') as stream:
        try:
           conf=yaml.safe_load(stream)
        except yaml.YAMLError as exc:
            print(exc)
except FileNotFoundError:
    print('Warning config.yaml file not present! Please store it in the same directory as the notebook')
#print (conf)

The following cells defines the Namespaces used in the parsing

In [3]:
from rdflib.namespace import DC, DCAT, DCTERMS, OWL, \
                            RDF, RDFS, SKOS,  \
                           XMLNS, XSD, XMLNS
from rdflib import Namespace
from rdflib import URIRef, BNode, Literal

pltextile = Namespace(conf['Namespaces']['TEXTILETERM'])
dc11=Namespace("http://purl.org/dc/elements/1.1/");
dct = Namespace("http://purl.org/dc/terms/")
iso369=Namespace("http://id.loc.gov/vocabulary/iso639-3");

Download Lessico spreadsheet and show it to check if the operation has been executed correctly

In [4]:
url=conf['Source']['LESSICOSOURCE']
df_data=pd.read_csv(url)
In [5]:
df_data.head()
Out[5]:
it DEF ca es es [ARG] es [ARG/MEX] es [MEX] fr fr [CA] gl pt ro en
0 abaca (s.m.)\nfibra di abaca (s.f.)\ncanapa di... Fibra ottenuta dalle foglie della Musa textilis. abacà (n.m.)\nfibra dabacà (n.f.)\ncànem de M... abacá (s.m.)\nfibra de abacá (s.f.)\ncáñamo de... NaN NaN abacá de Manila (s.m.) abaca (n.m.)\nchanvre de Manille (n.m.)\ntagal... fibre dabaca (n.f.)\nmanille (n.f.) abacá (s.m.)\ncánabo de Manila (s.m.) abacá (s.m.)\nmanila (s.f.)\ncânhamo-de-manila... abaca (s.f.) abaca\nabaca fibre\nManila hemp
1 acetato (s.m.)\nfibra di acetato (s.f.) Fibra prodotta a partire dallacetato di cellu... raió (n.m.)\nfibra dacetat (n.f.) acetato (s.m.) \nrayón acetato (s.m.) rayón (s.m.)\nviscosa (s.f.) fibra de acetato (s.f.) NaN acétate (n.m.) \nfibre dacétate (n.f.) NaN acetato (s.m.)\nfibra de acetato (s.f.) acetato (s.m.)\nfibra de acetato (s.f.) \nraio... acetat (s.m.) acetate\nacetate fibre
2 acrilico (s.m.)\nfibra acrilica (s.f.)\nacrili... Fibra costituita da macromolecole lineari cont... acrílic, -a (adj.)\nfibra acrílica (n.f.) acrílica (s.f.)\nfibra acrílica (s.f.) NaN acrílico (s.m.) fibra de acrílico (s.f.) acrylique (n.m.)\nfibre acrylique (n.f.) NaN acrílico (s.m.)\nfibra acrílica (s.f.) acrílico (s.m.)\nfibra acrílica (s.f.) acrilic (s.m.) acrylic\nacrylic fibre
3 alfa (s.f.)\nfibra dalfa (s.f.) Fibra ricavata dalle foglie della Stipa tenaci... espart (n.m.)\nfibra despart (n.f.) esparto (s.m.) \nfibra de esparto (s.f.) NaN fibra alfa (s.f.) alfa (s.m.) alfa (n.m.) sparte (n.m.)\nspart (n.m.) alfa (s.f.)\nesparto (s.m.) alfa (s.f.)\nfibra de alfa (s.f.) alfa (s.m.)\nfibră alfa (s.f.) alfa\nalfa fibre
4 alginica (s.f.)\nfibra alginica (s.f.) Fibra prodotta a partire dai sali metallici de... fibra dalginat (n.f.) fibra algínica (s.f.)\nfibra de alginato (s.f.) NaN NaN alginato (s.m.) fibre dalginate (n.f.)\nalginate (n.m.) NaN alxinato (s.m.)\nfibra de alxinato (s.f.) alginato (s.m.)\nfibra algínica (s.f.)\nfibra ... alginat (s.n.)\nfibră alginică (s.f.) alginate\nalginic fibre\nalginate fibre
In [6]:
df_data.rename(columns = {'es [ARG]': 'es-arg', 'es [MEX]': 'es-mex', 'fr [CA]': 'fr-ca'}, inplace = True)
#df_data.head()
In [7]:
df_data.iloc[0].it.split('\n')[0].split(' ')[0]
Out[7]:
'abaca'

Create a graph for the SKOS data and bind the namespaces to it

In [23]:
c1rdf = rdflib.Graph()
c1rdf.bind("pltextile", pltextile)
c1rdf.bind("dc11", dc11)
c1rdf.bind("dct", dct)
c1rdf.bind("iso369-3", iso369)
c1rdf.bind("skos", SKOS)
c1rdf.bind("dc", DC)
c1rdf.bind("rdf", RDF)
c1rdf.bind("owl", OWL)
c1rdf.bind("xsd", XSD)

Insert in the graph the SKOS.ConceptScheme

In [24]:
now = datetime.datetime.today()
today_date=now.date()
title=Literal(conf['Texts']['LESSICOTITLE'], lang=conf['Texts']['LANG'])
description=Literal(conf['Texts']['LESSICODESCRIPTION'], lang=conf['Texts']['LANG'])
description_it=Literal(conf['Texts']['LESSICODESCRIPTION_IT'], lang='it')
identifier=Literal(conf['Texts']['LESSICOID'], lang=conf['Texts']['LANG'])
#identifier=URIRef(conf['Texts']['VOCABULARYID'])
createddate= Literal(conf['Texts']['LESSICOCREATEDATE'],datatype=XSD.date)
moddate= Literal(today_date,datatype=XSD.date)
version= Literal(conf['Texts']['LESSICOVERSION'],datatype=XSD.string)

c1rdf.add((pltextile[''], RDF.type, SKOS.ConceptScheme))
c1rdf.add((pltextile[''], DC.title, title))
c1rdf.add((pltextile[''], DC.identifier, identifier))
c1rdf.add((pltextile[''], DC.description, description))
c1rdf.add((pltextile[''], DC.description, description_it))
c1rdf.add((pltextile[''], dct.created, createddate))
c1rdf.add((pltextile[''], dct.modified, moddate))
c1rdf.add((pltextile[''], OWL.versionInfo, version))
c1rdf.add((pltextile[''], dct.language, iso369.eng))
c1rdf.add((pltextile[''], dct.language, iso369.es))
c1rdf.add((pltextile[''], dct.language, iso369.fra))
c1rdf.add((pltextile[''], dct.language, iso369.gl))
c1rdf.add((pltextile[''], dct.language, iso369.ita))
c1rdf.add((pltextile[''], dct.language, iso369.ro))
c1rdf.add((pltextile[''], dct.language, iso369.pt))
c1rdf.add((pltextile[''], dct.language, iso369.ca))
Out[24]:
<Graph identifier=N72688dca2b42426587f4eb0e0dac3bfe (<class 'rdflib.graph.Graph'>)>
In [25]:
#c1rdf.serialize(destination='data/skostest.rdf', format="n3");#format="pretty-xml")
#comrdf.serialize(destination='data/parsed_rdf/prima_cantica_forme_com.rdf', format="n3");
df_data.fillna('', inplace=True)
df_data.head()
Out[25]:
it DEF ca es es-arg es [ARG/MEX] es-mex fr fr-ca gl pt ro en
0 abaca (s.m.)\nfibra di abaca (s.f.)\ncanapa di... Fibra ottenuta dalle foglie della Musa textilis. abacà (n.m.)\nfibra dabacà (n.f.)\ncànem de M... abacá (s.m.)\nfibra de abacá (s.f.)\ncáñamo de... abacá de Manila (s.m.) abaca (n.m.)\nchanvre de Manille (n.m.)\ntagal... fibre dabaca (n.f.)\nmanille (n.f.) abacá (s.m.)\ncánabo de Manila (s.m.) abacá (s.m.)\nmanila (s.f.)\ncânhamo-de-manila... abaca (s.f.) abaca\nabaca fibre\nManila hemp
1 acetato (s.m.)\nfibra di acetato (s.f.) Fibra prodotta a partire dallacetato di cellu... raió (n.m.)\nfibra dacetat (n.f.) acetato (s.m.) \nrayón acetato (s.m.) rayón (s.m.)\nviscosa (s.f.) fibra de acetato (s.f.) acétate (n.m.) \nfibre dacétate (n.f.) acetato (s.m.)\nfibra de acetato (s.f.) acetato (s.m.)\nfibra de acetato (s.f.) \nraio... acetat (s.m.) acetate\nacetate fibre
2 acrilico (s.m.)\nfibra acrilica (s.f.)\nacrili... Fibra costituita da macromolecole lineari cont... acrílic, -a (adj.)\nfibra acrílica (n.f.) acrílica (s.f.)\nfibra acrílica (s.f.) acrílico (s.m.) fibra de acrílico (s.f.) acrylique (n.m.)\nfibre acrylique (n.f.) acrílico (s.m.)\nfibra acrílica (s.f.) acrílico (s.m.)\nfibra acrílica (s.f.) acrilic (s.m.) acrylic\nacrylic fibre
3 alfa (s.f.)\nfibra dalfa (s.f.) Fibra ricavata dalle foglie della Stipa tenaci... espart (n.m.)\nfibra despart (n.f.) esparto (s.m.) \nfibra de esparto (s.f.) fibra alfa (s.f.) alfa (s.m.) alfa (n.m.) sparte (n.m.)\nspart (n.m.) alfa (s.f.)\nesparto (s.m.) alfa (s.f.)\nfibra de alfa (s.f.) alfa (s.m.)\nfibră alfa (s.f.) alfa\nalfa fibre
4 alginica (s.f.)\nfibra alginica (s.f.) Fibra prodotta a partire dai sali metallici de... fibra dalginat (n.f.) fibra algínica (s.f.)\nfibra de alginato (s.f.) alginato (s.m.) fibre dalginate (n.f.)\nalginate (n.m.) alxinato (s.m.)\nfibra de alxinato (s.f.) alginato (s.m.)\nfibra algínica (s.f.)\nfibra ... alginat (s.n.)\nfibră alginică (s.f.) alginate\nalginic fibre\nalginate fibre

The following cell implements the mapping rules for creating SKOS resources.

In [26]:
#df_data.iloc[0].it.split('\n')[0].split(' ')[0]
for index, row in df_data.iterrows():
    
    strlabel=row.it.split('\n')[0].split(' (')[0].strip()
    label=strlabel.replace(" ", "_")
    #label=URIRef(row.it.split('\n')[0].split(' (')[0].strip())
    c1rdf.add((pltextile[''], SKOS.hasTopConcept, pltextile[label]))    
    frlabel=Literal(row["fr"].split('\n')[0].strip(), lang='fr')
    fraltlabels=row["fr"].split('\n')[1:]
    itlabel=Literal(row['it'].split('\n')[0].strip(), lang='it')
    italtlabels=row["it"].split('\n')[1:]                    
    calabel=Literal(row['ca'].split('\n')[0].strip(), lang='ca')
    caaltlabels=row["ca"].split('\n')[1:]
    eslabel=Literal(row['es'].split('\n')[0].strip(), lang='es')
    esaltlabels=row["es"].split('\n')[1:]
    gllabel=Literal(row['gl'].split('\n')[0].strip(), lang='gl')
    glaltlabels=row["gl"].split('\n')[1:]
    ptlabel=Literal(row['pt'].split('\n')[0].strip(), lang='pt')
    ptaltlabels=row["pt"].split('\n')[1:]
    rolabel=Literal(row['ro'].split('\n')[0].strip(), lang='ro')
    roaltlabels=row["ro"].split('\n')[1:]
    enlabel=Literal(row['en'].split('\n')[0].strip(), lang='en')
    enaltlabels=row["en"].split('\n')[1:]
    
    esarglabel=Literal(row['es-arg'].split('\n')[0].strip(), lang='es-ar')
    esargaltlabels=row["es-arg"].split('\n')[1:]
    #es-arg-mex
#     esargmexarglabel=Literal(row['es-arg-mex'].split('\n')[0].strip(), lang='es-ar')
#     esargmexmexlabel=Literal(row['es-arg-mex'].split('\n')[0].strip(), lang='es-mx')
#     esargmexaltlabels=row["es-arg-mex"].split('\n')[1:]
    
    esmexlabel=Literal(row['es-mex'].split('\n')[0].strip(), lang='es-mx')
    esmexaltlabels=row["es-mex"].split('\n')[1:]
    frcalabel=Literal(row['fr-ca'].split('\n')[0].strip(), lang='fr-ca')
    frcaaltlabels=row["fr-ca"].split('\n')[1:]
    
    #definition
    itdef=Literal(row["DEF"].strip(), lang='it')
    
    c1rdf.add((pltextile[label], RDF.type, SKOS.Concept))
    c1rdf.add((pltextile[label], SKOS.inScheme, pltextile['']))
    c1rdf.add((pltextile[label], SKOS.topConceptOf, pltextile['']))
    
    for alab in esargaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))
    
#     for alab in esargmexaltlabels:
#         c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))
#         c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))
     
    for alab in esmexaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))
        
    for alab in frcaaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr-ca')))
    
    for alab in esaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es')))
    
    for alab in glaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='gl')))
    
    for alab in ptaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt')))
    
    for alab in roaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ro')))
    
    for alab in enaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='en')))
        
    for alab in caaltlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ca')))
        
    for alab in fraltlabels:
        #print ("tt "+alab)
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr')))
    for alab in italtlabels:
        c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='it')))
   
    
    if(frlabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, frlabel))
    if(itlabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, itlabel))
    if(gllabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, gllabel))
    
    if(ptlabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, ptlabel))
    if(rolabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, rolabel))
    if(enlabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, enlabel))
        
    if(calabel):    
        c1rdf.add((pltextile[label], SKOS.prefLabel, calabel))
    if(eslabel):  
        c1rdf.add((pltextile[label], SKOS.prefLabel, eslabel))
    if(esarglabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, esarglabel))
    
#     if(esargmexarglabel):
#         c1rdf.add((pltextile[label], SKOS.prefLabel, esargmexarglabel))
#         c1rdf.add((pltextile[label], SKOS.prefLabel, esargmexmexlabel))
        
    if(esmexlabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, esmexlabel))
    if(frcalabel):
        c1rdf.add((pltextile[label], SKOS.prefLabel, frcalabel))
    
    if (itdef):
        c1rdf.add((pltextile[label], SKOS.definition, itdef))

print(len(c1rdf))
1668
In [27]:
# for s, p, o in c1rdf.triples((None,  None, None)):
#    print("{}  {}".format(s, o.n3))

Create a Turtle file in the /data directory with the SKOS resources for Data Stewardship terminology

In [28]:
c1rdf.serialize(destination='data/lexpanlatskos_11.ttl', format="n3");#format="pretty-xml")
c1rdf.serialize(destination='data/lexpanlatskos_11.rdf', format="pretty-xml");#format="pretty-xml")

Lessico panlatino delle Maniche

In [29]:
urlma=conf['Source']['LESSICOMANICHESOURCE']
df_data_maniche=pd.read_csv(urlma)
df_data_maniche.rename(columns = {'es [ARG]': 'es-arg', 'es [MEX]': 'es-mex', 'pt [BR]': 'pt-br'}, inplace = True)
df_data_maniche.fillna('', inplace=True)
#df_data_maniche.info()
In [31]:
cl_manicherdf = rdflib.Graph()
cl_manicherdf.bind("pltextile", pltextile)
cl_manicherdf.bind("dc11", dc11)
cl_manicherdf.bind("dct", dct)
cl_manicherdf.bind("iso369-3", iso369)
cl_manicherdf.bind("skos", SKOS)
cl_manicherdf.bind("dc", DC)
cl_manicherdf.bind("rdf", RDF)
cl_manicherdf.bind("owl", OWL)
cl_manicherdf.bind("xsd", XSD)
now = datetime.datetime.today()
today_date=now.date()
title=Literal(conf['Texts']['LESSICOMANICHETITLE'], lang=conf['Texts']['LANG'])
description=Literal(conf['Texts']['LESSICOMANICHEDESCRIPTION'], lang=conf['Texts']['LANG'])
description_it=Literal(conf['Texts']['LESSICOMANICHEDESCRIPTION_IT'], lang='it')
identifier=Literal(conf['Texts']['LESSICOMANICHEID'], lang=conf['Texts']['LANG'])
#identifier=URIRef(conf['Texts']['VOCABULARYID'])
createddate= Literal(conf['Texts']['LESSICOCREATEDATE'],datatype=XSD.date)
moddate= Literal(today_date,datatype=XSD.date)
version= Literal(conf['Texts']['LESSICOVERSION'],datatype=XSD.string)

cl_manicherdf.add((pltextile[''], RDF.type, SKOS.ConceptScheme))
cl_manicherdf.add((pltextile[''], DC.title, title))
cl_manicherdf.add((pltextile[''], DC.identifier, identifier))
cl_manicherdf.add((pltextile[''], DC.description, description))
cl_manicherdf.add((pltextile[''], DC.description, description_it))
cl_manicherdf.add((pltextile[''], dct.created, createddate))
cl_manicherdf.add((pltextile[''], dct.modified, moddate))
cl_manicherdf.add((pltextile[''], OWL.versionInfo, version))
cl_manicherdf.add((pltextile[''], dct.language, iso369.eng))
cl_manicherdf.add((pltextile[''], dct.language, iso369.es))
cl_manicherdf.add((pltextile[''], dct.language, iso369.fra))
cl_manicherdf.add((pltextile[''], dct.language, iso369.ca))
cl_manicherdf.add((pltextile[''], dct.language, iso369.ita))
cl_manicherdf.add((pltextile[''], dct.language, iso369.pt))
Out[31]:
<Graph identifier=Nc80da8e5fa8e4ef5a36a57aeaed9673d (<class 'rdflib.graph.Graph'>)>
In [32]:
# Mapping
for index, row in df_data_maniche.iterrows():
    
    strlabel=row.it.split('\n')[0].split('(')[0].strip()
    label=strlabel.replace(" ", "_").replace("","").replace("'","").strip()
    #label=URIRef(row.it.split('\n')[0].split(' (')[0].strip())
    cl_manicherdf.add((pltextile[''], SKOS.hasTopConcept, pltextile[label]))    
    frlabel=Literal(row["fr"].split('\n')[0].strip(), lang='fr')
    fraltlabels=row["fr"].split('\n')[1:]
    itlabel=Literal(row['it'].split('\n')[0].strip(), lang='it')
    italtlabels=row["it"].split('\n')[1:]                    
    calabel=Literal(row['ca'].split('\n')[0].strip(), lang='ca')
    caaltlabels=row["ca"].split('\n')[1:]
    eslabel=Literal(row['es'].split('\n')[0].strip(), lang='es')
    esaltlabels=row["es"].split('\n')[1:]
    #gllabel=Literal(row['gl'].split('\n')[0].strip(), lang='gl')
    #glaltlabels=row["gl"].split('\n')[1:]
    ptlabel=Literal(row['pt'].split('\n')[0].strip(), lang='pt')
    ptaltlabels=row["pt"].split('\n')[1:]
#     rolabel=Literal(row['ro'].split('\n')[0].strip(), lang='ro')
#     roaltlabels=row["ro"].split('\n')[1:]
    enlabel=Literal(row['en'].split('\n')[0].strip(), lang='en')
    enaltlabels=row["en"].split('\n')[1:]
    
    esarglabel=Literal(row['es-arg'].split('\n')[0].strip(), lang='es-ar')
    esargaltlabels=row["es-arg"].split('\n')[1:]
  

    esmexlabel=Literal(row['es-mex'].split('\n')[0].strip(), lang='es-mx')
    esmexaltlabels=row["es-mex"].split('\n')[1:]
    ptbrlabel=Literal(row['pt-br'].split('\n')[0].strip(), lang='pt-br')
    ptbraltlabels=row["pt-br"].split('\n')[1:]
    
    #definition
    itdef=Literal(row["DEF"].strip(), lang='it')
    
    cl_manicherdf.add((pltextile[label], RDF.type, SKOS.Concept))
    cl_manicherdf.add((pltextile[label], SKOS.inScheme, pltextile['']))
    cl_manicherdf.add((pltextile[label], SKOS.topConceptOf, pltextile['']))
    
    for alab in esargaltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))
    
     
    for alab in esmexaltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))
        
    for alab in ptbraltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt-br')))
    
    for alab in esaltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es')))
    
#     for alab in glaltlabels:
#         cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='gl')))
    
    for alab in ptaltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt')))
    
#     for alab in roaltlabels:
#         cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ro')))
    
    for alab in enaltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='en')))
        
    for alab in caaltlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ca')))
        
    for alab in fraltlabels:
        #print ("tt "+alab)
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr')))
    for alab in italtlabels:
        cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='it')))
   
    
    if(frlabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, frlabel))
    if(itlabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, itlabel))
#     if(gllabel):
#         cl_manicherdf.add((pltextile[label], SKOS.prefLabel, gllabel))
    
    if(ptlabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, ptlabel))
#     if(rolabel):
#         cl_manicherdf.add((pltextile[label], SKOS.prefLabel, rolabel))
    if(enlabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, enlabel))
        
    if(calabel):    
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, calabel))
    if(eslabel):  
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, eslabel))
    if(esarglabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, esarglabel))
    

    if(esmexlabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, esmexlabel))
    if(ptbrlabel):
        cl_manicherdf.add((pltextile[label], SKOS.prefLabel, ptbrlabel))
    
    if (itdef):
        cl_manicherdf.add((pltextile[label], SKOS.definition, itdef))

print(len(cl_manicherdf))
597
In [33]:
cl_manicherdf.serialize(destination='data/lexpanlatmanicheskos_11.ttl', format="n3");#format="pretty-xml")
cl_manicherdf.serialize(destination='data/lexpanlatmanicheskos_11.rdf', format="pretty-xml");#format="pretty-xml")

Lessico panlatino dei Colli

In [34]:
urlco=conf['Source']['LESSICOCOLLISOURCE']
df_data_colli=pd.read_csv(urlco)
In [35]:
df_data_colli.rename(columns = {'es [ARG]': 'es-arg', 'es [MEX]': 'es-mex', 'pt [BR]': 'pt-br'}, inplace = True)
df_data_colli.fillna('', inplace=True)
#df_data_colli.head()
In [36]:
cl_collirdf = rdflib.Graph()
cl_collirdf.bind("pltextile", pltextile)
cl_collirdf.bind("dc11", dc11)
cl_collirdf.bind("dct", dct)
cl_collirdf.bind("iso369-3", iso369)
cl_collirdf.bind("skos", SKOS)
cl_collirdf.bind("dc", DC)
cl_collirdf.bind("rdf", RDF)
cl_collirdf.bind("owl", OWL)
cl_collirdf.bind("xsd", XSD)

SKOS concept scheme

In [ ]:
now = datetime.datetime.today()
today_date=now.date()
title=Literal(conf['Texts']['LESSICOCOLLITITLE'], lang=conf['Texts']['LANG'])
description=Literal(conf['Texts']['LESSICOCOLLIDESCRIPTION'], lang=conf['Texts']['LANG'])
description_it=Literal(conf['Texts']['LESSICOCOLLIDESCRIPTION_IT'], lang='it')
identifier=Literal(conf['Texts']['LESSICOCOLLIID'], lang=conf['Texts']['LANG'])
#identifier=URIRef(conf['Texts']['VOCABULARYID'])
createddate= Literal(conf['Texts']['LESSICOCREATEDATE'],datatype=XSD.date)
moddate= Literal(today_date,datatype=XSD.date)
version= Literal(conf['Texts']['LESSICOVERSION'],datatype=XSD.string)

cl_collirdf.add((pltextile[''], RDF.type, SKOS.ConceptScheme))
cl_collirdf.add((pltextile[''], DC.title, title))
cl_collirdf.add((pltextile[''], DC.identifier, identifier))
cl_collirdf.add((pltextile[''], DC.description, description))

cl_collirdf.add((pltextile[''], dct.created, createddate))
cl_collirdf.add((pltextile[''], dct.modified, moddate))
cl_collirdf.add((pltextile[''], OWL.versionInfo, version))
cl_collirdf.add((pltextile[''], dct.language, iso369.eng))
cl_collirdf.add((pltextile[''], dct.language, iso369.es))
cl_collirdf.add((pltextile[''], dct.language, iso369.fra))
cl_collirdf.add((pltextile[''], dct.language, iso369.ita))
cl_collirdf.add((pltextile[''], dct.language, iso369.pt))
cl_collirdf.add((pltextile[''], dct.language, iso369.ca))
In [ ]:
# Mapping
for index, row in df_data_colli.iterrows():
    
    strlabel=row.it.split('\n')[0].split(' (')[0].strip()
    label=strlabel.replace(" ", "_").replace("","")
    #label=URIRef(row.it.split('\n')[0].split(' (')[0].strip())
    cl_collirdf.add((pltextile[''], SKOS.hasTopConcept, pltextile[label]))    
    frlabel=Literal(row["fr"].split('\n')[0].strip(), lang='fr')
    fraltlabels=row["fr"].split('\n')[1:]
    itlabel=Literal(row['it'].split('\n')[0].strip(), lang='it')
    italtlabels=row["it"].split('\n')[1:]                    
    calabel=Literal(row['ca'].split('\n')[0].strip(), lang='ca')
    caaltlabels=row["ca"].split('\n')[1:]
    eslabel=Literal(row['es'].split('\n')[0].strip(), lang='es')
    esaltlabels=row["es"].split('\n')[1:]
    #gllabel=Literal(row['gl'].split('\n')[0].strip(), lang='gl')
    #glaltlabels=row["gl"].split('\n')[1:]
    ptlabel=Literal(row['pt'].split('\n')[0].strip(), lang='pt')
    ptaltlabels=row["pt"].split('\n')[1:]
#     rolabel=Literal(row['ro'].split('\n')[0].strip(), lang='ro')
#     roaltlabels=row["ro"].split('\n')[1:]
    enlabel=Literal(row['en'].split('\n')[0].strip(), lang='en')
    enaltlabels=row["en"].split('\n')[1:]
    
    esarglabel=Literal(row['es-arg'].split('\n')[0].strip(), lang='es-ar')
    esargaltlabels=row["es-arg"].split('\n')[1:]
  

    esmexlabel=Literal(row['es-mex'].split('\n')[0].strip(), lang='es-mx')
    esmexaltlabels=row["es-mex"].split('\n')[1:]
    ptbrlabel=Literal(row['pt-br'].split('\n')[0].strip(), lang='pt-br')
    ptbraltlabels=row["pt-br"].split('\n')[1:]
    
    #definition
    itdef=Literal(row["DEF"].strip(), lang='it')
    
    cl_collirdf.add((pltextile[label], RDF.type, SKOS.Concept))
    cl_collirdf.add((pltextile[label], SKOS.inScheme, pltextile['']))
    cl_collirdf.add((pltextile[label], SKOS.topConceptOf, pltextile['']))
    
    for alab in esargaltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))
    
     
    for alab in esmexaltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))
        
    for alab in ptbraltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt-br')))
    
    for alab in esaltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es')))
    
#     for alab in glaltlabels:
#         cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='gl')))
    
    for alab in ptaltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt')))
    
#     for alab in roaltlabels:
#         cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ro')))
    
    for alab in enaltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='en')))
        
    for alab in caaltlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ca')))
        
    for alab in fraltlabels:
        #print ("tt "+alab)
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr')))
    for alab in italtlabels:
        cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='it')))
   
    
    if(frlabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, frlabel))
    if(itlabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, itlabel))
#     if(gllabel):
#         cl_collirdf.add((pltextile[label], SKOS.prefLabel, gllabel))
    
    if(ptlabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, ptlabel))
#     if(rolabel):
#         cl_collirdf.add((pltextile[label], SKOS.prefLabel, rolabel))
    if(enlabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, enlabel))
        
    if(calabel):    
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, calabel))
    if(eslabel):  
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, eslabel))
    if(esarglabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, esarglabel))
    

    if(esmexlabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, esmexlabel))
    if(ptbrlabel):
        cl_collirdf.add((pltextile[label], SKOS.prefLabel, ptbrlabel))
    
    if (itdef):
        cl_collirdf.add((pltextile[label], SKOS.definition, itdef))

print(len(cl_collirdf))
In [ ]:
cl_collirdf.serialize(destination='data/lexpanlatcolliskos_11.ttl', format="n3");#format="pretty-xml")
cl_collirdf.serialize(destination='data/lexpanlatcolliskos_11.rdf', format="pretty-xml");#format="pretty-xml")