topconcept map. new descriptions, modify date

2021-12-17 15:36:14 +01:00 · 2021-12-17 15:36:14 +01:00 · 81c5db7c90
parent 9d1618630b
commit 81c5db7c90
2 changed files with 67 additions and 55 deletions
--- a/config.yaml
+++ b/config.yaml
@ -2,18 +2,18 @@ Texts:
  LANG: en
  #Vocabulary
  VOCABULARYTITLE : SSHOC Multilingual Data Stewardship Terminology
-  VOCABULARYDESCRIPTION : SSHOC Multilingual Data Stewardship Terminology is a multilingual terminology that collects terms specific to the domain of Data Stewardship, as well as their definitions. Each term-definition pair is available in several languages English, French, Dutch, German, Greek, Italian, Slovenian.
-  VOCABULARYID : Identifier of the resource
+  VOCABULARYDESCRIPTION : The SSHOC Multilingual Data Stewardship Terminology is a multilingual terminology that collects terms specific to the domain of Data Stewardship, as well as their definitions. A list of domain-specific terms was automatically extracted from a corpus pertaining to the domain of Data Stewardship and Curation, validated by domain experts, assigned a definition, and linked to other existing terminologies (Loterre Open Science Thesaurus, terms4FAIRskills, Linked Open Vocabularies, ISO terms and definitions). Each term-definition pair was then automatically translated into multiple languages (Dutch, French, German, Greek, Italian, Slovenian) by employing Deep-L. The Multilingual Data Stewardship Terminology thus consists of 210 concepts available in Dutch, French, German, Greek, Italian, Slovenian. This resource was created within the frame of the SSHOC (Social Sciences and Humanities Open Cloud) project (H2020-INFRAEOSC-2018-2-823782). It is the result of the work of Task 3.1.2 "extraction of terminology from technical documentation about standards and interoperability", as described in D3.9, carried out jointly by ILC-CNR and CLARIN ERIC.
+  VOCABULARYID : http://hdl.handle.net/20.500.11752/ILC-567
  VOCABULARYCREATEDATE : 2021-11-29
  VOCABULARYMODDATE : 2021-11-29
-  VOCABULARYVERSION : 0.1
+  VOCABULARYVERSION : 1.0
  #Metadata
  METADATATITLE: SSHOC Multilingual Metadata 
-  METADATADESCRIPTION: SSHOC Multilingual Metadata is based on the metadata set of the CLARIN Concept Registry (CCR). The CCR 232 approved metadata concepts, as well as their definitions, are translated into several languages (Dutch, French, Greek, Italian)
-  METADATAID : PID for this resource
+  METADATADESCRIPTION: SSHOC Multilingual Metadata is based on the metadata set of the CLARIN Concept Registry (CCR). The CCR 232 approved metadata concepts, as well as their definitions, were automatically translated into several languages (Dutch, French, Greek, Italian) thanks to the support of Machine Translation tools, and eventually validated by native speakers who were also expert of the domain. This resource was created within the frame of the SSHOC (Social Sciences and Humanities Open Cloud) project (H2020-INFRAEOSC-2018-2-823782). It is the result of the work of Task 3.1.3 "creating Multilingual metadata and taxonomies for discovery", as described in D3.9, carried out jointly by ILC-CNR and CLARIN ERIC.
+  METADATAID : http://hdl.handle.net/20.500.11752/ILC-568
  METADATACREATEDATE: 2021-11-29
  METADATAMODDATE: 2021-11-29
-  METADATAVERSION : 0.1
+  METADATAVERSION : 1.0
  
 Source:
    METADATASOURCE : data/input/Metadata_final.csv #https://docs.google.com/spreadsheets/d/1nlet1v8f0BfskwNGwL2rTEcxoUvMqcKr4VHVClgwTKw/gviz/tq?tqx=out:csv&sheet=Sheet4
--- a/sshoc_31_skos.ipynb
+++ b/sshoc_31_skos.ipynb
@ -2,7 +2,7 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "id": "secret-front",
+   "id": "mechanical-johns",
   "metadata": {},
   "source": [
    "## Mapping *Data Stewardship terminology* and *Metadata* from spreadsheets to SKOS resources\n",
@ -14,19 +14,20 @@
  {
   "cell_type": "code",
   "execution_count": 1,
-   "id": "frozen-workplace",
+   "id": "aware-comparison",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import rdflib\n",
    "import itertools\n",
-    "import yaml"
+    "import yaml\n",
+    "import datetime"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "operational-respect",
+   "id": "failing-shift",
   "metadata": {},
   "source": [
    "The file *config.yaml* contains the external information used in the parsing, including the position of the spreadsheets. Set the correct values before running the Notebook."
@ -35,7 +36,7 @@
  {
   "cell_type": "code",
   "execution_count": 2,
-   "id": "streaming-wrestling",
+   "id": "cutting-triangle",
   "metadata": {},
   "outputs": [],
   "source": [
@ -52,7 +53,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "polar-tenant",
+   "id": "cardiac-angel",
   "metadata": {},
   "source": [
    "The following cells defines the *Namespaces* used in the parsing"
@ -61,7 +62,7 @@
  {
   "cell_type": "code",
   "execution_count": 3,
-   "id": "miniature-frontier",
+   "id": "neural-career",
   "metadata": {},
   "outputs": [],
   "source": [
@ -80,7 +81,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "rubber-interval",
+   "id": "virtual-conducting",
   "metadata": {},
   "source": [
    "Download **Data Stewardship terminology** spreadsheet and show it to check if the operation has been executed correctly"
@ -89,7 +90,7 @@
  {
   "cell_type": "code",
   "execution_count": 4,
-   "id": "stable-olympus",
+   "id": "systematic-bachelor",
   "metadata": {},
   "outputs": [],
   "source": [
@ -100,7 +101,7 @@
  {
   "cell_type": "code",
   "execution_count": 5,
-   "id": "medical-investigator",
+   "id": "sealed-complexity",
   "metadata": {},
   "outputs": [],
   "source": [
@ -117,7 +118,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "fuzzy-disney",
+   "id": "impossible-romantic",
   "metadata": {},
   "source": [
    "Create a graph for the SKOS data and binds the namespaces to it"
@ -126,12 +127,13 @@
  {
   "cell_type": "code",
   "execution_count": 6,
-   "id": "fabulous-remains",
+   "id": "southeast-cholesterol",
   "metadata": {},
   "outputs": [],
   "source": [
    "c1rdf = rdflib.Graph()\n",
    "c1rdf.bind(\"sshocterm\", sshocterm)\n",
+    "c1rdf.bind(\"sshoccmd\", sshoccmd)\n",
    "c1rdf.bind(\"dc11\", dc11)\n",
    "c1rdf.bind(\"dct\", dct)\n",
    "c1rdf.bind(\"iso369-3\", iso369)\n",
@ -144,7 +146,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "stainless-pennsylvania",
+   "id": "duplicate-oregon",
   "metadata": {},
   "source": [
    "Insert in the graph the *SKOS.ConceptScheme*"
@ -153,13 +155,13 @@
  {
   "cell_type": "code",
   "execution_count": 7,
-   "id": "catholic-mortgage",
+   "id": "hydraulic-raising",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "<Graph identifier=N354cb351c0e54b49974ca1568175540a (<class 'rdflib.graph.Graph'>)>"
+       "<Graph identifier=Ne5aaedc87c8748e6841c5acf367152ca (<class 'rdflib.graph.Graph'>)>"
      ]
     },
     "execution_count": 7,
@ -168,11 +170,14 @@
    }
   ],
   "source": [
+    "now = datetime.datetime.today()\n",
+    "today_date=now.date()\n",
    "title=Literal(conf['Texts']['VOCABULARYTITLE'], lang=conf['Texts']['LANG'])\n",
    "description=Literal(conf['Texts']['VOCABULARYDESCRIPTION'], lang=conf['Texts']['LANG'])\n",
-    "identifier=Literal(conf['Texts']['VOCABULARYID'], lang=conf['Texts']['LANG'])\n",
+    "#identifier=Literal(conf['Texts']['VOCABULARYID'], lang=conf['Texts']['LANG'])\n",
+    "identifier=URIRef(conf['Texts']['VOCABULARYID'])\n",
    "createddate= Literal(conf['Texts']['VOCABULARYCREATEDATE'],datatype=XSD.date)\n",
-    "moddate= Literal(conf['Texts']['VOCABULARYMODDATE'],datatype=XSD.date)\n",
+    "moddate= Literal(today_date,datatype=XSD.date)\n",
    "version= Literal(conf['Texts']['VOCABULARYVERSION'],datatype=XSD.string)\n",
    "\n",
    "c1rdf.add((sshocterm[''], RDF.type, SKOS.ConceptScheme))\n",
@ -180,7 +185,7 @@
    "c1rdf.add((sshocterm[''], DC.identifier, identifier))\n",
    "c1rdf.add((sshocterm[''], DC.description, description))\n",
    "c1rdf.add((sshocterm[''], dct.created, createddate))\n",
-    "c1rdf.add((sshocterm[''], dct.modified, createddate))\n",
+    "c1rdf.add((sshocterm[''], dct.modified, moddate))\n",
    "c1rdf.add((sshocterm[''], OWL.versionInfo, version))\n",
    "c1rdf.add((sshocterm[''], dct.language, iso369.eng))\n",
    "c1rdf.add((sshocterm[''], dct.language, iso369.ger))\n",
@ -194,7 +199,7 @@
  {
   "cell_type": "code",
   "execution_count": 8,
-   "id": "encouraging-click",
+   "id": "level-score",
   "metadata": {},
   "outputs": [],
   "source": [
@ -204,7 +209,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "regulation-vehicle",
+   "id": "detailed-official",
   "metadata": {},
   "source": [
    "The following cell implements the mapping rules for creating SKOS resources."
@ -213,23 +218,25 @@
  {
   "cell_type": "code",
   "execution_count": 9,
-   "id": "amended-joshua",
+   "id": "failing-relative",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "4366\n"
+      "4577\n"
     ]
    }
   ],
   "source": [
-    "\n",
    "for index, row in df_data.iterrows():\n",
    "    \n",
    "    if row.Subject.lower()==\"preflabel\":\n",
-    "        label=row[\"Concept ID\"].strip()\n",
+    "        label=URIRef(row[\"Concept ID\"].strip())\n",
+    "        \n",
+    "       \n",
+    "        c1rdf.add((sshocterm[''], SKOS.hasTopConcept, sshocterm[label]))\n",
    "        enlabel=Literal(row[\"Term\"].strip(), lang='en')\n",
    "        frlabel=Literal(row[\"French\"].strip(), lang='fr')\n",
    "        nllabel=Literal(row['Dutch'].strip(), lang='nl')\n",
@ -237,7 +244,6 @@
    "        itlabel=Literal(row['Italian'].strip(), lang='it')\n",
    "        sllabel=Literal(row['Slovenian'].strip(), lang='sl')\n",
    "        ellabel=Literal(row['Greek'].strip(), lang='el')\n",
-    "        \n",
    "        c1rdf.add((sshocterm[label], RDF.type, SKOS.Concept))\n",
    "        c1rdf.add((sshocterm[label], SKOS.inScheme, sshocterm['']))\n",
    "        c1rdf.add((sshocterm[label], SKOS.topConceptOf, sshocterm['']))\n",
@ -281,7 +287,7 @@
    "        c1rdf.add((sshocterm[label], SKOS.definition, sldef))\n",
    "        c1rdf.add((sshocterm[label], SKOS.definition, eldef))\n",
    "        if not pd.isna(row['Source of definition']):\n",
-    "            source=Literal(row['Source of definition'].strip())\n",
+    "            source=Literal(row['Source of definition'].strip(), datatype=XSD.string)\n",
    "            #print (f'{label}, {source}')\n",
    "            c1rdf.add((sshocterm[label], dct.source, source))\n",
    "    if not pd.isna(row['Loterre Open Science Thesaurus']):\n",
@ -305,14 +311,14 @@
    "    if not pd.isna(row['Broader Concept']):\n",
    "        broc=URIRef(row['Broader Concept'])\n",
    "        c1rdf.add((sshocterm[label], SKOS.broadMatch, broc))\n",
-    "    \n",
+    "\n",
    "print(len(c1rdf))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
-   "id": "tutorial-zimbabwe",
+   "id": "earlier-slovak",
   "metadata": {},
   "outputs": [],
   "source": [
@ -322,7 +328,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "violent-reproduction",
+   "id": "arabic-buyer",
   "metadata": {},
   "source": [
    "Create a *Turtle* file in the **/data** directory with the SKOS resources for **Data Stewardship terminology** "
@ -331,7 +337,7 @@
  {
   "cell_type": "code",
   "execution_count": 11,
-   "id": "sweet-mixer",
+   "id": "treated-spotlight",
   "metadata": {},
   "outputs": [],
   "source": [
@ -341,7 +347,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "quiet-rehabilitation",
+   "id": "ruled-america",
   "metadata": {},
   "source": [
    "Download **Metadata** spreadsheet and show it to check if the operation has been executed correctly"
@ -350,7 +356,7 @@
  {
   "cell_type": "code",
   "execution_count": 12,
-   "id": "electrical-hydrogen",
+   "id": "olive-archive",
   "metadata": {},
   "outputs": [],
   "source": [
@ -361,7 +367,7 @@
  {
   "cell_type": "code",
   "execution_count": 13,
-   "id": "multiple-berry",
+   "id": "square-michael",
   "metadata": {},
   "outputs": [],
   "source": [
@ -375,7 +381,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "promising-gender",
+   "id": "explicit-routine",
   "metadata": {},
   "source": [
    "Create a graph for the SKOS data and binds the namespaces to it"
@ -384,12 +390,13 @@
  {
   "cell_type": "code",
   "execution_count": 14,
-   "id": "positive-library",
+   "id": "patient-winner",
   "metadata": {},
   "outputs": [],
   "source": [
    "ccr = rdflib.Graph()\n",
    "ccr.bind(\"sshoccmd\", sshoccmd)\n",
+    "ccr.bind(\"sshocterm\", sshocterm)\n",
    "ccr.bind(\"dc11\", dc11)\n",
    "ccr.bind(\"dct\", dct)\n",
    "ccr.bind(\"iso369-3\", iso369)\n",
@ -403,13 +410,13 @@
  {
   "cell_type": "code",
   "execution_count": 15,
-   "id": "outside-dressing",
+   "id": "least-waterproof",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "<Graph identifier=N876b7db85e864943a1e0342d4e4dafdc (<class 'rdflib.graph.Graph'>)>"
+       "<Graph identifier=Nf1a2f3298f9d4816824432b5017e57c8 (<class 'rdflib.graph.Graph'>)>"
      ]
     },
     "execution_count": 15,
@ -418,11 +425,14 @@
    }
   ],
   "source": [
+    "now = datetime.datetime.today()\n",
+    "today_date=now.date()\n",
    "title=Literal(conf['Texts']['METADATATITLE'], lang=conf['Texts']['LANG'])\n",
    "description=Literal(conf['Texts']['METADATADESCRIPTION'], lang=conf['Texts']['LANG'])\n",
-    "identifier=Literal(conf['Texts']['METADATAID'], lang=conf['Texts']['LANG'])\n",
+    "#identifier=Literal(conf['Texts']['METADATAID'], lang=conf['Texts']['LANG'])\n",
+    "identifier=URIRef(conf['Texts']['METADATAID'])\n",
    "createddate= Literal(conf['Texts']['METADATACREATEDATE'],datatype=XSD.date)\n",
-    "moddate= Literal(conf['Texts']['METADATAMODDATE'],datatype=XSD.date)\n",
+    "moddate= Literal(today_date,datatype=XSD.date)\n",
    "version= Literal(conf['Texts']['METADATAVERSION'],datatype=XSD.string)\n",
    "\n",
    "ccr.add((sshoccmd[''], RDF.type, SKOS.ConceptScheme))\n",
@ -430,7 +440,7 @@
    "ccr.add((sshoccmd[''], DC.description, description))\n",
    "ccr.add((sshoccmd[''], DC.identifier, identifier))\n",
    "ccr.add((sshoccmd[''], dct.created, createddate))\n",
-    "ccr.add((sshoccmd[''], dct.modified, createddate))\n",
+    "ccr.add((sshoccmd[''], dct.modified, moddate))\n",
    "ccr.add((sshoccmd[''], OWL.versionInfo, version))\n",
    "ccr.add((sshoccmd[''], dct.language, iso369.eng))\n",
    "ccr.add((sshoccmd[''], dct.language, iso369.ger))\n",
@ -443,7 +453,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "contemporary-familiar",
+   "id": "passing-onion",
   "metadata": {},
   "source": [
    "The following cell implements the mapping rules for creating SKOS resources."
@ -452,25 +462,27 @@
  {
   "cell_type": "code",
   "execution_count": 16,
-   "id": "comparative-matthew",
+   "id": "confirmed-montana",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "3030\n"
+      "3494\n"
     ]
    }
   ],
   "source": [
+    "topconcepts=[]\n",
    "for index, row in df_metadata.iterrows():\n",
    "    \n",
    "    label=row[\"URI\"]\n",
    "    urilabel=URIRef(label)\n",
    "    lastslash=label.rfind('/')\n",
-    "    label='sshoc_'+label[lastslash+1:]\n",
-    "    \n",
+    "    label=URIRef('sshoc_'+label[lastslash+1:])\n",
+    "    ccr.add((sshoccmd[''], SKOS.hasTopConcept, sshoccmd[label]))\n",
+    "    #topconcepts.append(Literal(sshoccmd[label]))\n",
    "    \n",
    "    strsource=row['source']\n",
    "    \n",
@ -512,14 +524,14 @@
    "   \n",
    "    ccr.add((sshoccmd[label], dct.source, source))\n",
    "    ccr.add((sshoccmd[label], SKOS.exactMatch, urilabel))\n",
+    "    ccr.add((sshoccmd[label], SKOS.topConceptOf, sshoccmd['']))\n",
    "    \n",
-    "        \n",
    "print(len(ccr))"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "reflected-dealer",
+   "id": "connected-honey",
   "metadata": {},
   "source": [
    "Create a *Turtle* file in the **/data** directory with the SKOS resources for **Metadata** "
@ -528,7 +540,7 @@
  {
   "cell_type": "code",
   "execution_count": 17,
-   "id": "located-metadata",
+   "id": "greater-thunder",
   "metadata": {},
   "outputs": [],
   "source": [
@ -539,7 +551,7 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "geographic-format",
+   "id": "elementary-graphics",
   "metadata": {},
   "outputs": [],
   "source": []