3409 lines
295 KiB
Plaintext
3409 lines
295 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Review of data ingested from TAPoR (draft)\n",
|
|||
|
"\n",
|
|||
|
"This is document cheks the TAPoR dataset using the python library Pandas.\n",
|
|||
|
"\n",
|
|||
|
"Reference to ticket: https://gitlab.gwdg.de/sshoc/data-ingestion/-/issues/7\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Preamble"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 79,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import ast\n",
|
|||
|
"import sys\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"from bokeh.io import output_notebook, show\n",
|
|||
|
"from bokeh.plotting import figure\n",
|
|||
|
"\n",
|
|||
|
"from im_tutorials.data import *\n",
|
|||
|
"from im_tutorials.utilities import flatten_lists\n",
|
|||
|
"from im_tutorials.features.text_preprocessing import *\n",
|
|||
|
"from im_tutorials.features.document_vectors import document_vector\n",
|
|||
|
"from im_tutorials.features.dim_reduction import WrapTSNE, GaussianMixtureEval\n",
|
|||
|
"# for db\n",
|
|||
|
"import sqlalchemy as db\n",
|
|||
|
"from sqlalchemy import *"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 107,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"engine = create_engine(\n",
|
|||
|
" \"connection_string\")\n",
|
|||
|
"connection = engine.connect()\n",
|
|||
|
"metadata = db.MetaData()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Import data"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Query the DB to get TAPoR data\n",
|
|||
|
"\n",
|
|||
|
"The TAPoR dataset used in this document is the sql dump published by Education and Research Archive (ERA) University of Alberta: \n",
|
|||
|
"\n",
|
|||
|
"https://era.library.ualberta.ca/items/f2da0666-f523-44d4-a83c-fa06351a1e94 \n",
|
|||
|
"\n",
|
|||
|
"(creation date: 2020-01-01).\n",
|
|||
|
"The table *tool* contains 1504 records, each one describing a tool. \n",
|
|||
|
"Records have been filtered according the value of the field *tool.is_approved*, there are 1363 *approved* records.\n",
|
|||
|
"In this document this dataset will be called the **TAPoR dataset**.\n",
|
|||
|
"\n",
|
|||
|
"*Note that the TAPoR dataset reviewed here is not the same that has been used for the MP ingestion, this document will be update when we'll have it*\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 108,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"RangeIndex(start=0, stop=1363, step=1)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 108,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools=pd.read_sql_query('SELECT * FROM TaPOR.tools where is_approved=1 order by last_updated', connection)\n",
|
|||
|
"df_db_tools.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### An example of TAPoR item\n",
|
|||
|
"Let's take a look at a random TAPoR dataset entry.\n",
|
|||
|
"(The database schema of the TAPoR dataset is described here: https://era.library.ualberta.ca/items/f2da0666-f523-44d4-a83c-fa06351a1e94/download/8057eae2-3fae-4afa-bc8e-6dcc2a257b6f.)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 116,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"id 254\n",
|
|||
|
"user_id NaN\n",
|
|||
|
"name TextQuest\n",
|
|||
|
"detail <p>TextQuest is a text analysis program availa...\n",
|
|||
|
"url http://www.textquest.de/pages/en/general-infor...\n",
|
|||
|
"is_approved 1\n",
|
|||
|
"creators_name Social Science Consulting\n",
|
|||
|
"creators_email info@textquest.de\n",
|
|||
|
"creators_url http://www.textquest.de/\n",
|
|||
|
"image_url images/tools/0/254.png\n",
|
|||
|
"star_average 0\n",
|
|||
|
"is_hidden 0\n",
|
|||
|
"last_updated 2013-05-13\n",
|
|||
|
"documentation_url http://www.textquest.de/pages/en/analysis-of-t...\n",
|
|||
|
"code None\n",
|
|||
|
"repository \n",
|
|||
|
"language NaN\n",
|
|||
|
"nature 0\n",
|
|||
|
"created_at 2013-05-13 18:57:27\n",
|
|||
|
"updated_at 2017-10-31 14:25:28\n",
|
|||
|
"recipes \n",
|
|||
|
"Name: 500, dtype: object"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 116,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#df_db_tools.dtypes\n",
|
|||
|
"df_db_tools.iloc[500]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The following table shows 5 records of the TAPoR dataset."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 111,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>user_id</th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>detail</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>is_approved</th>\n",
|
|||
|
" <th>creators_name</th>\n",
|
|||
|
" <th>creators_email</th>\n",
|
|||
|
" <th>creators_url</th>\n",
|
|||
|
" <th>image_url</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>is_hidden</th>\n",
|
|||
|
" <th>last_updated</th>\n",
|
|||
|
" <th>documentation_url</th>\n",
|
|||
|
" <th>code</th>\n",
|
|||
|
" <th>repository</th>\n",
|
|||
|
" <th>language</th>\n",
|
|||
|
" <th>nature</th>\n",
|
|||
|
" <th>created_at</th>\n",
|
|||
|
" <th>updated_at</th>\n",
|
|||
|
" <th>recipes</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>906</th>\n",
|
|||
|
" <td>937</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>140kit</td>\n",
|
|||
|
" <td><p>140kit provides a management layer for twee...</td>\n",
|
|||
|
" <td>https://github.com/WebEcologyProject/140kit</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Ian Pearce, Devin Gaffney</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>images/tools/1/937.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2018-10-05</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2015-05-24 00:00:00</td>\n",
|
|||
|
" <td>2018-10-05 04:43:34</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>334</th>\n",
|
|||
|
" <td>1229</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>3DVIA Virtools</td>\n",
|
|||
|
" <td><p>A software tool for the creation of 3D inte...</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Dassault Systemes</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2014-12-29 00:00:00</td>\n",
|
|||
|
" <td>2014-12-29 00:00:00</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>688</th>\n",
|
|||
|
" <td>783</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>4th Dimension</td>\n",
|
|||
|
" <td>4th Dimension is a graphic environment for dev...</td>\n",
|
|||
|
" <td>http://www.4d.com/products/4d2004/4dstandarded...</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>4D</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>http://www.4d.com/</td>\n",
|
|||
|
" <td>images/tools/1/783.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2018-09-18</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2015-05-24 00:00:00</td>\n",
|
|||
|
" <td>2018-09-18 20:39:31</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1156</th>\n",
|
|||
|
" <td>648</td>\n",
|
|||
|
" <td>937.0</td>\n",
|
|||
|
" <td>80legs</td>\n",
|
|||
|
" <td>80legs is a web crawling service. You need to ...</td>\n",
|
|||
|
" <td>http://80legs.com/</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>80legs</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>images/tools/1/648.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2018-10-30</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2017-10-15 23:04:46</td>\n",
|
|||
|
" <td>2018-10-30 16:03:45</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>770</th>\n",
|
|||
|
" <td>1454</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>960 Grid System</td>\n",
|
|||
|
" <td><p>960 Grid System is a CSS template that come...</td>\n",
|
|||
|
" <td>https://960.gs/</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Nathan Smith</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>http://sonspring.com/</td>\n",
|
|||
|
" <td>images/tools/2/1454.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2018-09-27</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>https://github.com/nathansmith/960-Grid-System</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2014-12-29 00:00:00</td>\n",
|
|||
|
" <td>2018-09-27 22:29:43</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>5 rows × 21 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id user_id name \\\n",
|
|||
|
"906 937 1.0 140kit \n",
|
|||
|
"334 1229 1.0 3DVIA Virtools \n",
|
|||
|
"688 783 1.0 4th Dimension \n",
|
|||
|
"1156 648 937.0 80legs \n",
|
|||
|
"770 1454 1.0 960 Grid System \n",
|
|||
|
"\n",
|
|||
|
" detail \\\n",
|
|||
|
"906 <p>140kit provides a management layer for twee... \n",
|
|||
|
"334 <p>A software tool for the creation of 3D inte... \n",
|
|||
|
"688 4th Dimension is a graphic environment for dev... \n",
|
|||
|
"1156 80legs is a web crawling service. You need to ... \n",
|
|||
|
"770 <p>960 Grid System is a CSS template that come... \n",
|
|||
|
"\n",
|
|||
|
" url is_approved \\\n",
|
|||
|
"906 https://github.com/WebEcologyProject/140kit 1 \n",
|
|||
|
"334 None 1 \n",
|
|||
|
"688 http://www.4d.com/products/4d2004/4dstandarded... 1 \n",
|
|||
|
"1156 http://80legs.com/ 1 \n",
|
|||
|
"770 https://960.gs/ 1 \n",
|
|||
|
"\n",
|
|||
|
" creators_name creators_email creators_url \\\n",
|
|||
|
"906 Ian Pearce, Devin Gaffney None None \n",
|
|||
|
"334 Dassault Systemes None None \n",
|
|||
|
"688 4D None http://www.4d.com/ \n",
|
|||
|
"1156 80legs \n",
|
|||
|
"770 Nathan Smith None http://sonspring.com/ \n",
|
|||
|
"\n",
|
|||
|
" image_url ... is_hidden last_updated documentation_url \\\n",
|
|||
|
"906 images/tools/1/937.png ... 0 2018-10-05 None \n",
|
|||
|
"334 None ... 0 None None \n",
|
|||
|
"688 images/tools/1/783.png ... 0 2018-09-18 None \n",
|
|||
|
"1156 images/tools/1/648.png ... 0 2018-10-30 None \n",
|
|||
|
"770 images/tools/2/1454.png ... 0 2018-09-27 None \n",
|
|||
|
"\n",
|
|||
|
" code repository language nature \\\n",
|
|||
|
"906 None None NaN 0 \n",
|
|||
|
"334 None None NaN 0 \n",
|
|||
|
"688 None None NaN 0 \n",
|
|||
|
"1156 None NaN 0 \n",
|
|||
|
"770 None https://github.com/nathansmith/960-Grid-System NaN 0 \n",
|
|||
|
"\n",
|
|||
|
" created_at updated_at recipes \n",
|
|||
|
"906 2015-05-24 00:00:00 2018-10-05 04:43:34 \n",
|
|||
|
"334 2014-12-29 00:00:00 2014-12-29 00:00:00 \n",
|
|||
|
"688 2015-05-24 00:00:00 2018-09-18 20:39:31 \n",
|
|||
|
"1156 2017-10-15 23:04:46 2018-10-30 16:03:45 \n",
|
|||
|
"770 2014-12-29 00:00:00 2018-09-27 22:29:43 \n",
|
|||
|
"\n",
|
|||
|
"[5 rows x 21 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 111,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools.sort_values('name').head(5)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Check for duplicates in TAPoR dataset\n",
|
|||
|
"Considering the values for 'name' and 'url', it appears that in the TAPoR dataset there are 4 duplicated descriptions"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 117,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>user_id</th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>detail</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>is_approved</th>\n",
|
|||
|
" <th>creators_name</th>\n",
|
|||
|
" <th>creators_email</th>\n",
|
|||
|
" <th>creators_url</th>\n",
|
|||
|
" <th>image_url</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>is_hidden</th>\n",
|
|||
|
" <th>last_updated</th>\n",
|
|||
|
" <th>documentation_url</th>\n",
|
|||
|
" <th>code</th>\n",
|
|||
|
" <th>repository</th>\n",
|
|||
|
" <th>language</th>\n",
|
|||
|
" <th>nature</th>\n",
|
|||
|
" <th>created_at</th>\n",
|
|||
|
" <th>updated_at</th>\n",
|
|||
|
" <th>recipes</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1358</th>\n",
|
|||
|
" <td>148</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>AntConc</td>\n",
|
|||
|
" <td>AntConc is free concordance software. It is mu...</td>\n",
|
|||
|
" <td>http://www.laurenceanthony.net/software/antconc/</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Laurence Anthony</td>\n",
|
|||
|
" <td>anthony@waseda.jp</td>\n",
|
|||
|
" <td>http://www.antlab.sci.waseda.ac.jp/index.html</td>\n",
|
|||
|
" <td>images/tools/0/148.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2019-08-19</td>\n",
|
|||
|
" <td>http://www.laurenceanthony.net/software/antcon...</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2012-07-30 18:25:44</td>\n",
|
|||
|
" <td>2019-08-19 00:37:45</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1362</th>\n",
|
|||
|
" <td>1565</td>\n",
|
|||
|
" <td>1201.0</td>\n",
|
|||
|
" <td>SentiStrength</td>\n",
|
|||
|
" <td>SentiStrength is a sentiment analysis (opinion...</td>\n",
|
|||
|
" <td>http://sentistrength.wlv.ac.uk/</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Mike Thelwall</td>\n",
|
|||
|
" <td>m.thelwall@wlv.ac.uk</td>\n",
|
|||
|
" <td>http://sentistrength.wlv.ac.uk</td>\n",
|
|||
|
" <td>images/tools/3/1565.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2019-09-27</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2019-09-20 05:03:47</td>\n",
|
|||
|
" <td>2019-09-27 10:03:35</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>652</th>\n",
|
|||
|
" <td>580</td>\n",
|
|||
|
" <td>937.0</td>\n",
|
|||
|
" <td>Voyant 2.0: Knots</td>\n",
|
|||
|
" <td>Voyant Knots is a visualization where a line i...</td>\n",
|
|||
|
" <td>http://voyant-tools.org/?view=knots</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Stéfan Sinclair and Geoffrey Rockwell</td>\n",
|
|||
|
" <td>stefan.sinclair@mcgill.ca</td>\n",
|
|||
|
" <td>http://stefansinclair.name/</td>\n",
|
|||
|
" <td>images/tools/1/580.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>2016-04-29</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2016-04-29 16:08:28</td>\n",
|
|||
|
" <td>2017-10-31 14:26:36</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>653</th>\n",
|
|||
|
" <td>581</td>\n",
|
|||
|
" <td>937.0</td>\n",
|
|||
|
" <td>Voyant 2.0: Knots</td>\n",
|
|||
|
" <td>Voyant Knots is a visualization where a line i...</td>\n",
|
|||
|
" <td>http://voyant-tools.org/?view=knots</td>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>Stéfan Sinclair and Geoffrey Rockwell</td>\n",
|
|||
|
" <td>stefan.sinclair@mcgill.ca</td>\n",
|
|||
|
" <td>http://stefansinclair.name/</td>\n",
|
|||
|
" <td>images/tools/1/581.png</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2016-04-29</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2016-04-29 16:11:55</td>\n",
|
|||
|
" <td>2017-10-31 14:26:36</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>4 rows × 21 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id user_id name \\\n",
|
|||
|
"1358 148 NaN AntConc \n",
|
|||
|
"1362 1565 1201.0 SentiStrength \n",
|
|||
|
"652 580 937.0 Voyant 2.0: Knots \n",
|
|||
|
"653 581 937.0 Voyant 2.0: Knots \n",
|
|||
|
"\n",
|
|||
|
" detail \\\n",
|
|||
|
"1358 AntConc is free concordance software. It is mu... \n",
|
|||
|
"1362 SentiStrength is a sentiment analysis (opinion... \n",
|
|||
|
"652 Voyant Knots is a visualization where a line i... \n",
|
|||
|
"653 Voyant Knots is a visualization where a line i... \n",
|
|||
|
"\n",
|
|||
|
" url is_approved \\\n",
|
|||
|
"1358 http://www.laurenceanthony.net/software/antconc/ 1 \n",
|
|||
|
"1362 http://sentistrength.wlv.ac.uk/ 1 \n",
|
|||
|
"652 http://voyant-tools.org/?view=knots 1 \n",
|
|||
|
"653 http://voyant-tools.org/?view=knots 1 \n",
|
|||
|
"\n",
|
|||
|
" creators_name creators_email \\\n",
|
|||
|
"1358 Laurence Anthony anthony@waseda.jp \n",
|
|||
|
"1362 Mike Thelwall m.thelwall@wlv.ac.uk \n",
|
|||
|
"652 Stéfan Sinclair and Geoffrey Rockwell stefan.sinclair@mcgill.ca \n",
|
|||
|
"653 Stéfan Sinclair and Geoffrey Rockwell stefan.sinclair@mcgill.ca \n",
|
|||
|
"\n",
|
|||
|
" creators_url image_url \\\n",
|
|||
|
"1358 http://www.antlab.sci.waseda.ac.jp/index.html images/tools/0/148.png \n",
|
|||
|
"1362 http://sentistrength.wlv.ac.uk images/tools/3/1565.png \n",
|
|||
|
"652 http://stefansinclair.name/ images/tools/1/580.png \n",
|
|||
|
"653 http://stefansinclair.name/ images/tools/1/581.png \n",
|
|||
|
"\n",
|
|||
|
" ... is_hidden last_updated \\\n",
|
|||
|
"1358 ... 0 2019-08-19 \n",
|
|||
|
"1362 ... 0 2019-09-27 \n",
|
|||
|
"652 ... 1 2016-04-29 \n",
|
|||
|
"653 ... 0 2016-04-29 \n",
|
|||
|
"\n",
|
|||
|
" documentation_url code repository \\\n",
|
|||
|
"1358 http://www.laurenceanthony.net/software/antcon... None \n",
|
|||
|
"1362 None None \n",
|
|||
|
"652 None None \n",
|
|||
|
"653 None None \n",
|
|||
|
"\n",
|
|||
|
" language nature created_at updated_at recipes \n",
|
|||
|
"1358 NaN 0 2012-07-30 18:25:44 2019-08-19 00:37:45 \n",
|
|||
|
"1362 NaN 0 2019-09-20 05:03:47 2019-09-27 10:03:35 \n",
|
|||
|
"652 NaN 0 2016-04-29 16:08:28 2017-10-31 14:26:36 \n",
|
|||
|
"653 NaN 0 2016-04-29 16:11:55 2017-10-31 14:26:36 \n",
|
|||
|
"\n",
|
|||
|
"[4 rows x 21 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 117,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"duplicateRowsDF0 = df_db_tools[df_db_tools.duplicated(['name', 'url'])].sort_values('name')\n",
|
|||
|
"#print(\"The (possibly) duplicated items in TAPoR dataset:\")\n",
|
|||
|
"duplicateRowsDF0.head(15)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Get the ingested TAPoR data in the Market Place (using the API)\n",
|
|||
|
"\n",
|
|||
|
"The SSHOC Market Place API entry: \n",
|
|||
|
"\n",
|
|||
|
" https://sshoc-marketplace-api.acdh-dev.oeaw.ac.at/api/tools\n",
|
|||
|
"\n",
|
|||
|
"has been used to extract the TAPoR descriptions imported in the SSHOC Market Place. In the rest of the document this dataset will be called: **MP dataset**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"RangeIndex(start=0, stop=1353, step=1)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#x = ('2','3','4','5')\n",
|
|||
|
"x = pd.Series(range(2,69))\n",
|
|||
|
"url = 'https://sshoc-marketplace-api.acdh-dev.oeaw.ac.at/api/tools?page=1&perpage=20'\n",
|
|||
|
"df_tool_all = pd.read_json(url, orient='columns')\n",
|
|||
|
"for var in x:\n",
|
|||
|
" url = \"https://sshoc-marketplace-api.acdh-dev.oeaw.ac.at/api/tools?page=\"+str(var)+\"&perpage=20\"\n",
|
|||
|
" df_tool_par=pd.read_json(url, orient='columns')\n",
|
|||
|
" df_tool_all=df_tool_all.append(df_tool_par, ignore_index=True)\n",
|
|||
|
" # print(\"url: \"+ url + \":\",var)\n",
|
|||
|
"df_tool_all.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"There are 1353 tool descriptions in MP dataset. The following table shows 10 records of the MP dataset."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Let's take a look at row 500 of the MP dataset"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 113,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"id 1388\n",
|
|||
|
"category tool\n",
|
|||
|
"label InTEXT\n",
|
|||
|
"version None\n",
|
|||
|
"description InTEXT is a legacy, commercial suite of progra...\n",
|
|||
|
"licenses []\n",
|
|||
|
"contributors [{'actor': {'id': 956, 'name': 'InTEXT Systems...\n",
|
|||
|
"properties [{'id': 14091, 'type': {'code': 'tadirah-metho...\n",
|
|||
|
"accessibleAt http://intext.com/\n",
|
|||
|
"sourceItemId 247\n",
|
|||
|
"relatedItems []\n",
|
|||
|
"informationContributors [{'id': 4, 'username': 'System importer', 'dis...\n",
|
|||
|
"lastInfoUpdate 2020-06-28T18:25:58+0000\n",
|
|||
|
"status ingested\n",
|
|||
|
"comments []\n",
|
|||
|
"olderVersions []\n",
|
|||
|
"newerVersions []\n",
|
|||
|
"repository None\n",
|
|||
|
"source.id 1\n",
|
|||
|
"source.label TAPoR\n",
|
|||
|
"source.url http://tapor.ca\n",
|
|||
|
"source.urlTemplate http://tapor.ca/tools/{source-item-id}\n",
|
|||
|
"source NaN\n",
|
|||
|
"Name: 500, dtype: object"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 113,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#descriptions are in JSON, create a dataframe\n",
|
|||
|
"df_tool_flat = pd.json_normalize(df_tool_all['tools'])\n",
|
|||
|
"df_tool_flat.iloc[500]\n",
|
|||
|
"#df_tool_flat.sort_values('label').head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"In the MP dataset there are 1353 tool descriptions."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"RangeIndex(start=0, stop=1353, step=1)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_tool_flat.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Considering the values for 'label' and 'accessibleAT', it appears that in the MP dataset there are 9 duplicated descriptions"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 94,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>category</th>\n",
|
|||
|
" <th>label</th>\n",
|
|||
|
" <th>version</th>\n",
|
|||
|
" <th>description</th>\n",
|
|||
|
" <th>licenses</th>\n",
|
|||
|
" <th>contributors</th>\n",
|
|||
|
" <th>properties</th>\n",
|
|||
|
" <th>accessibleAt</th>\n",
|
|||
|
" <th>sourceItemId</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>status</th>\n",
|
|||
|
" <th>comments</th>\n",
|
|||
|
" <th>olderVersions</th>\n",
|
|||
|
" <th>newerVersions</th>\n",
|
|||
|
" <th>repository</th>\n",
|
|||
|
" <th>source.id</th>\n",
|
|||
|
" <th>source.label</th>\n",
|
|||
|
" <th>source.url</th>\n",
|
|||
|
" <th>source.urlTemplate</th>\n",
|
|||
|
" <th>source</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>326</th>\n",
|
|||
|
" <td>335</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>EVI-LINHD</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>EVI-LINHD is a free and open-source cloud plat...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 275, 'name': 'Elena González...</td>\n",
|
|||
|
" <td>[{'id': 2702, 'type': {'code': 'thumbnail', 'l...</td>\n",
|
|||
|
" <td>http://www.evilinhd.com/</td>\n",
|
|||
|
" <td>594</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>532</th>\n",
|
|||
|
" <td>776</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>JSAN</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>The Integrated JStylo and Anonymouth Package. ...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 493, 'name': '18th Connect',...</td>\n",
|
|||
|
" <td>[{'id': 7310, 'type': {'code': 'thumbnail', 'l...</td>\n",
|
|||
|
" <td>https://github.com/psal/jstylo</td>\n",
|
|||
|
" <td>1559</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>533</th>\n",
|
|||
|
" <td>451</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>JSAN</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>The Integrated JStylo and Anonymouth Package. ...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 493, 'name': '18th Connect',...</td>\n",
|
|||
|
" <td>[{'id': 4037, 'type': {'code': 'keyword', 'lab...</td>\n",
|
|||
|
" <td>https://github.com/psal/jstylo</td>\n",
|
|||
|
" <td>1557</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>697</th>\n",
|
|||
|
" <td>1186</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>NodeXL</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>NodeXL is a free, open source tool for generat...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 832, 'name': 'M. Smith, N. M...</td>\n",
|
|||
|
" <td>[{'id': 11766, 'type': {'code': 'license-type'...</td>\n",
|
|||
|
" <td>http://nodexl.codeplex.com/</td>\n",
|
|||
|
" <td>482</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>854</th>\n",
|
|||
|
" <td>560</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>Python Tools for Text-Analysis</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>This is a set of simple, free tools for analyz...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 424, 'name': 'David L. Hoove...</td>\n",
|
|||
|
" <td>[{'id': 5060, 'type': {'code': 'thumbnail', 'l...</td>\n",
|
|||
|
" <td>https://wp.nyu.edu/exceltextanalysis/python_to...</td>\n",
|
|||
|
" <td>1507</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>947</th>\n",
|
|||
|
" <td>1136</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>SentiStrength</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>SentiStrength is a tool for sentiment analysis...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 799, 'name': 'Thelwall, M., ...</td>\n",
|
|||
|
" <td>[{'id': 11290, 'type': {'code': 'keyword', 'la...</td>\n",
|
|||
|
" <td>http://sentistrength.wlv.ac.uk/</td>\n",
|
|||
|
" <td>453</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>948</th>\n",
|
|||
|
" <td>378</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>SentiStrength</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>It is a sentiment analysis program. Automatic ...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 493, 'name': '18th Connect',...</td>\n",
|
|||
|
" <td>[{'id': 3210, 'type': {'code': 'thumbnail', 'l...</td>\n",
|
|||
|
" <td>http://sentistrength.wlv.ac.uk/</td>\n",
|
|||
|
" <td>1564</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1187</th>\n",
|
|||
|
" <td>607</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>UCINET</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>UCINET is a social media analysis set for soft...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 459, 'name': 'Borgatti, S.P....</td>\n",
|
|||
|
" <td>[{'id': 5501, 'type': {'code': 'tadirah-method...</td>\n",
|
|||
|
" <td>https://sites.google.com/site/ucinetsoftware/home</td>\n",
|
|||
|
" <td>576</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>476</th>\n",
|
|||
|
" <td>165</td>\n",
|
|||
|
" <td>tool</td>\n",
|
|||
|
" <td>igraph</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>igraph is an open source collection of network...</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[{'actor': {'id': 147, 'name': 'Gábor Csárdi, ...</td>\n",
|
|||
|
" <td>[{'id': 771, 'type': {'code': 'tadirah-methods...</td>\n",
|
|||
|
" <td>http://igraph.org/</td>\n",
|
|||
|
" <td>623</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>ingested</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>[]</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" <td>TAPoR</td>\n",
|
|||
|
" <td>http://tapor.ca</td>\n",
|
|||
|
" <td>http://tapor.ca/tools/{source-item-id}</td>\n",
|
|||
|
" <td>NaN</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>9 rows × 23 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id category label version \\\n",
|
|||
|
"326 335 tool EVI-LINHD None \n",
|
|||
|
"532 776 tool JSAN None \n",
|
|||
|
"533 451 tool JSAN None \n",
|
|||
|
"697 1186 tool NodeXL None \n",
|
|||
|
"854 560 tool Python Tools for Text-Analysis None \n",
|
|||
|
"947 1136 tool SentiStrength None \n",
|
|||
|
"948 378 tool SentiStrength None \n",
|
|||
|
"1187 607 tool UCINET None \n",
|
|||
|
"476 165 tool igraph None \n",
|
|||
|
"\n",
|
|||
|
" description licenses \\\n",
|
|||
|
"326 EVI-LINHD is a free and open-source cloud plat... [] \n",
|
|||
|
"532 The Integrated JStylo and Anonymouth Package. ... [] \n",
|
|||
|
"533 The Integrated JStylo and Anonymouth Package. ... [] \n",
|
|||
|
"697 NodeXL is a free, open source tool for generat... [] \n",
|
|||
|
"854 This is a set of simple, free tools for analyz... [] \n",
|
|||
|
"947 SentiStrength is a tool for sentiment analysis... [] \n",
|
|||
|
"948 It is a sentiment analysis program. Automatic ... [] \n",
|
|||
|
"1187 UCINET is a social media analysis set for soft... [] \n",
|
|||
|
"476 igraph is an open source collection of network... [] \n",
|
|||
|
"\n",
|
|||
|
" contributors \\\n",
|
|||
|
"326 [{'actor': {'id': 275, 'name': 'Elena González... \n",
|
|||
|
"532 [{'actor': {'id': 493, 'name': '18th Connect',... \n",
|
|||
|
"533 [{'actor': {'id': 493, 'name': '18th Connect',... \n",
|
|||
|
"697 [{'actor': {'id': 832, 'name': 'M. Smith, N. M... \n",
|
|||
|
"854 [{'actor': {'id': 424, 'name': 'David L. Hoove... \n",
|
|||
|
"947 [{'actor': {'id': 799, 'name': 'Thelwall, M., ... \n",
|
|||
|
"948 [{'actor': {'id': 493, 'name': '18th Connect',... \n",
|
|||
|
"1187 [{'actor': {'id': 459, 'name': 'Borgatti, S.P.... \n",
|
|||
|
"476 [{'actor': {'id': 147, 'name': 'Gábor Csárdi, ... \n",
|
|||
|
"\n",
|
|||
|
" properties \\\n",
|
|||
|
"326 [{'id': 2702, 'type': {'code': 'thumbnail', 'l... \n",
|
|||
|
"532 [{'id': 7310, 'type': {'code': 'thumbnail', 'l... \n",
|
|||
|
"533 [{'id': 4037, 'type': {'code': 'keyword', 'lab... \n",
|
|||
|
"697 [{'id': 11766, 'type': {'code': 'license-type'... \n",
|
|||
|
"854 [{'id': 5060, 'type': {'code': 'thumbnail', 'l... \n",
|
|||
|
"947 [{'id': 11290, 'type': {'code': 'keyword', 'la... \n",
|
|||
|
"948 [{'id': 3210, 'type': {'code': 'thumbnail', 'l... \n",
|
|||
|
"1187 [{'id': 5501, 'type': {'code': 'tadirah-method... \n",
|
|||
|
"476 [{'id': 771, 'type': {'code': 'tadirah-methods... \n",
|
|||
|
"\n",
|
|||
|
" accessibleAt sourceItemId ... \\\n",
|
|||
|
"326 http://www.evilinhd.com/ 594 ... \n",
|
|||
|
"532 https://github.com/psal/jstylo 1559 ... \n",
|
|||
|
"533 https://github.com/psal/jstylo 1557 ... \n",
|
|||
|
"697 http://nodexl.codeplex.com/ 482 ... \n",
|
|||
|
"854 https://wp.nyu.edu/exceltextanalysis/python_to... 1507 ... \n",
|
|||
|
"947 http://sentistrength.wlv.ac.uk/ 453 ... \n",
|
|||
|
"948 http://sentistrength.wlv.ac.uk/ 1564 ... \n",
|
|||
|
"1187 https://sites.google.com/site/ucinetsoftware/home 576 ... \n",
|
|||
|
"476 http://igraph.org/ 623 ... \n",
|
|||
|
"\n",
|
|||
|
" status comments olderVersions newerVersions repository source.id \\\n",
|
|||
|
"326 ingested [] [] [] None 1.0 \n",
|
|||
|
"532 ingested [] [] [] None 1.0 \n",
|
|||
|
"533 ingested [] [] [] None 1.0 \n",
|
|||
|
"697 ingested [] [] [] None 1.0 \n",
|
|||
|
"854 ingested [] [] [] None 1.0 \n",
|
|||
|
"947 ingested [] [] [] None 1.0 \n",
|
|||
|
"948 ingested [] [] [] None 1.0 \n",
|
|||
|
"1187 ingested [] [] [] None 1.0 \n",
|
|||
|
"476 ingested [] [] [] None 1.0 \n",
|
|||
|
"\n",
|
|||
|
" source.label source.url source.urlTemplate \\\n",
|
|||
|
"326 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"532 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"533 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"697 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"854 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"947 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"948 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"1187 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"476 TAPoR http://tapor.ca http://tapor.ca/tools/{source-item-id} \n",
|
|||
|
"\n",
|
|||
|
" source \n",
|
|||
|
"326 NaN \n",
|
|||
|
"532 NaN \n",
|
|||
|
"533 NaN \n",
|
|||
|
"697 NaN \n",
|
|||
|
"854 NaN \n",
|
|||
|
"947 NaN \n",
|
|||
|
"948 NaN \n",
|
|||
|
"1187 NaN \n",
|
|||
|
"476 NaN \n",
|
|||
|
"\n",
|
|||
|
"[9 rows x 23 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 94,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"test_p_d=df_tool_flat[df_tool_flat.duplicated(['label', 'accessibleAt'])].sort_values('label')\n",
|
|||
|
"test_p_d"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#df_tool_flat.dtypes "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"0 https://github.com/WebEcologyProject/140kit\n",
|
|||
|
"1 \n",
|
|||
|
"2 http://www.4d.com/products/4d2004/4dstandarded...\n",
|
|||
|
"3 http://80legs.com/\n",
|
|||
|
"4 https://960.gs/\n",
|
|||
|
" ... \n",
|
|||
|
"1348 \n",
|
|||
|
"1349 https://www.zotero.org/\n",
|
|||
|
"1350 http://zotfile.com/\n",
|
|||
|
"1351 https://wordpress.org/plugins/zotpress/\n",
|
|||
|
"1352 http://www.zubrag.com/tools/html-tags-stripper...\n",
|
|||
|
"Name: accessibleAt, Length: 1353, dtype: object"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_tool_flat['accessibleAt'].replace(np.nan, \"\", inplace=True)\n",
|
|||
|
"df_tool_flat['accessibleAt'].replace(r'^\\s*$', \"\", regex=True)\n",
|
|||
|
"#df_tool_flat['accessibleAt'].isnull()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#dataframe for MP properties\n",
|
|||
|
"df_prop_data = pd.json_normalize(data=df_tool_all['tools'], record_path='properties', meta=['label'])\n",
|
|||
|
"#df_prop_data.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#dataframe for MP contributors\n",
|
|||
|
"df_contr_data = pd.json_normalize(data=df_tool_all['tools'], record_path='contributors', meta=['label'])\n",
|
|||
|
"#df_contr_data.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_mpdatasets=df_tool_flat.join(df_contr_data.set_index('label'), on='label')\n",
|
|||
|
"#df_mpdatasets.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Comparing TAPoR dataset and MP datasets to find import issues"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"906 https://github.com/WebEcologyProject/140kit\n",
|
|||
|
"334 \n",
|
|||
|
"688 http://www.4d.com/products/4d2004/4dstandarded...\n",
|
|||
|
"1156 http://80legs.com/\n",
|
|||
|
"770 https://960.gs/\n",
|
|||
|
" ... \n",
|
|||
|
"816 http://www.jasondavies.com/wordtree/\n",
|
|||
|
"520 http://code.google.com/p/word2vec/\n",
|
|||
|
"815 https://code.google.com/p/wordsimilarity/\n",
|
|||
|
"702 http://www.tei-c.org/Vault/MembersMeetings/200...\n",
|
|||
|
"45 \n",
|
|||
|
"Name: url, Length: 1359, dtype: object"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#create a dataframe with a subset of columns for the TAPoR dataset\n",
|
|||
|
"df_tapor_worksub=df_db_tools.sort_values('name')[['name', 'url']].drop_duplicates()\n",
|
|||
|
"df_tapor_worksub['url'].replace(np.nan, \"\", inplace=True)\n",
|
|||
|
"df_tapor_worksub['url'].replace(r\"\\s+\", np.nan, regex=True)\n",
|
|||
|
"#df_tapor_worksub['url'].isnull()\n",
|
|||
|
"#df_tapor_worksub.tail(30)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 54,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#create a dataframe with a subset of columns for the MP dataset and change column names to have homogenous formats\n",
|
|||
|
"df_mp_taporsub= df_tool_flat[df_tool_flat['source.label'] == 'TAPoR']\n",
|
|||
|
"df_mp_worksub=df_mp_taporsub.sort_values('label')[['label','accessibleAt']].drop_duplicates()\n",
|
|||
|
"df_mp_worksub=df_mp_worksub.rename(columns={\"label\": \"name\", 'accessibleAt':'url'})\n",
|
|||
|
"#df_mp_worksub['url'].isnull()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 55,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# define a function that compares dataframes\n",
|
|||
|
"def dataframe_difference(df1, df2, which):\n",
|
|||
|
" \"\"\"Find rows which are different between two DataFrames.\"\"\"\n",
|
|||
|
" comparison_df = df1.merge(df2,\n",
|
|||
|
" indicator=True,\n",
|
|||
|
" how='outer')\n",
|
|||
|
" if which is None:\n",
|
|||
|
" diff_df = comparison_df[comparison_df['_merge'] != 'both']\n",
|
|||
|
" else:\n",
|
|||
|
" diff_df = comparison_df[comparison_df['_merge'] == which]\n",
|
|||
|
" diff_df.to_csv('data/diff.csv')\n",
|
|||
|
" return diff_df"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Considering values for 'name' and 'url', there are 1260 tool descriptions in MP dataset that are identical to descriptions in TAPoR dataset"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 63,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,\n",
|
|||
|
" ...\n",
|
|||
|
" 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342],\n",
|
|||
|
" dtype='int64', length=1260)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 63,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_both=dataframe_difference(df_mp_worksub, df_tapor_worksub, 'both')\n",
|
|||
|
"df_both.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 76,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>_merge</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>140kit</td>\n",
|
|||
|
" <td>https://github.com/WebEcologyProject/140kit</td>\n",
|
|||
|
" <td>both</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>3DVIA Virtools</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>both</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>4th Dimension</td>\n",
|
|||
|
" <td>http://www.4d.com/products/4d2004/4dstandarded...</td>\n",
|
|||
|
" <td>both</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>80legs</td>\n",
|
|||
|
" <td>http://80legs.com/</td>\n",
|
|||
|
" <td>both</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>960 Grid System</td>\n",
|
|||
|
" <td>https://960.gs/</td>\n",
|
|||
|
" <td>both</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" name url _merge\n",
|
|||
|
"0 140kit https://github.com/WebEcologyProject/140kit both\n",
|
|||
|
"1 3DVIA Virtools both\n",
|
|||
|
"2 4th Dimension http://www.4d.com/products/4d2004/4dstandarded... both\n",
|
|||
|
"3 80legs http://80legs.com/ both\n",
|
|||
|
"4 960 Grid System https://960.gs/ both"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 76,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_both.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Considering values for 'name' and 'url', there are 83 tool descriptions in MP dataset but not in TAPoR dataset"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 77,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>_merge</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>142</th>\n",
|
|||
|
" <td>CONDOR</td>\n",
|
|||
|
" <td>http://www.ickn.org/ckntools.html</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>144</th>\n",
|
|||
|
" <td>CQPweb</td>\n",
|
|||
|
" <td>https://cqpweb.lancs.ac.uk/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>146</th>\n",
|
|||
|
" <td>CSV Sort</td>\n",
|
|||
|
" <td>https://bitbucket.org/richardpenman/csvsort</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>156</th>\n",
|
|||
|
" <td>CasualConc</td>\n",
|
|||
|
" <td>https://sites.google.com/site/casualconc/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>161</th>\n",
|
|||
|
" <td>Chartle</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>163</th>\n",
|
|||
|
" <td>Chorus</td>\n",
|
|||
|
" <td>http://chorusanalytics.co.uk/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>165</th>\n",
|
|||
|
" <td>Chronos Timeline</td>\n",
|
|||
|
" <td>http://hyperstudio.mit.edu/software/chronos-ti...</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>180</th>\n",
|
|||
|
" <td>Code Bubbles</td>\n",
|
|||
|
" <td>http://cs.brown.edu/~spr/codebubbles/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>184</th>\n",
|
|||
|
" <td>Colaboratory</td>\n",
|
|||
|
" <td>https://colab.research.google.com/notebooks/we...</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>214</th>\n",
|
|||
|
" <td>ContaWords</td>\n",
|
|||
|
" <td>http://contawords.iula.upf.edu/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>215</th>\n",
|
|||
|
" <td>Contropedia</td>\n",
|
|||
|
" <td>http://contropedia.net/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>220</th>\n",
|
|||
|
" <td>Cowo</td>\n",
|
|||
|
" <td>https://github.com/seinecle/Cowo/blob/master/R...</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>223</th>\n",
|
|||
|
" <td>Critic Markup</td>\n",
|
|||
|
" <td>http://criticmarkup.com/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>228</th>\n",
|
|||
|
" <td>Cytoscape</td>\n",
|
|||
|
" <td>http://www.cytoscape.org/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>254</th>\n",
|
|||
|
" <td>Density Design - Knot</td>\n",
|
|||
|
" <td>http://www.densitydesign.org/research/knot/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>255</th>\n",
|
|||
|
" <td>DfR Browser</td>\n",
|
|||
|
" <td>https://agoldst.github.io/dfr-browser/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>300</th>\n",
|
|||
|
" <td>EVI-LINHD</td>\n",
|
|||
|
" <td>http://www.evilinhd.com/</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>307</th>\n",
|
|||
|
" <td>EgoWeb 2.0</td>\n",
|
|||
|
" <td>http://www.rand.org/methods/egoweb.html</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>332</th>\n",
|
|||
|
" <td>Facepager</td>\n",
|
|||
|
" <td>https://github.com/strohne/Facepager</td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>342</th>\n",
|
|||
|
" <td>Find Locations from A Text (Named-Entity Recog...</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>left_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" name \\\n",
|
|||
|
"142 CONDOR \n",
|
|||
|
"144 CQPweb \n",
|
|||
|
"146 CSV Sort \n",
|
|||
|
"156 CasualConc \n",
|
|||
|
"161 Chartle \n",
|
|||
|
"163 Chorus \n",
|
|||
|
"165 Chronos Timeline \n",
|
|||
|
"180 Code Bubbles \n",
|
|||
|
"184 Colaboratory \n",
|
|||
|
"214 ContaWords \n",
|
|||
|
"215 Contropedia \n",
|
|||
|
"220 Cowo \n",
|
|||
|
"223 Critic Markup \n",
|
|||
|
"228 Cytoscape \n",
|
|||
|
"254 Density Design - Knot \n",
|
|||
|
"255 DfR Browser \n",
|
|||
|
"300 EVI-LINHD \n",
|
|||
|
"307 EgoWeb 2.0 \n",
|
|||
|
"332 Facepager \n",
|
|||
|
"342 Find Locations from A Text (Named-Entity Recog... \n",
|
|||
|
"\n",
|
|||
|
" url _merge \n",
|
|||
|
"142 http://www.ickn.org/ckntools.html left_only \n",
|
|||
|
"144 https://cqpweb.lancs.ac.uk/ left_only \n",
|
|||
|
"146 https://bitbucket.org/richardpenman/csvsort left_only \n",
|
|||
|
"156 https://sites.google.com/site/casualconc/ left_only \n",
|
|||
|
"161 left_only \n",
|
|||
|
"163 http://chorusanalytics.co.uk/ left_only \n",
|
|||
|
"165 http://hyperstudio.mit.edu/software/chronos-ti... left_only \n",
|
|||
|
"180 http://cs.brown.edu/~spr/codebubbles/ left_only \n",
|
|||
|
"184 https://colab.research.google.com/notebooks/we... left_only \n",
|
|||
|
"214 http://contawords.iula.upf.edu/ left_only \n",
|
|||
|
"215 http://contropedia.net/ left_only \n",
|
|||
|
"220 https://github.com/seinecle/Cowo/blob/master/R... left_only \n",
|
|||
|
"223 http://criticmarkup.com/ left_only \n",
|
|||
|
"228 http://www.cytoscape.org/ left_only \n",
|
|||
|
"254 http://www.densitydesign.org/research/knot/ left_only \n",
|
|||
|
"255 https://agoldst.github.io/dfr-browser/ left_only \n",
|
|||
|
"300 http://www.evilinhd.com/ left_only \n",
|
|||
|
"307 http://www.rand.org/methods/egoweb.html left_only \n",
|
|||
|
"332 https://github.com/strohne/Facepager left_only \n",
|
|||
|
"342 left_only "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 77,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#tools in TAPoR but not in MP datset\n",
|
|||
|
"df_lo=dataframe_difference(df_mp_worksub.sort_values('name'), df_tapor_worksub.sort_values('name'), 'left_only')\n",
|
|||
|
"# see 20 records in MP dataset but not in TAPoR\n",
|
|||
|
"df_lo.head(20)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Comparing values for 'name' and 'url', there are 99 tool descriptions in TAPoR dataset but not in MP dataset"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 70,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>_merge</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1343</th>\n",
|
|||
|
" <td>ANNIS</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1344</th>\n",
|
|||
|
" <td>Adobe Flash</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1345</th>\n",
|
|||
|
" <td>Ainm.ie</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1346</th>\n",
|
|||
|
" <td>Alpheios</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1347</th>\n",
|
|||
|
" <td>Anastasia</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1348</th>\n",
|
|||
|
" <td>ArcExplorer</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1349</th>\n",
|
|||
|
" <td>AroniSmartIntelligence™</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1350</th>\n",
|
|||
|
" <td>Aruspix</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1351</th>\n",
|
|||
|
" <td>BASE</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1352</th>\n",
|
|||
|
" <td>Basement Waterproofing: Tips and Instructions</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1353</th>\n",
|
|||
|
" <td>Berkeley Parser</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1354</th>\n",
|
|||
|
" <td>CATMA (Computer Aided Textual Markup and Analy...</td>\n",
|
|||
|
" <td>http://www.catma.de/</td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1355</th>\n",
|
|||
|
" <td>Canva \"The Amazingly Simple Graphic Design Sof...</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1356</th>\n",
|
|||
|
" <td>Chicken</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1357</th>\n",
|
|||
|
" <td>CloudConvert</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1358</th>\n",
|
|||
|
" <td>Collocate</td>\n",
|
|||
|
" <td>http://</td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1359</th>\n",
|
|||
|
" <td>Commentpress</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1360</th>\n",
|
|||
|
" <td>CoolTool NeuroLab</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1361</th>\n",
|
|||
|
" <td>Datapress</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1362</th>\n",
|
|||
|
" <td>Delicious</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>right_only</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" name url \\\n",
|
|||
|
"1343 ANNIS \n",
|
|||
|
"1344 Adobe Flash \n",
|
|||
|
"1345 Ainm.ie \n",
|
|||
|
"1346 Alpheios \n",
|
|||
|
"1347 Anastasia \n",
|
|||
|
"1348 ArcExplorer \n",
|
|||
|
"1349 AroniSmartIntelligence™ \n",
|
|||
|
"1350 Aruspix \n",
|
|||
|
"1351 BASE \n",
|
|||
|
"1352 Basement Waterproofing: Tips and Instructions \n",
|
|||
|
"1353 Berkeley Parser \n",
|
|||
|
"1354 CATMA (Computer Aided Textual Markup and Analy... http://www.catma.de/ \n",
|
|||
|
"1355 Canva \"The Amazingly Simple Graphic Design Sof... \n",
|
|||
|
"1356 Chicken \n",
|
|||
|
"1357 CloudConvert \n",
|
|||
|
"1358 Collocate http:// \n",
|
|||
|
"1359 Commentpress \n",
|
|||
|
"1360 CoolTool NeuroLab \n",
|
|||
|
"1361 Datapress \n",
|
|||
|
"1362 Delicious \n",
|
|||
|
"\n",
|
|||
|
" _merge \n",
|
|||
|
"1343 right_only \n",
|
|||
|
"1344 right_only \n",
|
|||
|
"1345 right_only \n",
|
|||
|
"1346 right_only \n",
|
|||
|
"1347 right_only \n",
|
|||
|
"1348 right_only \n",
|
|||
|
"1349 right_only \n",
|
|||
|
"1350 right_only \n",
|
|||
|
"1351 right_only \n",
|
|||
|
"1352 right_only \n",
|
|||
|
"1353 right_only \n",
|
|||
|
"1354 right_only \n",
|
|||
|
"1355 right_only \n",
|
|||
|
"1356 right_only \n",
|
|||
|
"1357 right_only \n",
|
|||
|
"1358 right_only \n",
|
|||
|
"1359 right_only \n",
|
|||
|
"1360 right_only \n",
|
|||
|
"1361 right_only \n",
|
|||
|
"1362 right_only "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 70,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#Tools in MP dataset but not in TAPoR\n",
|
|||
|
"df_ro=dataframe_difference(df_mp_worksub.sort_values('name'), df_tapor_worksub.sort_values('name'), 'right_only')\n",
|
|||
|
"df_ro.head(20)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Distribution of items in TAPoR dataset by 'last_updated' value\n",
|
|||
|
"\n",
|
|||
|
"Check the content of the field 'last_update' for TAPoR dataset descriptions. This value *seems* the date when a description of a tool has been updated the last time.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 97,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>last_updated</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>423</th>\n",
|
|||
|
" <td>List Words - HTML (TAPoRware)</td>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/htmlTo...</td>\n",
|
|||
|
" <td>2011-11-27</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>424</th>\n",
|
|||
|
" <td>List Words - XML (TAPoRware)</td>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/xmlToo...</td>\n",
|
|||
|
" <td>2011-11-27</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>425</th>\n",
|
|||
|
" <td>List Words - Plain Text (TAPoRware)</td>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/textTo...</td>\n",
|
|||
|
" <td>2011-11-28</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>426</th>\n",
|
|||
|
" <td>List Tags - HTML (TAPoRware)</td>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/htmlTo...</td>\n",
|
|||
|
" <td>2011-11-28</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>427</th>\n",
|
|||
|
" <td>List XML Elements (TAPoRware)</td>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/xmlToo...</td>\n",
|
|||
|
" <td>2011-11-28</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" name \\\n",
|
|||
|
"423 List Words - HTML (TAPoRware) \n",
|
|||
|
"424 List Words - XML (TAPoRware) \n",
|
|||
|
"425 List Words - Plain Text (TAPoRware) \n",
|
|||
|
"426 List Tags - HTML (TAPoRware) \n",
|
|||
|
"427 List XML Elements (TAPoRware) \n",
|
|||
|
"\n",
|
|||
|
" url last_updated \n",
|
|||
|
"423 http://taporware.ualberta.ca/~taporware/htmlTo... 2011-11-27 \n",
|
|||
|
"424 http://taporware.ualberta.ca/~taporware/xmlToo... 2011-11-27 \n",
|
|||
|
"425 http://taporware.ualberta.ca/~taporware/textTo... 2011-11-28 \n",
|
|||
|
"426 http://taporware.ualberta.ca/~taporware/htmlTo... 2011-11-28 \n",
|
|||
|
"427 http://taporware.ualberta.ca/~taporware/xmlToo... 2011-11-28 "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 97,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools['correctdata']=pd.to_datetime(df_db_tools['last_updated'])\n",
|
|||
|
"df_db_tools['justdata'] = df_db_tools['correctdata'].dt.year\n",
|
|||
|
"df_reg_tm_sorted=df_db_tools.sort_values('last_updated')\n",
|
|||
|
"df_reg_tools_sub=df_reg_tm_sorted[['name', 'url', 'last_updated']]\n",
|
|||
|
"df_reg_tools_sub.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 23,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Text(0.5, 1.0, 'Number of tools by year their description has been updated')"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 23,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA20AAAF3CAYAAAA2IKMeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeXxb93nn++/DXRIlURQXUJu1WCuheJOV2E68yySdNE57Jx23aet0mrq9cZvm3pk2STtt0874XrdNl+n0ZqaeJI077cRN0mbiJhFoeZF3W5ZXAdr3jeAiihIlivvv/nEObQgGRZAEeQDi83698ALxw++c8+AsEh6c5/yOOecEAAAAAMhOBUEHAAAAAAAYHUkbAAAAAGQxkjYAAAAAyGIkbQAAAACQxUjaAAAAACCLkbQBAAAAQBYjaQOygJl91cycmTWneO/7ZrZ9GmO53Y8lPF3LHA8zW29mL5jZRT/O5Sn6lPjr9NopjOPbZrZzkvPYbmbfz1RMucDMavxtszypPWP7nZl91p9X+WTnlTDPaT0Ox2JmR83sa+OcJuVxYWbL/fX1icxGOWocgf4bk4/H3Wgmsl+b2Rp/P6oIMg4g35C0AdnlHjO7MeggstyfSaqQ9ElJN0lqSdGnRNIfSpqypA0TViNv2yyfwmX8WN6+0TOFywjaT0v663FOM9px0SJvfb2Ygbgw862Rtx9lLGkDMLaioAMA8J5OSScl/Z6kTwUcy5QxszLnXO8kZrFO0hPOuaczFRMyz8xmOecuBbFs51y7pPYr9QkyvskYids591am5umc65P0aqbmBwDIPM60AdnDSfp/JH3SzDaO1skvS+lI0e7M7DcSXh81s6+Z2ZfNrMXMzpnZn5vnXjOLmVm3mf1vM1uQYlGLzOxHfhnicTP79RTL/KiZPWdmPWZ2xsz+h5nNTXh/pExts1+SdEnSb1/hs11rZk/78ztrZv9oZrX+e8vNzElaJen/8ue7fZRZdfvPf+f3e6+M0syqzOwxP94eP65NSXEU+uv5uJn1+evq50eL25+mwsy+YWanzazXn/Z/XGmahGkf9LfXJTP7sZktTnjvdTP7uxTTPGZmb44yv3r/M9+W1F5uZhfM7AsJbWNtwzoz+5aZHfbj229m/9nMShL6jJTXfcbM/t7MuiT9a4q4lkva5b98dmTbJHWrMrPv+XEeNrPPp5hPuvtd+XjiS5h+qZn9xP+8R83sc6P0C/vbq9t/fM/MQgnvF5t3DI7sR6fN7AdJ6+4qM/uOmXX4n+fdkX3tSnFbUnmk+eW6ZvYpM9vr74MvmtmGhJBTHheWojwynWMgYZlb/Lgv+susH23dJrnitjazm8zsCX+9XTSzt83sM0l9puS4898vM7M/NbMT/jp4x8zuTTGfz/nrp8/MjpnZ72RiPSXvxwntydt+u3nlhWN9njH3azNbZ2aP+5+5x/9cXzSzAv/92/X+sXPEj+9owvTL/Ok7/embzWzteOMAkIJzjgcPHgE/JH1VUoe8H1L2Sno84b3vS9qe3DfFPJyk30h4fVTembt/kdQo7wyek/SXkt6Q9DOSPiPprKT/njDd7X6/E/KSyAZJf+u3fSKh3y2S+iT9k6R7Jf2ipFOSvp/Q57P+dIck/QdJd0i6bpR1UC2pS9Ir8s40/oIf/7vyyrpKJX1EXinXP/p/bxhlXnf4y/1Pfr+PSCr133tRUlzSL0v6KUnPy/sye3XC9A9LGpD0H/3P/6g/v59L6PNtSTsTXn/L33b/VtJtfvyPjrHdt/vrbJe/PX7eX++vJ/T5NUkXJJUntJX7bb95hXm/IunbSW2/7G+zqnFsw42SvuZvk9sk/arf528T+iz310+LpP9P0hZJd6aIqdT/jE7S50e2TdJ+d8Bf71v8deokbZ7gflc+nvj8vibpTUnH/Vh/xt8+p3T5cXi1pHOSnvbXzf8habek1yWZ3+cP/GU+IOlWST/r7zez/PdrJJ2WdNCP+S5JvyXpS2PFLe/4/lrS/tgu6bC843ok7hOSyq50XCQsJ/H4TvcYaJP0trz9/pOS9kuKjayDUdZxutv6fklf8rfznZJ+X1J/UgxTctz5/X7kf77/U9I9kr4haVDStQl9fttfTw/7n+PL8vbP38jAevqsEvbjpH/bvzaez6P09+u7JP2RvH8bb5f0RXn7+Vf89+dJ+vd+XD/t70PX+e9V+vN/S96+/gl5/96e0Pv7fFpx8ODB44OPwAPgwYPH5YmY/x/1kKQ1/uvJJG0HJRUmtO3wv3SsSGj7U0mtCa9v9+f1aNL8t0l6NeH1C5KeTepzpz9tOOGzOEm/lcY6eERe0jYvoW2zPvhF8bIvLKPMq9yf7rNJ7Y1++20JbXPkfdn9W/91paSLkv4wadqfSNqX8Prbujxpi+oKSdQocW6X94XvqoS2W/wYG/3X8/x4fjmhz7+T98Vw4RXm/Tl9MNl7XpcnN2NuwxTzLfK/bPVKKvHblvvT/CCNzxz2+96e1D6y3/1xQluxv20emeB+l5y0pRPfvX7fDye0XSXvuEk8Dv+npH0j68BvWy3v2P24//pHkv78Csv6f/1tWzfK+6PGrdRJm5N0c4q4f32M42JkOZ+YwDEwKGl1Qtun/Hmtu8LnTmtbJ01j/r73t5KemYbj7i4l/VuRcAx9L+HYvJBiPf2xvB+GCie5ni7bj6+w7dP5PGnt16Os89+VdDih/RP+vJYn9f9Pks5IqkxoWyAv6XtoonHw4MHDe1AeCWSff5D3K+RXMjCv7c65oYTXByUddc4dSWqrTizZ8v0g6fW/SLrBL5uaLW/ggu+aWdHIQ96vqgOSbkia9sdpxLpZ0pPOufMjDc65HfK+oHw0jenTsVlSu3PuuYRlXJT35XpkGWFJsyV9L2naf5K0xsxqRpn325J+28w+b2ZrxhHTm865YwnxvCTvV/nN/uvz8hL3zyZM81l51/WducJ8H/efPy1JZrZK3mf8O/91WtvQPF80s93mlbcOyDvTWSppWdIy09nOY3ly5A/n3IC8szFLxhPzFaS7H7Y6515LiOOYvLPTie6Wd4wMJ8RxRN7+OlJu+7akz5rZ75jZh8zMkuZxp6SIcy7VYDrjjVuS2pxzL6eIe3Oa048YzzFw1Dl3IOH1bv95SRrLGXVbS5KZLTCzvzazY/K274CkB+UNhDFiSo47eds3LumlpH3tab2/fW+S96PP95L6PCOpVpevg8msp0x8nrT2a78k9I/M7KC8H4ZGziKu8D/bldwt78e98wnrottfxsg6S/f4ApCEpA3IMs65QXlnv37BzK6a5Oy6kl73j9Jm8koQE7WleF0kqUrer6eFkr6u979MDcj7T75Y0tKkaVvTiLVulH6t8n75z4R0llGX0JbcR/I+eyq/Iel/yyuJ22dmB8zs/jRiSl7PI211Ca+/KeljZrbKT74+Jq8sbFTOuQuSviuvJFLyEr24pEjC50hnG35R0p/LS1Duk/el6yH/vbKkxaaznceSav8cWc5497tk6cQX0ujbJFGVvNK9gaTHyoQ4/rO8ssbPS3pH0gkz+62EeSxU6tFPJxJ3qhhH2upStF/JeI6BVNtL+uC+kcqVtrXknaH6t/JGjL1H0o3y9vvEPlN13FXJ2xeSt+9X9f72rfKfY0l9nvXbE/fHyayndIz1edLdr/9EXin7o/LOit0obz+Wxo61St72Sl5nd+j9dZFuHACSMHokkJ2+Je9ajy+leK9XSQmWpR5IZLKSzyjVyCth6ZD3n7eT9wXmJymmPZ302qWxvJYUy5S8X6wz9SvslZbRmdBHfr8zSX2U0O8yzrkuSV+Q9AUz+5Ck35H0j2b2rnNud6ppEpaTqu29L/POuefN7IC8a6NM3vp9MsV0yb4h70zBakm/JOnvE868dim9bfhpeeVgvzfyhl0+uEWidLbzZKQb82jSiS+u0bdJ4miTnfIS2W+k6NshSc4bJfUPJP2Bvw1+XdJfmdk+51xE3v6VTkKV7nodLe5YmtOPmNAxkElmVibp4/JKvv97QvtlPzZP4XHXKe86qyuN5DuyHj6h1In1vitMm46
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1080x432 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"f, ax1 = plt.subplots(nrows=1, figsize=(15,6))\n",
|
|||
|
"df_reg_tm_sorted.justdata.value_counts().reindex(sorted(df_reg_tm_sorted.justdata.value_counts().index)).plot(ax=ax1)\n",
|
|||
|
"ax1.set_title('Number of tools by year their description has been updated', fontsize=15)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Check URL in TAPoR dataset\n",
|
|||
|
"In TAPoR dataset there are descriptions where the URL of a Tool is not provided"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 24,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"name 136\n",
|
|||
|
"url 136\n",
|
|||
|
"last_updated 136\n",
|
|||
|
"dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 24,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_reg_tools_sub_emurl=df_reg_tools_sub[df_reg_tools_sub['url'] == '']\n",
|
|||
|
"#print(\"number of record with missed URL in TAPoR dataset:\")\n",
|
|||
|
"df_reg_tools_sub_emurl.count()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 25,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Int64Index([423, 446, 444, 443, 442, 441, 440, 439, 438, 437,\n",
|
|||
|
" ...\n",
|
|||
|
" 413, 414, 415, 416, 417, 418, 419, 420, 421, 422],\n",
|
|||
|
" dtype='int64', length=1227)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 25,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_reg_tools_sub_whurl=df_reg_tools_sub[df_reg_tools_sub['url'] != '']\n",
|
|||
|
"df_reg_tools_sub_whurl.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 26,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#df_reg_tools_sub.head()\n",
|
|||
|
"#for column in df_reg_tools_sub[['name', 'url']]:\n",
|
|||
|
"# # Select column contents by column name using [] operator\n",
|
|||
|
"# columnSeriesObj = df_reg_tools_sub[column]\n",
|
|||
|
"# print('Colunm Name : ', column)\n",
|
|||
|
"# print('Column Contents : ', columnSeriesObj.values)\n",
|
|||
|
"df_urls=df_reg_tools_sub_whurl.url.values\n",
|
|||
|
"#df_urls"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 27,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" <th>status</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>test</td>\n",
|
|||
|
" <td>1.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/htmlTo...</td>\n",
|
|||
|
" <td>404.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/textTo...</td>\n",
|
|||
|
" <td>404.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/htmlTo...</td>\n",
|
|||
|
" <td>404.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>http://taporware.ualberta.ca/~taporware/textTo...</td>\n",
|
|||
|
" <td>404.0</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" url status\n",
|
|||
|
"0 test 1.0\n",
|
|||
|
"1 http://taporware.ualberta.ca/~taporware/htmlTo... 404.0\n",
|
|||
|
"2 http://taporware.ualberta.ca/~taporware/textTo... 404.0\n",
|
|||
|
"3 http://taporware.ualberta.ca/~taporware/htmlTo... 404.0\n",
|
|||
|
"4 http://taporware.ualberta.ca/~taporware/textTo... 404.0"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 27,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"data = {'url': ['test'],'status': [1]}\n",
|
|||
|
"df_http_status = pd.DataFrame (data, columns = ['url','status'])\n",
|
|||
|
"import requests\n",
|
|||
|
"import re\n",
|
|||
|
"regex = re.compile(\n",
|
|||
|
" r'^(?:http|ftp)s?://' # http:// or https://\n",
|
|||
|
" r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\\.)+(?:[A-Z]{2,6}\\.?|[A-Z0-9-]{2,}\\.?)|' #domain...\n",
|
|||
|
" r'localhost|' #localhost...\n",
|
|||
|
" r'\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})' # ...or ip\n",
|
|||
|
" r'(?::\\d+)?' # optional port\n",
|
|||
|
" r'(?:/?|[/?]\\S+)$', re.IGNORECASE)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"for var in df_urls:\n",
|
|||
|
" # print(var)\n",
|
|||
|
" if ( var != \"\" and var!=None and re.match(regex, var)):\n",
|
|||
|
" try:\n",
|
|||
|
" r =requests.get(var,timeout=8)\n",
|
|||
|
" #print(\"result: \"+var+ \" \",r.status_code)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(r.status_code)}, ignore_index=True)\n",
|
|||
|
" except requests.exceptions.ConnectionError:\n",
|
|||
|
" # print(var)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(503)}, ignore_index=True)\n",
|
|||
|
" except requests.exceptions.ConnectTimeout:\n",
|
|||
|
" # print(var)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(408)}, ignore_index=True)\n",
|
|||
|
" except requests.exceptions.ReadTimeout:\n",
|
|||
|
" # print(var)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(408)}, ignore_index=True)\n",
|
|||
|
" except requests.exceptions.RequestException:\n",
|
|||
|
" # print(var)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(500)}, ignore_index=True)\n",
|
|||
|
" except TypeError:\n",
|
|||
|
" # print(var)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(400)}, ignore_index=True)\n",
|
|||
|
" else:\n",
|
|||
|
" # print(var ,0)\n",
|
|||
|
" df_http_status = df_http_status.append({'url': var, 'status': int(400)}, ignore_index=True)\n",
|
|||
|
"df_http_status.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### The HTTP result status values for URL in TAPoR dataset descriptions\n",
|
|||
|
"\n",
|
|||
|
"The table below shows the HTTP Status code (https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) obtained when 'clicking' on URL of tool descriptions of TAPoR dataset.\n",
|
|||
|
"\n",
|
|||
|
"There is a significant number of URLs that seems not correct (status 404, 503, 500, 508....)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 28,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"200.0 652\n",
|
|||
|
"400.0 442\n",
|
|||
|
"404.0 83\n",
|
|||
|
"503.0 25\n",
|
|||
|
"403.0 11\n",
|
|||
|
"406.0 7\n",
|
|||
|
"408.0 3\n",
|
|||
|
"500.0 2\n",
|
|||
|
"502.0 1\n",
|
|||
|
"420.0 1\n",
|
|||
|
"Name: status, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 28,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_http_status_sub=df_http_status[df_http_status['status'] != 1]\n",
|
|||
|
"df_db_st = df_http_status_sub['status'].value_counts()\n",
|
|||
|
"df_db_st.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## TAPoR dataset 'creators' \n",
|
|||
|
"There are 164 descriptions in TAPoR dataset that don't have values in *creators_name* field, and there are 924 different creators. \n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 29,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Int64Index([649, 686, 697, 701, 706, 719, 733, 736, 746, 765,\n",
|
|||
|
" ...\n",
|
|||
|
" 405, 407, 408, 410, 412, 414, 416, 417, 420, 422],\n",
|
|||
|
" dtype='int64', length=164)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 29,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools_na=df_db_tools[df_db_tools['creators_name'] == ''].sort_values('last_updated')\n",
|
|||
|
"df_db_tools_na.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 30,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"924"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 30,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"#the number of creators\n",
|
|||
|
"len(df_db_tools['creators_name'].unique())-1"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 31,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_db_tools.loc[df_db_tools['creators_name']=='','creators_name']='n/a'\n",
|
|||
|
"df_db_tech_NoCoT = df_db_tools['creators_name'].value_counts()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 32,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4gAAAG5CAYAAADMCRrvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdebid0/n/8fdHDEkkoiTUfFqUxhRyYgylVdVqa4oGaSsd+Pp1QFtapV+C0rS+HQxFQwmaomaVlpgiCSI5mSOoKbSh5oQQEXH//lhr82Rn732GnDgnJ5/XdZ3rPHs9a7jX2keuc1vr2UcRgZmZmZmZmdkqbR2AmZmZmZmZtQ9OEM3MzMzMzAxwgmhmZmZmZmaZE0QzMzMzMzMDnCCamZmZmZlZ5gTRzMzMzMzMACeIZmZmKzVJwyX9so3GlqQrJL0uacJyHqtOUkhatZnthkj6y/KKy9qGpGskHdTWcTRG0hqSHpO0XlvHYisPJ4hmZmbtiKTZkl6UtGah7LuSRrdhWMtLf+DzwMYRsXP5TUmDJY376MNasUkaLem7bR1HeyVpe2AH4FZJp0ian7/ekbS48PqRVhpvdUk35P+2Q9LeZfcl6deSXs1fv5EkgIhYCFwO/Kw1YjFrCieIZmZm7c+qwPFtHURzSerUzCabAbMj4q3lEc+Kqrm7nK08tiR19N8P/wcYEck5EdEtIroBxwIPlV5HxDatOOY44OvAfyvcOwY4iJS0bg98OcdY8lfgKElrtGI8ZlV19H8AzMzMVkTnAidKWrv8RqWjksUdo7zr9oCk30uaK+lpSbvn8n9LeknSUWXd9pR0l6Q3Jd0vabNC31vne69JelzS1wr3hku6WNI/JL0F7FMh3g0l3ZbbPynp6Fz+HeAyYLe8W3NGWbtPA5cU7s/N5T0kXSXpZUnPSvpFKaGRtEp+/Wye51WSelRa4LweT+c5PyNpUI33o7Ok63LdyZJ2yH2cJOnGsn4vkPSHKmNuIummHPurki4sxFJ6z14DhuSjhf8n6bm8o3yJpC65/sck3Z77eT1fb5zvnQ3sCVyY1600xu6SJkqal7/vXohrtKSzJT0AvA18sqnro3QE9295rd+U9Iik+sL9kyU9le/NknRw2XvQ5J/VRtakZ16HuflnbayqJ7pfBO6vcq84t8bW7FeSJuT7t0pap1I/EfFuRPwhIsYBiytUOQr4bUT8JyLmAL8FBhfa/wd4Hdi1sZjNWoMTRDMzs/anARgNnNjC9rsA04F1SbsP1wL9gC1IuxgXSupWqD8IOAvoCUwFRgAoHXO9K/exHnAEcJGk4s7KkcDZQHfSLkm5a4D/ABsCA4BzJH0uIv7Mkjs2pxcbRcSjZfdLyfIFQA/gk8BngG8C38r3BuevffL9bsCF5QHleZ0PfDEiugO753lXcyBwPbBOXotbJK0G/AXYXzmRV0raBwJXVxizE3A78CxQB2xEel9KdgGeJq3z2cCvgU8BfUjv20bAabnuKsAVpB3YTYEFpXlGxKnAWOAHed1+kBOXkXnO6wK/A0ZKWrcw/jdIO1ndgZebuT5fzXNZG7iNJdf8KVLC2gM4A/iLpA3K5t3Un9Vaa/IT0s9ZL2B94BQgygPN7/0ngMdrzIcmrtk3gW+Tfrbfy3VbYhtgWuH1tFxW9Chph9FsuXOCaGZm1j6dBvxQUq8WtH0mIq6IiMXAdcAmwJkRsTAiRgHvkn7BLhkZEWPy806nknbtNiEddZud+3ovIiYDN5ISvZJbI+KBiHg/It4pBpH76A/8LCLeiYippF3Db7RgTqUkayDw84h4MyJmk3ZbSv0NAn4XEU9HxHzg58Dhqnxk831gW0ldIuKFiKj1vNmkiLghIhaREoXOwK4R8QIwBjgs19sfeCUiJlXoY2dSInFSRLyV16OYUD8fERdExHvAO8DRwI8i4rWIeBM4BzgcICJejYgbI+LtfO9sUrJczQHAExFxdX4frwEeA75SqDM8Ih7J47/XzPUZFxH/yD9vV1NIZCLi+oh4Pv98XAc8kdeipEk/q5JUa02ARcAGwGYRsSgixkbEUgkiKYkFeLPGfJq6ZldHxMx8RPp/ga+p+cesIf2PjHmF1/OAbnnOJW8WYjdbrpwgmpmZtUMRMZO043RyC5q/WLhekPsrLyvuIP67MO584DVSMrMZsEs+tjdX6ZjnIODjldpWsCFQ+mW+5FnSzk9L9ARWz31U6m/DCvdWJe0ofSD/Qj+QtEP5gqSRkrauMW5xfd7nwx1RgCtJO13k70vtHmabAM/mBKzmGKRdsK7ApMK635HLkdRV0p+UjtK+QUpS166RnJSvCyz9PhTn2Nz1KT5X9zbpSO6qOdZvSppamMe2pPexpKk/qzXXhHQs+0lgVD6qWu2/m7n5e/ca84Fmrlm+txpLzq2p5gNrFV6vBcwvS3C782HsZsuVE0QzM7P263TSrknxl9LSB7p0LZQVE7aW2KR0kY/zrQM8T/oF+P6IWLvw1S0i/l+hbaVdmpLngXUkFX8Z3xSY08S4yvt+hbRTtFmhrNjf8xXuvceSSUjqOOLOiPg8adfpMeDSGnEU12cVYOM8FsAtwPaStiXtuI6o0se/gU2r7GbCknN9hZQYbVNY9x75g1QgHafcCtglItYC9iqFV6EvWHpdYOn3YYk2zVyfipSeZb0U+AGwbj4mPLMQZ3PUXJO8o/yTiPgkaZfvx5I+V95JTn6fIh1VraUpa7ZJ2b1FOc7meoQlj4/ukMuKPs2Sx1DNlhsniGZmZu1URDxJOnZ3XKHsZdIvqV+X1EnSt4HNl3GoL0nqL2l10rOID0fEv0k7mJ+S9A1Jq+WvfkofINOU+P8NPAj8SlJnpT8v8B2qJ1HlXgQ2znGRjyH+DThbUvecgPyY9CwgpOcdfyTpEznRPQe4rnzXTtL6kr6an0dbSNrBqfThISV9JR2Sk7sTcpvxOaZ3gBtIz89NiIjnqvQxAXgBGCppzbwee1SqmHcpLwV+r/z37yRtJOkLuUp3UrI0Nz8rd3pZFy+SnsEs+QfpfTxS0qqSBgK9Se/vUlqwPtWsSUo8X879fou0g9hsja2JpC9LKh1FfSPHWy3mf1D7SG6pTmNr9nVJvSV1Bc4Ebsg/o0tR+oCdzvnl6vn9LyXKV5ES2o0kbUj6HwDDC203Iv1Pm/GNxGzWKpwgmpmZtW9nkn7RLjoaOAl4lfRhFg8u4xh/JSUZrwF9ScdIyUdD9yM95/U86Sjhr4HmfNz+EaQPZXkeuBk4PSLuamLbe0k7Kf+VVNqZ+SFpF/Vp0ofi/JX0d+LI368mHbl8hvQs3w8r9LsK6Zfw50lz/gzwvRpx3Eo6cvk66XnHQ/LziCVXAttR/XhpKbn9CunZz+dIx1QH1hjzZ6Qjk+PzMdK7SbuGAH8AupB2q8aTjloWnQcMUPqE0/Mj4lXS7uZPSD8zPwW+HBHVdruauz4VRcQs0jOiD5GS1u2AB5rbT0GtNdkyv56fx7soIkZX6WcYMKjsGb/y2JuyZleTErn/kp5LPY7qHicl9RsBd+br0g7ln4C/AzNIO6wjc1nJkcCV+Rlhs+VOlZ/fNTMzM7OmkLQp6RjmxyPijbaOxxon6a/A3yLilha2Hw38JSIua9XAlh5nDdLR0r0i4qXlOZZZSZv9IVYzMzOzFV1+JvHHwLVODlccEXFkW8fQFHnXsNYHBJm1OieIZmZmZi2Qn9F7kfQJlvu3cThmZq3CR0zNzMzMzMwM8IfUmJmZmZmZWeYjpmZmraBnz55RV1fX1mGYmZmZNWrSpEmvRESvSvecIJqZtYK6ujoaGhraOgwzMzOzRkl6tto9HzE1MzMzMzMzwAmimZmZmZmZZU4QzczMzMzMDPAziGZmrWLGnHnUnTyyWW1mDz1gOUVjZmZm1jLeQTQzMzMzMzPACaLZEiTNL3s9WNKFy9jnbEk98/WD+fvekm6v0WZNSa9K6lFWfoukrzU1/ibEtkbuc4akKZI+2Uj9HSWFpC80Z5zctk7SzOa2a+YYiyVNlTRT0vWSukqql3T+8hzXzMzMrKNwgmjWiiTVPLYdEbs
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 720x504 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots()\n",
|
|||
|
"df_db_tech_NoCoT.head(20).plot.barh(figsize=(10,7), ax=ax)\n",
|
|||
|
"ax.set_title('Number of tools by creators names (Top 10)')\n",
|
|||
|
"ax.set_xlabel('N. of tools')\n",
|
|||
|
"ax.set_ylabel('Creators');"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Number of tool descriptions in TAPoR dataset that don't have the related creator email"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 101,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"382"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 101,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools_naem=df_db_tools[df_db_tools['creators_email'] == ''].sort_values('last_updated')\n",
|
|||
|
"#df_db_tools_naem.index\n",
|
|||
|
"len(df_db_tools_naem)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Number of tool description in TAPoR dataset that don't have the related creator URL"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 102,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"171"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 102,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools_nau=df_db_tools[df_db_tools['creators_url'] == ''].sort_values('last_updated')\n",
|
|||
|
"len(df_db_tools_nau)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# ------ "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 35,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_db_tech=pd.read_sql_query('select t.id, t.name, t.detail, t.creators_name, t.last_updated, at.name as \"attributetype\", av.name as\"attribute\", tags.text as \"tag\" from TaPOR.tools as t, TaPOR.attribute_values as av, TaPOR.tool_attributes as ta, TaPOR.attribute_types as at, TaPOR.tags as tags, TaPOR.tool_tags as tota where t.is_approved=1 and t.id=ta.tool_id and t.id=tota.tool_id and tags.id=tota.tag_id and ta.attribute_value_id=av.id and ta.attribute_type_id=at.id', connection)\n",
|
|||
|
"#df_db_tech=pd.read_sql_table('tools', connection)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 36,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#df_db_tech.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 37,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"RangeIndex(start=0, stop=43845, step=1)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 37,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tech.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 38,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Index(['id', 'name', 'detail', 'creators_name', 'last_updated',\n",
|
|||
|
" 'attributetype', 'attribute', 'tag'],\n",
|
|||
|
" dtype='object')"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 38,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tech.columns"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 39,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"df_items=df_db_tech[['id', 'name', 'detail', 'creators_name', 'last_updated']].drop_duplicates()\n",
|
|||
|
"#df_items.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Attributes in TAPoR dataset items\n",
|
|||
|
"\n",
|
|||
|
"The following dataframe shows the list of attribute types defined in TaPOR dataset to charachterize tools"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 103,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>Type of license</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>Background Processing</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>Web Usable</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>Ease of Use</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>5</th>\n",
|
|||
|
" <td>Warning</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>6</th>\n",
|
|||
|
" <td>Usage</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>7</th>\n",
|
|||
|
" <td>Tool Family</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8</th>\n",
|
|||
|
" <td>Historic Tool (developed before 2005)</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9</th>\n",
|
|||
|
" <td>Compute Canada</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10</th>\n",
|
|||
|
" <td>Link to Recipe</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>11</th>\n",
|
|||
|
" <td>TaDiRAH Goals</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>12</th>\n",
|
|||
|
" <td>TaDiRAH Methods</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" name\n",
|
|||
|
"0 Type of analysis\n",
|
|||
|
"1 Type of license\n",
|
|||
|
"2 Background Processing\n",
|
|||
|
"3 Web Usable\n",
|
|||
|
"4 Ease of Use\n",
|
|||
|
"5 Warning\n",
|
|||
|
"6 Usage\n",
|
|||
|
"7 Tool Family\n",
|
|||
|
"8 Historic Tool (developed before 2005)\n",
|
|||
|
"9 Compute Canada\n",
|
|||
|
"10 Link to Recipe\n",
|
|||
|
"11 TaDiRAH Goals\n",
|
|||
|
"12 TaDiRAH Methods"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 103,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools_toa=pd.read_sql_query('SELECT distinct name FROM TaPOR.attribute_types', connection)\n",
|
|||
|
"df_db_tools_toa.head(20)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Tools with no attribute in TAPoR dataset\n",
|
|||
|
"\n",
|
|||
|
"The following dataframe shows the main fields of tool descriptions in TAPoR dataset that do not have attribute values"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 104,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>creators_name</th>\n",
|
|||
|
" <th>url</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>579</td>\n",
|
|||
|
" <td>Voyant 2.0: Knots</td>\n",
|
|||
|
" <td>Stéfan Sinclair and Geoffrey Rockwell</td>\n",
|
|||
|
" <td>http://voyant-tools.org/?view=knots</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>591</td>\n",
|
|||
|
" <td>Warc Extractor</td>\n",
|
|||
|
" <td>Ryan Chartier & Internet Archive</td>\n",
|
|||
|
" <td>https://github.com/recrm/ArchiveTools/blob/mas...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>754</td>\n",
|
|||
|
" <td>TAGS https://t.co/T007ezdZoA</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>755</td>\n",
|
|||
|
" <td>Multiple enhancements to DiRT Directory (tools...</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>758</td>\n",
|
|||
|
" <td>RT : Today's \"dirt\": DiRT now uses TaDiRAH ter...</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>5</th>\n",
|
|||
|
" <td>823</td>\n",
|
|||
|
" <td>Basement Waterproofing: Tips and Instructions</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>6</th>\n",
|
|||
|
" <td>1017</td>\n",
|
|||
|
" <td>Datapress</td>\n",
|
|||
|
" <td>MIT CSAIL</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>7</th>\n",
|
|||
|
" <td>1063</td>\n",
|
|||
|
" <td>WordVenture</td>\n",
|
|||
|
" <td>WordNet</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8</th>\n",
|
|||
|
" <td>1174</td>\n",
|
|||
|
" <td>VoiceThread</td>\n",
|
|||
|
" <td>VoiceThread LLC</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>9</th>\n",
|
|||
|
" <td>1183</td>\n",
|
|||
|
" <td>Purdue OWL</td>\n",
|
|||
|
" <td>Purdue University Writing Lab, Purdue Universi...</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>10</th>\n",
|
|||
|
" <td>1352</td>\n",
|
|||
|
" <td>Aruspix</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>11</th>\n",
|
|||
|
" <td>1369</td>\n",
|
|||
|
" <td>MMax2</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>12</th>\n",
|
|||
|
" <td>1377</td>\n",
|
|||
|
" <td>Lextek</td>\n",
|
|||
|
" <td></td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id name \\\n",
|
|||
|
"0 579 Voyant 2.0: Knots \n",
|
|||
|
"1 591 Warc Extractor \n",
|
|||
|
"2 754 TAGS https://t.co/T007ezdZoA \n",
|
|||
|
"3 755 Multiple enhancements to DiRT Directory (tools... \n",
|
|||
|
"4 758 RT : Today's \"dirt\": DiRT now uses TaDiRAH ter... \n",
|
|||
|
"5 823 Basement Waterproofing: Tips and Instructions \n",
|
|||
|
"6 1017 Datapress \n",
|
|||
|
"7 1063 WordVenture \n",
|
|||
|
"8 1174 VoiceThread \n",
|
|||
|
"9 1183 Purdue OWL \n",
|
|||
|
"10 1352 Aruspix \n",
|
|||
|
"11 1369 MMax2 \n",
|
|||
|
"12 1377 Lextek \n",
|
|||
|
"\n",
|
|||
|
" creators_name \\\n",
|
|||
|
"0 Stéfan Sinclair and Geoffrey Rockwell \n",
|
|||
|
"1 Ryan Chartier & Internet Archive \n",
|
|||
|
"2 \n",
|
|||
|
"3 \n",
|
|||
|
"4 \n",
|
|||
|
"5 \n",
|
|||
|
"6 MIT CSAIL \n",
|
|||
|
"7 WordNet \n",
|
|||
|
"8 VoiceThread LLC \n",
|
|||
|
"9 Purdue University Writing Lab, Purdue Universi... \n",
|
|||
|
"10 \n",
|
|||
|
"11 \n",
|
|||
|
"12 \n",
|
|||
|
"\n",
|
|||
|
" url \n",
|
|||
|
"0 http://voyant-tools.org/?view=knots \n",
|
|||
|
"1 https://github.com/recrm/ArchiveTools/blob/mas... \n",
|
|||
|
"2 None \n",
|
|||
|
"3 None \n",
|
|||
|
"4 None \n",
|
|||
|
"5 None \n",
|
|||
|
"6 None \n",
|
|||
|
"7 None \n",
|
|||
|
"8 None \n",
|
|||
|
"9 None \n",
|
|||
|
"10 None \n",
|
|||
|
"11 None \n",
|
|||
|
"12 None "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 104,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_tools_noatt=pd.read_sql_query('select distinct tools.id, tools.name, tools.creators_name, tools.url from TaPOR.tools where tools.is_approved=1 and tools.id not in (select distinct TaPOR.tool_attributes.tool_id from TaPOR.tool_attributes)', connection)\n",
|
|||
|
"df_db_tools_noatt.head(19)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Type of Licenses in TAPoR dataset items"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 42,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Int64Index([ 8, 40, 44, 108, 170, 306, 330, 344, 362,\n",
|
|||
|
" 380,\n",
|
|||
|
" ...\n",
|
|||
|
" 43679, 43728, 43730, 43751, 43762, 43779, 43814, 43816, 43821,\n",
|
|||
|
" 43824],\n",
|
|||
|
" dtype='int64', length=1024)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 42,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_sub=df_db_tech[['id', 'name', 'detail', 'creators_name', 'last_updated', 'attributetype', 'attribute']]\n",
|
|||
|
"df_to=df_db_sub[df_db_sub['attributetype'] == 'Type of license'].drop_duplicates()\n",
|
|||
|
"df_to.index"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 43,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Free 470\n",
|
|||
|
"Open Source 256\n",
|
|||
|
"Closed Source 195\n",
|
|||
|
"Commercial 79\n",
|
|||
|
"Creative Commons 22\n",
|
|||
|
"Shareware 2\n",
|
|||
|
"Name: attribute, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 43,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_lic = df_to['attribute'].value_counts()\n",
|
|||
|
"df_db_lic.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 44,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA38AAAHeCAYAAAAraLLtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZhlVXm28fuhQQYVHEACDdgotIqYADZEP0dAUIOKGkQcIqiBz4hDnNEYNSqfxCmJJiZiQNGoiAOKYMQWZUiMAg0IAtKgjIJiS0BU5n6/P/YuOF1UdZ+Cqtqnat+/6zpXnb3OsN+qs6DrqbX2WqkqJEmSJEnz21pdFyBJkiRJmnmGP0mSJEnqAcOfJEmSJPWA4U+SJEmSesDwJ0mSJEk9YPiTJEmSpB4w/EnSHJTkPUkqyYkTPPaVJCfPYi1PbWvZfrbOORVJHpXktCS/b+tcNO7xA9r21d0um6ZaTk7ylXv5Hpcl+fB01DPBe78nyYrVPD7Sn7UkafXW7roASdK9smeSnavqjK4LGWEfAh4APAf4PXDNuMdPAB4/cLwP8KZxbbfMZIFzyFk0P5efdV2IJGnqDH+SNHddB1wF/A3w3I5rmTFJ1quqm+/FWzwSOK6qTprowar6NfDrgfMtadt/eC/OOS9V1W8Bfy6SNEc57VOS5q4C/h/wnCSPmexJk03la6fvvWbg+LIkH05ySJJrktyQ5CNp/FmS85PcmOTrSR44wak2T3J8O73yiiSvmuCcT0xySpI/JPlNkk8luf/A42NTMHdpp0jeBLxlNd/bDklOat/vf5N8Psmm7WOLkhTwcOAN7fuePNl7rU6Srdvv+7ftz+CbSbYZ95wNknwsyS+T3JzkjCR7ruF9t0hyTJJrk9yU5GdJ3jdkTX/bnut37fe9Udu+dpKrk7x7gteckuRrU/nex73+btM+kyxI8vYky5PckuSqJJ8Z97q9k5zZ/lx+meSDSdYZePw9SVYk2THJD9vP8+wkTxr3Ps9JsqztY/+b5EdJnjLw+Fpt/72krWV5kv3v6fcrSfON4U+S5rYvA8tpRv+mw37ALsDLgQ8CbwQ+CrwP+FvgVcBTgA9M8NojgHOB5wP/CfxrkmeNPZjkCcBJwC9pplb+NfBnwKcneK8vAse3jx8/UaFJNgFOBjYAXgy8tq1taZL70EzvfHx7vi+09189zA9h3HnWbet+FHAgcACwNXBKkgcNPPVTND+3Q4HnAVcCJyR54mre/rPAlsBBwDPb1647RFkvAp7W1vNGYC/g3wGq6nbgKOCAJBn4Ph4GPImJf973xieBvwOOAZ5FM2X2vgPn3Rf4GnA6zdTbv6P5fsf3oQ3auj8J/DnNVNtjk2zQvs/Dga8A3wOeDbyEpm8MfgYfB94JHE7zMzkWOHKwH0pSr1WVN2/evHmbYzfgPcCK9v4BwB3A4vb4K8DJEz133HsU8JqB48uAS4AFA22nA7cDWw+0fRD41cDxU9v3Onzc+y8FfjhwfBrw/XHP2a197fYD30sBrx/iZ3AYcD2w4UDbLu3rXzTu+/rwFH62r2n+ebzz+FXtz+BhA21bALcCb2+PHwWsBPYfeM5awE+AEwfaTga+MnD8O+DZU/zsL6OZ8nu/gbaXtOd/VHu8bftz2HXgOe+lCcJrD9OvJnl87LMe+7we2R6/bpLnB7gc+PS49lcANwEPHjhvAbsNPGeHtu0Z7fE+wG9WU9s24z+Dtv2zwBn35r83b968eZsvN0f+JGnu+w/gCuDt0/BeJ1fVHQPHlwCXVdWl49o2aUfXBh077vhrwGPbaYEb0Iy8HdNOS1w7ydrAfwG3AY8d99oThqh1F+A71VyHBkBVnU4TjlY32jZVuwBnVdXPB85zFfDfA+fZmSbofHngOSvb49XVcg7wgXa661ZTqGlpVf1u4Phr7fl3bs99MXAqTZimHQF8GfC5akYGp8uu7dfPTPL4YmAr7v65fw9YDxhcNfQ2mnA85oL26xbt1/OAjZIclWTPJPdlVbvThL9jx53rJGCHJAum/u1J0vxi+JOkOa79Zf6DwEuTPPRevt31445vnaQtwPjwd+0Ex2sDGwMPBBYAn6D5JX/sdguwDs3Ux0G/GqLWzSZ53q9YdSrgvTXMeTYDfldVf5jgORu0U0cn8kLgTOAfgMuTnJNk9yFqWuVnXVU30YwibjbQfASwT3tN5W7AQ5n+KZ8PBn4/GMDH2bj9+i1W/dzH/pgw+Ln/tg3MAFTVre3d9drji4C9gYe177ciyRfa6b9j51oA3DDuXJ+h6YeDPxtJ6iVX+5Sk+eFImmud3jbBYzczLqhl4gVb7q2HTHB8O7CC5hf4opne960JXnv1uOMa4nzXTHBOgE2BZUO8fljXAI+e5DzXDTznfkk2GBcANwX+UFUTbhVRVb+guTZvLZoRxvcAxyXZqqp+s5qaVvm+k6wP3I9Vt7H4MvAx4AU0I3Q/qqoLmF6/Ae6bZMNJAuDYz+cg4OwJHr90grZJVdUJNNdRbkRzTd8/0lznt197rtuBJ9CMAI43/o8TktQ7jvxJ0jzQhosP01xLNX6E4yrg/kkWDrStdhXKe+h5Exwvq6o7qur3NFsEPKKqzpzgNj78DeNHwNOz6mqhOwOLaKaTTpcf0Uxf3XrgPAuB/zNwnjNoAus+A89Je7zGWqpqZTVbS/wdzcInaxrB3SPJ/QaOn9+e/8yB97yJZuGcg9vHp3vUD5rpm9BMKZ3IRcAvgEWTfO6rC7iTqqobquoLNFONtxuoZQGw0STnunXSN5SknnDkT5Lmj08C76AJJacMtH+bZnGNI5N8hGalyrttwzANnpnk0Pbczwf2oJmmN+atwElJVtIsSnMjzfVgewF/U1XLp3i+jwJ/BZyY5O9pRr4Oo7k27Kv35hsZ5zM0I6r/meRdNIvrvIdmRPOTAFV1YZIvAv+cZEOa6yIPpFkQ5a8metN29OpEmgVJltOs8vkmmkVZLlxDTTfRjIB9iCbsfwg4doKRvSNoPuubgKOH/H7vk2SfCdpPGd9QVRclORz4SJKH0Fxn+ABgn6rar6pWJnkT8Ln25/KfNNOGH0azN+U+E0yVnVCS/0tz3ei3aUaKt6UZ1fzsQC3/Bhyd5IM0QXg9mlHbxVX1l0N+/5I0bxn+JGmeqKo/JPkHmu0CBttXJPlzmpHBr9NMiXwxdy2oMV3+kmb7hjfQTME7uKqOG6jjv5I8mWZ063M0ozSX0/wyP8w1fquoql8n2RX4CM0I1600U0rfMJ2jPFV1S5Kn0YTNI2iudzwZeH5VXTfw1AOBv6fZEuMBNCH0WVU12cjfze1zXk9z7dsfaEZH92xH7VbnaJrwfARN6D2OCUJmVZ2Z5Bc0C/ncsObvFoD7M7BwzYBdJ2iDZvuMy2k+/0NoplcuHajhS0l+S/OHiVfQhOef02zTMJXP6VyarSI+SnOt5TU022u8a+A5B9ME6QNpVjf9LU0/P2IK55GkeStVw1xWIUmS5pok2wHnA0+rqpO6rkeS1C3DnyRJ80ySBwOPAN5HszjMH5f/4EtS77ngiyRJ88+zaRaa2Qw4wOAnSQJH/iRJkiSpFxz5kyRJkqQeMPxJkiRJUg/Mq60eNt5441q0aFHXZYycW265hXXXXbfrMjRH2F80LPuKpsL+omHZVzQV9pe7W7Zs2Yqq2mSix+ZV+Fu0aBFnnnlm12WMnOXLl7N48eKuy9AcYX/RsOwrmgr7i4ZlX9FU2F/uLsnlkz3mtE9JkiRJ6gHDnyRJkiT1gOFPkiRJknrA8CdJkiRJPWD4kyRJkqQeMPxJkiRJUg8Y/iRJkiSpBwx/kiRJktQDhj9JkiRJ6gHDnyRJkiT1gOFPkiRJknrA8CdJkiRJPWD4kyRJkqQeMPxJkiRJUg+s3XUB89WiQ07ouoQ77bFwJUuPvLjrMgC47LC9ui5BkiRJ6iVH/iRJkiSpBwx/kiRJktQDhj9JkiRJ6gHDnyRJkiT1gOFPkiRJknrA8CdJkiRJPWD4kyRJkqQeMPxJkiRJUg8Y/iRJkiSpBwx/kiRJktQDhj9JkiRJ6gHDnyRJkiT
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1080x432 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots()\n",
|
|||
|
"df_db_lic.plot(kind='bar', figsize=(15,6), x='licences', y='tools',)\n",
|
|||
|
"plt.grid(alpha=0.6)\n",
|
|||
|
"ax.yaxis.set_label_text(\"\")\n",
|
|||
|
"ax.set_title(\"Number of Tools by License\", fontsize=15)\n",
|
|||
|
"ax.set_xlabel('License', fontsize=14)\n",
|
|||
|
"ax.set_ylabel('N of Tools', fontsize=14);\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 45,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"#df_db_tech.loc[df_db_tech['country']=='', 'country']='N/A'"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## *Type of analysis* in TAPoR dataset items\n",
|
|||
|
"\n",
|
|||
|
"A tool description can have more than one value for *Type of analysis* (i.e. a tool can perform one or more type of analysis)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 119,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>detail</th>\n",
|
|||
|
" <th>creators_name</th>\n",
|
|||
|
" <th>last_updated</th>\n",
|
|||
|
" <th>attributetype</th>\n",
|
|||
|
" <th>attribute</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43724</th>\n",
|
|||
|
" <td>1499</td>\n",
|
|||
|
" <td>iPhoto</td>\n",
|
|||
|
" <td><p>iPhoto is a digital photograph manipulation...</td>\n",
|
|||
|
" <td>Apple</td>\n",
|
|||
|
" <td>2018-10-12</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Organizing</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43726</th>\n",
|
|||
|
" <td>1499</td>\n",
|
|||
|
" <td>iPhoto</td>\n",
|
|||
|
" <td><p>iPhoto is a digital photograph manipulation...</td>\n",
|
|||
|
" <td>Apple</td>\n",
|
|||
|
" <td>2018-10-12</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Storage</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43748</th>\n",
|
|||
|
" <td>1500</td>\n",
|
|||
|
" <td>Google 3D Warehouse</td>\n",
|
|||
|
" <td><p>A collection of free-to-download 3D models ...</td>\n",
|
|||
|
" <td>Google</td>\n",
|
|||
|
" <td>2018-11-06</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Collaboration</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43749</th>\n",
|
|||
|
" <td>1500</td>\n",
|
|||
|
" <td>Google 3D Warehouse</td>\n",
|
|||
|
" <td><p>A collection of free-to-download 3D models ...</td>\n",
|
|||
|
" <td>Google</td>\n",
|
|||
|
" <td>2018-11-06</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Dissemination</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43750</th>\n",
|
|||
|
" <td>1500</td>\n",
|
|||
|
" <td>Google 3D Warehouse</td>\n",
|
|||
|
" <td><p>A collection of free-to-download 3D models ...</td>\n",
|
|||
|
" <td>Google</td>\n",
|
|||
|
" <td>2018-11-06</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Modeling</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43759</th>\n",
|
|||
|
" <td>1501</td>\n",
|
|||
|
" <td>SketchUp (Formerly Google SketchUp)</td>\n",
|
|||
|
" <td><p>Google SketchUp is easy-to-use free 3D mode...</td>\n",
|
|||
|
" <td>Google</td>\n",
|
|||
|
" <td>2018-10-26</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Creation</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43760</th>\n",
|
|||
|
" <td>1501</td>\n",
|
|||
|
" <td>SketchUp (Formerly Google SketchUp)</td>\n",
|
|||
|
" <td><p>Google SketchUp is easy-to-use free 3D mode...</td>\n",
|
|||
|
" <td>Google</td>\n",
|
|||
|
" <td>2018-10-26</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Interpretation</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43761</th>\n",
|
|||
|
" <td>1501</td>\n",
|
|||
|
" <td>SketchUp (Formerly Google SketchUp)</td>\n",
|
|||
|
" <td><p>Google SketchUp is easy-to-use free 3D mode...</td>\n",
|
|||
|
" <td>Google</td>\n",
|
|||
|
" <td>2018-10-26</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Modeling</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43790</th>\n",
|
|||
|
" <td>1502</td>\n",
|
|||
|
" <td>GIMP (GNU Image Manipulation Program)</td>\n",
|
|||
|
" <td><p>GIMP is image editing software, much like P...</td>\n",
|
|||
|
" <td>GIMP Team</td>\n",
|
|||
|
" <td>None</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Creation</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>43818</th>\n",
|
|||
|
" <td>1556</td>\n",
|
|||
|
" <td>Reaper</td>\n",
|
|||
|
" <td>REAPER is a complete digital audio production ...</td>\n",
|
|||
|
" <td>Cockos</td>\n",
|
|||
|
" <td>2019-03-24</td>\n",
|
|||
|
" <td>Type of analysis</td>\n",
|
|||
|
" <td>Creation</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id name \\\n",
|
|||
|
"43724 1499 iPhoto \n",
|
|||
|
"43726 1499 iPhoto \n",
|
|||
|
"43748 1500 Google 3D Warehouse \n",
|
|||
|
"43749 1500 Google 3D Warehouse \n",
|
|||
|
"43750 1500 Google 3D Warehouse \n",
|
|||
|
"43759 1501 SketchUp (Formerly Google SketchUp) \n",
|
|||
|
"43760 1501 SketchUp (Formerly Google SketchUp) \n",
|
|||
|
"43761 1501 SketchUp (Formerly Google SketchUp) \n",
|
|||
|
"43790 1502 GIMP (GNU Image Manipulation Program) \n",
|
|||
|
"43818 1556 Reaper \n",
|
|||
|
"\n",
|
|||
|
" detail creators_name \\\n",
|
|||
|
"43724 <p>iPhoto is a digital photograph manipulation... Apple \n",
|
|||
|
"43726 <p>iPhoto is a digital photograph manipulation... Apple \n",
|
|||
|
"43748 <p>A collection of free-to-download 3D models ... Google \n",
|
|||
|
"43749 <p>A collection of free-to-download 3D models ... Google \n",
|
|||
|
"43750 <p>A collection of free-to-download 3D models ... Google \n",
|
|||
|
"43759 <p>Google SketchUp is easy-to-use free 3D mode... Google \n",
|
|||
|
"43760 <p>Google SketchUp is easy-to-use free 3D mode... Google \n",
|
|||
|
"43761 <p>Google SketchUp is easy-to-use free 3D mode... Google \n",
|
|||
|
"43790 <p>GIMP is image editing software, much like P... GIMP Team \n",
|
|||
|
"43818 REAPER is a complete digital audio production ... Cockos \n",
|
|||
|
"\n",
|
|||
|
" last_updated attributetype attribute \n",
|
|||
|
"43724 2018-10-12 Type of analysis Organizing \n",
|
|||
|
"43726 2018-10-12 Type of analysis Storage \n",
|
|||
|
"43748 2018-11-06 Type of analysis Collaboration \n",
|
|||
|
"43749 2018-11-06 Type of analysis Dissemination \n",
|
|||
|
"43750 2018-11-06 Type of analysis Modeling \n",
|
|||
|
"43759 2018-10-26 Type of analysis Creation \n",
|
|||
|
"43760 2018-10-26 Type of analysis Interpretation \n",
|
|||
|
"43761 2018-10-26 Type of analysis Modeling \n",
|
|||
|
"43790 None Type of analysis Creation \n",
|
|||
|
"43818 2019-03-24 Type of analysis Creation "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 119,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_to_ta=df_db_sub[df_db_sub['attributetype'] == 'Type of analysis'].drop_duplicates()\n",
|
|||
|
"df_to_ta.tail(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 47,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Analysis 434\n",
|
|||
|
"Visualization 236\n",
|
|||
|
"Content Analysis 185\n",
|
|||
|
"Search 139\n",
|
|||
|
"Natural Language Processing 125\n",
|
|||
|
"Discovering 124\n",
|
|||
|
"Capture 113\n",
|
|||
|
"Gathering 97\n",
|
|||
|
"Publishing 92\n",
|
|||
|
"Dissemination 91\n",
|
|||
|
"Enrichment 90\n",
|
|||
|
"Annotating 83\n",
|
|||
|
"Collaboration 80\n",
|
|||
|
"Organizing 71\n",
|
|||
|
"Creation 52\n",
|
|||
|
"Uncategorized 49\n",
|
|||
|
"Storage 40\n",
|
|||
|
"Web development 39\n",
|
|||
|
"Modeling 25\n",
|
|||
|
"Programming 22\n",
|
|||
|
"Interpretation 18\n",
|
|||
|
"RDF 12\n",
|
|||
|
"Name: attribute, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 47,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_db_a = df_to_ta['attribute'].value_counts()\n",
|
|||
|
"df_db_a.head(25)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 48,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA38AAAIRCAYAAAD6PgQRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZhkdXm+8fsRVDSIG2hgAAeVUXE3iCLE4IIbKu6CmoAx8jNixGiiaGLcI66JmriggLiLccMl6oiAKyK4BpBFQUUQRFRQEYR5f3+cU0xN0zNdxXTXOTPn/lxXX9116lSdp2t/67ulqpAkSZIkbdyu03UASZIkSdLSs/iTJEmSpAGw+JMkSZKkAbD4kyRJkqQBsPiTJEmSpAGw+JMkSZKkAbD4k6RrIclLk1SSz89z3v8kOW6GWfZos9xpVsecRpI7JPlKkt+3OZfPOX//dvu6fs5ZpCzHJfmf9byOc5K8fjHyzLnehW6DSrLHYh93sSXZOslnk/x2ksxJdmj3+2mSLHG29b7/Z3GdkrRUNu06gCRt4B6U5J5V9a2ug/TY64CbAI8Efg+cP+f8zwC7jp1+HPC8OdsuX8qAPTH+/94A+BLwSprbZ+TUmSa6dv4FuCuwL3AxC2fet/29HbAb8NWli7Ykngn8qesQkjQJiz9JuvYuBs6l+bD7qI6zLJkkm1XVH9fjKm4PHF1Vx8x3ZlX9Evjl2PF2brefsB7H3OCM/79JNm///NEGeDvcHvhmVX12wv33BU4A7tz+vUEVf1W1IRTkkgTY7VOS1kcB/w48Msmd17ZT20X0onm2V5JnjZ0+J8nrkxyc5Py229wb0nhYklOSXJrkE0luOs+htkny6bZ75U+TPGOeY+6e5Pgkf0jyqyTvTHKjsfNHXTB3abuzXQb88zr+t7slOaa9vl8neX+SW7bnLU9SwG2Af2yv97i1Xde6tF0DP5HkkvY2+FSS287Z54ZJ3pzkF0n+mORbSR60wPVum+SoJBcmuSzJj5K8YsJML26P9bv2/75xu33TJOcleck8lzk+ycem+d/byx3Y/t+bz9l+v/Z2vUt7evQYmjfb2OVuluQdSS5ob6uvJ7nXBDnWeT+09/cDgEdngu66Se4I3Ak4AjgaeHySTefs8+4kJyXZM8n328f3V9vLju/3vPY+/237f13jMTL32G3Gv5qzffP2dnv22H6fS3Jxe+zTkhw4tv8a3T7X5zElSUvN4k+S1s9HgDNoWv8Wwz7ALsBTgdcCzwXeCLwCeDHwDOCvgFfPc9nDgO8DjwH+F3hbkoePzkyyG3AM8AuarpXPAR5G88F7rg8Cn27P//R8QZNsBRwH3BB4EvAPbbaVSa5H071z1/Z4H2j/fuYkN8Kc41y/zX0H4OnA/sAOwPFJbja26ztpbrdXAY8GfgZ8Jsnu67j699B0NzwAeGh72etPEGtf4IFtnucCewHvAqiqK4Ejgf2T1WPYktwa+Evmv70X8n6a3jqPm7N9f+DbVfX9SbK1Oa4PfBHYk6awfxRNy+sXk/z52gJMeD/sCnwHOLb9+9EL/F9PAq4EPkrzmNuKpnica3ua7sOvav+/WwBHjd++wLbAfwF7t/k2Ab42t/AdqapTaFocnzrnrMcD16V5zEJTlF4FPIWm6/JbgBuxdtf2MSVJS6+q/PHHH3/8mfIHeClwUfv3/jQfDle0p/8HOG6+fedcRwHPGjt9DnAWsMnYthNpPhzvMLbttcAFY6f3aK/r0DnXvxI4Yez0V4Bj5+xz//aydxr7Xwo4aILb4BDgN8AWY9t2aS+/75z/6/VT3LbPat6erj79jPY2uPXYtm2BK4AXtqfvAKwC9hvb5zrA/wGfH9t2HPA/Y6d/Bzxiyvv+HJouv5uPbXtye/w7tKd3bG+H+43t83KaQnjTCY6xeXv5/ce2vQ84fs4+v5vnMbRQtqe1t92OY/tsCvwIeN06Mi14P8x3Gy/wf/4I+Gz79/Xa7O+es8+72+OO531Ue/vcfi3XuwnNuMlLgb9Zx/3/d+1tOH57fXm0D7Ble5w7r+N/WO/HlD/++OPPrH5s+ZOk9fc+4KfACxfhuo6rqqvGTp8FnFNVZ8/ZtlXbujbu43NOfwz4iySbJLkhTUvMUW23xE3b7nVfpZms4i/mXPYzLGwX4AtVdcloQ1WdSFOArKu1bVq70LRu/XjsOOcCXxs7zj2B0LTEjvZZ1Z5eV5bvAq9O0911+ykyrayq342d/lh7/Hu2xz6TpojYH6Btofob4L3VtAxeG4cBf9m2IAI8gaZo+8Cc/daZjaZV8GTg7LHHAcDxwM7rOP4k98PE2m6mtwY+1F7XFW3WRyfZbM7u57S36chonN22Y9d37yQrk/yKplj8A02BvGIdMT7U/n58ex23af+XUevsxTQtyG9P8sQkt5jgX7u2jylJWnIWf5K0ntoP868FnpLkVut5db+Zc/qKtWwLTUvJuAvnOb0pTevFTWlaQ95KU+yNfi6n6eK23ZzLXjBB1q3Xst8FwM3m2X5tTXKcrYHfVdUf5tnnhm2Xxfk8ETgJ+A/gJ0m+m2S+bodzrXFbV9VlNC0+W49tPgx4XJoxlfcHbsW16/I5chzwY9qCkqa74ier6uIps20J3Js1Hwd/aq9v7uNg3GLf3/u2xz0uyU2S3ITmS4ctaLobj5vvOQCwGUBbZH2B5nnx/2hmDb0nzW0xt5C8WlskH8Xqrp/707TOfq49fxXwoHbb4cAv0ixbcvd1/F/X9jElSUvO2T4laXEcDvwr8IJ5zvsjcwq1zD9hy/qa2ypxC5oWkItoPgAXTRfU+WZhPG/O6ZrgeOfPc0yAW9K0LC2W84E7zrP9ljQtM6N9Nk9ywzkF4C2BP1TVvEtFVNXPacbmXYemZeulwNFJtq+qX60j0xr/d5Ib0LQyjS9j8RHgzTStSvejmQHzWs8MWVWV5HDggCTvpWmheui1yHYxTXHy9/Ncdl1LakxyP0ykvb2fQPPFw0/m2WVfmlbAST2EZuzp3lX1+/YYmzJZUfoumrGBO9K0zr5nvPW9qn4IPDbJdWnGbL6GZizptm1xuIb1eExJ0pKz5U+SFkFbXLwe+FvWbP2BZjmIGyVZNrZtnbNQXktzJ9d4NHByVV3VfiA+AbhdVZ00z8/c4m8S3wQenDVnC70nsJzFna7/mzTdV3cYO84y4D5jx/kWTcH6uLF90p5eMEtVrapmSYWX0RQRC7Xg7pk1Z958THv8k8au8zKaSUwObM9fn1a/kXfTdHU8HPg5zbjOabMdA9wW+Ok8j4MfrOPYk9wPk9qD5nnyAprCePznPcDDxx9XE7gBzbjG8S61o26x61RVXwd+SHObbk9zG8+335+q6ks0EzBtTbN25bqud9rHlCQtOVv+JGnxvAN4Ec2H4ePHtn8OuAw4PMkbaGZIvMYyDIvgoUle1R77MTSzOe49dv7zgWOSrKKZlOZSmg+7ewH/UlVnTHm8N9K0Hn0+yWtoWpcOAX5AM3vjYnk3TZHwv0n+jWZynZfStGi+A6CqTkvyQeC/kmxBMy7y6TRrzs3XwkU7C+TnaYqNM2hmZHweTRe/0xbIdBlN68/raAqB1wEfn6dl7zCa+/oyVo8vu9aq6rwkn6O5z149Z3zopNne02Y6LsnrabqS3pymleoXVfUfazn8u1ngfpjCvjRdOd80t1U2ySU0LXCPAt474fV9iaZb8xFJDqNpofwnrtlddG0Oo7mdvtG29I2y3IXmS50P09xON6W5Db43T3fb9X1MSdKSs+VPkhZJ293wGh+cq+oi4LE0LTafoJky/klLEOHvgHu0x3g4cGBVHT2W46vAfWmm038v8CmagvBnTDbGbw3VLM5+P5purR8E/ptmRtE928k7FkVbHDyQpnXmMJplFH4C7DHnA/jT2/NeDHySpqXl4e3/PZ8/0hSqB9FM538kzSQhD2pb7dblQzTLGRwG/CfN0hpPmyf7STQtdB+rqt8u+M9O5hPt77W1JK4zW1X9keZ+W0nTKvUF4E00M5SeuLaDTnE/rFPbffKxwEfm645bVd+mmdBl4udI22L5VOB
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1080x432 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots()\n",
|
|||
|
"df_db_a.plot(kind='bar', figsize=(15,6), x='analysys', y='tools',)\n",
|
|||
|
"plt.grid(alpha=0.6)\n",
|
|||
|
"ax.yaxis.set_label_text(\"\")\n",
|
|||
|
"ax.set_title(\"Number of Tools by Type of Analysis\", fontsize=15)\n",
|
|||
|
"ax.set_xlabel('Type of Analysis', fontsize=14)\n",
|
|||
|
"ax.set_ylabel('N of Tools', fontsize=14);\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## *Tool families* in TAPoR dataset items"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"TAPoRware 55\n",
|
|||
|
"Voyant 18\n",
|
|||
|
"Digital Methods Initiative 12\n",
|
|||
|
"Stanford NLP 11\n",
|
|||
|
"SEASR 8\n",
|
|||
|
"SIMILE Widgets 6\n",
|
|||
|
"EURAC 5\n",
|
|||
|
"CNRTL 5\n",
|
|||
|
"Visualizing Literature 5\n",
|
|||
|
"Book Genome Project 5\n",
|
|||
|
"CHNM 4\n",
|
|||
|
"Orlando 3\n",
|
|||
|
"Laurence Anthony 3\n",
|
|||
|
"Stanford HCI Group 2\n",
|
|||
|
"Stanford Vis Group 2\n",
|
|||
|
"Scholars' Lab 2\n",
|
|||
|
"Name: attribute, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_to_tf=df_db_sub[df_db_sub['attributetype'] == 'Tool Family'].drop_duplicates()\n",
|
|||
|
"df_to_tf = df_to_tf['attribute'].value_counts()\n",
|
|||
|
"df_to_tf.head(20)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 50,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAH8CAYAAABsEmaXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd7xlVXn/8c8XsCGiYkHAAipjNCS20dh+xoZRsXexYOVnCxhjFDW2xBYjGmsiRgxqLGg0oqg4ooBdwd4YUFFRFEcsKAjCPL8/1r5w5nJn5rY5++z9+7xfr/uas/cp+1n3nHtmP3ut9axUFZIkSZKkcdiu7wAkSZIkSavHJE+SJEmSRsQkT5IkSZJGxCRPkiRJkkbEJE+SJEmSRsQkT5IkSZJGxCRPklZZkhcmqSTHLHDf+5IcN8VY7tDFss+0jrkUSW6Y5NNJ/tDFuee8+x/d7d/Sz2mrFMtxSd63wtc4LckrVyOeea+7td9BJbnDKhxn7ve90xYec4fNHP+ClR5/EfFd4vPcbT91YnvF76MkDd0OfQcgSSN21yS3qKov9x3IDPtX4ErAvYE/AGfMu/9o4NYT2w8E/n7evvO2ZYAzYrK9lwM+CbyY9vuZ852pRgQPB34wsT2NhXe/QvtdfH8Lj3ky8KcpxCJJM8skT5K2jbOA04HnAvftOZZtJsllq+qPK3iJPwOOqqpjF7qzqn4J/HLieGu7/V9YwTEHZ7K9E71s3+/59/CNqvrWNA9YVb8Dttjmqpp2sitJM8fhmpK0bRTwUuDeSf5icw/qhnZuWGD//CFopyV5ZZJDkpyR5LdJDk1zjyTfTnJ2kv9NcuUFDrV7kg93wyJ/nOSJCxzzdkmOT3JOkl8leXOSK0zcPzeU75bdkLhzgX/YQttukuTY7vV+neS/k+za3bdnkgKuB/xd97rHbe61tiTJXl27f9f9Dj6U5PrzHrNjktcm+XmSPyb5cpK7buV1r5nkyCRnJjk3yfeT/PMiY3ped6zfd+2+Yrd/hyQ/S/KCBZ5zfJL3L6XtE8/dvvss/TjJed3nYf8FHvfgJN/sHvOTJC9JsqoXfJO8vDvG75Oc3rX/GvMes6zP80LDNRc4/iWGaybZJ8nR3WueneS9kzEluVQXz9zv72dJPpDk0qv5u5GkaTHJk6Rt573Aelpv3mp4KHBL4DHAK4CnA68C/hl4HvBE4K+Bly3w3LcA3wDuD3wU+Pck95y7M8ltgWOBn9OGRD4NuAfw1gVe613Ah7v7P7xQoEmuBhwH7AjsD/xtF9u67sT5DNqwu58D7+xuP3kxv4R5x7lMF/cNgScAjwb2Ao5PssvEQ99M+729BLgf8BPg6CS328LLvw24FnAgcPfuuZdZRFgPA+7SxfN0YD/gPwGq6gLgCODRSTLRjusC/4eFf9+L8U+0z9lhtKGvnwX+O8nDJo5xV+A9tCGP9wFeBzwDeP0yj7l9l7TO/cydU1yddoFjP9rn6LrAJ5NsP+/5K/k8L1qX8H8WuCzwSNpn5M+BD028B8+mDT99HrBvF/dvgfkxS9IwVJU//vjjjz+r+AO8ENjQ3X40cCGwptt+H3DcQo+d9xoFPHVi+zTgVGD7iX1fAi4A9prY9wrgFxPbd+he67B5r78O+MLE9qeBT817zJ265+4z0ZYCDl7E7+DlwG+AnSf23bJ7/sPmteuVS/jdPrX913XR9hO738F1J/ZdEzgfeHa3fUNgI3DAxGO2A74FHDOx7zjgfRPbvwfutcT3/jTaUN2dJvY9vDv+Dbvtvbvfwx0nHvNPtIR3h0UcY6fu+Y/utnehzWd8wbzHfQQ4eWL7Cwu8x8/sPp/XnPce77SF4899pub/vHiBx24P7NHdf/tV/Dzvs4W/lfnv49uBk4FLT+zbu2v3ft32h4FDl/Je++OPP/7M8o89eZK0bb0D+DGtp2CljquqCye2TwVOq6ofztt3tQWGmX1g3vb7gZt3w/x2pPWkHTnZMwN8hlbA4ubznns0W3dL4OPV5lABUFVfop3cb6n3bKluCXylqi4qAFJVp9N6buaOcwsgtJ7Vucds7La3FMvXgJd1w1SvvYSY1lXV7ye2398d/xbdsU8BTqAlVHS9SY8C3l6tp2+p9qH1mL533v73AGuSXL3rRbvZZh6zHZsWdlmsh9LaNPfzRoAkd0/yuSS/pSVtp3ePXzPv+Sv5PC/FXWif/40Tn+0f0j6La7vHfI3Wu/rMJH852csqSUNkkidJ21B30v4K4BFJrrPCl/vNvO3zN7MvwPyT4jMX2N4BuCpwZVqPyxtpSd3cz3nApWhDFif9YhGx7raZx/2C1vO0WhZznN2A31fVOQs8ZsduyOdCHgKcCLwa+FGSryW58yJi2uR3XVXn0noFd5vY/RbggWlzHu8EXIflD9Wce935v4e57SvT3udLbeExy3lPvl1VJ078/CzJLYCjaIndI2nJ4626x1923vNX8nleiqsCz2LTz/afaMNI5z7bLwbeQBsy/HXgJ0kOXsExJalXVteUpG3vcOAfaSea8/2ReSewWbhwykpdfYHtC4ANtJPvog0d/cgCz/3ZvO3FlMo/Y4FjAuwKnLSI5y/WGbT5VQsd56yJx+yUZMd5id6uwDlVteASDFX1U1rvzna0HsMXAkcluXZV/WoLMW3S7iSXow2xnFwe4r3Aa4EHAXcEvljLrwo597pXBybj2rX796zu50/zY5v3mNVwP1o11IdUVQGswsWNlTqL1pP3nwvctwGgWoXY5wPPT7I3bRjwvyU5uao+NrVIJWmV2JMnSdtYl0S8Engsm/bmQOvxuEKSPSb2bbHq4zLdb4Htk6rqwqr6A22+1g3m9cxc1EOzjON9EfibbFqd8xbAnrRhoKvli7Rhp3tNHGcP4DYTx/kyLTF94MRj0m1vNZaq2lhtqYIX0YZFbi1p2TebLiZ+/+74J0685rm0AjZP6e5fbi8etLmF59ASxkkPBtZX1S+7YZEnbeYxG4HPr+D4ky4H/Gkuwes8fJVee7mOpQ1pPWmBz/Zp8x/cDad9Bq0n+0bTDVWSVoc9eZI0HW8CnkNLPo6f2P8x4Fzg8CSH0ipDXmJ5g1Vw9yQv6Y59f1oFwftM3P9M4NgkG2nFYc4Grk2rkPjcqlq/xOO9CngScEySf6H1ZL0c+CbwPytpyDz/Resh/WiS59OKabyQ1kPzJoCq+m6SdwGvT7IzbZ7XE2hr9D1poRdNW/LgGFqFzfW0qpp/TyuO8t2txHQurXLnv9KS+n8FPrBAT91baO/1ucC7F93iearqrCT/BvxjkgtoyeT9adVPHzbx0BfQ3o+3dsf7C1olyzd38xhXwzrgaV08H6J93h+xSq+9XC+kFXU5OsnhtM/GHrS/gf+qquOSfICWBH+V9n48kHaOdEIvEUvSCpnkSdIUVNU5SV5NK8M/uX9DkgfQevr+l3aiuT+w2gs6P55WFv7vaMPXnlJVR03E8Zkkt6f1Vr2dNkfvR7QkdDFz8DZRVb9MckfgUFqP1fm0oaB/V1Xnr7Atk8c5L8ldaEnlW2jzt44D7l9Vk0MQnwD8C61E/pVoyeY9q2pzPXl/7B5zMG3e1jm03s67dr1wW/JuWpL8FlpyexQLJJNVdWKSn9IKkPx2663doufTht8+iTYE81TgEVV1UfJYVR9P8lDa0OGH0+YOHkpL/lZFVX0kybNoS2Y8gdZDeE9aotyLqlqf5Fa0eXeH0Xobf0rr4Tu1e9jnaHMw/4E2yuk7wAOq6sRLvqIkzb5sOqJCkiRNQ5IbAd8G7lJVx/YdjyRpPEzyJEmaoiRXAW5AGyp5deAvy/+MJUmryMIrkiRN171oBV92oy1oboInSVpV9uRJkiRJ0ojYkydJkiRJIzLI6ppXvepVa88999zmxznvvPO4zGUus82PMy1jaw+Mr022Z/aNrU22Z/aNrU22Z/aNrU22Z7aNrT0wvTaddNJJG6rqagvdN8gkb8899+TEE7d9VeP169ezZs2abX6caRlbe2B8bbI9s29sbbI9s29sbbI9s29sbbI9s21s7YHptSnJjzZ3n8M1JUmSJGl
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1080x432 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots()\n",
|
|||
|
"df_to_tf.plot(kind='bar', figsize=(15,6), x='analysys', y='tools',)\n",
|
|||
|
"plt.grid(alpha=0.6)\n",
|
|||
|
"ax.yaxis.set_label_text(\"\")\n",
|
|||
|
"ax.set_title(\"Number of Tools by Tool Families\", fontsize=15)\n",
|
|||
|
"ax.set_xlabel('Tool Family', fontsize=14)\n",
|
|||
|
"ax.set_ylabel('N of Tools', fontsize=14);\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## *Web Usable* in TAPoR items"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 51,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>detail</th>\n",
|
|||
|
" <th>creators_name</th>\n",
|
|||
|
" <th>last_updated</th>\n",
|
|||
|
" <th>attributetype</th>\n",
|
|||
|
" <th>attribute</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>16</th>\n",
|
|||
|
" <td>1</td>\n",
|
|||
|
" <td>List Words - HTML (TAPoRware)</td>\n",
|
|||
|
" <td><p>This tool lists words in an HTML document, ...</td>\n",
|
|||
|
" <td>Geoffrey Rockwell et. al.</td>\n",
|
|||
|
" <td>2011-11-27</td>\n",
|
|||
|
" <td>Web Usable</td>\n",
|
|||
|
" <td>Run in Browser</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>52</th>\n",
|
|||
|
" <td>4</td>\n",
|
|||
|
" <td>Wordle</td>\n",
|
|||
|
" <td><p>Wordle is an online toy for generating <a h...</td>\n",
|
|||
|
" <td>Jonathan Feinberg</td>\n",
|
|||
|
" <td>2018-10-17</td>\n",
|
|||
|
" <td>Web Usable</td>\n",
|
|||
|
" <td>Run in Browser</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>82</th>\n",
|
|||
|
" <td>5</td>\n",
|
|||
|
" <td>OrlandoVision (OVis)</td>\n",
|
|||
|
" <td><p>An application for visualizing a specific c...</td>\n",
|
|||
|
" <td>The Orlando Project</td>\n",
|
|||
|
" <td>2018-11-01</td>\n",
|
|||
|
" <td>Web Usable</td>\n",
|
|||
|
" <td>Software you Download and Install</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>118</th>\n",
|
|||
|
" <td>8</td>\n",
|
|||
|
" <td>Voyant Cirrus</td>\n",
|
|||
|
" <td><p>Cirrus is a visualization tool that display...</td>\n",
|
|||
|
" <td>Stéfan Sinclair and Geoffrey Rockwell</td>\n",
|
|||
|
" <td>2018-10-05</td>\n",
|
|||
|
" <td>Web Usable</td>\n",
|
|||
|
" <td>Run in Browser</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>192</th>\n",
|
|||
|
" <td>9</td>\n",
|
|||
|
" <td>Voyant Links</td>\n",
|
|||
|
" <td><p>Links finds collocates for words and displa...</td>\n",
|
|||
|
" <td>Stéfan Sinclair and Geoffrey Rockwell</td>\n",
|
|||
|
" <td>2018-09-18</td>\n",
|
|||
|
" <td>Web Usable</td>\n",
|
|||
|
" <td>Run in Browser</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id name \\\n",
|
|||
|
"16 1 List Words - HTML (TAPoRware) \n",
|
|||
|
"52 4 Wordle \n",
|
|||
|
"82 5 OrlandoVision (OVis) \n",
|
|||
|
"118 8 Voyant Cirrus \n",
|
|||
|
"192 9 Voyant Links \n",
|
|||
|
"\n",
|
|||
|
" detail \\\n",
|
|||
|
"16 <p>This tool lists words in an HTML document, ... \n",
|
|||
|
"52 <p>Wordle is an online toy for generating <a h... \n",
|
|||
|
"82 <p>An application for visualizing a specific c... \n",
|
|||
|
"118 <p>Cirrus is a visualization tool that display... \n",
|
|||
|
"192 <p>Links finds collocates for words and displa... \n",
|
|||
|
"\n",
|
|||
|
" creators_name last_updated attributetype \\\n",
|
|||
|
"16 Geoffrey Rockwell et. al. 2011-11-27 Web Usable \n",
|
|||
|
"52 Jonathan Feinberg 2018-10-17 Web Usable \n",
|
|||
|
"82 The Orlando Project 2018-11-01 Web Usable \n",
|
|||
|
"118 Stéfan Sinclair and Geoffrey Rockwell 2018-10-05 Web Usable \n",
|
|||
|
"192 Stéfan Sinclair and Geoffrey Rockwell 2018-09-18 Web Usable \n",
|
|||
|
"\n",
|
|||
|
" attribute \n",
|
|||
|
"16 Run in Browser \n",
|
|||
|
"52 Run in Browser \n",
|
|||
|
"82 Software you Download and Install \n",
|
|||
|
"118 Run in Browser \n",
|
|||
|
"192 Run in Browser "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 51,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_to_bp=df_db_sub[df_db_sub['attributetype'] == 'Web Usable'].drop_duplicates()\n",
|
|||
|
"df_to_bp.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 52,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"Run in Browser 503\n",
|
|||
|
"Other 400\n",
|
|||
|
"Software you Download and Install 187\n",
|
|||
|
"Web Application you Launch 8\n",
|
|||
|
"Name: attribute, dtype: int64"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 52,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_to_bp = df_to_bp['attribute'].value_counts()\n",
|
|||
|
"df_to_bp.head(10)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 53,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA38AAAIsCAYAAABC2hBiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZhkZX3+//cNKLgwKlEJ+6AyKrggW8RdFBUxaowLRhIUlRgxkp/GiGZxC3E332hiFAWDGwYjCG4hiAIiioKibDIgICIIjLigIgLz+f1xTkPR9HT3THfV6arzfl3XXF3n1Hb3UlB3Pc95TqoKSZIkSdJkW6/rAJIkSZKk4bP8SZIkSVIPWP4kSZIkqQcsf5IkSZLUA5Y/SZIkSeoBy58kSZIk9YDlT5KGKMkbk1SS42e47n+SnDTCLI9rszxoVM+5NpI8MMnXkvymzbl82vUvbPfP9u/SRcpyUpL/WeBjXJrkXYuRZ9rjvrj9Xrectv/t7f59p+3fs93/iHk+/tTP+a6LmXsxzTfj9N9j+3pcNbB9m9dEkju2t9lxeOklqTsbdB1AknriSUl2rapvdx1kCXsncHfg6cBvgCunXf8FYPeB7WcDr56274ZhBlwiTmu/PgI4amD/I4Dftl8/Pm3/DcCZI0m3tLwcuHGW679D8/fzw3b7jsAbgEuBs4aaTJI6YPmTpOG7Frgc+HvgmR1nGZokG1XV7xbwEA8AjquqE2e6sqquAa4ZeL5d2v3fXMBzjqMf0PxN3VL+ktwB2Bk4ot0/6BHAGVXVh2J8G1V13hzX/wro29+PpB5z2qckDV8B/wI8PcmD13Sj6VPSBvZXklcMbF+a5F1JDk5yZZJfJnl3Gk9Ncm6S65J8Nsk9ZniqzZN8vp1eeVmSl83wnI9KcnKS3yb5WZIPJdl44PqpaXe7tVPrrgdeM8v3tmOSE9vH+3mSTyTZtL1ueZIC7gv8f+3jnrSmx5pNkm3b7/tX7c/gc0nuN+02d07y3iQ/TfK7JN9O8qQ5HnfLJEcluTrJ9Ul+mOQt88z0j+1z/br9vu/W7t8gyRVJ3jDDfU5OcvRMj1dVBXyD25a8hwEB3g88eOp3lWQ94I+4dbRwzt/tgKlpuNcnWZnkT+b4Ppe3v7unTdv/X0nOGNie9WeZZPckx7U/m98kOSvJC9bwtLNmnD7tc4bM06dCX9d+/UhunUq8vP0b+cgM9z8iyXdm+7lI0lJi+ZOk0fg0sJJm9G8x7APsBrwIeAfwKuA9wFuAfwReBjwWeOsM9z0M+D7wLOBLwH8OvmFP8kjgROCnNFMr/wZ4KnC7N7/AkcDn2+s/P1PQJPcCTgLuDPwZ8NdtthOS3JFmeufu7fN9sr388vn8EKY9z4Zt7gcCLwVeCGwLnJxkk4Gbfojm53YI8CfAj4EvJHnULA//UWAr4ABgr/a+G84j1vOBJ7Z5XgXsDXwYoKpuohmpe2GSDHwf9wEezcw/7ymnATsmuVO7vTvNtM5zgF/QFD6AHYC7AV9vH3ttfrf/DRxL83dyNvDpJA+dx/c8l7l+ltu0eV8C/DHwGZoy9vwRZNyj/frPND/T3Wn+Pj8MPCcDxxi2l/+U2X9PkrSkOO1TkkagqlYneRtwWJJ/qqqVC3zI3wHPqaqbgf9N8gyaUrVdVV0C0L4J3o+mCA76UlW9vr18fFs2/oFby9vbgNOq6nlTd0jyE+DEJA+qqnMGHuu9VfVvc2R9dfv1ye00O5KsBE4H/rSqjgS+meQG4MoFTON8EbA1sKKqLm6f53TgYuAvgbcmeSBNIXtRVR3R3uZ4mjL8j8CT1/DYuwHPr6rPtdsnzTPTnYC9q+rX7XP9BvhYkgdW1fnA4cDBwOOAr7b3eSFwNU0xX5PTgDsAuwKn0IwCfqOqKsk32+0vc+vo4NTI39r8bj9cVe9qb3M8cB7wOpoPHhZi1p9lVX1qIFva729LmgJ95LTHWuyMU8fk/nDw7zDJkTQfrjyHW8vec2l+B59cx+eSpJFz5E+SRufjwGU0b04X6qS2+E25CLh0qvgN7LtXO7o26Jhp20cDOydZP8mdaUY7jmqnJW6QZAPgVJqFM3aedt8vzCPrbsD/TRU/gKr6Fs2iGrONtq2t3YDvTBW/9nkupxlFmnqeXWmmR3564Dar2+3ZspxFUx5fmGTrtch0wlTxax3dPv+u7XNfSFNuXgi3lJ2/AD7WjgyuybeAm7i13D2CZiooNMewDe6/sKquWYff7S1/J+3P6Fian/FCzfqzTHKPdlruj9pcN9KMEq6Y4bGGlfE22r/d/6H9PbVeSHOM6s8W+/kkaVgsf5I0Iu2b+XcA+ybZZoEP94tp279fw77QrGA46OoZtjcA7gncA1if5tixGwf+3UAzyrHVtPteNY+sm63hdlcBm8ywf13N53k2A35dVb+d4TZ3bqeOzuR5wBnAvwI/ao9De8I8Mt3mZ11V1wO/bnNMOQx4dnvc3R400x5nnUrY5j8LeESaUz5sya3l7xvAw9vj/R5BO+WTtf/dzvR3shkLN9fP8r/a27wTeBJNUT4c2GiGxxpWxpkcBjw6yX2T3Jdmau7hQ3ouSRoKp31K0mgdTjPF8rUzXPc7phW1zLxgy0Lde4btm4BVNG+wC3gj8MUZ7nvFtO2ax/NdOcNzAmzK4p5+4EqaY9xmep5rB25z1yR3nlYANwV+u6YVMavqJzTH5q1HM7L0RuC4JFvPMfJzm++7PUbvrtz2NBafBt5LM6Xw8cDpc61S2TqN5hjKR9CM+k495unAxjTHVd4PeHu7/xes3e/23sDPpm1PP/3GoKmVXqd/2HCbgj/bz5LmFB97A6+oqg9M3ae97UzWNuM6q6pTklxIM5U6ND+v/xvGc0nSsDjyJ0kj1JaLdwH7c/sRisuBjZNsMbBv1lUo19H0VRv/BDizqm6uqt/QTBu8f1WdMcO/6QVhPk4Hnpzbrha6K7CcZsrhYjmdZvrqtgPPswVNOZp6nm/TFKBnD9wm7facWapqdXss2JtoFrCZawR3z9z2ROTPap//ltUv29HAI4ED2+vnu4DI12lGa/fj1lE/quo64Fzgb9tdp7X71/Z3e8vfSVu+nkEz3XRNrqYZSXzgwP3uym3Pw3iLNfwsN6QZnbxh4DE2pjn340zWNuNcft9+nWmUEZoPb/ajmZr70WlTryVpyXPkT5JG74PA62lKyckD+/8XuB44PMm7aVaqvN1pGBbBXkkOaZ/7WcCeNG+ap/wdzQIgq2mOc7qOZiGVvYG/X4fFat4D/BXN4jJvpxn5ehvN6oyfWcg3Ms1/0YyofinJPwE304wqraL5mVNV57eLd/x7kmU0x0W+lOYcg38104OmOTXD8TSrVK6kKSivplkx8/w5Ml1Ps5LoO2nK/juBY2YY2TuM5nd9PfAp5mdqOudewEHTrvsGzff182kZ1+Z3+5Ikv6dZQfSlNKOIM624CdyyqNGxNKfr+BHNSOOr2+8JmPtnWVXXJ/k28E9JfgWsplkQ55fAshmedq0yzqWqfp/kEuC5Sc6hGc38flVNlcIjaFYC3YDm702Sxoojf5I0Yu10w3+dYf8qmqXjtwQ+C+xLM61vsb0E2Kl9jqcBB1bVcQM5TgUeA9wL+BjwOZrS8GPmd4zfbbQnZ388zRvpI4H/AL4G7DnwpnrB2lHVJ9KcBP0wmjfqPwIeV1XXDtz0pe11/0izQMg2wNPa73smv6MpqgcBx7X3/S3wpHbUbjafolnF8zDg/9Gs4PniGbKfAfwEOLqqfjnnN8st0ycvo5mC+I1pV39jan97XsCp+6zN73YfmpG1zwIPBZ5XVd+dI9YraErp+2l+z0cCXxm4fj4/yz8DLqEpiP9G8wHBR9fwfOuScS4voxlR/TLNSPHmU1dU1U9pRpi/XlUXLPB5JGnkMvD/BEmS1IEk29NM1XxiVZ3YdR7NrD1f5E9ojkk8rOs8krS2LH+SJHUkyR8A9wfeQrNYyUPK/zEvOe1xh9vTjFg+EVg+w4qxkrTkOe1TkqTu/DHNQjObAS+0+C1ZO3Pr+RP/wuInaVw58idJkiRJPeD
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1080x432 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots()\n",
|
|||
|
"df_to_bp.plot(kind='bar', figsize=(15,6), x='webusable', y='tools',)\n",
|
|||
|
"plt.grid(alpha=0.6)\n",
|
|||
|
"ax.yaxis.set_label_text(\"\")\n",
|
|||
|
"ax.set_title(\"Number of Tools by Web usability\", fontsize=15)\n",
|
|||
|
"ax.set_xlabel('Web usable', fontsize=14)\n",
|
|||
|
"ax.set_ylabel('N of Tools', fontsize=14);\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## ------"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.7.7"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|