Written by Benjamin Gross
30 October 2015
The Internet is a messy place. For the most part, it is a giant collection of text pages tenuously linked to other pages of text. Humans are adept at parsing this text, riddled with spelling mistakes, nuance, and ambiguity. Machines, on the other hand, struggle. Enter the semantic web, an extension of the web that uses controlled vocabularies and common data formats. Now, UNAVCO has staked a claim on the semantic web with a new resource, Connect UNAVCO.
Connect UNAVCO links together geodetic services and products (such as, people, software, datasets, publications, and more) in efficient and discoverable ways. The website is built upon VIVO, an open-source and community-supported software implemented at universities and research organizations worldwide. Each record in the VIVO database is linked to other records using a machine-friendly pattern consisting of a subject, predicate, and object, known as a triple. For example, UNAVCO is identified as http://connect.unavco.org/individual/org253530 (subject), http://www.w3.org/1999/02/22-rdf-syntax-ns#type (predicate), and http://vivoweb.org/ontology/core#Consortium (object); in plain English, org253530 is a consortium. VIVO provides a front-end interface that displays semantic data in an attractive, more human-friendly format.
Connect UNAVCO is being developed as part of an EarthCube building block, EarthCollab, with our partners, the National Center for Atmospheric Research (NCAR) and Cornell. EarthCube aims to create a “framework for sharing data and knowledge in an open and inclusive manner to enable an integrated understanding of the Earth system.” Connect UNAVCO is supporting that mission by 1. extending the VIVO software, 2. compiling diverse information from an array of sources 3. mapping the information to meaningful and controlled vocabularies (i.e., ontologies), 4. displaying the information in an accessible way, 5. providing the information in a machine-readable format and 6. connecting people, places and things where such connections have not been made or are not easily discoverable.
EarthCollab includes two use cases: a VIVO implementation at UNAVCO and another at NCAR’s Earth Observing Laboratory (EOL). Cornell, where VIVO was originally developed, is helping to enhance VIVO’s capabilities to connect external researchers who are part of either the UNAVCO or UCAR consortia to their related data, publications, grants and colleagues.
Linked open data is a term coined by World Wide Web inventor Tim Berners Lee. It is data that conforms to the following principles:
The VIVO software is designed to serve machine-readable linked open data. Connect UNAVCO currently makes 1.25 million triples freely available. Clicking the green link icon near the top of any page will provide a link to the page in RDF format.
Unique identifiers make it much easier to connect linked open data. Most peer-reviewed science publications now have digital object identifiers (DOIs) to track and connect publications. UNAVCO is now minting data DOIs to track and connect datasets. Connect UNAVCO holds DOI information for thousands of datasets and publications. Unique identifiers also allow other software to easily incorporate information stored in the VIVO database.
Name ambiguity presents a challenge when populating a semantic database such as VIVO, partly because unique identifiers are not commonly implemented for people. Publication records often only include partial names, making the process of connecting authors with non-unique last names to their unique record in Connect UNAVCO difficult. A publicly available, unique identifier for a person solves this problem, much like a DOI does for an information resource. ORCID is an open-source, non-profit effort to provide persistent identifiers for people. ORCID has been adopted by publishers such as AGU and AAAS as an optional part of manuscript submissions and is increasingly becoming an essential part of a researcher’s academic identity. UNAVCO is leveraging ORCID in the disambiguation process and to facilitate updates to Connect UNAVCO. UNAVCO is asking researchers for their ORCIDs or requesting that researchers consider signing up for ORCIDs so that we can disambiguate names and make clear and accurate attributions.
One of the primary goals of EarthCollab is to link together people, places, and things. The benefits of this include enhancing discovery and advancing research, reducing data redundancy, and sourcing quickly changing data from a canonical source. Connect UNAVCO includes publication, grant, and dataset information for a large community, but lacks researcher profile information such as position, institution and contact details because this information is maintained by the researcher’s institution. We plan to dynamically display researcher information by linking to the person’s profile maintained by their home institution via their ORCID or a ‘same as’ assertion. In that way, we can easily link the researcher to their UNAVCO-related materials, such as datasets, publications and colleagues.
Last modified: Friday, 30-Oct-2015 19:36:26 UTC