Home > Group JVDB > 4 Semantic Web Projects and Services
Neurocommons
The Neurocommons is an open RDF database developed by Science Commons. It was compiled from major life sciences databases with a focus on neuroscience. It is accessible via a web-based front end using the SPARQL query language.
The Neurocommons project is creating an Open Source knowledge management platform for biological research and has two distinct phases. The first phase is a project to apply text mining and natural language processing to open biomedical abstracts.
The second phase is the development of a data analysis software system.
A prerequisite for automated categorization of scientific information is that it be in a consistent format that can be processed meaningfully and accurately by software. Literature searches are the primary method by which scientists obtain up to date information about the subject matter in their particular field.
Links among literature, data records, real-world entities, and abstract concepts, with formal definitions of each link’s endpoints and type. Applications need to use common identifiers for endpoints so that mentions of shared entities can be matched. This discipline of links, definitions, and identification is exactly what the framework of the semantic web provides.
In collaboration with the W3C Semantic Web Health Care and Life Science interest group, information from a variety of standard sources to establish core RDF content that can be used as a basis for bioinformatics applications.
The neurocommons project started with one of the primary repositories for Biomedical literature. The randomly selected 874,727 PubMed/Medline abstracts and fed them into Temis IDE equipped with the “biological entity recognizer” (BER). BER was able to perform some degree of processing on 368,688 of the abstracts.
BER categorizes terms and phrases in the input text in various ways (e.g. as a genetic population, chemical entity), but the only controlled vocabulary handled by BER is one for proteins and genes.
Each concept tree generated by BER was pruned to remove information not related to proteins/genes and their interactions, converted to a canonical format, and then rendered as RDF. Each leaf of the RDF concept tree is a protein/gene substance node, and internal nodes are called ‘associations.’
The images below describe the resulting organization:
Beyond processes and other associations, the RDF captures additional annotations that relate the annotations to the originating PubMed record and to other data sources. Protein/gene nodes are linked to their identified gene and protein public databases.
A more popular application of the semantic web is the Friend of a Friend Project (FOAF) which describes relationships among people and other agents in terms of RDF. The Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people, the links between them and the things they create, places they visit, etc.
FOAF is an RDF vocabulary. FOAF data is decentralized and within user control. An example application that uses these files might be a community directory where members maintain their own records. Many communities have evolved and grown on the Internet, from companies through professional organizations to soley social groups. The FOAF vocabulary gives a basic identifiers for community membership by describing people through their basic properties.
Some of the benefis of the potential resulting from this project's efforts include:
Simple properties to characterize an individual
| Property | Value |
| nick | A string literal that gives a name used to identify a user on a chat or other computer system; for example an AIM screen name or UNIX login |
| homepage | The URL of the person's home page |
| workplacehomepage | The URL of the home page of the place the person works |
| depiction | The URL of an image in which the person is depicted |
| phone | A telephone number for the person |
By aggregating and merging the FOAF files, you can achieve the same effect as operating a centralized directory service, without any of the issues of single points of failure or control.
This is a very attractive feature for many communities for which decentralized control is necessary.
The SIOC initiative (Semantically-Interlinked Online Communities) aims to enable the integration of online community information. SIOC provides a Semantic Web ontology for representing data from the social web 2.0 in RDF. It has recently achieved significant adoption through its usage in a variety of commercial and open-source software applications, and is commonly used in combination with the FOAF vocabulary for expressing personal profile and social networking information.
SIOC enables usage scenarios for online community site data, and allows semantic applications to be built on top of existing social websites.
A list of SIOC data sources can be found on the SIOC “Enabled Sites” wiki page, or by downloading the export list from PingtheSemanticWeb.com
PingtheSemanticWeb.com is a web service archiving the location of recently created/updated RDF documents on the Web. If one of those documents is created or updated, its author can notify PTSW that the document has been created or updated by pinging the service with the URL of the document.
PingtheSemanticWeb.com is used by crawlers or other types of software agents to know when and where the latest updated RDF documents can be found
SIMILE is a joint project conducted by the MIT Libraries.
SIMILE seeks to enhance inter-operability among digital assets including schmemas, ontologies, and metadata. A key challenge is that the collections which must inter-operate are often distributed across individual, community, and organizations.
In addition, SIMILE wants to implement a digital asset dissemination architecture based upon web standards. The dissemination architecture will provide a mechanism to add useful views to digital object including metadata, schemas, vocabularies.
Applications that have originated from the SIMILE project include:
Babel: converts standard formats to web semantic formats
Fresnal: Fresnel is a vocabulary for displaying RDF. The prefix fresnel: is often used. The main goals are to help developers stop reinventing the wheel provide portable descriptions of resources that function similarly independent of the rendering browser, making it easy for users to visually reconcile what they see with what they already recognize regardless of which software they use.
Longwell: Longwell mixes the flexibility of the RDF data model with the effectiveness of the faceted browsing UI paradigm and enables you to visualize and browse any arbitrarely complex RDF dataset, allowing you to build a user-friendly web site out of your data within minutes and without requiring any code at all.
PiggyBank: Piggy Bank is a Firefox extension that turns your browser into a mashup platform, by allowing you to extract data from different web sites and mix them together. Piggy Bank also allows you to store this extracted information locally for you to search later and to exchange at need the collected information with others.
RDFizers: Converts content into RDF format. Plug-ins avialable for a variety of existing formats including e-mail, bibliographies, and raw image files
Seek: Mozilla firefox plug-in demo to view e-mail in RDF format
Commerical Interests: Utilizing Existing Content
FreeBase - Commerical program from MetaWeb that is in the early stages of cateogrizing, tagging, and databasing existing content on the web (different from Googlebase and Wikipedia in that it is compiling information from sources)
Leiki - Uses machine learning and algorithms to parse information and categorize for the purpose of personalization (goal: is targeted adveritsing)
Twine: utilizes a combination of RDF, OWL, and machine-learning techniques to gather information about individual's interests
Powerset: Natural Language Processing (NLP) search engine for the web.