November | 2009 | Scimantica - Semantic Science

Semantic Web Tools and Applications for Life Sciences 2009 – A Personal Summary

November 24, 2009 2 Comments

: Image via Wikipedia

So another SWAT4LS is behind us, this time wonderfully organised by Andrea Splendiani, Scott Marshall, Albert Burger, Adrian Paschke and Paolo Romano.

I have been back home in Cambridge for a couple of days now and have been asking myself whether there was an overall conclusion from the day – some overarching bottom line that one could take away and against which one could measure the talks at SWAT4LS2010 to see whether there has been progress or not. The programme consisted of a great mixture of both longer keynotes, papers, “highlight posters” and highlight demonstations illustrating a wide range of activities at the semantic web technology – computer science and biomedical research.

Topics at the workshop covered diverse areas such as the analysis of the relationship between HLA structure variation and disease, applications for maintaining patient records in clinical information systems, patient classification on the basis of semantic image annotations to the use of semantics in chemo- and proteoinformatics and the prediction of drug-target interactions on the basis of sophisticated text mining as well as games such as Onto-Frogger (though I must confess that I somehow missed the point of what that was all about).

So what were the take-home messages of the day? Here are a few points that stood out to me:

During his keynote, Alan Ruttenberg coined the dictum of “far too many smart people doing data integration”, which was subsequently taken up by a lot of the other speakers – an indication that most people seemed to agree with the notion that we still spend far too much time dealing with the “mechanics” of data – mashing it up and integrating it, rather than analysing and interpreting it.
During last year;s conference, it already became evident that a lot of scientific data is now coming online in a semantic form. The data avalanche has certainly continued and the feeling of an increased amount of data availability, at least in the biosciences, has intensified. While chemistry has been lagging behind, data is becoming available here too. On the one hand, there are Egon’s sterling efforts with openmolecules.net and the data solubility project, on the other, there are big commercial entities like the RSC and ChemSpider. During the meeting, Barend Mons also announced that he had struck an agreement with the RSC/ChemSpider to integrate the content of ChemSpider into his Concept Wiki system. I will reserve judgement as to the usefulness and openness of this until it is further along. In any case, data is trickling out – even in chemistry.
Another thing that stood out to me – and I could be quite wrong in this interpretation, given that this was very much a research conference – was the fact that there were many proof-of-principle applications and demonstrators on show, but very few production systems, that made use of semantic technologies at scale. A notable exception to this was the GoPubMed (and related) system demonstrated by Michael Schroeder, who showed how sophisticated text mining can be used not only to find links between seemingly unrelated concepts in the literature, but can also assist in ontology creation and the prediction of drug-target interactions.

Overall, many good ideas, but, as seems to be the case with all of the semantic web, no killer application as to yet – and at every semweb conference I go to we seem to be scrabbling around for one of those. I wonder if there will be one and what it will be.

Thanks to everybody for a good day. It was nice to see some old friends again and make some new ones. Duncan Hull has also written up some notes on the day – so go and read his perspective. I, for one, am looking forward to SWAT4LS2010.

Filed under Uncategorized Tagged with data, GoPubMed, Knowledge Management, Knowledge Representation, semantic web, Technology, Text Mining

SWAT4LS2009 – Barend Mons: The meta-analysed semantic web, getting rid of ambiguity and redundancy

November 20, 2009 1 Comment

Introducing Concept Wiki – a semantic wiki and insulting his audience repeatedly.

Problems with getting the community to do annotation:

everybody wants structured data, but nobody wants to do structured data entry. Not working.
Everybody likes free text and cut and paste.

Now shows suggestion of ontology terms in authoring tools for introduction of structure in unstructured data.

Now talking about redundancy? Is it a problem? His point:

no reviewer would accept the exact same paper twice let alone several times
But same assertions are published over and over

Mentions deposit of ChemSpider Content into concept wiki.

Oh dear – hopeless confusion between names, people, identifiers etc…..they are all “concepts” according to Barend Mons.
The “essence of a nanopublication” is an annotated triple…i.e an assertion together with metadata about it (provenance, time etc…)
Now points out that human language grammer is kind of similar to triples….subject predicate object…
An assertion should only be accepted if it has value and advances human knowledge. The mind boggles….who decides what is interesting when….
Triples vs “smart triples” apparently “smart triples” are curated/observed/hypothetical

Now shows some screenshots of use cases.

Filed under Uncategorized

SWAT4LS2009 – A.L. Lamprecht: Semantics-Based Composition of EMBOSS Services with Bio-jETI

November 20, 2009 Leave a comment

Bio-jETI: framework for model-based graphical design execution and management of bioinformatics processes

PROPHETS Plugin: visual semantic domain modeling, lose specification within the process model, non-formal specification of constrains using natural language templates, automatic generation of model checking formulae.

Filed under Uncategorized

SWAT4LS2009 – James Eales: Mining Semantic Networks of Bioinformatics eResources from Literature

November 20, 2009 Leave a comment

eResource Annotations could help with

making better choices: which resource is best?
which is available?
reduce curation
help with service discovery

Approach: link bioinformatics resources using semantic descriptors generated from text mining….head terms for services can be used to assign services to types..e.g. applications, data sources etc.

Filed under informatics, semantic web, Text Mining Tagged with bioinformatics, Data mining, Knowledge Management, Knowledge Representation

SWAT4LS2009 – Michael Schroeder: Predicton of Drug Target Interactions from Literature by Context Similarity

November 20, 2009 Leave a comment

Typical researcher spends 12.4 hours a week searching for information. Why not use Google? ‘Cause Google is not semantic.

Go PubMed – Filter PubMed contents against all the terms in the Gene Ontology. If you use simple categorisation for information retrieval potentially increase search burden due to compartmentalisation. However works the other way round too…useful filtering.

Showing some examples of faceted browsing of PubMed content and systematic drilldown into search results. Not easy to blog, but literature exploration in this way is always fascinating. Examples include the analysis of research trends, networks of colaborators etc..new tool in Go PubMed also allows the discovery of indirect links or inferred links.

Have developed a similar system for the web: Go Web (works on the top yahoo search results).

Remarks on Ontology Generation: have developed a plugin for OBO Edit…search for term and plugin makes suggestions for terms that might be included in new ontologies. Points out terms in existing ontologies. Also helps with the generation of definitions for terms…wow this is extremely useful in SO many ways….

Now let’s talk about drugs and targets….

Try and mine for gene mentions in text…find a gene term and then use context to decide what it is we are talking about. Once gene has found look for statistically significant co-occurences. The results have been made available in GoGene. Again can do bibliometric trend analysis – genes are ranked by community interest.

From drugs to genes..what is the link between a gene and a drug using context profiles: what are the disease terms related to a given drug…then to genes.

Gotta stop blogging…enjoying this talk far too much…….

Filed under informatics, ontology Tagged with Gene Ontology, Information retrieval, PubMed, Research

SWAT4LS2009 – Linking Open Drug Data to Cheminformatics abd Proteochemometrics

November 20, 2009 1 Comment

: Image via Wikipedia

(Notes frm the presentation as it happens)

Knowledge is not uni or bivariate, but we think of it as such: this leads to information loss.

Naming things: showing example of a trivial name, an IUPAC systematic name and an InChI and points out that these have different information content.

Points out scaling problem: drug discovery is multivariate and happens in a space of approx 10¹⁶ molecules (all molecules that are feasible and thought to be drug-like). Information loss occurs as you traverse this space backwards and forwards.

Now talks about molecular information in RDF: http://rdf.openmolecules.net for the provision of derefernceable URIs for molecules….and plugging the Chemistry Development Kit (CDK) as a means for cnverting between multiple representations of a molecule. Now moves on to Bioclipse as an integrating tool that allows chemical data transformations and the tracking of vwhy these transformations occur (version-controllable scripts to drive Bioclipse).

RDF extension to bioclipse: local RDF storage, read/write RDF, run SPARQL queries and extract RDF from XHTTML/RDFa.

Now shows an example of the expression of the CDK data model using ontologies but no details. Brief mention of his recent descriptor ontology.

Filed under chemistry, data, informatics, ontology, RDF, semantic web Tagged with Chemistry Development Kit, data, Metadata, Resource Description Framework

SWAT4LS2009 – Sonja Zillner: Towards the Ontology Based Classification of Lymphoma Patients using Semantic Image Annotation

November 20, 2009 1 Comment

(Again, these are notes as the talk happens)

This has to do with the Siemens Project Theseus Medico – Semantic Medical Image Understanding (towards flexible and scalable access to medical images)

Different images from many different sources: e.g. X-ray, MRI etc…use this and combine with treatment plans, patient data etc and integrate with external knowledge sources.

Example Clinical Query:” Show me theCT scans and records of patiens with a Lymph Node enlargement in the neck area” – at the moment query over several disjoint systems is required

Current Motivation: generic and flexible understanding of images is missing
Final Goal: Enhance medical image annotations by integrating clinical data with images
This talk: introduce a formal classification system for patients (ontological model)

Used Knowledge Sources:

Ann-Arbor Staging System – particularly suitable for lymphoma patients
RadLex
Foundational Model of Anatomy
Semantic Image Annotation

Requirements of the Ontological Model

Capture the rationale of the Ann Arbor Staging system
Integrate external ontologies
Ontology must describe the patient record

Now showing an example axiomatisation for the counting and location of lymphatic occurences and discussses problems relating to extending existing ontologies….

Now talking about annotating patient records: typical problems are abbreviations, clinical codes, fragments of sentences etc…difficult for NLP people to deal with….

Now showing detailed patient example where application of their classification system led to reclassification of patient in terms of staging system.

Filed under informatics, ontology, Uncategorized Tagged with Ann Arbor Staging, Knowledge Management, Knowledge Representation, Magnetic resonance imaging, Medicine, Ontologies, ontology, X-ray

SWAT4LS: Demo Preview NeuroLex.org.

November 20, 2009 1 Comment

online wiki-bases ontology for neuroscience
built on top of mediawiki
domain scientists can make contributions and a curation process turns this into formal representations

Filed under Uncategorized

SWAT4LS2009 – Matthias Loebe: TIM A semantic web application for the specification of metadata items in clinical research

November 20, 2009 1 Comment

(Again, these are live notes as bullet points.)

Problems with the Specification of Clinical Trials

Development of trial protocol
Preparation of study centres
Registration of patients etc…..
Case report Forms capture data at different time points (lab results, therapy outcomes, treatment history etc..) relevant for answering clinical questions
Misconceptions and misinterpretations of data occur frequently either through underspecification or lack of metadata

Benefits of Detailing and Reusing Items

Efficiency
Data Quality
Metaanalysis

Requirements for an Item Data Model

Expressiveness – Items consist of subitems
Adaptable system of rules – validity and consistency checking
Supporting context
Providing views – facests. Different information requirements for different types ofusers.
Exploiting conceptual relations
Mapping to terminologies

Architecture of Trial Item Manager

Application behaviour is ontology driven…
working data stored as rdf in separate model
multiple rdf models (combined, raw, inferred)

Showing the specification of various trial items in rdf…..and examples of the ontology driving the app…

Advantage of th Semantic Approach

open linked data, referencability
extensible rules
user guidence using semantics but invisible to user
rapid response to change
import external item sets
personalisation on a per-user/user type basis
navigation
semantic search

Caveats:

Open world reasoning sometimes gives unexpected results…..;-)
No unique name assumption
Performance

Filed under Uncategorized Tagged with Clinical trial, data, Linked Data, Metadata, ontology

SWAT4LS2009 – Keynote Alan Ruttenberg: Semantic Web Technology to Support Studying the Relation of HLA Structure Variation to Disease

November 20, 2009 2 Comments

(These are live-blogging notes from Alan’s keynote…so don’t expect any coherent text….use them as bullt points to follow the gist of the argument.)

The Science Commons:

a project of the Creative Commons
6 people
CC specializes CC to science
information discovery and re-use
establish legal clarity around data sharing and encourage automated attribution and provenance

Semantic Web for Biologist because it maximizes value o scientific work by removing repeat experimentation.

ImmPort Semantic Integration Feasibility Project

Immport is an immunology database and analysis portal
Goals:metaanalysis
Question: how can ontology help data integration for data from many sources

Using semantics to help integrate sequence features of HLA with disorders
Challenges:

Curation of sequence features
Linking to disorders
Associating allele sequences with peptide structures with nomenclature with secondary structure with human phenotype etc etc etc…

Talks about elements of representation

pdb structures translated into ontology-bases respresentations
canonical MHC molecule instances constructed from IMGT
relate each residue in pdb to the canonical residue if exists
use existing ontologies
contact points between peptide and other chains computed using JMOL following IMGT. Represented as relation between residue instances.
Structural features have fiat parts

Connecting Allele Names to Disease Names

use papers as join factors: papers mention both disease and allele – noisy
use regex and rewrites applied to titles and abstracts to fish out links between diseases and alleles

Correspondence of molecules with allele structures is difficult.

use blast to fiind closest allele match between pdb and allele sequence
every pdb and allele residue has URI
relate matching molecules
relate each allele residue to the canonical allele
annotate various residoes with various coordinate systems

This creates massive map that can be navigated and queried. Example queries:

What autoimmune diseases can de indexed against a given allele?
What are the variant residues at a position?
Classification of amino acids
Show alleles perturned at contacts of 1AGB

Summary of Progress to Date:
Elements of Approach in Place: Structure, Variation, transfer of annotation via alignment, information extraction from literature etc…

Nuts and Bolts:

Primary source
Local copy of souce
Scripts transforms to RDF
Exports RDF Bundles
Get selected RDF Bundles and load into triple store

Parsers generate in memory structures (python, java)
Template files are instructions to fomat these into owl
Modeling is iteratively refined by editiing templates
RDF loaded into Neurocommons, some amount of reasoning

RDFHerd package management for data

neurocommons.org/bundles

Can we reduce the burden of data integration?

Too many people are doing data integration – wasting effort
Use web as platform
Too many ontologies…here’s the social pressure again

Challenges

have lawyers bless every bit of data integration
reasoning over triple stores
SPARQL over HTTP
Understand and exploit ontology and reasoning
Grow a software ecosystem like Firefox

Filed under data, informatics, ontology, OWL, semantic web, Uncategorized Tagged with Data integration, Knowledge Management, Knowledge Representation, Ontologies, ontology, Resource Description Framework, semantic web, SPARQL

← Older posts

Scimantica – Semantic Science

Semantic Web Tools and Applications for Life Sciences 2009 – A Personal Summary

SWAT4LS2009 – Barend Mons: The meta-analysed semantic web, getting rid of ambiguity and redundancy

SWAT4LS2009 – A.L. Lamprecht: Semantics-Based Composition of EMBOSS Services with Bio-jETI

SWAT4LS2009 – James Eales: Mining Semantic Networks of Bioinformatics eResources from Literature

SWAT4LS2009 – Michael Schroeder: Predicton of Drug Target Interactions from Literature by Context Similarity

SWAT4LS2009 – Linking Open Drug Data to Cheminformatics abd Proteochemometrics

SWAT4LS2009 – Sonja Zillner: Towards the Ontology Based Classification of Lymphoma Patients using Semantic Image Annotation

SWAT4LS: Demo Preview NeuroLex.org.

SWAT4LS2009 – Matthias Loebe: TIM A semantic web application for the specification of metadata items in clinical research

SWAT4LS2009 – Keynote Alan Ruttenberg: Semantic Web Technology to Support Studying the Relation of HLA Structure Variation to Disease

Archive

Search this Blog:

Recent Tweets

Pages

Admin