Visualisation of Ontologies and Large Scale Graphs

{{en|A phylogenetic tree of life, showing the ...
Image via Wikipedia

For a whole number of reasons, I am currently looking into the visualisation of large-scale graphs and ontologies and to that end, I have made some notes concerning tools and concepts which might be useful for others. Here they are:

Visualisation by Node-Link and Tree

jOWL: jQuery Plugin for the navigation and visualisation of OWL ontologies and RDFS documents. Visualisations mainly as trees, navigation bars.

OntoViz: Plugin into Protege…at the moment supports Protege 3.4 and doesn’t seem to work with Protege 4.

IsaViz: Much the same as OntoViz really. Last stable version 2004 and does not seem to see active development.

NeOn Toolkit: The Neon toolkit also has some visualisation capability, but not independent of the editor. Under active development with a growing user base.

OntoTrack: OntoTrack is a graphical OWL editor and as such has visualisation capabilities. Meager though and it does not seem to be supported or developed anymore either…the current version seems about 5 years old.

Cone Trees: Cone trees are three-dimensional extensions of 2D tree structures and have been designed to allow for a greater amount odf information to be visualised and navigated. Not found any software for download at the moment, but the idea is so interesting that we should bear it in mind. Examples are here, here and the key reference is Robertson, George G. and Mackinlay, Jock D. and Card, Stuart K., Cone Trees: animated 3D visualizations of hierarchical information, CHI ’91: Proceedings of the SIGCHI conference on Human factors in computing systems, 1991, ISBN = 0-89791-383-3, pp.189-194. (DOI here)

PhyloWidget: PhyloWidget is software for the visualisation of phylogenetic trees, but should be repurposable for ontology trees. Javascript – so appropriate for websites. Student project as part of the Phyloinformatics Summer of Code 2007.

The JavaScript Information Visualization Toolkit: Extremely pretty JS toolkit for the visualisation of graphs etc…..Dynamic and interactive visualisations too…just pretty. Have spent some time hacking with it and I am becoming a fan.

Welkin: Standalone application for the visualisation of RDF graphs. Allows dynamic filtering, colour coding of resources etc…

Three-Dimensional Visualisation

Ontosphere3D: Visualisation of ontologies on 3D spheres. Does not seem to be supported anymore and requires Java 3D, which is just a bad nightmare in itself.

Cone Trees (see above) with their extension of Disc Trees (for an example of disc trees, see here

3D Hyperbolic Tree as exemplified by the Walrus software. Originally developed for website visualisation, results in stunnign images. Not under active development anymore, but source code available for download.

Cytoscape: The 1000 pound gorilla in the room of large-scale graph visualization. There are several plugins available for interaction with the Gene Ontology, such as BiNGO and ClueGO. Both tools consider the ontologies as annotation rather than a knowledgebase of its own and can be used for the identification of GO terms, which are overrepresented in a cluster/network. In terms of visualisation of ontologies themselves, there is there is the RDFScape plugin, which can visualize ontologies.

Zoomable Visualisations

Jamabalaya – Protege Plugin, but can also run as a browser applet. Uses Shrimp to visualise class hierarchies in ontologies and arrows between boxes to represent relationships.

CropCircles (link is to the paper describing it): CropCircles have been implemented in the SWOOP ontology editor which is not under active development anymore, but where the source code is available.

Information Landscapes – again, no software, just papers.

Reblog this post [with Zemanta]

SWAT4LS2009 – Sonja Zillner: Towards the Ontology Based Classification of Lymphoma Patients using Semantic Image Annotation

(Again, these are notes as the talk happens)

This has to do with the Siemens Project Theseus Medico – Semantic Medical Image Understanding (towards flexible and scalable access to medical images)

Different images from many different sources: e.g. X-ray, MRI etc…use this and combine with treatment plans, patient data etc and integrate with external knowledge sources.

Example Clinical Query:” Show me theCT scans and records of patiens with a Lymph Node enlargement in the neck area” – at the moment query over several disjoint systems is required

Current Motivation: generic and flexible understanding of images is missing
Final Goal: Enhance medical image annotations by integrating clinical data with images
This talk: introduce a formal classification system for patients (ontological model)

Used Knowledge Sources:

Requirements of the Ontological Model

Now showing an example axiomatisation for the counting and location of lymphatic occurences and discussses problems relating to extending existing ontologies….

Now talking about annotating patient records: typical problems are abbreviations, clinical codes, fragments of sentences etc…difficult for NLP people to deal with….

Now showing detailed patient example where application of their classification system led to reclassification of patient in terms of staging system.

Reblog this post [with Zemanta]

SWAT4LS2009 – Keynote Alan Ruttenberg: Semantic Web Technology to Support Studying the Relation of HLA Structure Variation to Disease

(These are live-blogging notes from Alan’s keynote…so don’t expect any coherent text….use them as bullt points to follow the gist of the argument.)

The Science Commons:

  • a project of the Creative Commons
  • 6 people
  • CC specializes CC to science
  • information discovery and re-use
  • establish legal clarity around data sharing and encourage automated attribution and provenance

Semantic Web for Biologist because it maximizes value o scientific work by removing repeat experimentation.

ImmPort Semantic Integration Feasibility Project

  • Immport is an immunology database and analysis portal
  • Goals:metaanalysis
  • Question: how can ontology help data integration for data from many sources

Using semantics to help integrate sequence features of HLA with disorders
Challenges:

  • Curation of sequence features
  • Linking to disorders
  • Associating allele sequences with peptide structures with nomenclature with secondary structure with human phenotype etc etc etc…

Talks about elements of representation

  • pdb structures translated into ontology-bases respresentations
  • canonical MHC molecule instances constructed from IMGT
  • relate each residue in pdb to the canonical residue if exists
  • use existing ontologies
  • contact points between peptide and other chains computed using JMOL following IMGT. Represented as relation between residue instances.
  • Structural features have fiat parts

Connecting Allele Names to Disease Names

  • use papers as join factors: papers mention both disease and allele – noisy
  • use regex and rewrites applied to titles and abstracts to fish out links between diseases and alleles

Correspondence of molecules with allele structures is difficult.

  • use blast to fiind closest allele match between pdb and allele sequence
  • every pdb and allele residue has URI
  • relate matching molecules
  • relate each allele residue to the canonical allele
  • annotate various residoes with various coordinate systems

This creates massive map that can be navigated and queried. Example queries:

  • What autoimmune diseases can de indexed against a given allele?
  • What are the variant residues at a position?
  • Classification of amino acids
  • Show alleles perturned at contacts of 1AGB

Summary of Progress to Date:
Elements of Approach in Place: Structure, Variation, transfer of annotation via alignment, information extraction from literature etc…

Nuts and Bolts:

  • Primary source
  • Local copy of souce
  • Scripts transforms to RDF
  • Exports RDF Bundles
  • Get selected RDF Bundles and load into triple store
  • Parsers generate in memory structures (python, java)
  • Template files are instructions to fomat these into owl
  • Modeling is iteratively refined by editiing templates
  • RDF loaded into Neurocommons, some amount of reasoning

RDFHerd package management for data

neurocommons.org/bundles

Can we reduce the burden of data integration?

  • Too many people are doing data integration – wasting effort
  • Use web as platform
  • Too many ontologies…here’s the social pressure again

Challenges

  • have lawyers bless every bit of data integration
  • reasoning over triple stores
  • SPARQL over HTTP
  • Understand and exploit ontology and reasoning
  • Grow a software ecosystem like Firefox
Reblog this post [with Zemanta]

Licences for Ontologies

Creative Commons: Some Rights Reserved
Image via Wikipedia

One of the things that I have been grappling with for quite some time is the whole notion of licences for ontologies. Of course, neither I – nor anybody else for that matter, should have to worry about this. But the world is the way it is and so the question is: what would an appropriate licence for an ontology be? The answer to that question would mainly depend on what an ontology actually is. Is it a piece of software? Is it a database? A structured document (whatever that means in the context of licensing)?

I have spent quite some time talking to my colleagues about this and we haven’t been able to come up with a satisfactory answer. Even emailing the good folks at the Open Knowledge foundation did not ellicit a response. Now, it seems that the Science Commons have made an attempt to provide some answers on their website.

They state that whether an ontology is protected by copyright law will mainly depend on whether the ontology “contains a sufficient degree of creative expression” or whether it draws entirely on fact. In the latter case, it might not be protected. Now such a statement in itself is intriguing – in the communities in which I and many of the Science Commons people tend to spend most of my time, ontologies are usually understood to be representational artefacts, “whose representational units are intended to designate universals in reality and the relations between them.” Just how much “creative expression” that would allow is an interesting debate in itself, which is probably best had in the pub. But I digress.

Science Commons then goes on to quote some legal precedence in which US courts have upheld copyright in medical ontologies. So really, we don’t know. Science Commons then counsels “pre-emptive” licencing: if in doubt, slap a Creative Commons licence on your ontology (CC0 is explicitly recommended) – if it is later found that copyright cannot subsist in ontologies and that your licence is therefore invalid, you haven’t lost anything, but if it turns out that copyright does indeed subsist in an/your ontology, your bottom is covered. small surprise, too, that the Science Commons would wish to promote the licences of their sister organisation the Creative Commons.

Again, I am not convinced that Creative Commons Licences are an appropriate form of licence for ontologies any more than I am convinced that the GPL licence attached to ChemAxiom is an entirely appropriate licence for an ontology. I would be interested in what the OKF experts have to say about this. The bottom line, for now at least, seems to be that we just won’t know until someone does a lot of deep thinking or it will be tested in court.

Any comments and opinions would be extremely welcome!

Reblog this post [with Zemanta]

Hello from Hinxton

So in my last post I pretty much said good-bye to the Unilever Centre and the people there and now it is time for a hello – a hello to a new job. I have recently joined the Department of Genetics and the group of Prof Ashburner as a Research Associate. While I am formally employed by the university, I will, however, spend most of my time at the European Bioinformatics Institute in the group of Christoph Steinbeck.

My remit here will be to continue to develop chemical ontology and in particular to help, together with my colleagues and the ChEBI user community, to put the ChEBI ontology onto a “formal” footing and to align it with the upper ontology used by the OBO Foundry ontologies. I will blog more about this as the story develops – however, for now, I am very excited about this new opportunity. I have a great set of new colleagues (Duncan Hull has also just joined the ChEBI team and has blogged about it) both in the ChEBI group as well as in the wider EBI community and there is a community of people here that believe in the value of this type of work. So I am very much looking forward to helping create some exciting ontology and resources of value to the chemical and biological community.

As I was walking across the Genome campus this morning, I couldn’t help but to be struck by its beauty – here are some pictures I shot with my mobile phone:

Hinxton High Street

Hinxton High Street - On the way to the Genome Campus


Genome Campus - By Hinxton Hall

Genome Campus - By Hinxton Hall

Reblog this post [with Zemanta]

Just a quick note from the International Conference on Biomedical Ontology

I am normally pretty noisy these days when it comes to blogging or tweeting conferences….but haven’t produced anything for ICBO so far. This has mainly to do with the fact that I am far too busy learning, thinking and absorbing people’s ideas and yet again realising just how far ahead biology/biomedicine is in thinking how to deal with data properly. In any case, all I wanted to say was that ICBO has its own friendfeed group where Robert Hoehndorf and others are doing a sterling job documenting the conference and discussing what is said during the tutorial sessions that are currently going on. The friendfeed page is here:

http://friendfeed.com/icbo

So do read along if you want to follow what is going on from afar!

ChemAxiom: An Ontology for Chemistry 2. The Set-Up

Now that I have introduced at least some of the motivation behind ChemAxiom, let me outline some of the mechanics.

ChemAxiom is a collective term for a set of ontologies, all of which make a start at describing subdomains within chemistry. The ontology modules are independent and self-contained and can (largely) be developed seperately and concurrently. Although they are independent, they are interoperable and integrated via a common upper ontology – in the case of ChemAxiom, we have chosen the Basic Formal Ontology (BFO). I will blog the reasons for this choice in the next post.

clip_image002[11]

The ontologies are currently in various stages of axiomatisation depending on how long we have been working on them and how much we have had a chance to play – so therefore, if there are axioms there that are not and you think there should be, or if you agree/disagree with some of our design decisions, please let us know. In any case, the discussion has already started with some helpful comments over on the Google Group. Let me describe the various modules in greater detail:

The Reasons for Modularity: When developing ontologies, it is always tempting to develop the ueber-McDaddy-ontology-of-everything, because, of course, ontology development is, by definition, never done: we alsways need more than we have  – more terms, more axioms etc.. Very quickly, this can result in monstrously large and virtually unmaintainable constructs. Modularisation has, from out perspective, the advantage of (a) smaller and more handlable ontologies, (b) ontologies which are easier to maintain, (c) ontologies which can be developed in parallel or orthogonally and subsequently integrated using either a common upper ontology or mapping/rules etc…..Furthermore, if refactoring of ontologies is necessary during the development process, this is also facilitated by modularity: changes in one module have less chance of affecting changes in another module.

The General Use Case: One of the things we are particularly interested in here in Cambridge, is the extraction of chemical entities and data from text and Peter Corbett’s OSCAR is now fairly well established within the chemical informatics community. Our text sources vary widely, and can range from standard chemical papers to theses, blogs and Wikipedia pages. To give you an impression of the types of data we are talking about, there’s an example Wikipedia’s infobox for benzene (somewhat truncated):

 

benzene infobox for blog 

So we have to deal with names, identifiers of various type, physico-chemical property data as well as the corresponding metadata (e.g. measurement pressures, measurement temperatures etc.), and chemical structure (InChI, SMILES). Our ontologies should enable us the generate RDF that allow us to hold this data – the ontology here serves as a schema. While we are interested in reasoning/using reasoners for the purposes of (retrospective) typing (again, I will explain what I mean by that in subsequent blog posts) applying ontologies to the description of chemical data is our first use-case.

With all of that said, let me provide a quick summary of the modules:

Chemistry Domain Ontology – ChemAxiomDomain ChemAxiomDomain is the first module in the set. It is currently a small ontology, which clarifies some fundamental relationships in the chemistry domain. Key concepts in this ontology are “ChemicalElement”, “ChemicalSpecies” and “MolecularEntity” as well as “Role”. ChemAxiomDomain clarifies the relationships between these terms (see my previous blog post) and also deals with identifiers etc. Chemical roles too are important: while chemical entities, may be or act as nucleophiles, acids, solvents etc.. some of the time, they do not have these roles all of the time – roles are realisable entities and and ChemAxiomDomain provides a mechanism for dealing with that. There are few other high-level domain concepts in there at the moment, though obviously we are looking to expand as and when the need arises and use-cases are provided.I will blog some details in a subsequent blog post.

Properties Ontology – ChemAxiomProp. ChemAxiomProp is an ontology of over 150 chemical and materials properties, together with a first set of definitions and symbols (where available and appropriate) and some axioms for typing of properties. Again, details will follow in a subsequent blog post.

Measurement Techniques – ChemAxiomMetrology. This is an ontology of over 200 measurement techniques and also contains a list of instrument parts and axioms for typing of measurement techniques. It does not currently include information about minimum information requirements for measurement techniques (e.g. the measurement of a boiling point also requires a measurement of pressure) and other metadata, but this will be added at a later stage. Again, a detailed blog-post will follow.

ChemAxiomPoly and ChemAxiomPolyClass – These two ontologies contain terms which are in common use across polymer science as well as a taxonomy of polymers based on the composition of their backbone (though the latter is not axiomatised yet). Details will follow in a further blog post.

ChemAxiomMeta – ChemAxiomMeta is a developing ontology, that will allow the specification of provenance of data (e.g. data derived from wiki pages etc.) and will also define what a journal, journal article, thesis, thesis chapter etc is and what the relationships between these entities are. We have not currently released this yet. Details will follow in a further blog post.

ChemAxiomComtinuants – ChemAxionContinuants represents an integration of all the above sub-ontologies into an ontological framework for chemical continuants (with some occurrents mixed in when we need to talk about measurement techniques). Details will follow in a further blog post.

We have also started to work on ontologies of chemical reactions, actions and, as mentioned above, minimum information requirements – however, these are at a relatively early stage of development and hence not released yet.

So much for a short overview over the mechanics of the ontologies. I am sure there are a thousand other things I should have said, but that will have to
do for now. Comments and suggestions via the usual channels. Automatic links and tags, as always, by Zemanta.

Reblog this post [with Zemanta]

Semantic Web Applications and Tools for Life Sciences – Morning Session

I am currently at a meeting in Edinburgh with the title “Semantic Web Applications and Tools for Life Sciences“. The title is programmatic and it promises to be a hugely exciting meeting. As far as I can tell, the British ontological aristocracy is here and a few more besides. The following are some notes I made during the meeting.

1. Keynote: Semantic Web Technology in Translational Cancer Research (M. Krauthammer, Yale Univ.)

How to integrate semantic web technologies with the Cancer Biomedical Informatics Grid (caBIG)?

Use case: melanoma…worked on at 5 NCI sites in US: Harvard, Penn, Yale, Anderson….can measure all kinases involved in disease pathways…use semantic technologies to share and integrate data from all sites and link to other data sources…e.g. drug screening results etc…..

MelaGrid consortium: data sharing, omics integration, workflow integration for clinical trials

Data sharing: create community wide resources – a federated repository of melanoma specimens

currently caBIG uses ISO/IEC 11179 metadata standards to register CDEs (common data element) and additional annotation via NCI thesaurus concepts: example of use: caTissue…tissue tracking software (multisite banking, form definition, temporal searches etc.)

omics integration: caBIG domain models are in essence ontologies…..translate into OWL models and integrate with other ontologies (e.g. sequence ontology etc.) to align data from various sources

using Sesame as a triple store, but have performance problems….use SPARQL as query language rather than caBIGs own query language

2. Semantic Data Integration for Francisella tubularis novicida Proteomic and Genomic Data (Nadia Anwar et al.)

Why is data integration important in biology?

datainformatics in bioinformatics is nor a solved problem…there are no technologies which satisfy all the problems biologists are likely to ask, also issues with data accesss and permissions…..yet another problem is heterogeneous nature of data: information discovery is not integrated…all technologies have strengths and weaknesses…data relates – but it doesn’t overlap

Solution: semantic data integration across omes data silos….

Case Study: Francisella tularensis (bacterium, infection through airways…infects immune system….francisella can bypass macrophages….forms phagosome, but can escape from it…bioterrorism fears…..”Hittite plague” been associated with Tularemia)

available datasources: genome data…from international database….convert to simple rdf data, kegg, ncbi, GO, Poson, transcriptomics data

used data from proteomics experiment to integrate with the constructed graphs….could show that it was easy to query the whole graph…..but issues with modeling of the data and the resulting rdf graph…so some careful data modeling is still necessary….some performancce issues with datasets cotaining many reified statements…..memory problems…

Summary: In principle it’s easy – in practice it is still hard work

Use of shared lexical resources for efficient ontological engineering (Antonio Jimeno et al.)

Motivation: Health-e-Child Project (creation of an integrated (grid-based) healthcare platform for European Paediatrics

Use Case: Juvenile Rheumathoid Arthritis Ontology construction
reuse existing ontologies – Galen, NCI but….problem with alignment becuase of missing information that could facilitate mapping, also many mapping tools based on statistics….thus trust

A common terminological resource for life sciences….generate a reference thesaurus that Galen,, NCI, JRAO thesaurus to normalise term concepts

Def Thesaurus: Collection of entity names in domain with synonyms, taxonomy of more general and specific terms (DAG)…..no axiomatisation

Problems in thesaurus construction: ambiguity (retinoblastoma – gene or disease), inappropriate term labels, maintenance: thesaurus and ontologies need to be updated simultaneously now…

KASBi: Knowledge Bases Analysis in Systems Biology ()

Problem: Combining data from different data sources – use semweb rather than standard data integration systems for integration…in particualar use reasoners….

In KASBi try and integrate reasoners/semweb with traditional database tech: use semtech to generate a “query plan” which specifies how queries need to be carried out across resources

goWeb – Semantic Search Engine for the Life Science Web (Heiko Dietze)

Typical question: “What is the diagnosis for the symptoms for multiple spinal tumors and skin tumors?”, “Which organisms is FGF8 studied in?”

goWeb combines simple key-word web searching, text mining and ontologies for question answering

Keyword search in goWeb is sent to yahoo, which returns snippets. These are subsequently pushed through NLP to extract concepts and mark them up with ontology concepts…….use ontolgies to further filter results…..

Path Explorer: Service Mining for Biological Pathways on the Web (George Zheng)

Two major biological data representation approaches: free text(discoverable but not invocable), computer models (constructed but made available in isolated environment – invocable but not discoverable)

Solution: model biological processes using web service operations (aim: to be invocable and discoverable) pathways of service oriented processes canbe discovered and invoked

SOA: service providers publish services into registry where they can be discovered by service providers

DAMN – slides are much to small…can’t see anything….”entities are service providers and service consumers”
….ook…..he’s lost me now – I can’t see anything anymore…..

Close integration of ML and NLP tools in …
Scope: Fine grained semantic annotation: eg he GenE protein inhibits……mark up GenE protein as a protein, inhibits as a negative interaction etc…..

Availability of NLP Pipeline….Alvis/A3P, GATE, UMA but domain specific NLP resources are rare

focus on target knowledge ensures learnability
rigorous manual annotation
high quality annotation and low vvlumes require proper nrmalisation of training corpora (syntactic dependencies vs shallow clues)
clarification of different annotatoon tasks and knowledge – consistency between NE ype and semantics

Fine grained annotation is feasible and necessary for high quality services: i.e. in verticals and science….

Right – time for lunch and a break. I have only captured aspects of the presentations and stuff that resonated with me at the time….so please nobody shoot me if they think I haven’t grabbed the most fundamental points….Link to the slides from the event is here

Reblog this post [with Zemanta]