Visualisation of Ontologies and Large Scale Graphs

{{en|A phylogenetic tree of life, showing the ...
Image via Wikipedia

For a whole number of reasons, I am currently looking into the visualisation of large-scale graphs and ontologies and to that end, I have made some notes concerning tools and concepts which might be useful for others. Here they are:

Visualisation by Node-Link and Tree

jOWL: jQuery Plugin for the navigation and visualisation of OWL ontologies and RDFS documents. Visualisations mainly as trees, navigation bars.

OntoViz: Plugin into Protege…at the moment supports Protege 3.4 and doesn’t seem to work with Protege 4.

IsaViz: Much the same as OntoViz really. Last stable version 2004 and does not seem to see active development.

NeOn Toolkit: The Neon toolkit also has some visualisation capability, but not independent of the editor. Under active development with a growing user base.

OntoTrack: OntoTrack is a graphical OWL editor and as such has visualisation capabilities. Meager though and it does not seem to be supported or developed anymore either…the current version seems about 5 years old.

Cone Trees: Cone trees are three-dimensional extensions of 2D tree structures and have been designed to allow for a greater amount odf information to be visualised and navigated. Not found any software for download at the moment, but the idea is so interesting that we should bear it in mind. Examples are here, here and the key reference is Robertson, George G. and Mackinlay, Jock D. and Card, Stuart K., Cone Trees: animated 3D visualizations of hierarchical information, CHI ’91: Proceedings of the SIGCHI conference on Human factors in computing systems, 1991, ISBN = 0-89791-383-3, pp.189-194. (DOI here)

PhyloWidget: PhyloWidget is software for the visualisation of phylogenetic trees, but should be repurposable for ontology trees. Javascript – so appropriate for websites. Student project as part of the Phyloinformatics Summer of Code 2007.

The JavaScript Information Visualization Toolkit: Extremely pretty JS toolkit for the visualisation of graphs etc…..Dynamic and interactive visualisations too…just pretty. Have spent some time hacking with it and I am becoming a fan.

Welkin: Standalone application for the visualisation of RDF graphs. Allows dynamic filtering, colour coding of resources etc…

Three-Dimensional Visualisation

Ontosphere3D: Visualisation of ontologies on 3D spheres. Does not seem to be supported anymore and requires Java 3D, which is just a bad nightmare in itself.

Cone Trees (see above) with their extension of Disc Trees (for an example of disc trees, see here

3D Hyperbolic Tree as exemplified by the Walrus software. Originally developed for website visualisation, results in stunnign images. Not under active development anymore, but source code available for download.

Cytoscape: The 1000 pound gorilla in the room of large-scale graph visualization. There are several plugins available for interaction with the Gene Ontology, such as BiNGO and ClueGO. Both tools consider the ontologies as annotation rather than a knowledgebase of its own and can be used for the identification of GO terms, which are overrepresented in a cluster/network. In terms of visualisation of ontologies themselves, there is there is the RDFScape plugin, which can visualize ontologies.

Zoomable Visualisations

Jamabalaya – Protege Plugin, but can also run as a browser applet. Uses Shrimp to visualise class hierarchies in ontologies and arrows between boxes to represent relationships.

CropCircles (link is to the paper describing it): CropCircles have been implemented in the SWOOP ontology editor which is not under active development anymore, but where the source code is available.

Information Landscapes – again, no software, just papers.

Reblog this post [with Zemanta]

SWAT4LS2009 – Michael Schroeder: Predicton of Drug Target Interactions from Literature by Context Similarity

Typical researcher spends 12.4 hours a week searching for information. Why not use Google? ‘Cause Google is not semantic.

Go PubMed – Filter PubMed contents against all the terms in the Gene Ontology. If you use simple categorisation for information retrieval potentially increase search burden due to compartmentalisation. However works the other way round too…useful filtering.

Showing some examples of faceted browsing of PubMed content and systematic drilldown into search results. Not easy to blog, but literature exploration in this way is always fascinating. Examples include the analysis of research trends, networks of colaborators etc..new tool in Go PubMed also allows the discovery of indirect links or inferred links.

Have developed a similar system for the web: Go Web (works on the top yahoo search results).

Remarks on Ontology Generation: have developed a plugin for OBO Edit…search for term and plugin makes suggestions for terms that might be included in new ontologies. Points out terms in existing ontologies. Also helps with the generation of definitions for terms…wow this is extremely useful in SO many ways….

Now let’s talk about drugs and targets….

Try and mine for gene mentions in text…find a gene term and then use context to decide what it is we are talking about. Once gene has found look for statistically significant co-occurences. The results have been made available in GoGene. Again can do bibliometric trend analysis – genes are ranked by community interest.

From drugs to genes..what is the link between a gene and a drug using context profiles: what are the disease terms related to a given drug…then to genes.

Gotta stop blogging…enjoying this talk far too much…….

Reblog this post [with Zemanta]

SWAT4LS2009 – Sonja Zillner: Towards the Ontology Based Classification of Lymphoma Patients using Semantic Image Annotation

(Again, these are notes as the talk happens)

This has to do with the Siemens Project Theseus Medico – Semantic Medical Image Understanding (towards flexible and scalable access to medical images)

Different images from many different sources: e.g. X-ray, MRI etc…use this and combine with treatment plans, patient data etc and integrate with external knowledge sources.

Example Clinical Query:” Show me theCT scans and records of patiens with a Lymph Node enlargement in the neck area” – at the moment query over several disjoint systems is required

Current Motivation: generic and flexible understanding of images is missing
Final Goal: Enhance medical image annotations by integrating clinical data with images
This talk: introduce a formal classification system for patients (ontological model)

Used Knowledge Sources:

Requirements of the Ontological Model

Now showing an example axiomatisation for the counting and location of lymphatic occurences and discussses problems relating to extending existing ontologies….

Now talking about annotating patient records: typical problems are abbreviations, clinical codes, fragments of sentences etc…difficult for NLP people to deal with….

Now showing detailed patient example where application of their classification system led to reclassification of patient in terms of staging system.

Reblog this post [with Zemanta]

SWAT4LS2009 – Keynote Alan Ruttenberg: Semantic Web Technology to Support Studying the Relation of HLA Structure Variation to Disease

(These are live-blogging notes from Alan’s keynote…so don’t expect any coherent text….use them as bullt points to follow the gist of the argument.)

The Science Commons:

  • a project of the Creative Commons
  • 6 people
  • CC specializes CC to science
  • information discovery and re-use
  • establish legal clarity around data sharing and encourage automated attribution and provenance

Semantic Web for Biologist because it maximizes value o scientific work by removing repeat experimentation.

ImmPort Semantic Integration Feasibility Project

  • Immport is an immunology database and analysis portal
  • Goals:metaanalysis
  • Question: how can ontology help data integration for data from many sources

Using semantics to help integrate sequence features of HLA with disorders
Challenges:

  • Curation of sequence features
  • Linking to disorders
  • Associating allele sequences with peptide structures with nomenclature with secondary structure with human phenotype etc etc etc…

Talks about elements of representation

  • pdb structures translated into ontology-bases respresentations
  • canonical MHC molecule instances constructed from IMGT
  • relate each residue in pdb to the canonical residue if exists
  • use existing ontologies
  • contact points between peptide and other chains computed using JMOL following IMGT. Represented as relation between residue instances.
  • Structural features have fiat parts

Connecting Allele Names to Disease Names

  • use papers as join factors: papers mention both disease and allele – noisy
  • use regex and rewrites applied to titles and abstracts to fish out links between diseases and alleles

Correspondence of molecules with allele structures is difficult.

  • use blast to fiind closest allele match between pdb and allele sequence
  • every pdb and allele residue has URI
  • relate matching molecules
  • relate each allele residue to the canonical allele
  • annotate various residoes with various coordinate systems

This creates massive map that can be navigated and queried. Example queries:

  • What autoimmune diseases can de indexed against a given allele?
  • What are the variant residues at a position?
  • Classification of amino acids
  • Show alleles perturned at contacts of 1AGB

Summary of Progress to Date:
Elements of Approach in Place: Structure, Variation, transfer of annotation via alignment, information extraction from literature etc…

Nuts and Bolts:

  • Primary source
  • Local copy of souce
  • Scripts transforms to RDF
  • Exports RDF Bundles
  • Get selected RDF Bundles and load into triple store
  • Parsers generate in memory structures (python, java)
  • Template files are instructions to fomat these into owl
  • Modeling is iteratively refined by editiing templates
  • RDF loaded into Neurocommons, some amount of reasoning

RDFHerd package management for data

neurocommons.org/bundles

Can we reduce the burden of data integration?

  • Too many people are doing data integration – wasting effort
  • Use web as platform
  • Too many ontologies…here’s the social pressure again

Challenges

  • have lawyers bless every bit of data integration
  • reasoning over triple stores
  • SPARQL over HTTP
  • Understand and exploit ontology and reasoning
  • Grow a software ecosystem like Firefox
Reblog this post [with Zemanta]

Just a quick note from the International Conference on Biomedical Ontology

I am normally pretty noisy these days when it comes to blogging or tweeting conferences….but haven’t produced anything for ICBO so far. This has mainly to do with the fact that I am far too busy learning, thinking and absorbing people’s ideas and yet again realising just how far ahead biology/biomedicine is in thinking how to deal with data properly. In any case, all I wanted to say was that ICBO has its own friendfeed group where Robert Hoehndorf and others are doing a sterling job documenting the conference and discussing what is said during the tutorial sessions that are currently going on. The friendfeed page is here:

http://friendfeed.com/icbo

So do read along if you want to follow what is going on from afar!

Capturing process: In silico, in laboratorio and all the messy in-betweens – Cameron Neylon @ the Unilever Centre

I am not very good at live-blogging, but Cameron Neylon is at the Unilever Centre and giving a talk about capturing the scientific process. This is important stuff and so I shall give it a go.

He starts off by making the point that to capture the scientific process we need to capture the information about the objects we are investigating as well as the process how we get there.

Journals not enough – the journal article is static but knowledge is dynamic. Can solutions come from software development? Yes to a certain extent….

e.g. source control/versioning systems – captures snapshots of development over time, date stamping etc.
Unit testing – continuous tests as part of the science/knowledge testing
Solid-replication…distributed version control

Branching and merging: data integration. However, commits are free text..unstructured knowledge…no relationships between objects – what Cameron really wants to say is NO ONTOLOGIES, NO LINKED DATA.

Need linked data, need ontologies: towards a linked web of data.

Data is nice and well…but how about the stuff that goes on in the lab? Objects, data spread over multiple silos – recording much harder: we need to worry about the lab notebook.

“Lab notebook is pretty much an episodic journal” – which is not too dissimilar to a blog. Similarities are striking: descriptions of stuff happening, date stamping, categorisation, tagging, accessibility…and not of much interest to most people…;-). But problem with blogs is still information retrieval – same as lab notbook…

Now showing a blog of one of his students recording lab work…software built by Jeremy Frey’s group….blog IS the primary record: blog is a production system…2GB of data. At first glance lab-log similar to conventional blog: dates, tags etc….BUT fundamental difference is that data is marked up and linked to other relevant resources…now showing video demo of capturing provanance, date, linking of resources, versioning, etc: data is linked to experiment/procedure, procedure is linked to sample, sample is linked to material….etc….

Proposes that his blog system is a system for capturing both objects and processes….a web of objects…now showing a visualisation of resources in the notbook and demonstrates that the visualisation of the connectedness of the resources can indicate problems in the science or recording of science etc….and says it is only the linking/networking effect that allows you to do this. BUT…no semantics in the system yet (tags yes…no PROPER semantics).

Initial labblog used hand-coded markup: scientists needed to know how to hand code markup…and hated it…..this led to a desire for templates….templates create posts and associate controlled vocab and specify the metadata that needs to be recorded for a given procedure….in effect they are metadata frameworks….templates can be preconfigured for procedures and experiments….metadata frameworks map onto ontologies quite well….

Bio-ontologies…sometimes convolute process and object….says there is no particularly good ontology of experiments….I think the OBI and EXPO people might disagree….

So how about the future?

    • Important thing is: capture at source IN CONTEXT
      Capture as much as possible automatically. Try and take human out of the equation as much as possible.
      In the lab capture each object as it is created and capture the plan and track the execution step by step
      Data repositories as easy as Flickr – repos specific for a data type and then link artefacts together across repos..e.g. the Periodic Table of Videos on YouTube, embedding of chemical structures into pages from ChemSpider
      More natural interfaces to interact with these records…better visualisation etc…
      Trust and Provenance and cutting through the noise: which objects/people/literature will I trust and pay attention to? Managing people and reputation of people creating the objects: SEMANTIC SOCIAL WEB (now shows FriendFeed as an example: subscription as a measure of trust in people, but people discussing objects) “Data finds the data, then people find the people”..Social network with objects at the Centre…
      Connecting with people only works if the objects are OPEN
      Connected research changes the playing field – again resources are key
      OUCH controversy: communicate first, standardize second….but at least he ackowledges that it will be messy….
  • UPDATE: Cameron’s slides of the talk are here:

    Reblog this post [with Zemanta]

    ChemAxiom: An Ontology for Chemistry 4. ChemAxiomChemDomain

    Obligations to our funders and some publishers have delayed me in continuing this series of blog post and participation in the discussion on the Google Group for a few days, but I hope I can catch up on either now. In my previous blogpost, I have summarised all of the ChemAxiom modules briefly: now is the time to delve into some more detail. First up then: ChemAxiomChemDomain.

    ChemAxiomChemDomain is, at the moment, a rather small, but nevertheless important ontology, which clarifies some fundamental domain concepts in chemistry, namely the relationship between platonic molecules, platonic bulk substances, instances of either and roles.  

    First oof all, let’s turn to some fundamental concepts. The classes “ChemicalElement”, “MolecularEntity”, and “ChemicalSpecies”are all subclasses of “snap:Object”. The class “Object” in the BFO is defined as a “material entity [snap:MaterialEntity] that is spatially extended, maximally self-connected and self-contained (the parts of a substance are not separated from each other by spatial gaps) and possesses an internal unity. The identity of substantial object [snap:Object] entities is independent of that of other entities and can be maintained through time.” Various disjoint axioms specify the fact that “MolecularEntities” are not the same as “ChemicalSpecies”, thus addessing some of fundamental issues about the relationship between molecules and substances etc.

    Further axioms on these classes specify other necessary parthood relationships: “ChemicalSpecies” are composed of molecules or other ChemicalSpecies (thus giving recursion and allowing the modeling of formulations) or BulkChemicalElements.:

    ChemistryOntology:ChemicalSpecies
          a       owl:Class ;
          rdfs:comment “An ensemble of chemically identical molecular entities that can explore the same set of molecular energy levels on the time scale of the experiment.”@en ;
          rdfs:subClassOf snap:Object ;
          rdfs:subClassOf
                  [ a       owl:Class ;
                    owl:unionOf ([ a       owl:Restriction ;
                                owl:onProperty ChemistryOntology:hasPart ;
                                owl:someValuesFrom ChemistryOntology:MolecularEntity
                              ] [ a       owl:Restriction ;
                                owl:onProperty ChemistryOntology:hasPart ;
                                owl:someValuesFrom ChemistryOntology:ChemicalSpecies
                              ] [ a       owl:Restriction ;
                                owl:hasValue ChemistryOntology:BulkChemicalElement ;
                                owl:onProperty ChemistryOntology:hasPart
                              ])
                  ] ;
          rdfs:subClassOf
                  [ a       owl:Restriction ;
                    owl:onProperty ChemistryOntology:preseentInAmount ;
                    owl:someValuesFrom xsd:string
                  ] ;
          rdfs:subClassOf
                  [ a       owl:Restriction ;
                    owl:onProperty ChemAxiomProp:hasProperty ;
                    owl:someValuesFrom ChemAxiomProp:Property
                  ] ;
          owl:disjointWith ChemistryOntology:ChemicalElement , ChemistryOntology:MolecularEntity

    When intengrated with ChemAxiomProp (as has been done in ChemAxiomComtinuants), ChemicalSpecies can be connected up to their properties and other statements which one might wish to make about chemical species.

    Another part of ChemAxiomChemDomain is the definition of roles: generic types of ChemicalSpecies, such as solvents, acids, catalysts, can be defined in terms of roles: no molecule is ever only just a solvent or an acid or a catalyst. Rather, these categories are realisable entities; a molecular species or a chemical entity behaves as a catalyst, nucleophile or a solvent under certain circumstances

    ChemistryOntology:NucleophileMolecule
          a       owl:Class ;
          rdfs:subClassOf ChemistryOntology:MolecularEntity ;
          owl:disjointWith ChemistryOntology:ElectrophileMolecule ;
          owl:equivalentClass
                  [ a       owl:Class ;
                    owl:intersectionOf (ChemistryOntology:MolecularEntity [ a       owl:Restriction ;
                                owl:onProperty ChemistryOntology:hasRole ;
                                owl:someValuesFrom ChemistryOntology:NucleophileRole
                              ])
                  ] .

    Furthemore, roles in combination with MolecularEntity or ChemicalSpecies allow the definition of generic molecules or substances, such as acids (hydrochloric acid) and acids (proton donor), catalysts, solvents etc. At the moment, the number of axio
    ms is small, however, as the body of axioms grows in the future, it can be expected, that  ChemAxiom will become more and more useful for the disambiguation of concepts: while it would make sense for a chemical species, which is an acid, to talk about a pH-Value, it would not make sense to speak of “molecular acids” in the same terms.

    Finally, OWL’s model of classes as collections of instances models the things we need to model really well: the class “ChemicalSpecies” and “MolecularEntitiy” and thweir respective subclasses can be thought of as rpreesentinmg the platonic ideals of molecules or substances, whereas instances of these classes can be thought of as representing “real” samples of both molecules (e.g. a single molecule, in for example, matrix isolation) and substances (100 ml of HCl in a flask).

    So much for ChamAxiomChemDomain fo rnow. It is the beginning of a domain model and very much driven by the use-case I ourtlined in a prewvious blog post. Obviously, we would like to expand the scope of this particular ontology to be morwe universally useful in the future., However, I believe that rather to do this via random ontological engineering, this should be driven by use-cases. So therefore, if you have use-cases in mind, please be in touch and let’s discuss how we can collaborate.

    Tags and automatic links, as always, by Zemanta.

    Reblog this post [with Zemanta]

    ChemAxiom: An Ontology for Chemistry 2. The Set-Up

    Now that I have introduced at least some of the motivation behind ChemAxiom, let me outline some of the mechanics.

    ChemAxiom is a collective term for a set of ontologies, all of which make a start at describing subdomains within chemistry. The ontology modules are independent and self-contained and can (largely) be developed seperately and concurrently. Although they are independent, they are interoperable and integrated via a common upper ontology – in the case of ChemAxiom, we have chosen the Basic Formal Ontology (BFO). I will blog the reasons for this choice in the next post.

    clip_image002[11]

    The ontologies are currently in various stages of axiomatisation depending on how long we have been working on them and how much we have had a chance to play – so therefore, if there are axioms there that are not and you think there should be, or if you agree/disagree with some of our design decisions, please let us know. In any case, the discussion has already started with some helpful comments over on the Google Group. Let me describe the various modules in greater detail:

    The Reasons for Modularity: When developing ontologies, it is always tempting to develop the ueber-McDaddy-ontology-of-everything, because, of course, ontology development is, by definition, never done: we alsways need more than we have  – more terms, more axioms etc.. Very quickly, this can result in monstrously large and virtually unmaintainable constructs. Modularisation has, from out perspective, the advantage of (a) smaller and more handlable ontologies, (b) ontologies which are easier to maintain, (c) ontologies which can be developed in parallel or orthogonally and subsequently integrated using either a common upper ontology or mapping/rules etc…..Furthermore, if refactoring of ontologies is necessary during the development process, this is also facilitated by modularity: changes in one module have less chance of affecting changes in another module.

    The General Use Case: One of the things we are particularly interested in here in Cambridge, is the extraction of chemical entities and data from text and Peter Corbett’s OSCAR is now fairly well established within the chemical informatics community. Our text sources vary widely, and can range from standard chemical papers to theses, blogs and Wikipedia pages. To give you an impression of the types of data we are talking about, there’s an example Wikipedia’s infobox for benzene (somewhat truncated):

     

    benzene infobox for blog 

    So we have to deal with names, identifiers of various type, physico-chemical property data as well as the corresponding metadata (e.g. measurement pressures, measurement temperatures etc.), and chemical structure (InChI, SMILES). Our ontologies should enable us the generate RDF that allow us to hold this data – the ontology here serves as a schema. While we are interested in reasoning/using reasoners for the purposes of (retrospective) typing (again, I will explain what I mean by that in subsequent blog posts) applying ontologies to the description of chemical data is our first use-case.

    With all of that said, let me provide a quick summary of the modules:

    Chemistry Domain Ontology – ChemAxiomDomain ChemAxiomDomain is the first module in the set. It is currently a small ontology, which clarifies some fundamental relationships in the chemistry domain. Key concepts in this ontology are “ChemicalElement”, “ChemicalSpecies” and “MolecularEntity” as well as “Role”. ChemAxiomDomain clarifies the relationships between these terms (see my previous blog post) and also deals with identifiers etc. Chemical roles too are important: while chemical entities, may be or act as nucleophiles, acids, solvents etc.. some of the time, they do not have these roles all of the time – roles are realisable entities and and ChemAxiomDomain provides a mechanism for dealing with that. There are few other high-level domain concepts in there at the moment, though obviously we are looking to expand as and when the need arises and use-cases are provided.I will blog some details in a subsequent blog post.

    Properties Ontology – ChemAxiomProp. ChemAxiomProp is an ontology of over 150 chemical and materials properties, together with a first set of definitions and symbols (where available and appropriate) and some axioms for typing of properties. Again, details will follow in a subsequent blog post.

    Measurement Techniques – ChemAxiomMetrology. This is an ontology of over 200 measurement techniques and also contains a list of instrument parts and axioms for typing of measurement techniques. It does not currently include information about minimum information requirements for measurement techniques (e.g. the measurement of a boiling point also requires a measurement of pressure) and other metadata, but this will be added at a later stage. Again, a detailed blog-post will follow.

    ChemAxiomPoly and ChemAxiomPolyClass – These two ontologies contain terms which are in common use across polymer science as well as a taxonomy of polymers based on the composition of their backbone (though the latter is not axiomatised yet). Details will follow in a further blog post.

    ChemAxiomMeta – ChemAxiomMeta is a developing ontology, that will allow the specification of provenance of data (e.g. data derived from wiki pages etc.) and will also define what a journal, journal article, thesis, thesis chapter etc is and what the relationships between these entities are. We have not currently released this yet. Details will follow in a further blog post.

    ChemAxiomComtinuants – ChemAxionContinuants represents an integration of all the above sub-ontologies into an ontological framework for chemical continuants (with some occurrents mixed in when we need to talk about measurement techniques). Details will follow in a further blog post.

    We have also started to work on ontologies of chemical reactions, actions and, as mentioned above, minimum information requirements – however, these are at a relatively early stage of development and hence not released yet.

    So much for a short overview over the mechanics of the ontologies. I am sure there are a thousand other things I should have said, but that will have to
    do for now. Comments and suggestions via the usual channels. Automatic links and tags, as always, by Zemanta.

    Reblog this post [with Zemanta]