ChemAxiom: An Ontology for Chemistry – 1. The Motivation

I have already announced the fact that we are working on ontologies in the polymer domain some time ago, though I realise that so far, I have yet to produce the proof of that: the actual ontology/ontologies.

So today I am happy to announce that the time of vapourware is over and that we have released ChemAxiom – a modular set of ontologies, which form the first ontological framework for chemistry (or at least so we believe). The development of these ontologies has taken us a while: I started this on a hunch and as a nice intellectual exercise, not entirely sure where to go with them and what to use them and therefore not working on them full time. As the work progressed, however, we understood just how inordinately useful they would be for doing what we are trying to accomplish in both polymer informatics and chemical informatics at large. I will introduce and discuss the ontologies in a succession of blogposts, of which this is the first one

So what, though maybe somwhat retrospectively, was the motivation for the preparation of the ontologies? In short – the breakdown of many common chemistry information systems when confronted with real chemical phenomena rather than small subsections of idealised abstractions. Let me explain.

Chemistry and chemical information systems positively thrive on the use of a connection table as a chemical identifier and determinant of uniqueness. The reasons for this are fairly clear: chemistry, for the past 100 years or so, has elevated the (potential) correlation between the chemical structure of a molecule and its physicochemical and biological properties to be its “central dogma.” The application of this dogma has served subsections of the community – notably organic/medicinal/biological chemists incredibly well, while causing major headaches for other parts of the chemistry community and given an outright migraine to information scientists and researchers. There are several reasons for the pain:

The use of a connection table as an identifier for chemical objects leads to significant ontological confusion. Often, chemists and their information systems do not realise that there is a fundamental distinction between (a) the platonic idea of a molecule, (b) the idea of a bulk substance and (c) an instance of (“the real bulk substance”) in a flask or bottle on the researcher’s lab bench. An example of this is the association of a physicochemical property of a chemical entity with a structure representation of a molecule: while it would, for example, make sense to do this for a HOMO energy, it does NOT make sense to speak of a melting point or a boiling point in terms of a a molecule. The point here simply is that many physicochemical properties are the mereological sums of the properties of many molecules in an ensemble. If this is true for simple properties of pure small molecules, it is even more true for properties of complex systems such as polymers, which are ensembles of many different molecules of many different architectures. A similar argument can also be made for identifiers: in most chemical information systems, it is often not clear whether the identifier (such as a CAS number etc.) refers to a molecule or a substance composed of these molecules.

Many chemical objects have temporal characteristics. Often, chemical objects have temporal characteristics, which influence and determine their connection table. A typical example for this are rapidly interconverting isomers: glucose, when dissolved in water, for example, can be described by several rapidly interconverting structures – a single connection table is not enough to describe the concept “glucose in water” and there exists a parthood relationship between the concept and several possible connection tables. Ontologies can help with specifying and defining these parthood relationships.

There is another aspect to time dependence we also need to consider. For many materials, their existence in time, or, put in another way, their history, often holds more meaningful information about an observed physical property of that substance than the chemical structure of one of the components of the mixture. For an observable property of a polymer, such as the glass transition temperature, for example, it matters a great deal whether the polymer was synthesized in on the solid phase in a pressure autoclave or in solution at ambient pressure. Furthermore, it matters, whether and how a polymer was processed – how was it extruded, grafted etc. All of these processes have a significant amount of influence on the observable physical properties of a bulk sample of this polymer, while leaving the chemical decription of the material, essentially unchanged (in current practice, polyethylene is often represented either by using the structure of the corresponding repeat unit (ethene, for example) or the structure of a repeat unit fragment (-CH2-CH2-). Ontologies will help us to describe and define these histories. Ultimately, we envisage that this will result in a “semantic fingerprint” of a material, which – one might speculate – will be much more appropriate for the development of design rules for materials than the dumb structure representations in use today.

Many chemical objects are mixtures….and mixtures simply do not lend themselves to being described using the connection table of a single constituent entity of that mixture. If this is true for glucose in water, it is even truer for things such as polymers: polymers are mixtures of many different macromolecules, all of which have slightly different architectures etc. An observed physical property, and therefore a data object, is the mereological sum of the contributions made by all the constituent macromolecules and therefore, such a data object cannot simply be associated with a single connection table.

This, in my view, is a short summary of the case for ontology in chemistry. Please feel free to violently (dis-)agree and if you want to do so, I am looking forward to a discussion in the comments section.

There’s one more thing:

AN INVITATION

The ChemAxiom ontologies are far from perfect and far from finished. We hope, that they show the way how an ontological framework for chemistry could look like. In developing these ontologies, we can contribute our particular point of view, but we would like to hear yours. Even more, we would like to invite the community to get involved in the development of these ontologies in order to make them a general and valuable resource. If you would like to  become involved, then please send an email to chemaxiom at googlemail dot com or leave comments/questions etc, in the ChemAxiom Google Group.

In the next several blog posts, I will dive into some of the technical details of the ontologies.

(Automatic Links etc., as always, by Zemanta)

Reblog this post [with Zemanta]

7 Responses to ChemAxiom: An Ontology for Chemistry – 1. The Motivation

  1. Cheers! Nice to see this take off. Will you be around at the EBI around the CDK workshop?

    PS. your last link to the Google Group has a missing colon in the URL…

  2. Making the distinction between a substance and a molecule is indeed important and valuable from an ontological perspective, particularly when it comes to reasoning about the domain. The distinction is often blurred simply because it is more pragmatic not to consider them different. Indeed, most chemists might agree that while there is a conceptual distinction, they don’t want to navigate through a set of high-brow concepts to find such simple (indirect) relationships. Have you considered the impact of representing knowledge in this way with respect to usability?

    Another important issue is that of identity – you are making the argument that every different feature effectively warrants a different identifier. That if i want to make a statement about glucose in one form versus another, i need two different identifiers. This will lead to enormous numbers of different “concepts”, which may affect reasoning capability (especially in OWL!) and potentially also lead to sparsely populated knowledge bases. An alternative is to capture the semantics of non-structural features in relations to the main component. For instance, I could reuse the “Glucose” class by adding additional restrictions, in the context of some process or experimental result. e.g. in my experiment i found glucose to be in its chair form. Indeed, expressing the behaviour (or structural conformation in this case) wrt to the context leads itself to modular reuse and also improves our representation of knowledge.

    A couple of pointers that you might find interesting:
    1 – Contextual knowledge representation – http://dumontierlab.com/pdf/2008_OWLEDEU_MR.pdf
    2 – Biochemical Identifiers – http://www.slideshare.net/micheldumontier/accurate-biochemical-knowledge-starting-with-precise-structurebased-criteria-for-molecular-identity

    We’ve also worked on chemical ontologies in OWL – you can see them here:
    http://dumontierlab.com/index.php?page=ontologies

    it would be nice to merge the two and work towards a comprehensive OWL-based ontology for the chemistry domain. Let me know if that would interest you.

    Cheers,

    -=Michel=-

  3. Pingback: Unilever Centre for Molecular Informatics, Cambridge - ChemAxiom - an ontology for chemistry « petermr’s blog

  4. Pingback: Unilever Centre for Molecular Informatics, Cambridge - ChemAxiom: An Ontology for Chemistry 2. The Set-Up « Staudinger’s Semantic Molecules

  5. Pingback: Unilever Centre for Molecular Informatics, Cambridge - Open Notebok Ontology development « petermr’s blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: