Polymer Informatics and The Semantic Web – The Solution, Part 1: Adding Structure: Chemical Markup Language
April 10, 2007 Leave a comment
In my last post concerning our work on polumer informatics, I started to discuss how one can add structure to documents in the form of metadata, in order to help correct information retrieval. In particular, I introduced the notion of markup languages to structure information and used an example of a bread recipe, to discuss some general features of XML. So having been through all of that, how can we hold chemical information in a marked-up way.
Being chemists, one of the assumptions that is fundamentally engrained into all of our thinking, is that the structure of a molecule is related to the physical properties of that molecule. Therefore, the most important information a chemist might wish to hold in a marked-up way is probably structural information about a molecule. Well, fortunately, over the past decade or so, Peter Murray-Rust, Henry Rzepa and others have worked on an XML dialect called CML – Chemical Markup Language. Let’s have a look at a small molecule, styrene in our case, and see what some basic CML looks like.
Here’s a representation of styrene (InChI=1/C8H8/c1/h2-7H,1H2) that every chemist will be familiar with:
and here’s how the same molecule would be represented in CML:
As was the case for our bread recipe, you can see that we have three containers here, namely “atomArray” and “bondArray” enclosed by the container “molecule”. Both arrays are essentially lists of atoms (with attributes specifying which element we are talking about, what id that particular atom has and what it’s 2D coordinates are) and bonds (with attributes telling us between which atom IDs the bond was formed, what the ID of the bond is and also what the bond order is). All of this taken together is what computational chemists call a “connection table”.
Neither hard, nor scary, is it? And the simplest way of holding chemical information in a semantically rich format. In future posts I will delve somewhat deeper into the bowels of CML and show you what else it is capable of.