Polymer Informatics and The Semantic Web – The Solution, Part 1: Adding Structure: Chemical Markup Language

In my last post concerning our work on polumer informatics, I started to discuss how one can add structure to documents in the form of metadata, in order to help correct information retrieval. In particular, I introduced the notion of markup languages to structure information and used an example of a bread recipe, to discuss some general features of XML. So having been through all of that, how can we hold chemical information in a marked-up way.

Being chemists, one of the assumptions that is fundamentally engrained into all of our thinking, is that the structure of a molecule is related to the physical properties of that molecule. Therefore, the most important information a chemist might wish to hold in a marked-up way is probably structural information about a molecule. Well, fortunately, over the past decade or so, Peter Murray-Rust, Henry Rzepa and others have worked on an XML dialect called CML – Chemical Markup Language. Let’s have a look at a small molecule, styrene in our case, and see what some basic CML looks like.

Here’s a representation of styrene (InChI=1/C8H8/c1/h2-7H,1H2) that every chemist will be familiar with:

Styrene

and here’s how the same molecule would be represented in CML:

StyreneInCML

As was the case for our bread recipe, you can see that we have three containers here, namely “atomArray” and “bondArray” enclosed by the container “molecule”. Both arrays are essentially lists of atoms (with attributes specifying which element we are talking about, what id that particular atom has and what it’s 2D coordinates are) and bonds (with attributes telling us between which atom IDs the bond was formed, what the ID of the bond is and also what the bond order is). All of this taken together is what computational chemists call a “connection table”.

Neither hard, nor scary, is it? And the simplest way of holding chemical information in a semantically rich format. In future posts I will delve somewhat deeper into the bowels of CML and show you what else it is capable of.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: