Polymer Informatics and the Semantic Web – The Problem, Part II – A Common Understanding of Polymers.
March 28, 2007 Leave a comment
In my previous post concerning the challenges facing the polymer information scientist, I talked about a general lack of freely available polymer data as well as insufficient curation of that data.
Another reason that polymer informatics is in its infancy, is simply the fact that we do not, at the moment have a shared understanding of what a polymer is and how to represent it. Let me discuss this further.
As I have already discussed in my previous blog post, Take a simple polymer, which has the following repeat unit:
Now if you are the Chemical Abstracts Service, you would register this polymer as “1,3-butadiene, homopolymer”. If you were IUPAC, you would allow any of these four names: “polybutadiene”, “poly(but-1-ene-1,4-diyl)”, “1,4-polybutadiene” or poly(buta-1,3-diene).
Historical continuity of the indexing system is also an issue. The following monomer:
would be registered as “methacrylic acid, methyl ester” in the 8th Collective Index of the Chemical Abstracts Service, but as “2-propenoic acid, 2-methyl-, methyl ester” in the 9th Collective Index.
I can already hear you saying, well so how about a chemistry-based representation? Well ok. Now to do that, polymers would traditionally be indexed using their repeat unit structure. So how about a polymer like this:
For a polymer like this, two perfectly good repeat units could be written, namely -O-CHF-CH2– or -O-CH2-CHF- Now, these two are identical for a chemist, but completely different things for a machine. So with multiple possible repeat units, you then have to get into the business of rules again and start to fiddle around with alphabetical precedence of atoms, locants etc. And if you ever decide to change these rules, you have issues with historical continuity, which makes information searching and retrieval harder.
So in summary, we have not satisfactorily solved representation issues and frequently encounter issues of historical continuity. Any modern polymer informatics system should aim to overcome those challenges.
I will discuss just how that could be accomplished in my next posts.