ChemAxiom: An Ontology for Chemistry 2. The Set-Up
April 8, 2009 4 Comments
Now that I have introduced at least some of the motivation behind ChemAxiom, let me outline some of the mechanics.
ChemAxiom is a collective term for a set of ontologies, all of which make a start at describing subdomains within chemistry. The ontology modules are independent and self-contained and can (largely) be developed seperately and concurrently. Although they are independent, they are interoperable and integrated via a common upper ontology – in the case of ChemAxiom, we have chosen the Basic Formal Ontology (BFO). I will blog the reasons for this choice in the next post.
The ontologies are currently in various stages of axiomatisation depending on how long we have been working on them and how much we have had a chance to play – so therefore, if there are axioms there that are not and you think there should be, or if you agree/disagree with some of our design decisions, please let us know. In any case, the discussion has already started with some helpful comments over on the Google Group. Let me describe the various modules in greater detail:
The Reasons for Modularity: When developing ontologies, it is always tempting to develop the ueber-McDaddy-ontology-of-everything, because, of course, ontology development is, by definition, never done: we alsways need more than we have – more terms, more axioms etc.. Very quickly, this can result in monstrously large and virtually unmaintainable constructs. Modularisation has, from out perspective, the advantage of (a) smaller and more handlable ontologies, (b) ontologies which are easier to maintain, (c) ontologies which can be developed in parallel or orthogonally and subsequently integrated using either a common upper ontology or mapping/rules etc…..Furthermore, if refactoring of ontologies is necessary during the development process, this is also facilitated by modularity: changes in one module have less chance of affecting changes in another module.
The General Use Case: One of the things we are particularly interested in here in Cambridge, is the extraction of chemical entities and data from text and Peter Corbett’s OSCAR is now fairly well established within the chemical informatics community. Our text sources vary widely, and can range from standard chemical papers to theses, blogs and Wikipedia pages. To give you an impression of the types of data we are talking about, there’s an example Wikipedia’s infobox for benzene (somewhat truncated):
So we have to deal with names, identifiers of various type, physico-chemical property data as well as the corresponding metadata (e.g. measurement pressures, measurement temperatures etc.), and chemical structure (InChI, SMILES). Our ontologies should enable us the generate RDF that allow us to hold this data – the ontology here serves as a schema. While we are interested in reasoning/using reasoners for the purposes of (retrospective) typing (again, I will explain what I mean by that in subsequent blog posts) applying ontologies to the description of chemical data is our first use-case.
With all of that said, let me provide a quick summary of the modules:
Chemistry Domain Ontology – ChemAxiomDomain ChemAxiomDomain is the first module in the set. It is currently a small ontology, which clarifies some fundamental relationships in the chemistry domain. Key concepts in this ontology are “ChemicalElement”, “ChemicalSpecies” and “MolecularEntity” as well as “Role”. ChemAxiomDomain clarifies the relationships between these terms (see my previous blog post) and also deals with identifiers etc. Chemical roles too are important: while chemical entities, may be or act as nucleophiles, acids, solvents etc.. some of the time, they do not have these roles all of the time – roles are realisable entities and and ChemAxiomDomain provides a mechanism for dealing with that. There are few other high-level domain concepts in there at the moment, though obviously we are looking to expand as and when the need arises and use-cases are provided.I will blog some details in a subsequent blog post.
Properties Ontology – ChemAxiomProp. ChemAxiomProp is an ontology of over 150 chemical and materials properties, together with a first set of definitions and symbols (where available and appropriate) and some axioms for typing of properties. Again, details will follow in a subsequent blog post.
Measurement Techniques – ChemAxiomMetrology. This is an ontology of over 200 measurement techniques and also contains a list of instrument parts and axioms for typing of measurement techniques. It does not currently include information about minimum information requirements for measurement techniques (e.g. the measurement of a boiling point also requires a measurement of pressure) and other metadata, but this will be added at a later stage. Again, a detailed blog-post will follow.
ChemAxiomPoly and ChemAxiomPolyClass – These two ontologies contain terms which are in common use across polymer science as well as a taxonomy of polymers based on the composition of their backbone (though the latter is not axiomatised yet). Details will follow in a further blog post.
ChemAxiomMeta – ChemAxiomMeta is a developing ontology, that will allow the specification of provenance of data (e.g. data derived from wiki pages etc.) and will also define what a journal, journal article, thesis, thesis chapter etc is and what the relationships between these entities are. We have not currently released this yet. Details will follow in a further blog post.
ChemAxiomComtinuants – ChemAxionContinuants represents an integration of all the above sub-ontologies into an ontological framework for chemical continuants (with some occurrents mixed in when we need to talk about measurement techniques). Details will follow in a further blog post.
We have also started to work on ontologies of chemical reactions, actions and, as mentioned above, minimum information requirements – however, these are at a relatively early stage of development and hence not released yet.
So much for a short overview over the mechanics of the ontologies. I am sure there are a thousand other things I should have said, but that will have to
do for now. Comments and suggestions via the usual channels. Automatic links and tags, as always, by Zemanta.