Semantic Science

Writing about Science, Informatics, Management and more…

Merry Christmas Everyone

with one comment

German painting, 1457
Image via Wikipedia

Another year is coming to a close and it has been nothing short of eventful. There has been the end of one direction of research, the beginning of my existence as a service provider at the EBI and several new strands of research. Not to speak of moving house and a number of other things.

I have learned a lot about people this year and sometimes more than I wanted to. In particular, I have learned that “trust” is the only way that allows anyone to manage anything – both in business and academia. Destroying trust between people or people and organisations, causes untold harm in the medium and long term, no matter how expedient it seems at the time.

However, it is christmas now and the world rests for a few days. Time to reflect on 2009 and to look forward to the new year with all its possibilities and challenges.

A very merry christmas and a happy new year to you all, thank you for reading the blog and see you in 2010!

Reblog this post [with Zemanta]

Written by Nico Adams

December 25, 2009 at 7:14 am

Posted in blogging

Tagged with

Almost Christmas….

leave a comment »

Christmas is almost upon us and many are at home with their friends and family and looking forward to a few quiet days. Should you, however, not wish to forget about science althogether during this period, have a look at Prof Richard Wiseman’s (University of Hertfordshire) christmas science experiments:

Written by Nico Adams

December 24, 2009 at 11:41 am

Posted in Uncategorized

Exploring Chemical Space with GDB – Jean Louis Raymond (University of Bern)

with 2 comments

Three molecules. This image was originally upl...
Image via Wikipedia

(These are live notes from a talk Prof Reymond gave at EBI today)

The GDB Database

GDB = Generated Database (of Molecules)

The Chemical Universe Project – how many small molecules are possible?

GDB was put together by starting from graphs -  in this case the graphs were hydrocarbons and used GENG software to elaborate all possible graphs (after predefining which graphs are chemically reasonable and incorporating bonding informatation etc.) Then place atoms, enumerate, get combinatorial explosion of compounds and apply filters to remove chemical immpossibility: result couple of billion compounds.

 

Some choices restricting diversity: no allenes, no DB at bridgeheads etc, problematic heteroatom constellations (did not consider peroxides), hydrolytically labile functional groups.

In general – number of possible molecules increases exponentially with increasing number of nodes.

Showing that the molecular diversity increases with linear open carbon skeletons – cyclic graphs have fewer substitution possibilities. Chiral compounds offer more diversity than non-chiral ones.

 

GDB Website

 

Now talking about GDB13:

removed fluorine, introduced sulphur, filtered for molecules with “too many” heteroatoms – due to synthetic difficulties and the fact they may be of lesser interest to medchem.

Now showing statistical analysis of molecular types in GDB. 95% of all marketed drugs violate at least two Lipinski Rules. All molecules in the GDB13 are Lipinski conformant.

Use case: take known drug and find isomers. Aspirin has approx 180 compounds similar to Aspirin by Tanimoto score > 0.7 similarity. Points out that any of these molecules may not have been imagined by chemists.

 

GDB15 is just out – corrected some bugs, eliminated enol ethers (due to quick hydrolysis), optimized CPU usage…approx 26 billion molecules, 1.4 Tb – counting them takes a day)

 

Applications of the Database – mainly GDB 11

Use case: Glutamatergic Synapse Binding

used Bayesian classifier trained with known actives and then used that to retrieve about 11000 molecules from GDB11. This was followed by high throughput docking – selected 22 compounds for lab testing. Enrichment of glycine-containing compounds. Now showing some activity data for selected compounds.

Use case: Glutamate Transporter: applied certain structural selection criteria to database molecules to obtain a subset of approx 250 k compounds. Again followed by HT docking. Now showing syntheses of some selected candidate structures together with screening data.

 

“Molecular Quantum Numbers”

Classification system for large compound databases. Draws analogy to periodic table: classification system for elements. We do not have something like this for molecules. Define features for molecules: atom types, bond types, polarity, topology……42 categories in total. Now examines ZINC database against these features: can show that there are common features for molecules occupying similar categories.PCA analysis: first 2 PCs cover 70% of diversity space: first PC includes molecular weight…2D representations considered to be acceptable. PCA also shows nice grouping of molecules by number of cycles

Same analysis for GDB 11: first PCs now mainly account for molecular flexibility, polarity (doesn’t contain many rings due to atom limitation).

Analysis for PubChem – difficult to discover information at the moment.

Was on the cover of ChemMedChem this November.

Shows examples of fishing our structural motive analogies for given molecular motives.

Reblog this post [with Zemanta]

Written by Nico Adams

December 4, 2009 at 2:03 pm

Semantic Web Tools and Applications for Life Sciences 2009 – A Personal Summary

with 2 comments

A bicyclist in Amsterdam, the Netherlands.
Image via Wikipedia

So another SWAT4LS is behind us, this time wonderfully organised by Andrea Splendiani, Scott Marshall, Albert Burger, Adrian Paschke and Paolo Romano.

I have been back home in Cambridge for a couple of days now and have been asking myself whether there was an overall conclusion from the day – some overarching bottom line that one could take away and against which one could measure the talks at SWAT4LS2010 to see whether there has been progress or not. The programme consisted of a great mixture of both longer keynotes, papers, “highlight posters” and highlight demonstations illustrating a wide range of activities at the semantic web technology – computer science and biomedical research.

Topics at the workshop covered diverse areas such as the analysis of the relationship between  HLA structure variation and disease, applications for maintaining patient records in clinical information systems, patient classification on the basis of semantic image annotations to the use of semantics in chemo- and proteoinformatics and the prediction of drug-target interactions on the basis of sophisticated text mining as well as games such as Onto-Frogger (though I must confess that I somehow missed the point of what that was all about).

So what were the take-home messages of the day? Here are a few points that stood out to me:

  • During his keynote, Alan Ruttenberg coined the dictum of “far too many smart people doing data integration”, which was subsequently taken up by a lot of the other speakers – an indication that most people seemed to agree with the notion that we still spend far too much time dealing with the “mechanics” of data – mashing it up and integrating it, rather than analysing and interpreting it.
  • During last year;s conference, it already became evident that a lot of scientific data is now coming online in a semantic form. The data avalanche has certainly continued and the feeling of an increased amount of data availability, at least in the biosciences, has intensified. While chemistry has been lagging behind, data is becoming available here too. On the one hand, there are Egon’s sterling efforts with openmolecules.net and the data solubility project, on the other, there are big commercial entities like the RSC and ChemSpider. During the meeting, Barend Mons also announced that he had struck an agreement with the RSC/ChemSpider to integrate the content of ChemSpider into his Concept Wiki system. I will reserve judgement as to the usefulness and openness of this until it is further along. In any case, data is trickling out – even in chemistry.
  • Another thing that stood out to me – and I could be quite wrong in this interpretation, given that this was very much a research conference – was the fact that there were many proof-of-principle applications and demonstrators on show, but very few production systems, that made use of semantic technologies at scale. A notable exception to this was the GoPubMed (and related) system demonstrated by Michael Schroeder, who showed how sophisticated text mining can be used not only to find links between seemingly unrelated concepts in the literature, but can also assist in ontology creation and the prediction of drug-target interactions.

Overall, many good ideas, but, as seems to be the case with all of the semantic web, no killer application as to yet – and at every semweb conference I go to we seem to be scrabbling around for one of those. I wonder if there will be one and what it will be.

Thanks to everybody for a good day. It was nice to see some old friends again and make some new ones. Duncan Hull has also written up some notes on the day – so go and read his perspective. I, for one, am looking forward to SWAT4LS2010.

Reblog this post [with Zemanta]

Written by Nico Adams

November 24, 2009 at 1:05 pm

SWAT4LS2009 – Barend Mons: The meta-analysed semantic web, getting rid of ambiguity and redundancy

with one comment

Introducing Concept Wiki – a semantic wiki and insulting his audience repeatedly.

Problems with getting the community to do annotation:

  • everybody wants structured data, but nobody wants to do structured data entry. Not working.
  • Everybody likes free text and cut and paste.

Now shows suggestion of ontology terms in authoring tools for introduction of structure in unstructured data.

Now talking about redundancy? Is it a problem? His point:

  • no reviewer would accept the exact same paper twice let alone several times
  • But same assertions are published over and over

    Mentions deposit of ChemSpider Content into concept wiki.

  • Oh dear – hopeless confusion between names, people, identifiers etc…..they are all “concepts” according to Barend Mons.
  • The “essence of a nanopublication” is an annotated triple…i.e an assertion together with metadata about it (provenance, time etc…)
  • Now points out that human language grammer is kind of similar to triples….subject predicate object…
  • An assertion should only be accepted if it has value and advances human knowledge. The mind boggles….who decides what is interesting when….
  • Triples vs “smart triples” apparently “smart triples” are curated/observed/hypothetical

Now shows some screenshots of use cases.

Written by Nico Adams

November 20, 2009 at 8:05 pm

Posted in Uncategorized

SWAT4LS2009 – A.L. Lamprecht: Semantics-Based Composition of EMBOSS Services with Bio-jETI

leave a comment »

Bio-jETI: framework for model-based graphical design execution and management of bioinformatics processes

PROPHETS Plugin: visual semantic domain modeling, lose specification within the process model, non-formal specification of constrains using natural language templates, automatic generation of model checking formulae.

Written by Nico Adams

November 20, 2009 at 5:33 pm

Posted in Uncategorized

SWAT4LS2009 – James Eales: Mining Semantic Networks of Bioinformatics eResources from Literature

leave a comment »

eResource Annotations could help with

  • making better choices: which resource is best?
  • which is available?
  • reduce curation
  • help with service discovery

Approach: link bioinformatics resources using semantic descriptors generated from text mining….head terms for services can be used to assign services to types..e.g. applications, data sources etc.

Reblog this post [with Zemanta]

Written by Nico Adams

November 20, 2009 at 5:06 pm

SWAT4LS2009 – Michael Schroeder: Predicton of Drug Target Interactions from Literature by Context Similarity

leave a comment »

Typical researcher spends 12.4 hours a week searching for information. Why not use Google? ‘Cause Google is not semantic.

Go PubMed – Filter PubMed contents against all the terms in the Gene Ontology. If you use simple categorisation for information retrieval potentially increase search burden due to compartmentalisation. However works the other way round too…useful filtering.

Showing some examples of faceted browsing of PubMed content and systematic drilldown into search results. Not easy to blog, but literature exploration in this way is always fascinating. Examples include the analysis of research trends, networks of colaborators etc..new tool in Go PubMed also allows the discovery of indirect links or inferred links.

Have developed a similar system for the web: Go Web (works on the top yahoo search results).

Remarks on Ontology Generation: have developed a plugin for OBO Edit…search for term and plugin makes suggestions for terms that might be included in new ontologies. Points out terms in existing ontologies. Also helps with the generation of definitions for terms…wow this is extremely useful in SO many ways….

Now let’s talk about drugs and targets….

Try and mine for gene mentions in text…find a gene term and then use context to decide what it is we are talking about. Once gene has found look for statistically significant co-occurences. The results have been made available in GoGene. Again can do bibliometric trend analysis – genes are ranked by community interest.

From drugs to genes..what is the link between a gene and a drug using context profiles: what are the disease terms related to a given drug…then to genes.

Gotta stop blogging…enjoying this talk far too much…….

Reblog this post [with Zemanta]

Written by Nico Adams

November 20, 2009 at 4:42 pm

SWAT4LS2009 – Sonja Zillner: Towards the Ontology Based Classification of Lymphoma Patients using Semantic Image Annotation

with one comment

(Again, these are notes as the talk happens)

This has to do with the Siemens Project Theseus Medico – Semantic Medical Image Understanding (towards flexible and scalable access to medical images)

Different images from many different sources: e.g. X-ray, MRI etc…use this and combine with treatment plans, patient data etc and integrate with external knowledge sources.

Example Clinical Query:” Show me theCT scans and records of patiens with a Lymph Node enlargement in the neck area” – at the moment query over several disjoint systems is required

Current Motivation: generic and flexible understanding of images is missing
Final Goal: Enhance medical image annotations by integrating clinical data with images
This talk: introduce a formal classification system for patients (ontological model)

Used Knowledge Sources:

Requirements of the Ontological Model

Now showing an example axiomatisation for the counting and location of lymphatic occurences and discussses problems relating to extending existing ontologies….

Now talking about annotating patient records: typical problems are abbreviations, clinical codes, fragments of sentences etc…difficult for NLP people to deal with….

Now showing detailed patient example where application of their classification system led to reclassification of patient in terms of staging system.

Reblog this post [with Zemanta]

Written by Nico Adams

November 20, 2009 at 1:59 pm