Project Management by Committee

I am just catching up on Seth Godin’s blog – as usual his posts are short and poignant. This one struck a particular chord:

“Hi, we’re here to take your project to places you didn’t imagine.

With us on board, your project will now take three times as long.

It will cost five times as much.

And we will compromise the art and the vision out of it, we will make it reasonable and safe and boring.”

Great work is never reasonable, safe or boring. Thanks anyway.

Support the Long-Term Future of KEGG

The KEGG database is an invaluable resource for biologists, bioinformaticians, clinical researchers, chemists etc. in general and has also been invaluable in some of my personal activities. KEGG is developed in the laboratory of Minoru Kanehisa who is now coming up towards his mandatory retirement. And he is looking to put KEGG on a sustainable footing and to give it a viable business model for the future. The following is a complete reproduction (though no explicit licence for reuse is provided I claim fair use) of a recent post on the KEGG website:

Plea to Support KEGG

Since 1995 the KEGG database has been developed in my laboratories (Kanehisa Laboratories) at Kyoto University and the University of Tokyo thanks to funding from the Japanese Ministry of Education and its agencies. Contrary to popular perception, KEGG has never been a public database, as there has never been an official long-term commitment from any government agency. Although I have managed over the years to obtain multiple and overlapping short-term research grants to support KEGG, this has become more difficult now that I am reaching the mandatory retirement age. Foreseeing this eventuality, together with my colleagues, I started a non-profit organization, NPO Bioinformatics Japan, as a vehicle to raise funds for the service that we have been delivering.

For the last ten years our major source of funding has come from the Institute for Bioinformatics Research and Development (BIRD) of the Japan Science and Technology Agency (JST). As of April 1, 2011 BIRD has been converted to the National Bioscience Database Center (NBDC) in JST. The newly established NBDC focuses on the integration of various databases, and does not support the development of individual databases as BIRD did. The good news is that I was awarded a three-year grant from NBDC for integration of KEGG MEDICUS with disease and drug information used in practice and in society. However, the bad news is that this grant is not sufficient to continue to hire my talented crew of KEGG curators and software developers.

KEGG is now one of the most widely used biological databases in the world as indicated by the web access statistics (150 to 200 thousand unique visitors per month) and the number of KEGG paper citations (one thousand per year). I intend to ensure that KEGG remains a freely available web resource. However, this will be possible only with your support. First, I would like to ask all of you who have benefited from KEGG to write, email, tweet, and blog about your support for KEGG. I hope, in the long run, your voices will increase our chances of getting more stable funding. Second, we will continue to ask commercial organizations to obtain a license to use KEGG from Pathway Solutions Inc. I am very grateful to all the companies who have so far supported KEGG by obtaining license agreements. This licensing revenue is fully reinvested to further the development of KEGG. Unfortunately though, this is still insufficient to maintain the high-quality service that we strive to accomplish. Consequently, I would like to introduce the following mechanism.

Starting on July 1, 2011 the KEGG FTP site for academic users will be transferred from GenomeNet at Kyoto University to NPO Bioinformatics Japan, and it will be available only to paid subscribers. The publicly funded portion, the medicusdirectory, will continue to be freely accessible at GenomeNet. The KEGG FTP site for commercial customers managed by Pathway Solutions will remain unchanged. The new FTP site is available for free trial until the end of June.

Please register to learn more about the KEGG FTP subscription.

Thank you!

Minoru Kanehisa

2011 – The International Year of Chemistry

In their editorial for the January Issue (you will need a Nature subscription to access this, altrenatively see the Sceptical Chymyst post here), the good folks at Nature Chemistry have reminded us that 2011 is the International Year of Chemistry:

“The United Nations has proclaimed 2011 to be the International Year of Chemistry. Under this banner, chemists should seize the opportunity to highlight the rich history and successes of our subject to a much broader audience — and explain how it can help to solve the global challenges we face today and in the future.”

The year even has a website. The UN also singles out two important areas of chemistry – neither of which have chemistry in the name – on the frontpage of the site: namely the development of advanced materials and molecular medicine. I am extremely happy to see this – materials and in particular polymers have been a long-standing interest of mine and some of the immunology work I am currently doing has implications for molecular medicine too.

There are several ways to participate in the Year of Chemistry – one of them is through an essay and video competition: “A World Without Polymers”. Students are asked to make short videos or write essays, trying to imagine what the world would be like without polymers. Furthermore there are networking events, conferences and more all across the world. So go and check out the UN’s site, participate and contribute!

Almost Christmas….

Christmas is almost upon us and many are at home with their friends and family and looking forward to a few quiet days. Should you, however, not wish to forget about science althogether during this period, have a look at Prof Richard Wiseman’s (University of Hertfordshire) christmas science experiments:

Semantic Web Tools and Applications for Life Sciences 2009 – A Personal Summary

So another SWAT4LS is behind us, this time wonderfully organised by Andrea Splendiani, Scott Marshall, Albert Burger, Adrian Paschke and Paolo Romano.

I have been back home in Cambridge for a couple of days now and have been asking myself whether there was an overall conclusion from the day – some overarching bottom line that one could take away and against which one could measure the talks at SWAT4LS2010 to see whether there has been progress or not. The programme consisted of a great mixture of both longer keynotes, papers, “highlight posters” and highlight demonstations illustrating a wide range of activities at the semantic web technology – computer science and biomedical research.

Topics at the workshop covered diverse areas such as the analysis of the relationship between  HLA structure variation and disease, applications for maintaining patient records in clinical information systems, patient classification on the basis of semantic image annotations to the use of semantics in chemo- and proteoinformatics and the prediction of drug-target interactions on the basis of sophisticated text mining as well as games such as Onto-Frogger (though I must confess that I somehow missed the point of what that was all about).

So what were the take-home messages of the day? Here are a few points that stood out to me:

  • During his keynote, Alan Ruttenberg coined the dictum of “far too many smart people doing data integration”, which was subsequently taken up by a lot of the other speakers – an indication that most people seemed to agree with the notion that we still spend far too much time dealing with the “mechanics” of data – mashing it up and integrating it, rather than analysing and interpreting it.
  • During last year;s conference, it already became evident that a lot of scientific data is now coming online in a semantic form. The data avalanche has certainly continued and the feeling of an increased amount of data availability, at least in the biosciences, has intensified. While chemistry has been lagging behind, data is becoming available here too. On the one hand, there are Egon’s sterling efforts with and the data solubility project, on the other, there are big commercial entities like the RSC and ChemSpider. During the meeting, Barend Mons also announced that he had struck an agreement with the RSC/ChemSpider to integrate the content of ChemSpider into his Concept Wiki system. I will reserve judgement as to the usefulness and openness of this until it is further along. In any case, data is trickling out – even in chemistry.
  • Another thing that stood out to me – and I could be quite wrong in this interpretation, given that this was very much a research conference – was the fact that there were many proof-of-principle applications and demonstrators on show, but very few production systems, that made use of semantic technologies at scale. A notable exception to this was the GoPubMed (and related) system demonstrated by Michael Schroeder, who showed how sophisticated text mining can be used not only to find links between seemingly unrelated concepts in the literature, but can also assist in ontology creation and the prediction of drug-target interactions.

Overall, many good ideas, but, as seems to be the case with all of the semantic web, no killer application as to yet – and at every semweb conference I go to we seem to be scrabbling around for one of those. I wonder if there will be one and what it will be.

Thanks to everybody for a good day. It was nice to see some old friends again and make some new ones. Duncan Hull has also written up some notes on the day – so go and read his perspective. I, for one, am looking forward to SWAT4LS2010.

SWAT4LS2009 – Barend Mons: The meta-analysed semantic web, getting rid of ambiguity and redundancy

Introducing Concept Wiki – a semantic wiki and insulting his audience repeatedly.

Problems with getting the community to do annotation:

  • everybody wants structured data, but nobody wants to do structured data entry. Not working.
  • Everybody likes free text and cut and paste.

Now shows suggestion of ontology terms in authoring tools for introduction of structure in unstructured data.

Now talking about redundancy? Is it a problem? His point:

  • no reviewer would accept the exact same paper twice let alone several times
  • But same assertions are published over and over

    Mentions deposit of ChemSpider Content into concept wiki.

  • Oh dear – hopeless confusion between names, people, identifiers etc…..they are all “concepts” according to Barend Mons.
  • The “essence of a nanopublication” is an annotated triple…i.e an assertion together with metadata about it (provenance, time etc…)
  • Now points out that human language grammer is kind of similar to triples….subject predicate object…
  • An assertion should only be accepted if it has value and advances human knowledge. The mind boggles….who decides what is interesting when….
  • Triples vs “smart triples” apparently “smart triples” are curated/observed/hypothetical

Now shows some screenshots of use cases.

SWAT4LS2009 – A.L. Lamprecht: Semantics-Based Composition of EMBOSS Services with Bio-jETI

Bio-jETI: framework for model-based graphical design execution and management of bioinformatics processes

PROPHETS Plugin: visual semantic domain modeling, lose specification within the process model, non-formal specification of constrains using natural language templates, automatic generation of model checking formulae.