The Unilever Centre @ Semantic Technology 2009
April 7, 2009 1 Comment
Well, the programme for the conference is out now and looks even more mind-blowing (in a very good way) than last year. Jim and I will be speaking on Tuesday, 16th June at 14:00. Here’s our talk abstracts:
PART I | Lensfield – The Working Scientist’s Linked Data Space Elevator (Jim Downing)
The vision of Open Linked Data in long-tail science (as opposed to Big Science, high energy physics, genomics etc) is an attractive one, with the possibility of delivering abundant data without the need for massive centralization. In achieving that vision we face a number of practical challenges. The principal challenge is the steep learning curve that scientists face in dealing with URIs, web deployment, RDF, SPARQL etc. Additionally most software that could generated Linked Data runs off-web, on workstations and internal systems. The result of this is that the desktop filesystem is likely remain the arena for the production of data in the near to medium term. Lensfield is a data repository system that works with the filesystem model and abstracts semantic web complexities away from scientists who are unable to deal with them. Lensfield makes it easy for researchers to publish linked data without leaving their familiar working environment. The presentation of this system will include a demonstration of how we have extended Lensfield to produce a Linked Data publication system for small molecule data.
PART II | The Semantic Chemical World Wide Web (Nico Adams)
The development of modern new drugs, new materials and new personal care products requires the confluence of data and ideas from many different scientific disciplines and enabling scientists to ask questions of heterogeneous data sources is crucial for future innovation and progress. The central science in much of this is chemistry and therefore the development of a “semantic infrastructure” for this very important vertical is essential and of direct relevance to large industries such as the pharmaceuticals and life sciences, home and personal care and, of course, the classical chemical industry. Such an infrastructure shouls include a range of technological capabilities, from the representation of molecules and data in semantically rich form to the availability of chemistry domain ontologies and the ability to extract data from unstructured sources.
The talk will discuss the development of markup languages and ontologies for chemicals and materials (data). It will illustrate how ontologies can be used for indexing, faceted search and retrieval of chemical information and for the “axiomatisation” of chemical entities and materials beyond simple notions of chemical structure. The talk will discuss the use of linked data to generate new chemical insight and will provide a brief discussion of the use of entity extraction and natural language processing for the “semantification” of chemical information.
But that’s not all. Lezan has been accepted to present a poster and so she will be there too,, showing off her great work on the extraction and semantification of chemical reaction data from the literature. Here is her abstract:
The domain of chemistry is central to a large number of significant industries such as the pharmaceuticals and life sciences industry, the home and personal care industry as well as the “classical” chemical industry. All of these are research-intensive and any innovation is crucially dependent on the ability to connect data from heterogeneous sources: in the pharmaceutical industry, for example, the ability to link data about chemical compounds, with toxicology data, genomic and proteomic data, pathway data etc. is crucial. The availability of a semantic infrastructure for chemistry will be a significant factor for the future success of this industry. Unfortunately, virtually all current chemical knowledge and data is generated in non-semantic form and in many silos, which makes such data integration immensely difficult.
In order to address these issues, the talk will discuss several distinct, but related areas, namely chemical information extraction, information/data integration, ontology-aided information retrieval and information visualization. In particular, we demonstrate how chemical data can be retrieved from a range of unstructured sources such as reports, scientific theses and papers or patents. We will discuss how these sources can be processed using ontologies, natural language processing techniques and named-entity recognisers to produce chemical data and knowledge expressed in RDF. We will furthermore show, how this information can be searched and indexed. Particular attention will also be paid to data representation and visualisation using topic/topology maps and information lenses. At the end of the talk, attendees should have a detailed awareness of how chemical entities and data can be extracted from unstructured sources and visualised for rapid information discovery and knowledge generation.
It promises to be a great conference and I am sure our minds will go into overdrive when there….can’t wait to go! See you there!?
Related articles by Zemanta
- Discovering SPARQL (blogs.talis.com)