Semantic Web Applications and Tools for Life Sciences – Afternoon Session

Tutorial: The W3C Interest Group on Semantic Web Technologies for Health Care and Life Sciences (M.S. Marshall)

“Scientists should be able to work in terms of commonly used concepts

The scientist should be able to ork in terms of personal concepts and hypotheses (not forced to map concepts to the terms that have been chosed for him)

Otherwise general overview over what the interest groups does and how it works….link to the webpage is here.

To participate email team-hcls-chairs@w3.org

Task Forces:

  • Terminology
  • Linking Open Drug Data
  • Scientific Discourse
  • Clinical Observations Interoperability
  • BioRDF – integrated neuroscience knowledge base
  • Other Projects – clinical decision support, URI workshop

Paper in IEEE Software: SOftware design for empoweriing scientists

I stopped blogging after this mainly because my batteries were dry and there was a scarmble for the power sockets in the room I did not wishh to participate in. During the meeting some people said they were blogging this and that there was some discussion on Friendfeed….but I can’t find anything much on either. If anybody has a few links, please give me a shout and I will happily link out.

Reblog this post [with Zemanta]

(More) Triples for the World

I have taken a long hiatus from blogging for a number of reasons and still don’t have time to blog much, but something has just happened that has really excited me.
During this year’s International Semantic Web Conference in Karlsruhe (which I am still angry about not being able to attend due to time constraints), it was announced, that Freebase now produces RDF!

Now just in case you are wondering what Freebase is, here’s a description from their website:

Freebase, created by Metaweb Technologies, is an open database of the world’s information. It’s built by the community and for the community – free for anyone to query, contribute to, build applications on top of, or integrate into their websites.

Already, Freebase covers millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC archives, it contains structured information on many popular topics, including movies, music, people and locations – all reconciled and freely available via an open API. This information is supplemented by the efforts of a passionate global community of users who are working together to add structured information on everything from philosophy to European railway stations to the chemical properties of common food ingredients.

By structuring the world’s data in this manner, the Freebase community is creating a global resource that will one day allow people and machines everywhere to access information far more easily and quickly than they can today.

And all of this data, they are making available as RDF triples, which you can get via a simple service:

Welcome to the Freebase RDF service.

This service generates views of Freebase Topics following the principles of Linked Data. You can obtain an RDF representation of a Topic by sending a simple GET request to http://rdf.freebase.com/ns/thetopicid, where the “thetopicid” is a Freebase identifier with the slashes replaced by dots. For instance to see “/en/blade_runner” represented in RDF request http://rdf.freebase.com/ns/en.blade_runner

The /ns end-point will perform content negotiation, redirecting your client to the HTML view of the Topic if HTML is prefered (as it is in standard browsers) or redirecting you to http://rdf.freebase.com/rdf to obtain an RDF representation in N3, RDF/XML or Turtle depending on the preferences expressed in your clients HTTP Accept header.

This service will display content in Firefox if you use the Tabulator extension.

If you have questions of comments about the service please join the Freebase developer mailing list.

So now there’s DBPedia and Freebase. More triples for the world, more data, more opportunity to move ahead. In chemistry, it’s sometimes so difficult to convince people of the value of open and linked data. This sort of stuff makes me feel that we are making progress. Slowly, but inexorably. And that is exciting.

Twine as a model for repository functionality?

Now although I have not blogged anything for a long time again, I did not mean to write this blog-post as I have some more pressing academic concerns to deal with at the moment. However, given that the discussion as to what a repository should be or should do for its users is flaring up again here, http://blog.openwetware.org/scienceintheopen/2008/06/10/the-trouble-with-institutional-repositories/ and here. In particular, Chris Rusbridge asked for ideas about repository functionality and so I thought I should chime in.

When reading through all of the posts referenced above, the theme of automated metadata discovery is high up on everybody’s agenda and for good reason: while I DO use our DSpace implementation here in Cambridge and try and submit posters and manuscripts, I do feel considerable frustration everytime I do so. Having to enter the metadata first (and huge amounts of it) costs me anything from 5 to 10 min a pop. Now (ex-)DSpacers tell me that the interface and funcationality that make me do this is a consequence of user interaction studies. If that is true, then the mind boggles….but anyway, back to the point.

I have been wondering for a while now, whether the darling of the semantic web community, namely Radar Network’s Twine, could not be a good model for at least some of the functionality that an institutional repo should/could have. It takes a little effort to explain what Twine is, but if you were to press me for the elevator pitch, then it would probably be fair to say that Twine is to interests and content, what Facebook is to social relationships and LinkedIn to professional relationships.

In short, when logging into Twine, I am provided with a sort of workspace, which allows me to reposit all sorts of stuff: text documents, pdf documents, bookmarks, videos etc. The Twine Workspace:

I can furthermore organize this content into collections (“Twines”), which can be either public or private:

Once uploaded, all resources get pushed through a natural language processing workflow, which aims to extract metadata from these and subsequently marks the metadata up in a semantically rich form (RDF) using Twine’s own ontologies. Here, for example, is a bookmark for a book on Amazon’s site:

The extracted metadata is shown in a user friendly way on the right. And here is the RDF that Twine produces as a concequence of metadata extraction from the Amazon page:

So far, the NLP functionality extracts people, places, organisations, events etc. However, Radar Networks have announced that users will be allowed to use their own ontologies come the end of the year. Now I have no idea how this will work technically, but assuming that they can come up with a reasonable implementation of this, things get exciting as it is then up to the user to “customize” his workspace around his interests etc. and to decide on the information they want to see.

On the basis of the extracted metadata, the system will suggest other documents in my own collection or in other public Twines, which might be of interest to me, and I, for one, have already been alerted to a number of interesting documents this way. Again, if Radar’s plans go well, Twine will offer document similarity analyses on the basis of clustering around autumn time.

It doesn’t end here: there is also a social component to the system. On the basis of the metadata extracted from my documents, other users with a similar metadata profile and therefore presumed similar interests will be recommended to me and I have to opportunity to link up with them.

As I said above, at the moment, Twine is in private beta and so the stuff is hidden by behind a password for now. However, if everything goes to plan, Radar plans to take the passwords off the public Twines so that the stuff will be exposed on the web, indexed by Google etc. And once that happens, of course, there are more triples for the world too…..which can only be a good thing.

Personally, I am excited about all of this, simply because the potential is huge. Some of my colleagues are less enthusiastic – for all sorts of reasons. For one, the user interface is far from intuitive at the moment and it actually takes a little while to “get” Twine. But once you do, it is very exciting….and I think that a great deal of this functionality could be/should be implemented by institutional repos as well. Oh and what would it mean for data portability/data integration etc. if institutional repos started to expose RDF to the world….?

By the way, I have quite a few Twine invites left – so should anybody want to have a look and play with the system, leave a comment on the blog and I’ll send you an invite!

Semantic Technologies 2008 – Thursday Sessions

Here are the Seesmic Videoblogs of the Thursday Sessions:
(In some feedreaders (e.g. Google) the Seesmic Videoplayers don’t seem to show up properly – so please click through to the actualy blog post)

1. The use of SWRL for ontology translation

2. Modeling Objects in RDF

3. Bringing Semantic Technology back to the Business

Semantic Technologies 2008 – Wednesday Sessions

Here are the Seesmic Videoblogs of the Wednesday Sessions:
(In some feedreaders (e.g. Google) the Seesmic Videoplayers don’t seem to show up properly – so please click through to the actualy blog post)

1. Impressions from the Conference Halls

2. The Semantic Web as a Blue Ocean Opportunity

3. Advanced Topics in OWL Ontology Development

4. The Calais Webservice

5. Collaborative Protege

6. The Fellowship of the Semantic Web

7. The Rising Stars of the Semantic Web (Panel)

Semantic Technologies 2008

…I have just gotten an invite for Seesmic and am quite excited to play with video blogging. So here’s the first video…and the questions on there also go to the readers of this blog:

So leave a comment on Seesmic or on the blog here…looking forward to your answers!

What is a document?

The following is a verbatim post on Richard Cyganiak’s blog recently. I normally hate re-blogging other people’s content – however, this one is so funny, it deserves further attention:

QOTD: timbl is a document

Simon Spero on the SKOS list:

The meaning of “document” in this context is extremely broad; if we follow Otlet’s definition of a document as anything which can convey information to an observer, the term would seem to cover anything which can have a subject.

By this standard, timbl is a document, but only when someone’s looking.

Ah, the Semantic Web community! Please leave your common sense at the door …



And Amen to that!

Yahoo! has! announced! support! for! semantic! web! standards!

Well, this blog has remained dormant for far too long as I got distracted by the “real world” (i.e. papers, presentations and grant proposals – not the real real world) after christmas.

But I can’t think of a better way to start blogging again than to report that Yahoo! has just announced their support of semantic web technologies. To quote from their search blog:

The Data Web in Action
While there has been remarkable progress made toward understanding the semantics of web content, the benefits of a data web have not reached the mainstream consumer. Without a killer semantic web app for consumers, site owners have been reluctant to support standards like RDF, or even microformats. We believe that app can be web search.

By supporting semantic web standards, Yahoo! Search and site owners can bring a far richer and more useful search experience to consumers. For example, by marking up its profile pages with microformats, LinkedIn can allow Yahoo! Search and others to understand the semantic content and the relationships of the many components of its site. With a richer understanding of LinkedIn’s structured data included in our index, we will be able to present users with more compelling and useful search results for their site. The benefit to LinkedIn is, of course, increased traffic quality and quantity from sites like Yahoo! Search that utilize its structured data.

In the coming weeks, we’ll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources.

We believe that our open approach will let each of these formats evolve within their own passionate communities, while providing the necessary incentive to site owners (increased traffic from search) for more widespread adoption. Site owners interested in learning more about the open search platform can sign up here.

I have had many discussions with people over the past year or so concerning the value of the semantic web approach and some of the people I talked to have been very vocal sceptics. However, the results of some of our work in Cambridge, together with the fact that no matter where I look, the semantic web is becoming more prominent, have convinced me that we are doing the right thing. It was pleasing to see that the Semantikers were out in force recently even at events such as CeBit which has just ended. Twine seems to be inching towards a public beta now and the first reviews of the closed beta are being written even if they are mixed at the moment. Reuters recently released Calais as a web service, which uses natural language processing to identify people, places, organizations and facts and makes the metadata available as RDF constructs. So despite all the skepticism, semantic web products are starting to be shipped and even the mainstream media are picking up the idea of the semantic web with the usual insouciance, but nevertheless, it seems to be judged newsworthy. Academic efforts are coming to fruition too.

Maybe I am gushing a little now. But it seems to me, that Yahoo now lending its support could be a significant step forward for all of us. And sometimes it is just nice to know that one is doing the right thing.

Ontology Development Methodologies – Uschold and King

Ontology Development Methodologies take two.

Pictorially, Uschold and King’s methodology can be summarized as follows:

Uschold.gif

As can be seen, the process can be broken down into a number of discreete steps: identification of the ontology’s purpose, ontology capture, ontology coding, integration of existing ontologies, ontology evaluation and ontology documentation.

Let’s look at each of these steps in turn:

Identification of the Ontology’s Purpose

All methodologies for the development of ontologies that will be surveyed in this series of blog posts agree, that the definition of the purpose is of vital importance as it essentially helps to define scope and granularity of the ontology. However this is where agreements usually ends and while Grueninger and Fox are relatively prescriptive about how this could be achieved, Uschold and King essentially leave the puropose and scope definition up to the ontological engineer. They do, however, discuss a number of purposes which have been reported in the literature and thus provide some criteria which an ontological engineer could consider when attempting to formulate a “mission statement” for his ontology. These are:

  • definition of a vocabulary?
  • meta-level specification of a logical theory?
  • ontology primarily intended for use of a small group?
  • re-use by large community?
  • ontology a means to structure a kowledge base?
  • ontology part of knowledge base?

Ontology Capture

Uschold and King define ontology as a process, which can again be broken down into a number of smaller steps:

  • identification of the key concepts and relationships in the domain of interest
  • production of precise, unambiguous text definitions for such concepts and relationships
  • identification of of terms to refer to such consepts and relationships
  • achieving community agreement on all of the above

For the initial stages of the ontology capture process, Uschold and King recommend a brainstorming phase, which should produce all relevant concepts and relationships the ontology should contain. At this stage, concepts are represented by terms (labels) which my hide differences in interpretation and understanding of fundamental concepts. Furthermore, the authors point out that, while, in their experience, brainstorming works well, it may have to be supplemented with other sources of information if domain expertise is required.

In a second step then, the identified concepts should be arranged into “work areas corresponding to naturally arising sub-groups”. To decide whether a term should be included or excluded from a grouping and the ontology in general, a reference should be made to the requirements specification of the ontology. The authors thus underline again the vital importance of the availability of such a document. They furthermore recommend, that inclusion or exclusion decisions be documented for future reference. Finally, it is recommended that “semantic cross-references” be identified which link concepts in one group to those of another group.

In a third step in the capture process, Uschold and King recommend the identification of metaontologies which may be suitable for the particular domain ontology to be constructed, without, at this stage, making a firm ontological commitment. They recommend that a consideration of the concepts in the domain ontology and their interrelationships guide the choice of a metaontology.

In a fourth step, precise definitions of all terms and concepts in the ontology should be produced. For this purpose, the authors recommend that defintions for concepts which have a maximum semantic overlap between work areas should be produced first, as these are more likely to be the most important concepts and it is important to get these definitions right in the first instance. Furthermore, Uschold and King advocate to focus initially on the definition of cognitively basic terms as opposed to more abstract ones, arguing that this should facilitate the process of relating terms in different areas. To develop the definitions, Uschold and King recommend that precise natural language text definitions of all terms be produced, while takinng great care to ensure consistency with other terms which are already in use. Furthermore, the introduction of new terms to to be avoided at this stage. The provision of examples is considered to be helpful.

Possible guidelines for dealing with ambiguous or hard terms are also provided. These are:

  • Do not use the term if it is too ambiguous to define and consensus cannot be reached.
  • Before attempting a definition, clarify the underlying idea. This can, for example be done by consulting dictionaries and trying to avoid technical terms as much as possible.
  • To avoid getting hung up on concept labels, give term meaningless labels such as x1, x2, x3 and then attempt the definition of the underlying idea.
  • If a definition/clarification has been achieved, attach a term to the concepts, which, if possible avoids the previous ambiguous term.

Once this has been done, an ontological commitment to a meta-ontology should be made at this stage, which will support the following coding stage.

Ontology Coding

Ontology coding denotes the representation of the conceptualization which has been developed during the capture phase in a formal language. In essence, this involves three sub-stages: a commitment to a meta-ontology, the choice of a formal ontology language and the coding itself. Furthermore, existing ontologies which are to be re-used should be included during the coding stage. In general, the merging of two different ontologies is no trivial task, but can be facilitated, if a meta-ontology is available in all used ontologies to provide some sort of “schema definition”.

Ontology Evaluation

Ontology evaluation is in essence a technical judegement of the performance of an ontology w.r.t. its requirements specification, its ability to answer questions and its associated software environment.

Ontology Documentation

We have already alluded to the fact that decisions concerning the inclusion or exclusion of terms from the ontology should be documented. Furthermore, Uschold and King point out that a large obstacle to efficient ontology sharing is the absence of documentation or inadequate documentation.

The Semantic Web of Data

Peter has already blogged that Paul Miller visited us yesterday and gave an excellent talk on evolving the Web from a Web of Documents to a semantic web of data. He does so without any jargon and using beautiful slides and his talk is a wonderful explanation of the philosophy we have taken in polymer informatics.

In short, it is stuff that everyone should know about and certainly scientists – it will change the way in which we report, analyze and distribute data and scientific information forever. Paul has kindly allowed us to video his talk and to upload it to Google. Treat yourself and watch Paul’s talk here: