Now although I have not blogged anything for a long time again, I did not mean to write this blog-post as I have some more pressing academic concerns to deal with at the moment. However, given that the discussion as to what a repository should be or should do for its users is flaring up again here, http://blog.openwetware.org/scienceintheopen/2008/06/10/the-trouble-with-institutional-repositories/ and here. In particular, Chris Rusbridge asked for ideas about repository functionality and so I thought I should chime in.
When reading through all of the posts referenced above, the theme of automated metadata discovery is high up on everybody’s agenda and for good reason: while I DO use our DSpace implementation here in Cambridge and try and submit posters and manuscripts, I do feel considerable frustration everytime I do so. Having to enter the metadata first (and huge amounts of it) costs me anything from 5 to 10 min a pop. Now (ex-)DSpacers tell me that the interface and funcationality that make me do this is a consequence of user interaction studies. If that is true, then the mind boggles….but anyway, back to the point.
I have been wondering for a while now, whether the darling of the semantic web community, namely Radar Network’s Twine, could not be a good model for at least some of the functionality that an institutional repo should/could have. It takes a little effort to explain what Twine is, but if you were to press me for the elevator pitch, then it would probably be fair to say that Twine is to interests and content, what Facebook is to social relationships and LinkedIn to professional relationships.
In short, when logging into Twine, I am provided with a sort of workspace, which allows me to reposit all sorts of stuff: text documents, pdf documents, bookmarks, videos etc. The Twine Workspace:
I can furthermore organize this content into collections (“Twines”), which can be either public or private:
Once uploaded, all resources get pushed through a natural language processing workflow, which aims to extract metadata from these and subsequently marks the metadata up in a semantically rich form (RDF) using Twine’s own ontologies. Here, for example, is a bookmark for a book on Amazon’s site:
The extracted metadata is shown in a user friendly way on the right. And here is the RDF that Twine produces as a concequence of metadata extraction from the Amazon page:
So far, the NLP functionality extracts people, places, organisations, events etc. However, Radar Networks have announced that users will be allowed to use their own ontologies come the end of the year. Now I have no idea how this will work technically, but assuming that they can come up with a reasonable implementation of this, things get exciting as it is then up to the user to “customize” his workspace around his interests etc. and to decide on the information they want to see.
On the basis of the extracted metadata, the system will suggest other documents in my own collection or in other public Twines, which might be of interest to me, and I, for one, have already been alerted to a number of interesting documents this way. Again, if Radar’s plans go well, Twine will offer document similarity analyses on the basis of clustering around autumn time.
It doesn’t end here: there is also a social component to the system. On the basis of the metadata extracted from my documents, other users with a similar metadata profile and therefore presumed similar interests will be recommended to me and I have to opportunity to link up with them.
As I said above, at the moment, Twine is in private beta and so the stuff is hidden by behind a password for now. However, if everything goes to plan, Radar plans to take the passwords off the public Twines so that the stuff will be exposed on the web, indexed by Google etc. And once that happens, of course, there are more triples for the world too…..which can only be a good thing.
Personally, I am excited about all of this, simply because the potential is huge. Some of my colleagues are less enthusiastic – for all sorts of reasons. For one, the user interface is far from intuitive at the moment and it actually takes a little while to “get” Twine. But once you do, it is very exciting….and I think that a great deal of this functionality could be/should be implemented by institutional repos as well. Oh and what would it mean for data portability/data integration etc. if institutional repos started to expose RDF to the world….?
By the way, I have quite a few Twine invites left – so should anybody want to have a look and play with the system, leave a comment on the blog and I’ll send you an invite!