Polymer Theses, Polymer Data and a Common Language.

I am currently at the European Science Foundation’s first summer school on Nanomedicine in Cardiff, where I was invited to present some of the work in polymer informatics which we are doing in Cambridge. The summer school is a wonderful event, with approximately 180 attendees, the majority of which are PhD students and even a few undergraduates as well as a significant number of tenured faculty. The attendees came from a number of scientific disciplines, such as chemistry, biology, physics, medicine and ethics. And bringing people together in this way to talk about a field of research which is completely interfacial is the only sustainable way forward.
An awful lot of people were very impressed by the work we do and our approach to data and knowledge management and many of the PhD students I spoke to were enthused by the potential power that informatics can bring to their research. They also appreciated the need to have well-curated data that is freely available and not copyrighted by publishers etc. With so many PhD students here talking to each other freely about their research, getting to know each other and appreciating each other’s science, it seemed to me, that there is a real chance to build a community, that exchanges data and information in order to communally advance a field of research.

While the summer school was very multidisciplinary, there was a predominance of people interested in the use of polymers for all sorts of different applications – not least for applications in drug and gene delivery.
People working in polymer therapeutics are quite often “jacks of all trades;” not only are they chemists who know how to synthesize and purify polymers, but, to a certain extent at least, they also have to be physical chemists, biologists, formulators etc. So the polymer pharmaceuticals community produces very rich and diverse datasets. The data they create is usually of general importance:
An important property of polymers in medical applications, for example, is solubility. So quite often, people working in polymer pharmaceuticals will engage in the determination of phase diagramms for polymers. And as there is a lot of interest in stimulus responsive polymers, these diagramms are not just measured in pure water, but also in the presence of different ions and pH values. Researchers might also be interested in the dimensions of the polymer chain under all of those conditions, so light or x-ray scattering studies are carried out. And that is just on the pure polymer! Conjugation of a drug or gene to th pure material changes the game completely and so all of these measurements potentially get carried out again.

Once we are done with the physicochemical characterisation, we then go on to try and characterize the polymers we have synthesized w.r.t. their biological properties: we are interested in their toxicology, their biodistribution, their specificity etc. That, too, generates an awful lot of data which is potentially related to the structure of the polymers we are dealing with.

And as I said before, it is not only other pharmaceutical people that are interested in this sort of data. A lot of polymer chemists in general as well as companies should in principle be very interested in thi type of data: polymers are present in most modern household and cleaning products (check the labels of your shampoo and washing powder bottles).

Therefore it seems to me, that we have a rich source of polymer-related data here, that we should attempt to harvest. Judging from the initial enthusiasm that I have encountered at the summer school leads me to think that maybe we have an opportunity to work with the polymer pharmaceutics/nanomedicine research community to build up, at least in the long term, a valuable polymer knowledge base. Now, I am aware of the fact that this community in particular is very conscious of patents and intellectual property and we have mechanisms to ensure that these considerations can be taken into account and accommodated. How could we get hold of this data?
Over on his blog, Peter has pointed out that a viable way would be to capture digital theses in repositories, which, would not only allow the thesis to be preserved, but will undoubtedly also help with dissemination and intelligent data mining. Furthermore, it would be a way to prevent publishers from copyrighting scientific data.

All of this said, the potentialities go much further than this. I have already mentioned the strongly interdisciplinary nature of the summer school. Now, in our work here in Cambridge, we use semantic web technologies to hold information about polymers….we have developed an XML-based polymer markup language and are working on ontologies, which codify polymer knowledge. One of the conclusions of my talk was, that biologists and medics use exactly the same technologies to communicate their data and knowledge and so here for the first time, we have an opportunity to bring knowledge from disparate disciplines together and map it onto each other. In that way, we should be able to develop a joint language which we and our information systems can understand each other and that should allow us to ask new questions – Peter has already demonstrated what is possible when a thesis can be turned into RDF.
And theses originating in a strongly interdisciplinary field of research could be a wonderful starting point.

So, dear polymer science/polymer pharmaceuticals community, how about it? If you are interested not only in preserving and disseminating your data (after patenting etc.), but also in being able to ask new questions of it and in bringing multiple disciplines together, then give us your theses and let us work with you to show you how all this can be achieved. Here’s an offer – please take us up on it.

