PDFs enhanced with XMP

02 Aug 2010

By Colin Batchelor, Senior Informatics Analyst.

Our readers still read most of our articles on the web as PDFs rather than HTML, so we thought we’d experiment with making some of our award-winning Prospect markup available through PDFs as well as through HTML.

Our first experiment is with XMP, a format which has hitherto mainly been used for metadata in photographs. We’re including compound data as InChIs, specifically pointers to the RSC InChI resolver, and incorporating other entities of interest with reference to OBO and RSC ontologies.

Examples, and instructions for how to see what we’ve included with an ordinary PDF viewer, available here: http://www.rsc.org/Publishing/Journals/ProjectProspect/Examples.asp

They’re not really intended to be directly read by human beings; we’d anticipate that these will be picked up and indexed by search engines or desktop search, and that people will use Adobe’s SDK to extract the data into a triplestore where it can be reasoned over.

We should also acknowledge that Omer Casher and Henry Rzepa at Imperial College London were experimenting with XMP back in 2006, and that NPG’s Tony Hammond has been blogging extensively on this subject on the CrossTech blog.

More experiments soon, but do let us know what you think in the comments below!

RSC Publishing Innovation

PDFs enhanced with XMP

Links

Categories

Archives

Meta