Author Archive

Southampton University internships to transfer thesis data into LabTrove and ChemSpider

Written by Aileen Day.

This summer there have been a number of students from the University of Southampton doing internships on joint projects between the university and the Royal Society of Chemistry and ChemSpider. Three of these students have been sifting through theses from past members of Richard Whitby’s research group in order to extract the compound, spectra and reaction data in it (and linked lab note books, and archive spectra files) and share these in LabTrove, ChemSpider, and CSSP. The students – Alex Hartke, Yet Wai Lee and Josh Whittam (all 2nd year undergraduates) – are shown below together with the boxes of thesis data, lab notebooks and spectra print outs that they digitised.

Southampton University interns

Southampton University interns

Between them they digitised 7 theses, by A.Henderson, L. Sayer, D. Owen, D.Macfarlane, F. Giustiniano, G. Saluste, J. Stec, which resulted in 1035 LabTrove pages being published to the Whitby Group’s LabTrove blog.

The theses were a rich source of compound information – including compound structures, names, properties and spectra, all of which were also deposited into ChemSpider resulting in 208 new compound pages, and about 600 spectra.

For this project the students manually deposited the compound information into LabTrove and then deposited the compounds and spectra to ChemSpider. However, we are currently developing a range of ChemSpider jquery widgets which can be integrated into web-based ELNs such as LabTrove which will make it easier to enter compound information from ChemSpider into experiments, and also to publish compound and reaction data from the ELNs to ChemSpider, CSSP and ChemSpider Reactions. This will follow on from the initial proof of concept to retreive ChemSpider information and enter it into LabTrove pages.

With this long-term aim in view, the LabTrove pages that the interns stored the compound and reaction data were structured using LabTrove templates, and this structuring will make it easier for publishing widgets to understand the data and process it the correct way. In this way, the project was partly a test to ensure that the templates were suitable for storing compound data in LabTrove. As well as the ChemSpider compound and associated data template (with corresponding help page, templates were also written to store reaction data in a formatted way, since the theses were primarily focused on the synthesis of compounds. At their simplest, basic reaction data can be stored in LabTrove using the ChemSpider Reactions template (and corresponding help page, and eventually posts written in this format will be easily publishable to ChemSpider Reactions. More detailed reaction data can be stored using the ChemSpider SyntheticPages style reaction template (and corresponding help page. The initial aim was to deposit all of this reaction data into ChemSpider SyntheticPages but it became clear that it was difficult for anyone other than the researcher who conducted the reaction, or their superviser to supply the necessary level of detail for CSSP submissions, and in particular couldn’t easily be reached by retrospectively abstracting theses. As a result, only a handful of reactions were submitted to CSSP, and the majority (over 500) were stored in LabTrove for future submission to ChemSpider Reactions.

If reactions can be published easily from ELNs to ChemSpider Reactions and that is easily queryable by other researchers and their applications when performing new reactions this will be a major step towards the aims of the Dial-a-molecule (an EPSRC Grand Challenge network). An important part of the reaction data which needs to be captured is the stoichiometry table of substances used and produced in a reaction. However, these stoichiometry tables are too complicated to incorporate into a LabTrove template, so the LabTrove reaction templates will be used in conjunction with a new ChemSpider jquery widget which is currently in the process of being integrated with LabTrove (more details to follow on this blog shortly!) which will construct them. The widget performs ChemSpider lookups to retrieve compound information, and will calculate equivalents, thereby saving the researcher time when working out the amounts of reactants needed or yields of products obtained. An example of a reaction post which was initially created using the ChemSpider Reactions template and then supplemented by adding a stoichiometry table to it using the ChemSpider Edit Stoichiometry Table widget is shown here.

If you are a LabTrove user and wish to use the ChemSpider templates, their source is available via their links above, and instructions for using templates in Labtrove are documented here.

Recent Improvements to ChemSpider Search (part 3)

In part one of this series we talked about searching by molecular formula ranges, and combining substructure searches with other types of searches. Part two covered how to search by supplementary information like bioactivity, appearance or melting point. This time we will demonstrate how you can use a search combining these new features to help answer a question you might encounter in the lab.

After performing a bromination reaction on phenol you isolate a product with a melting point of 90-93°C. If you start a search with just three pieces of information – your product is a derivative of phenol, it should contain at least one bromine, and your melting point is 90-93°C – you can construct a search on the Advanced Search page to help you get started in identifying your product.

Since you can now combine substructure searches with other searches, you start by looking for a compound containing phenol (Search by SubStructure). To restrict your results to brominated phenols, you add a molecular formula range search for C6H(1-5)O1Br(1-5) (Search by Properties). Lastly, you search for compounds with a melting point of 90-93°C (Search by Supplementary Information).

Your search turns up one result – 2,4,6-Tribromophenol. Although you need more information to conclusively confirm the identification, this gives you a lead in your analysis/elucidation.

Taking a look at the record, you may notice it has an interactive IR spectrum from NIST. If you check the Data Sources section, you will find that there are a lot data sources for the record.

To make it simpler to identify useful information you can browse the tabs to look for specific types of information: for instance the “Spectral Data” tab provides links to data in the MassBank and NMRShiftDB databases, which will hopefully aid you confirming/determining whether the product is 2,4,6-Tribromophenol.

This is just one example of how you can combine different searches on the Advanced Search page. Advanced searches are a great way to narrow down your results to help you find exactly what you are looking for, and there are many options we haven’t covered here, so have a look around and see what combinations might work for you.

Recent Improvements to ChemSpider Search (part 2)

Last time we told you about a number of improvements we have added to ChemSpider in the recent site updates, including combined substructure and properties search and searching by molecular formula ranges. As promised, this time we will cover how to search by properties like melting point or appearance.

Searching by Supplementary Information

Until now, although you could view properties when you were already on a record, there was no way to search by melting point, refractive index, appearance or bioactivity. This update has implemented a new search interface which allows you to search this data. You can now find compounds that are reported as being isolated from yeast, or compounds with a melting point of 32-35 °C.

There are 2 main parts to our Supplementary search interface.

Text Properties Search

Text properties include appearance, chemical class, drug status, or safety data. You can search any of these properties by using key words. When you start typing, a number of suggested search terms will appear, which can help you narrow down what search term to use.

You can also use wild cards by entering *, which can give you a little more flexibility in your search term – so if your unknown is a blue, crystalline material a search for “Blue crystal*” will turn up all records which mention the word “blue”, as well as any word beginning with “crystal” (such as crystals or crystalline).

 

Numeric Properties Search

Numeric properties include physical properties like experimental or predicted boiling point, optical rotation, or LogP. Since we draw data from a wide range of data sources, not all of this information is sent to us in the same format or with the units depicted the same way. In order to make it possible for you to search across all the properties in our database no matter how it was supplied to us, we have done a lot of background work on tidying up and standardizing this data.

All numeric properties can be searched using min/max or with a +/- range and the search term can be entered in a variety of units – eg. Fahrenheit or Celsius for temperature, or psi or mmHg for pressure. Because the boiling point of a material is dependent at the pressure at which the measurement is made and not all boiling points are measured at atmospheric pressure we have created a feature that attempts to compensate for this. It uses the Clausius-Clapeyron equation to create estimated (standardised) boiling points for searching, please remember this when looking at your results.

 

As you can see, you are able to search on a wide variety of experimental properties, including boiling point, LogP, melting point, specific gravity and solubility. Please note that although many of the more common compounds have some properties, these properties are only available on a subset of our records – so if you do not get a result on a property search, it might be that we haven’t added that information yet.

Hopefully this gives you a good idea of the improvements we’ve made to ChemSpider search, and how these new features make it easier than ever to find what you are looking for. See the following post for a case study that showcases several of the new features covered in these posts.

Recent Improvements to ChemSpider Search (part 1)

We recently published an update to the ChemSpider website which, in addition to fixing a number of bugs, has added some useful new features. Three of these features are highlighted in this post – one which you might have noticed already, and two which you may not have discovered yet.

Auto-Complete

We have reinstated the auto-complete feature on the ChemSpider homepage. Now, when you begin typing in the search box, ChemSpider makes suggestions based on what you have typed. This makes it easier than ever to find what you are looking for – even if you aren’t quite sure how to spell it.

Autocomplete on the ChemSpider homepage

 

Combined Structure/Property Searches

People frequently ask if there is a way to search substructure and other properties like molecular weight or molecular formula at the same time. This update now makes it possible to perform this kind of combined search from our improved Advanced Search page.

E.g. If you are interested in finding compounds which are structurally similar to Valium, you can enter a benzodiazepinone substructure and restrict it to compounds with a molecular weight of 275-325.


This search then returns Valium along with other similar drugs like clonazepam, nitrazepam and lorazepam.

There are many other search options that can be combined with a substructure/similarity search so look at the Advanced Search page and have a play.

Molecular Formula Range Searching

You can also search a range of molecular formulae at once. To specify the range for a given element, put the range in parentheses after the element. E.g. C7H(10-12)O(0-1) would return all compounds containing exactly 7 carbons and between 10 to 12 hydrogens and which may or may not contain an oxygen. This type of search can be performed from the Simple Search page, as part of an Advanced Search or from the ChemSpider homepage.

Best of all, this can be combined with any of the other search parameters on the Advanced Search page including the substructure search. For example, if you wanted to find polychlorinated biphenyls containing at least three Chlorines you could perform a substructure search for a biphenyl with a molecular formula of C12H(0-7)Cl(3-10).


In our next post, we will cover some new ways you can search by properties that are stored in our records such as melting point, density, etc.

Hexagons in the Plane

Written by Colin Batchelor.

I’ll be talking at the 6th Joint Sheffield Conference on Cheminformatics in July on Validation and Standardization of Molecular Structures in General and Sugars in Particular. This is a taster.

Sugars in Particular

One of the big problems with chemical structure algorithms is that they can’t, in general, cope with the ways that chemists are accustomed to drawing sugar molecules. They will lose the stereochemistry around the sugar ring, collapsing D-glucose, say, on to L-glucose, not to mention allose, altrose, gulose and all the others.

(ChemDraw, I should note, can interpret chair stereo properly, but it is very much an exception.)

The first step in determining correct stereochemistry for a chair atom is recognizing a chair hexagon. That is the subject of this post.

Have you ever been in the same car as a satnav (US readers: this is the same as a GPS)? Whereas a human navigator will give general instructions like “go straight over all of the roundabouts till we reach the Red Lion”, a satnav only ever gives single-step, local instructions. “At the roundabout, take the third exit.” “In 100 metres, turn left.” Machine structure perception is rather like this. Instead of apprehending in an instant that the hexagon is a chair or a boat like you or I would, the algorithm needs to step around the structure atom by atom, bond by bond.

The trick to identifying what kind of hexagon we are dealing with is to see whether, at each atom, we turn left or right. If we keep turning in the same direction all the way round, then we have a regularish hexagon. If we turn once in one direction, then twice in the other, then once in the first, then twice in the other, then we have a chair. There are six other sorts of hexagon you can draw, and they’re all depicted below alongside the corresponding sequences of turns.

Some of them are familiar, like the boat, the twist boat, and the envelope. Others, less so.

Hexagons
What happens when we’ve identified the atoms in the chair? I’ll come to that in more detail soon, but in the mean time here’s the slides from the ACS Spring meeting in New Orleans:

Wedges, hashes and a side order of Grice

Written by Colin Batchelor.

No (This is not a post about carbohydrates, despite the title!)

Dodgy stereochemistry is a persistent problem.  Even if someone knows all of the stereocentres in a particular molecule, they might not necessarily draw them in a way that a machine, or even a person, can interpret.  There are rules about whether the pointy end or the blunt end of a bond indicates the stereocentre, and it’s surprising how often you see them done wrongly.

Today I’m going to talk about a particular IUPAC recommendation for drawing stereocentres that might at first glance seem surprising, the rule that you may only have one stereobond at a given stereocentre. If you have a wedged bond attached to an atom, you can’t have a hashed bond attached to the same atom. And vice versa.

Why is this?

You might think that as you’re supplying more information, you’re making the diagram easier to interpret. However, you’re running directly counter to the normal principles of communication.  You’re being more informative than required, and this sets off alarm bells in the reader.  What are you trying to say?  If you ask a passerby the time and they say “Well, it’s half past six Greenwich Mean Time” you’re entitled to wonder why they’re quoting the timezone. Maybe they’re trying to be funny.

Paul Grice thought about this whole problem in the 1970s and came up with a set of four principles, summarized in maxims, that listeners (or readers) assume that speakers are following.  These are they:

  • Be Truthful. Do not say what you believe to be false. Do not say that for which you lack adequate evidence.

Let us hope that this one is implicit in any chemical drawing!

  • Make your contribution as informative as is required.  Do not make your contribution more informative than required.

If you have two methyl groups coming off an atom, do not make one wedgy and one hashy. You are adding no new information!

Do not mark carbons with the letter C unless your target audience is schoolchildren.

  • Be relevant:

On the grand scale: do not illustrate an article with any old molecule—make sure the molecule mentioned is actually relevant.

On the scale of the drawing itself, however: If you have three bonds about an ordinary p-block atom, for example, make sure they’re at 120 degrees to each other.  If they aren’t, for example if two of them are at right angles, the reader will infer that something odd is going on.

  • Be clear:

Make sure all your double bonds actually look like double bonds rather than a single bond parallel to another single bond.  I suspect a lot of the success of ChemDraw is down to the fact that it produces attractive, clear chemical drawings.

Do people ever flout the maxims on purpose?

Oh yes.  People often flout the maxims when trying to be funny, or in a political interview.  Similarly there are all kinds of Gricean violations in the chemical drawings you see in patents: bonds which do not quite extend all the way to atoms, R groups labelled as Y (particularly dangerous as Y is yttrium!) or Q or W (also tungsten) or some other unusual letter and so forth.  Exactly why this happens so much more often in patents than in journal articles is left as an exercise for the reader.

Putting sugar in perspective

Written by Colin Batchelor.

You might not think so, but you’re very good at taking a two-dimensional drawing and converting it into a three-dimensional shape in your head. No, really, you are.

Galactose in perspective

Fig. 1. Galactose in perspective

Take the drawing of galatose in Fig. 1. Even if you’re not a chemist, you can tell which bits of the ring are at the front and at the back, which bonds point up and which bonds point down. If you actually are a chemist, you’ve been trained to apply this geometrical intuition to work out what’s going on at each of the five stereocentres.

However, if you ask the InChI algorithm about the stereochemistry of this molecule, it’ll say that there is no stereochemistry in there and you’re looking at a stereoless description of which atom is attached to which. Since we use the InChI algorithm to say whether two records describe the same molecule, this puts us in a quandary, and there are thousands of entries in ChemSpider that come from just such a drawing and hence lack stereochemistry.

(more…)