Author Archive

Recent Improvements to ChemSpider Search (part 1)

We recently published an update to the ChemSpider website which, in addition to fixing a number of bugs, has added some useful new features. Three of these features are highlighted in this post – one which you might have noticed already, and two which you may not have discovered yet.

Auto-Complete

We have reinstated the auto-complete feature on the ChemSpider homepage. Now, when you begin typing in the search box, ChemSpider makes suggestions based on what you have typed. This makes it easier than ever to find what you are looking for – even if you aren’t quite sure how to spell it.

Autocomplete on the ChemSpider homepage

 

Combined Structure/Property Searches

People frequently ask if there is a way to search substructure and other properties like molecular weight or molecular formula at the same time. This update now makes it possible to perform this kind of combined search from our improved Advanced Search page.

E.g. If you are interested in finding compounds which are structurally similar to Valium, you can enter a benzodiazepinone substructure and restrict it to compounds with a molecular weight of 275-325.


This search then returns Valium along with other similar drugs like clonazepam, nitrazepam and lorazepam.

There are many other search options that can be combined with a substructure/similarity search so look at the Advanced Search page and have a play.

Molecular Formula Range Searching

You can also search a range of molecular formulae at once. To specify the range for a given element, put the range in parentheses after the element. E.g. C7H(10-12)O(0-1) would return all compounds containing exactly 7 carbons and between 10 to 12 hydrogens and which may or may not contain an oxygen. This type of search can be performed from the Simple Search page, as part of an Advanced Search or from the ChemSpider homepage.

Best of all, this can be combined with any of the other search parameters on the Advanced Search page including the substructure search. For example, if you wanted to find polychlorinated biphenyls containing at least three Chlorines you could perform a substructure search for a biphenyl with a molecular formula of C12H(0-7)Cl(3-10).


In our next post, we will cover some new ways you can search by properties that are stored in our records such as melting point, density, etc.

Hexagons in the Plane

Written by Colin Batchelor.

I’ll be talking at the 6th Joint Sheffield Conference on Cheminformatics in July on Validation and Standardization of Molecular Structures in General and Sugars in Particular. This is a taster.

Sugars in Particular

One of the big problems with chemical structure algorithms is that they can’t, in general, cope with the ways that chemists are accustomed to drawing sugar molecules. They will lose the stereochemistry around the sugar ring, collapsing D-glucose, say, on to L-glucose, not to mention allose, altrose, gulose and all the others.

(ChemDraw, I should note, can interpret chair stereo properly, but it is very much an exception.)

The first step in determining correct stereochemistry for a chair atom is recognizing a chair hexagon. That is the subject of this post.

Have you ever been in the same car as a satnav (US readers: this is the same as a GPS)? Whereas a human navigator will give general instructions like “go straight over all of the roundabouts till we reach the Red Lion”, a satnav only ever gives single-step, local instructions. “At the roundabout, take the third exit.” “In 100 metres, turn left.” Machine structure perception is rather like this. Instead of apprehending in an instant that the hexagon is a chair or a boat like you or I would, the algorithm needs to step around the structure atom by atom, bond by bond.

The trick to identifying what kind of hexagon we are dealing with is to see whether, at each atom, we turn left or right. If we keep turning in the same direction all the way round, then we have a regularish hexagon. If we turn once in one direction, then twice in the other, then once in the first, then twice in the other, then we have a chair. There are six other sorts of hexagon you can draw, and they’re all depicted below alongside the corresponding sequences of turns.

Some of them are familiar, like the boat, the twist boat, and the envelope. Others, less so.

Hexagons
What happens when we’ve identified the atoms in the chair? I’ll come to that in more detail soon, but in the mean time here’s the slides from the ACS Spring meeting in New Orleans:

Wedges, hashes and a side order of Grice

Written by Colin Batchelor.

No (This is not a post about carbohydrates, despite the title!)

Dodgy stereochemistry is a persistent problem.  Even if someone knows all of the stereocentres in a particular molecule, they might not necessarily draw them in a way that a machine, or even a person, can interpret.  There are rules about whether the pointy end or the blunt end of a bond indicates the stereocentre, and it’s surprising how often you see them done wrongly.

Today I’m going to talk about a particular IUPAC recommendation for drawing stereocentres that might at first glance seem surprising, the rule that you may only have one stereobond at a given stereocentre. If you have a wedged bond attached to an atom, you can’t have a hashed bond attached to the same atom. And vice versa.

Why is this?

You might think that as you’re supplying more information, you’re making the diagram easier to interpret. However, you’re running directly counter to the normal principles of communication.  You’re being more informative than required, and this sets off alarm bells in the reader.  What are you trying to say?  If you ask a passerby the time and they say “Well, it’s half past six Greenwich Mean Time” you’re entitled to wonder why they’re quoting the timezone. Maybe they’re trying to be funny.

Paul Grice thought about this whole problem in the 1970s and came up with a set of four principles, summarized in maxims, that listeners (or readers) assume that speakers are following.  These are they:

  • Be Truthful. Do not say what you believe to be false. Do not say that for which you lack adequate evidence.

Let us hope that this one is implicit in any chemical drawing!

  • Make your contribution as informative as is required.  Do not make your contribution more informative than required.

If you have two methyl groups coming off an atom, do not make one wedgy and one hashy. You are adding no new information!

Do not mark carbons with the letter C unless your target audience is schoolchildren.

  • Be relevant:

On the grand scale: do not illustrate an article with any old molecule—make sure the molecule mentioned is actually relevant.

On the scale of the drawing itself, however: If you have three bonds about an ordinary p-block atom, for example, make sure they’re at 120 degrees to each other.  If they aren’t, for example if two of them are at right angles, the reader will infer that something odd is going on.

  • Be clear:

Make sure all your double bonds actually look like double bonds rather than a single bond parallel to another single bond.  I suspect a lot of the success of ChemDraw is down to the fact that it produces attractive, clear chemical drawings.

Do people ever flout the maxims on purpose?

Oh yes.  People often flout the maxims when trying to be funny, or in a political interview.  Similarly there are all kinds of Gricean violations in the chemical drawings you see in patents: bonds which do not quite extend all the way to atoms, R groups labelled as Y (particularly dangerous as Y is yttrium!) or Q or W (also tungsten) or some other unusual letter and so forth.  Exactly why this happens so much more often in patents than in journal articles is left as an exercise for the reader.

Putting sugar in perspective

Written by Colin Batchelor.

You might not think so, but you’re very good at taking a two-dimensional drawing and converting it into a three-dimensional shape in your head. No, really, you are.

Galactose in perspective

Fig. 1. Galactose in perspective

Take the drawing of galatose in Fig. 1. Even if you’re not a chemist, you can tell which bits of the ring are at the front and at the back, which bonds point up and which bonds point down. If you actually are a chemist, you’ve been trained to apply this geometrical intuition to work out what’s going on at each of the five stereocentres.

However, if you ask the InChI algorithm about the stereochemistry of this molecule, it’ll say that there is no stereochemistry in there and you’re looking at a stereoless description of which atom is attached to which. Since we use the InChI algorithm to say whether two records describe the same molecule, this puts us in a quandary, and there are thousands of entries in ChemSpider that come from just such a drawing and hence lack stereochemistry.

(more…)