Recent Improvements to ChemSpider Search (part 2)

Last time we told you about a number of improvements we have added to ChemSpider in the recent site updates, including combined substructure and properties search and searching by molecular formula ranges. As promised, this time we will cover how to search by properties like melting point or appearance.

Searching by Supplementary Information

Until now, although you could view properties when you were already on a record, there was no way to search by melting point, refractive index, appearance or bioactivity. This update has implemented a new search interface which allows you to search this data. You can now find compounds that are reported as being isolated from yeast, or compounds with a melting point of 32-35 °C.

There are 2 main parts to our Supplementary search interface.

Text Properties Search

Text properties include appearance, chemical class, drug status, or safety data. You can search any of these properties by using key words. When you start typing, a number of suggested search terms will appear, which can help you narrow down what search term to use.

You can also use wild cards by entering *, which can give you a little more flexibility in your search term – so if your unknown is a blue, crystalline material a search for “Blue crystal*” will turn up all records which mention the word “blue”, as well as any word beginning with “crystal” (such as crystals or crystalline).

 

Numeric Properties Search

Numeric properties include physical properties like experimental or predicted boiling point, optical rotation, or LogP. Since we draw data from a wide range of data sources, not all of this information is sent to us in the same format or with the units depicted the same way. In order to make it possible for you to search across all the properties in our database no matter how it was supplied to us, we have done a lot of background work on tidying up and standardizing this data.

All numeric properties can be searched using min/max or with a +/- range and the search term can be entered in a variety of units – eg. Fahrenheit or Celsius for temperature, or psi or mmHg for pressure. Because the boiling point of a material is dependent at the pressure at which the measurement is made and not all boiling points are measured at atmospheric pressure we have created a feature that attempts to compensate for this. It uses the Clausius-Clapeyron equation to create estimated (standardised) boiling points for searching, please remember this when looking at your results.

 

As you can see, you are able to search on a wide variety of experimental properties, including boiling point, LogP, melting point, specific gravity and solubility. Please note that although many of the more common compounds have some properties, these properties are only available on a subset of our records – so if you do not get a result on a property search, it might be that we haven’t added that information yet.

Hopefully this gives you a good idea of the improvements we’ve made to ChemSpider search, and how these new features make it easier than ever to find what you are looking for. See the following post for a case study that showcases several of the new features covered in these posts.

Recent Improvements to ChemSpider Search (part 1)

We recently published an update to the ChemSpider website which, in addition to fixing a number of bugs, has added some useful new features. Three of these features are highlighted in this post – one which you might have noticed already, and two which you may not have discovered yet.

Auto-Complete

We have reinstated the auto-complete feature on the ChemSpider homepage. Now, when you begin typing in the search box, ChemSpider makes suggestions based on what you have typed. This makes it easier than ever to find what you are looking for – even if you aren’t quite sure how to spell it.

Autocomplete on the ChemSpider homepage

 

Combined Structure/Property Searches

People frequently ask if there is a way to search substructure and other properties like molecular weight or molecular formula at the same time. This update now makes it possible to perform this kind of combined search from our improved Advanced Search page.

E.g. If you are interested in finding compounds which are structurally similar to Valium, you can enter a benzodiazepinone substructure and restrict it to compounds with a molecular weight of 275-325.


This search then returns Valium along with other similar drugs like clonazepam, nitrazepam and lorazepam.

There are many other search options that can be combined with a substructure/similarity search so look at the Advanced Search page and have a play.

Molecular Formula Range Searching

You can also search a range of molecular formulae at once. To specify the range for a given element, put the range in parentheses after the element. E.g. C7H(10-12)O(0-1) would return all compounds containing exactly 7 carbons and between 10 to 12 hydrogens and which may or may not contain an oxygen. This type of search can be performed from the Simple Search page, as part of an Advanced Search or from the ChemSpider homepage.

Best of all, this can be combined with any of the other search parameters on the Advanced Search page including the substructure search. For example, if you wanted to find polychlorinated biphenyls containing at least three Chlorines you could perform a substructure search for a biphenyl with a molecular formula of C12H(0-7)Cl(3-10).


In our next post, we will cover some new ways you can search by properties that are stored in our records such as melting point, density, etc.

Hexagons in the Plane

Written by Colin Batchelor.

I’ll be talking at the 6th Joint Sheffield Conference on Cheminformatics in July on Validation and Standardization of Molecular Structures in General and Sugars in Particular. This is a taster.

Sugars in Particular

One of the big problems with chemical structure algorithms is that they can’t, in general, cope with the ways that chemists are accustomed to drawing sugar molecules. They will lose the stereochemistry around the sugar ring, collapsing D-glucose, say, on to L-glucose, not to mention allose, altrose, gulose and all the others.

(ChemDraw, I should note, can interpret chair stereo properly, but it is very much an exception.)

The first step in determining correct stereochemistry for a chair atom is recognizing a chair hexagon. That is the subject of this post.

Have you ever been in the same car as a satnav (US readers: this is the same as a GPS)? Whereas a human navigator will give general instructions like “go straight over all of the roundabouts till we reach the Red Lion”, a satnav only ever gives single-step, local instructions. “At the roundabout, take the third exit.” “In 100 metres, turn left.” Machine structure perception is rather like this. Instead of apprehending in an instant that the hexagon is a chair or a boat like you or I would, the algorithm needs to step around the structure atom by atom, bond by bond.

The trick to identifying what kind of hexagon we are dealing with is to see whether, at each atom, we turn left or right. If we keep turning in the same direction all the way round, then we have a regularish hexagon. If we turn once in one direction, then twice in the other, then once in the first, then twice in the other, then we have a chair. There are six other sorts of hexagon you can draw, and they’re all depicted below alongside the corresponding sequences of turns.

Some of them are familiar, like the boat, the twist boat, and the envelope. Others, less so.

Hexagons
What happens when we’ve identified the atoms in the chair? I’ll come to that in more detail soon, but in the mean time here’s the slides from the ACS Spring meeting in New Orleans:

Wedges, hashes and a side order of Grice

Written by Colin Batchelor.

No (This is not a post about carbohydrates, despite the title!)

Dodgy stereochemistry is a persistent problem.  Even if someone knows all of the stereocentres in a particular molecule, they might not necessarily draw them in a way that a machine, or even a person, can interpret.  There are rules about whether the pointy end or the blunt end of a bond indicates the stereocentre, and it’s surprising how often you see them done wrongly.

Today I’m going to talk about a particular IUPAC recommendation for drawing stereocentres that might at first glance seem surprising, the rule that you may only have one stereobond at a given stereocentre. If you have a wedged bond attached to an atom, you can’t have a hashed bond attached to the same atom. And vice versa.

Why is this?

You might think that as you’re supplying more information, you’re making the diagram easier to interpret. However, you’re running directly counter to the normal principles of communication.  You’re being more informative than required, and this sets off alarm bells in the reader.  What are you trying to say?  If you ask a passerby the time and they say “Well, it’s half past six Greenwich Mean Time” you’re entitled to wonder why they’re quoting the timezone. Maybe they’re trying to be funny.

Paul Grice thought about this whole problem in the 1970s and came up with a set of four principles, summarized in maxims, that listeners (or readers) assume that speakers are following.  These are they:

  • Be Truthful. Do not say what you believe to be false. Do not say that for which you lack adequate evidence.

Let us hope that this one is implicit in any chemical drawing!

  • Make your contribution as informative as is required.  Do not make your contribution more informative than required.

If you have two methyl groups coming off an atom, do not make one wedgy and one hashy. You are adding no new information!

Do not mark carbons with the letter C unless your target audience is schoolchildren.

  • Be relevant:

On the grand scale: do not illustrate an article with any old molecule—make sure the molecule mentioned is actually relevant.

On the scale of the drawing itself, however: If you have three bonds about an ordinary p-block atom, for example, make sure they’re at 120 degrees to each other.  If they aren’t, for example if two of them are at right angles, the reader will infer that something odd is going on.

  • Be clear:

Make sure all your double bonds actually look like double bonds rather than a single bond parallel to another single bond.  I suspect a lot of the success of ChemDraw is down to the fact that it produces attractive, clear chemical drawings.

Do people ever flout the maxims on purpose?

Oh yes.  People often flout the maxims when trying to be funny, or in a political interview.  Similarly there are all kinds of Gricean violations in the chemical drawings you see in patents: bonds which do not quite extend all the way to atoms, R groups labelled as Y (particularly dangerous as Y is yttrium!) or Q or W (also tungsten) or some other unusual letter and so forth.  Exactly why this happens so much more often in patents than in journal articles is left as an exercise for the reader.

Putting sugar in perspective

Written by Colin Batchelor.

You might not think so, but you’re very good at taking a two-dimensional drawing and converting it into a three-dimensional shape in your head. No, really, you are.

Galactose in perspective

Fig. 1. Galactose in perspective

Take the drawing of galatose in Fig. 1. Even if you’re not a chemist, you can tell which bits of the ring are at the front and at the back, which bonds point up and which bonds point down. If you actually are a chemist, you’ve been trained to apply this geometrical intuition to work out what’s going on at each of the five stereocentres.

However, if you ask the InChI algorithm about the stereochemistry of this molecule, it’ll say that there is no stereochemistry in there and you’re looking at a stereoless description of which atom is attached to which. Since we use the InChI algorithm to say whether two records describe the same molecule, this puts us in a quandary, and there are thousands of entries in ChemSpider that come from just such a drawing and hence lack stereochemistry.

Read more »