04 September 2017

Efficiently searching for fragments

What do you do when you find a fragment? After checking for artifacts and getting as much structural information as possible, the next step is usually to test analogs for improved potency. But how do you go about that? Richard Hall and his colleagues at Astex provide their approach in a recent paper in J. Med. Chem.

Readily available analogs can come from two sources. Larger organizations generally have massive libraries of compounds, and it’s easy enough to order these for testing. There are also plenty of commercial vendors, enabling SAR by catalog. But how do you sort through the millions of possibilities to find those that are most likely to improve potency?

Sub-structure searches are generally the first approach: look for fragments containing a central core, perhaps differently decorated. A nice example of this is described here, where a search for related pyrimidines led to an increase in potency by replacing one atom. Sometimes more dramatic changes are necessary though. Searching for similar molecules that do not share the same core can be successful, as in this case, but often requires multiple searches. Also, particularly for smaller fragments, “similarity” can encompass significant differences.

The Astex researchers have created a computational tool to streamline this search procedure. It is called the Fragment Network, which is a “graph database,” a type of database in which information is stored as nodes and edges – like the webpages (nodes) and links (edges) used in Google searches. In the Fragment Network, each fragment is computationally dissected into component parts (such as a phenyl ring or a hydroxyl), with edges representing the connections between the parts (such as carbon-carbon bonds). The database contains about 5 million compounds of up to 24 non-hydrogen atoms, and these are further annotated as to whether they are available in-house or from more or less reliable vendors.

A search of the Fragment Network – which takes just a fraction of a second – can be customized depending on the goal. A default search returns compounds that are up to two edges away from the query, which can yield quite a large number of compounds, many of which would not come up in a substructure query, as shown for the simple but useful 4-hydroxybiphenyl.


Plodding through lists of compounds can be tedious, and one nice feature of the Fragment Network is that it groups compounds by type – so for example the ring substitutions are grouped separately from the linker replacements. Compounds are also sorted by commonality of replacement: for example, published data reveals that the most common replacement of a methyl group is a chlorine atom, followed by a methoxy group, with an amine way down the list.

The researchers applied the Fragment Network retrospectively to two previously disclosed programs, campaigns against protein kinase B and HCV NS3. In both cases the program identified most of the changes explored by the medicinal chemists on the project, as well as some that were not tested. Of course, often times the best fragments are not available and need to be synthesized, and the grouping of results returned by the Fragment Network quickly highlights these regions of less-populated chemical space.

Those of you who have seen Astex researchers present at conferences will be familiar with AstexViewer, a powerful open-source molecular visualization program. Hopefully the code for the Fragment Network will also be publically released. If not, it might be worth talking to your computationally gifted colleagues to see if they can create something similar. In the meantime, how many of you are using something similar?

No comments: