Short Story ◈ Drug Design

A Single Hit Compound Sunk in the Sea of Similarity Search

The techniques and challenges of similarity search to find promising candidates from vast compound libraries.

  • #similarity search
  • #chemical space
  • #fingerprints
  • #screening

"How do you find one compound among a billion?"

Sena stared at the screen with an overwhelmed expression.

"Similarity search," Akira answered. "Look for things similar to known active compounds."

"Similar, meaning?"

Eiji joined the explanation. "Structural similarity. Quantified with molecular fingerprints."

Sena showed interest. "Fingerprints?"

Akira drew a diagram. "Represent compound features as bit strings. For example, whether there's a benzene ring, hydroxyl group."

"And that shows similarity?"

"Compare bit strings. More matching bits means more similar."

Eiji opened a database. "This is PubChem. Over 100 million compounds registered."

"Search all of them?" Sena was surprised.

"Efficient algorithms exist. But computational cost is still high."

Akira showed a concrete example. "This compound has nM activity. Want to find similar compounds."

"Similarity threshold?"

"Tanimoto coefficient above 0.7. That is, over 70 percent feature match."

Eiji executed the search. "Results... 12,000 hits."

"Too many!" Sena said.

"So we narrow down further. Drug-likeness, synthetic accessibility, intellectual property status."

Akira supplemented. "There are criteria like Lipinski's Rule of Five."

"Molecular weight below 500, LogP below 5..." Sena wrote in her notebook.

Eiji applied filters. "Down to 200 now."

"Still many."

"From here, human judgment. Look at structures one by one."

Akira scrolled through the screen. "This compound looks good."

"What's good about it?" Sena asked.

"Core structure is the same, but substituents differ. Has novelty."

Eiji analyzed. "But synthesis looks difficult. Can we control this stereochemistry?"

"I think it's possible. With Suzuki coupling and asymmetric reduction."

Sena asked, "Similarity alone doesn't guarantee activity?"

"Exactly right," Akira admitted. "Similar doesn't always mean active."

"Then why search?"

"A probability problem. Much higher probability of finding active compounds than random."

Eiji showed statistics. "Random screening: 0.01 percent hit rate. Similarity search: 5 percent."

"500 times!"

"That's why it's efficient."

Akira pointed out another challenge. "But similarity search has blind spots."

"Blind spots?"

"Can't do scaffold hopping. Can't find compounds with completely different backbones but same activity."

Eiji showed an example. "These two have same activity but totally different structures."

"Similarity is... below 20 percent."

"This can't be found with similarity search."

Sena thought. "Then what do we do?"

"Pharmacophore search," Akira proposed. "Search by pharmacological features, not structure."

"3D arrangement of hydrogen bond donors, acceptors, hydrophobic regions."

Eiji added, "Also, machine learning. Learn nonlinear relationships between structure and activity."

"With deep learning?"

"Graph Neural Networks are promising. Treat molecules as graphs."

Sena organized in her notebook. "Similarity search, pharmacophore search, machine learning..."

"Each has strengths and weaknesses," Akira said. "Smart to use them in combination."

Eiji showed results. "These are final candidates. Selected five."

"Few," Sena said.

"Experimental resources are limited. Need to choose carefully."

Akira pointed to one compound. "This is interesting. Moderate similarity but pharmacophore matches perfectly."

"Machine learning prediction score is also high," Eiji confirmed.

"Then we synthesize this with top priority."

Sena was moved. "Found one pearl from a sea of billions."

"Don't know if it's a pearl yet," Akira smiled. "Won't know until we experiment."

"But much better than searching randomly."

Eiji said, "That's the value of computational drug design. Efficiently explore chemical space."

Sena stared at the screen. Countless compound points spread in high-dimensional space.

"How far does this sea extend?"

"Theoretically, over 10 to the 60th power," Akira answered. "Essentially infinite."

"But," Eiji continued, "from that infinity, we carve out meaningful subspaces. That's our job."

Sena felt like an explorer navigating the sea of chemical space with similarity as her compass.