"How do you find one from a million compounds?"
Sena stared at the database with an overwhelmed expression.
"Similarity search," Akira answered.
"Similarity?"
"Find compounds similar to a known active compound. If similar, they likely have similar activity."
Eiji supplemented. "The similarity principle. Foundation of cheminformatics."
"But how do you judge 'similar'?"
Akira began explaining. "Molecular descriptors. Representing compound features numerically."
"What features?"
"Molecular weight, LogP, hydrogen bond donor count, acceptor count..."
"That's all?"
"No, there are more complex descriptors. Like fingerprints."
"Fingerprints?"
"Representing molecular substructures as bit strings. For example, if there's a benzene ring, 1, if not, 0."
Sena understood. "Digitizing structure."
"Yes. ECFP, MACCS keys, various types."
Eiji demonstrated. "Calculate the fingerprint of this active compound."
A row of 0s and 1s appeared on screen.
"Next, calculate Tanimoto similarity with the entire database."
"Tanimoto?"
"The number of common bits divided by total bits. Between 0 and 1."
"Closer to 1 means more similar?"
"Correct."
Calculation began. One million compounds, completed in minutes.
"Extract top 100 compounds," Akira operated.
A list displayed. Tanimoto coefficient above 0.7.
"Hit compounds are likely among these."
Sena had a question. "But if too similar, isn't it no different from the known compound?"
"Sharp," Eiji was impressed. "So moderate similarity is ideal."
"Moderate?"
"Around 0.5 to 0.7. Keeps the core while having novelty."
Akira introduced another method. "There's also a technique called scaffold hopping."
"Scaffold?"
"The molecular skeleton. Replace it with a completely different skeleton."
"Not similar, but keeps activity?"
"Yes. If the three-dimensional shape and pharmacophore are similar, activity can be shown even with different skeletons."
Sena was confused. "But that's not similarity search, right?"
"Not structural similarity, but shape similarity."
Eiji supplemented. "So we use 3D fingerprints. Descriptors considering conformation."
"Complex..."
"But powerful. Effective for patent avoidance and novelty assurance."
Akira enlarged one hit compound.
"This is our candidate. Tanimoto coefficient 0.65."
"Just the right similarity."
"Yes. But this isn't the end."
"There's more?"
Eiji explained. "Validate with docking. Even high similarity doesn't guarantee binding."
"Why?"
"Because two-dimensional similarity and three-dimensional compatibility are different."
Akira continued. "So similarity search is the first screening stage. Then docking, then experiments."
Sena summarized in her notebook. "Similarity search → docking → experiments. Narrow down stepwise."
"Yes. Candidates decrease at each stage. Start with a million, end with under ten."
"Efficient."
"But," Eiji warned, "not perfect. Some compounds are missed by similarity search."
"Missed?"
"Compounds completely dissimilar to known ones but active. Novel scaffolds."
"Can't find those?"
"Difficult with similarity search. So we also use random screening or phenotypic screening."
Akira concluded. "Similarity search is one tool. Not omnipotent, but useful."
Sena gazed at the screen. A sea of a million compounds. One hit compound sunk within. Following the thread of similarity to pull it up.
"I hope the compound we find really works."
"Experiments will tell us," Eiji smiled.
The journey of similarity search has just begun.