Short Story ◈ Drug Design

The Blueprint for Translation Efficiency Hidden in mRNA Sequences

Exploring strategies to improve translation efficiency through mRNA sequence optimization from a machine learning perspective.

  • #mRNA optimization
  • #codon optimization
  • #translation efficiency
  • #machine learning

"They code for the same protein, but expression levels differ by 10-fold."

Mikhail was comparing two sequences.

"Codon differences?" Eiji guessed.

"Yes. Degeneracy of the genetic code. Same amino acid, multiple codons."

Lina showed interest. "But if the result is the same protein, why do expression levels differ?"

"Because translation efficiency differs. Not all codons are equally efficient."

Mikhail showed a table. "This is codon usage frequency. Preferred codons differ by species."

"E. coli and mammals are completely different," Lina observed.

"Right. Use codons frequent in E. coli in mammals, and corresponding tRNAs are scarce, delaying translation."

Eiji understood. "So optimize codons to match the expression system."

"Correct. This is codon optimization."

Lina asked, "Simply replace with high-frequency codons?"

"Not that simple," Mikhail smiled. "Many other factors exist."

"For example?"

"mRNA secondary structure. Need to avoid sequences where ribosomes get stuck."

Mikhail showed folding predictions. Complex stem-loop structures.

"This structure hides the translation initiation site. That's why expression is low."

Eiji proposed, "Then change codons to disrupt secondary structure."

"Yes. But we can't change the protein sequence. Must choose among synonymous codons."

Lina thought. "A constrained optimization problem."

"Exactly. And recently, we solve it with machine learning."

Mikhail showed a model diagram. "Input is mRNA sequence, output is predicted translation efficiency."

"What kind of model?"

"Transformer-based deep learning. Can capture sequence context."

Eiji showed interest. "Training data?"

"Large-scale expression experiment data. Tens of thousands of mRNA sequences with their expression levels."

Lina asked, "But don't other factors affect expression levels?"

"They do. mRNA stability, 5'UTR, 3'UTR, poly-A tail length. All influence it."

"Isn't that too complex?"

"It is complex. That's why deep learning is suitable. Can learn nonlinear relationships."

Mikhail explained the optimization process. "Optimize sequences while predicting translation efficiency with the model."

"Genetic algorithm?" Eiji guessed.

"That's usable too. But recently, gradient-based optimization is also tried."

"Gradients with discrete sequences?"

"Convert one-hot representation to continuous, calculate gradients. Then select the most improving codon."

Lina was impressed. "Mathematically beautiful."

Mikhail opened another screen. "This is the sequence before and after optimization."

"Visually... almost the same?" Eiji said.

"Amino acid sequence is completely identical. But codons changed by 30 percent."

"Results?"

"Translation efficiency improved 5-fold. Reached pharmaceutical-grade expression levels."

Lina confirmed practicality. "Technology also used in mRNA vaccines?"

"Yes. COVID-19 vaccine mRNA is also thoroughly optimized."

"What kind of optimization?"

"Codon optimization, secondary structure adjustment, modified nucleotide incorporation."

Eiji added, "Reducing immunogenicity is also important."

"Correct. Making mRNA less recognizable to the immune system."

Mikhail showed another example. "This codes for the same spike protein. Wild-type and vaccine-type sequences."

"G and C content is totally different," Lina noticed.

"Increasing GC content stabilizes mRNA. But too high increases secondary structure."

"Another trade-off," Eiji said.

"Everything is balance. Optimization is multi-objective optimization."

Lina asked, "The objective function?"

"Maximize translation efficiency, minimize secondary structure, appropriate GC content range, minimize immunogenicity."

"All simultaneously?"

"Search for Pareto optimal solutions. Can't maximize everything, but find well-balanced solutions."

Eiji was impressed. "mRNA design is this deep."

"Optimizing the language of life," Mikhail said quietly. "A blueprint is hidden in sequence information."

Lina smiled. "Deciphering hidden blueprints. That's our job."

"Yes. And drawing better blueprints."

The three saw infinite possibilities in the four-letter language of mRNA sequences.