This research applies large language models to decode and design proteins by treating amino acid sequences as biological languages. By identifying hidden structural and functional patterns across massive protein datasets, the work enables creation of novel proteins for medicine, cancer therapy, carbon capture, and environmental remediation beyond naturally evolved biological systems.

This research uses natural language processing techniques to uncover evolutionary relationships between ancient proteins. By analyzing contextual patterns among amino acids, the new computational tool can identify connections between proteins that diverged billions of years ago, helping scientists reconstruct the history of early microbial life and Earth’s biological evolution.