This research applies large language models to decode and design proteins by treating amino acid sequences as biological languages. By identifying hidden structural and functional patterns across massive protein datasets, the work enables creation of novel proteins for medicine, cancer therapy, carbon capture, and environmental remediation beyond naturally evolved biological systems.

The talk describes using AI language models to decipher the hidden “languages” within millions of natural protein sequences. By learning protein vocabulary, syntax, and grammar, researchers can design new molecules that fight cancer, degrade plastics, capture carbon, and expand biology beyond nature’s rules—advancing medicine, sustainability, and molecular engineering.

This research designs simplified, custom-built proteins to understand how natural proteins work and to create new biocatalysts. By choosing a desired function and designing the amino-acid sequence and structure from scratch, the project aims to develop clean, efficient protein-based alternatives to environmentally harmful industrial chemistry.