Physics-Guided Protein Engineering
Physics-Informed AI for Intelligent Protein Engineering: The Mutexa Platform
Our research goal is to develop Mutexa [a], a physics-informed artificial intelligence (AI) platform for “intelligent” protein engineering, enabling researchers to identify function-enhancing mutants while uncovering the molecular insights behind unexpected experimental outcomes. Protein engineering, despite over decades of progress, remains reliant on labor- and resource-intensive experimental screening, which delivers only “what you screen for” and offers little insight into the structure–function relationships underlying mutation effects. While AI is widely recognized for its potential to accelerate protein engineering, I doubt AI alone can yield generalizable models given the inherently small, sparse datasets in biomolecular research. Furthermore, these data-driven models contribute little to advancing our knowledge into protein biochemistry and biophysics. I envision the future of “intelligent protein engineering” lies in combining AI with physics-based modeling, where molecular-level electronic structures, protein dynamics, electrostatics, and other microscopic features illuminate sequence–function relationships underlying mutation effects, enabling the creation of generalizable models for computational protein engineering with high fidelity.
To realize this vision, my group built Mutexa in three layers. We established the data backbone with IntEnzyDB[b], a relational sequence–structure–kinetics enzymology database; and a large-language-model (LLM)–plus–human curation pipeline that extracts enzyme kinetic data from >130,000 literature articles [c]. At the heart of Mutexa is a high-throughput molecular modeling engine EnzyHTP that automates molecular mechanical and quantum-mechanical simulations for enzymes, enabling end-to-end molecular characterization of large mutant libraries [d,e]. Built on this stack are interpretable, predictive scoring functions that turn physical principles and data patterns into quantifiable functional metrics, including a molecular-dynamics–derived descriptor that links substrate-positioning dynamics to catalytic turnover [f] and EnzyKR, a chirality-aware deep-learning model for predicting outcomes of hydrolase-catalyzed kinetic resolution [g]. Collectively, these foundational tools behind Mutexa have been used by more than 60 non-Vanderbilt research groups and accessed over 100,000 times, underscoring their broad impact.
Rather than optimizing a single algorithm’s accuracy versus efficiency (a common focus in computational chemistry), Mutexa is designed to harness molecular modeling and machine learning to decode the molecular complexity of enzyme catalysis and to discover enzymes and mutants that elude state-of-the-art screening alone.Although developed for protein engineering, the underlying infrastructure, such as kinetic data extractor, integrated databases, and the EnzyHTP engine for high-throughput enzyme modeling, has far-reaching potential to enable scientific discovery and deepen mechanistic insight across structural biology, biochemistry, and molecular evolution.
a. Yang, Z. J.*; Shao, Q.; Jiang, Y.; Jurich, C.; Ran, X.; Juarez, R.; Yan, B.; Stull, S.; Gollu, A.; Ding, N. “Mutexa: A Computational Ecosystem for Intelligent Protein Engineering” Journal of Chemical Theory and Computation. 2023. 19, 7459–7477.
b. Yan, B.; Ran, X.; Gollu, A.; Cheng, Z.; Zhou, X.; Chen, Y.; Yang, Z. J.* “IntEnzyDB: an Integrated Structure-Kinetics Enzymology Database” Journal of Chemical Information and Modeling, 2022, 62, 5841-5848.
c. Wei, G.; Ran, X.; Al-Abssi, R.; Yang, Z. J.* “Finding the Dark Matter: Large Language Model-based Enzyme Kinetic Data Extractor and Its Validation” Protein Science, 2025, 34,
d. Shao, Q.; Jiang, Y.; Yang, Z. J.* “EnzyHTP Computational Directed Evolution with Adaptive Resource Allocation” Journal of Chemical Information and Modeling, 2023, 63, 5650–5659.
e. Shao, Q.; Jiang, Y.; Yang, Z. J.* “EnzyHTP: A High-Throughput Computational Platform for Enzyme Modeling” Journal of Chemical Information and Modeling, 2022, 62, 647-655.
f. Ran, X.; Jiang, Y.; Shao, Q.; Yang, Z. J.* “EnzyKR: A Chirality-Aware Deep Learning Model for Predicting the Outcomes of the Hydrolase-Catalyzed Kinetic Resolution” Chemical Science. 2023, 14, 12073 – 12082.
g. Jiang, Y.; Yan, B.; Chen, Y.; Juarez, R. J.; Yang, Z. J.* “Molecular Dynamics-Derived Descriptor Informs the Impact of Mutation on the Catalytic Turnover Number in Lactonase Across Substrates” Journal of Physical Chemistry B, 2022, 126, 2486-2495.
Turning Mechanism into Action: Mutexa-Enabled Understanding, Prediction, and Design
Building on Mutexa’s foundational toolkits, my group leverages physics-based modeling to translate mechanistic insight into deployable tools and validated designs. In a recent perspective, we articulated how the “new era” of computational enzyme engineering will be driven by hybrid, mechanism-aware models that are applied to identify beneficial mutants from large libraries while preserving physical interpretability [a]. Anchored in this vision, we developed SubTuner, a physics-based workflow that retunes enzymatic substrate scope toward non-native targets[b]. Despite decades of mechanistic investigations into the molecular origin of enzyme catalysis, it remains elusive how to leverage molecular mechanics and quantum mechanics to virtually identify beneficial enzyme mutants that accommodate a desired non-native substrate. SubTuner was rigorously tested on three tasks for its accuracy, generalizability, and a prior predictivity, each involving hundreds of anion methyltransferase mutants screened for synthesizing S-adenosyl-L-methionine analogs. SubTuner delivered superior accuracy, speed, and generalizability relative to state-of-the-art bioinformatics and machine-learning approaches, identified function-enhancing mutations in a substrate-specific manner, and revealed the mechanistic insights by which the beneficial mutations improve catalysis. These demonstrate how physical hypotheses and high-throughput molecular modeling can be fused into a practical design workflow for modifying enzymes toward new-to-nature substrates.
Furthermore, we leveraged Mutexa to elucidate molecular design principles for endowing natural bidomain enzymes with cold adaptation and demonstrate its potential for designing cold-adapted enzymes computationally and experimentally [c]. Cold-adapted bidomain enzymes are vital for transforming modern industries by decreasing energy consumption, reducing greenhouse gas emission, and fostering sustainability. However, the atomistic principle guiding their acquisition of cold adaptation was elusive. We showed that for mesophilic Pseudomonas saccharophila amylase (psA) and the naturally psychrophilic Saccharophagus degradans amylase (sdA), enhancing the separation between the catalytic domain and the carbohydrate-binding module, as quantified by a domain separation index, is the key structural feature that converts a mesophilic scaffold into a cold-adapted one. Guided by DSI, we used Mutexa_EnzyHTP to generate 3,528 psA linker variants in silico, selected 120 for MD modeling, and identified a cold-adapted 15-aa helical-linker variant (psA121) that maintains ~30% relative activity after a 45 °C drop from its optimum, outperforming the mesophilic wild-type psA by three-fold.
Finally, we demonstrated Mutexa’s transferability beyond enzymes to the high-throughput modeling [d] and structural predictions [e] for ribosomally synthesized and post-translationally modified peptides (RiPPs). Lasso peptides (LaPs), characterized by their entangled slipknot-like structures, are a large class of RiPPs. Only around 50 distinct LaPs have been structurally characterized in the past 30 years. Existing computational tools, such as AlphaFold2, AlphaFold3 and ESMfold, fail to accurately predict LaP structures due to their irregular scaffolds and presence of isopeptide bonds. To predict 3D structures of LaPs, we developed LassoPred [e], a specialized tool designed with a machine learning-based predictor to annotate the ring, loop, and tail of an LaP sequence, and a molecular mechanics-based constructor to build a 3D structure. This integrated approach addresses the limitations of existing structure-prediction methods for these slipknot-like peptides. Leveraging LassoPred, we created the largest in silico database of LaP structures, encompassing 4,749 unique sequences previously curated from bioinformatic analyses. LassoPred is publicly accessible to enable studies on LaP structure-function relationships and facilitates the discovery of functional peptides for antimicrobial and biomedical applications.
Together, these advances illustrate a Mutexa-based paradigm for biomolecular study, progressing from understanding to predicting to designing, demonstrating how an integrated physics-AI approach can modify substrate specificity, shift biocatalytic temperature adaptation, and illuminate knotted peptide structures for functional discovery.
a. Jurich, C.; Shao, Q.; Ran, X.; Yang, Z. J.* “Physics-based Modeling in the New Era of Computational Enzyme Engineering” Nature Computational Science. 2025, 5, 279–291.
b. Shao, Q.; Hollenbeak, A. C.; Jiang, Y.; Bachmann, B. O.*; Yang, Z. J.* “SubTuner Leverages Physics-based Modeling to Complement AI in Enzyme Engineering towards Non-Native Substrates” Chem Catalysis, 2025, 5,
c. Ding, N.; Jiang, Y.; Ge, R.; Ran, X.; Shin, W.; Yang, Z. J.* “Enhance Cold Adaptation of Bidomain Amylases via High-throughput Computational Engineering” Angewandte Chemie International Edition, 2025, e
d. Juarez, R. J.; Jiang, Y.; Tremblay, M.; Shao, Q.; Link, A. J.; Yang, Z. J.* “LassoHTP: A High-throughput Computational Tool for Lasso Peptide Structure Construction and Modeling” Journal of Chemical Information and Modeling, 2023, 63, 522–530.
e. Ouyang, ; Ran, X.; Xu, Han; Zhao, Y.-L.*; Link, A. J.*; Yang, Z. J.* “LassoPred: a Tool to Predict the 3D Structure of Lasso Peptides” Nature Communication, 2025, 16, 5497.
