Etinosa Osaro

Chemical and Biomolecular Engineering

Faculty Advisor: Yamil Colón

Development of a Universal Adsorption Model for Efficient Energy Applications: Simultaneous Exploration of Molecule and Material Space using Active Learning

Metal-organic frameworks (MOFs) are a promising class of porous, crystalline materials for numerous energy- based applications. For instance, a MOF with the “right” adsorption properties could enable replacing a given thermal-based, chemical separation process with an adsorption-based one, which could in turn bring up a 10-fold increase in energy efficiency. As chemical separation account for roughly 15% of U.S. energy usage, and about 80% of these separations are done thermally, finding the “right” MOF for each separation could potentially reduce

U.S. energy expenditure by around 11%. Given i) the overwhelmingly large MOF “design space” (with trillions of potential designs), and ii) the thousands of chemical separations, each which could be potentially performed at a variety of different operating conditions (OC, e.g., temperature, pressure, relative proportion of components), one can imagine that computation is to play a central role in identifying for each chemical separation the most promising MOF with the corresponding “optimal” operating condition. The challenge is that classical simulations methods to predict adsorption (e.g., grand canonical Monte Carlo (GCMC)) are just “fast enough” to make thousands to hundreds of thousand adsorption predictions in a reasonable timeframe. However, finding the optimal MOF-OC combination for each chemical separation of interest is a task that would probably entail trillions of adsorption predictions. Thus, faster methods such as machine learning (ML) are better poised to take such tasks, however these ML methods require a very large amount of training dataset, which is also computationally expensive. This calls for the need for a better ML method – one that creates an accurate predictive model trained on a relatively small data set. This new approach is referred to as Active Learning for adsorption simulations.

Active learning (AL) assumes a pivotal role in strategically navigating the intricate "adsorption space," effectively mitigating the challenges associated with data generation while facilitating the training of highly predictive machine learning (ML) models. To illustrate, in our prior research, we initiated the deployment of a Gaussian process regression (GPR) framework. This framework was designed to model the pure component adsorption of nitrogen at 77K from 10-5 to 1 bar, methane at 298K from 10-5 to 100 bar, carbon dioxide at 298K from 10-5 to 100 bar, and hydrogen at 77K from 10-5 to 100 bar across eleven diverse sets of MOFs. Within the GPR framework, an initial model is trained using a dataset known as the "prior." Subsequently, the model undergoes retraining upon the sequential addition of adsorption data to the dataset. This addition is determined by the uncertainty of the Gaussian process (GP) model, evaluated on a new dataset. In a noteworthy demonstration [1], we showcased that with active learning, we achieved the capability to predict full adsorption isotherms, including 64 data points in this instance, using a model trained with only 9 active learning data points. This exemplifies the efficiency and predictive power gained through the strategic application of active learning in the context of adsorption simulations. Performing these simulations for four molecules and eleven MOFs yielded a total of 44 models, underscoring the complexity of the task. However, considering the vast scope involving thousands of molecules and millions of MOFs, the potential combinations become combinatorically expansive, verging on the infinite. Considering this challenge, the overarching goal of this research is to craft a unified model capable of navigating the entire landscape, encompassing all synthesized and predicted-for-synthesis millions of MOFs along with thousands of molecules. This ambitious endeavor aims to streamline the intricate process of MOF design for a multitude of energy applications.

Research Objectives

The outcomes of this project are poised to mark a significant breakthrough in the realm of chemical separations employing Metal-Organic Frameworks (MOFs). The potential to predict comprehensive adsorption isotherms for any MOF and gases, whether in pure components or mixtures, across diverse thermodynamic conditions holds immense value for the screening of MOFs in various chemical separation processes. This capability is instrumental in identifying the optimal MOF candidates for specific chemical separations, aligning with the overarching vision of the project. Preliminary findings strongly affirm this vision, particularly evident in the application of Active Learning (AL) to the 44 adsorbate-absorbent pairs. The results underscore the remarkable capacity of AL to precisely forecast full adsorption isotherms with minimal simulations. This early success validates the efficacy of AL as a powerful tool in achieving accurate predictions, setting a promising trajectory for the project's objectives outlined below, which collectively encapsulate the comprehensive scope and goals of the research.

Objective 1: Unified Molecule Navigation: Simultaneous Exploration of All Molecules Across some MOFs. Molecules (e.g., methane, nitrogen, benzene, water, etc.) can be studied and modeled using some defined combinations of intermolecular (e.g., effective well-depth and the effective distance at which the intermolecular potential between two particles is zero) and intramolecular potentials (bond-length and charges). Alchemical molecules on the other hand have arbitrary random combinations of these potentials. Building a model around alchemical molecules proves valuable for navigating real molecules, leveraging the defined potential combinations inherent in alchemical species. A previously developed multi-layer perceptron (MLP) model, trained on an extensive dataset of approximately 5 million grand canonical Monte Carlo (GCMC) data points, encompassing pure component information for 200 alchemical adsorbates from 1800 topologically and chemically diverse ToBaCCo-generated MOFs at varying fugacities, has significantly advanced adsorption studies. This MLP model has demonstrated accuracy across a diverse set of real molecules [2], with each MOF requiring a comprehensive full adsorption isotherm dataset of about 2800 data points. Our objective is to utilize our established AL framework to demonstrate the ability to make accurate predictions of full isotherms in each MOF with a more streamlined dataset. Preliminary results showcase a notable data savings of 57.5% through the application of AL, indicating that only approximately 2.2 million simulations were necessary to train a new MLP model for adsorption across all 1800 MOFs. With these 2.2 million data points, the ongoing development of a new MLP model is showing promising results in predicting the actual isotherms of real molecules within these 1800 MOFs.

Objective 2: Integrated Molecule and MOF Navigation: Simultaneous Exploration of All Molecules and MOFs. Creating the new MLP model, utilizing the AL-trained data from objective 1, entails implementing AL across all 1800 MOFs. This involves establishing 1800 individual MOF GP models before initiating the training of a single MLP model. Despite the capability to collectively navigate all molecules within each MOF, this quantity remains ambiguous. Consequently, studying 1 million MOFs would necessitate a corresponding 1 million AL scheme. This underscores the necessity of incorporating the MOF features into the navigation process, aiming to construct a singular model that navigates both molecules and MOFs simultaneously. However, this presents a challenge due to the high dimensionality of MOFs, characterized by numerous textural and chemical features. Objective 2 is designed to address this challenge by employing a dimensional reduction technique. This technique effectively captures the variations in MOF features and integrates the reduced-dimension features into the model. Consequently, it enables the simultaneous navigation of all MOFs and molecules, marking a significant advancement in the development of a universal model for the adsorption of all pure component molecules in all MOFs.

Objective 3: Expanding the nearly finalized universal model to encompass entire MOF databases, with additional focus on navigating mixtures of molecules. The ultimate objective of this project is to enhance the robustness of the singular model by incorporating all synthesized MOFs to date. Initially, employing Active Learning (AL), we aim to simultaneously generate full isotherms for all pure alchemical molecules within these MOFs. Subsequently, we will validate the model results by comparing the predicted adsorption of real molecules to the ground-truth data. Once validated, we will extend our study from pure component gases to multi-component gases, building on recent findings demonstrating the capability of AL to predict 3 mixture isotherms in a MOF [3]. This expansion seeks to capture the intricacies of adsorption phenomena in the presence of multiple interacting gas components, a critical aspect in real-world applications. Upon achieving the holistic integration of pure component and multi-component gas scenarios into the model, the research aims to unleash the full potential of this comprehensive tool. A pivotal application of the finalized model involves the systematic screening of all existing MOFs. This screening process is tailored to identify MOFs with optimal attributes for precise energy separation operations across the entire spectrum of gas components. In essence, the goal is to provide a transformative solution for refining and advancing energy separation processes through the judicious selection and application of MOFs based on the insights derived from this sophisticated, all-encompassing model.

[1] Osaro, E., Mukherjee, K., & Colón, Y. J. (2023). Active learning for adsorption simulations: evaluation, criteria analysis, and recommendations for metal–organic frameworks. Industrial Engineering Chemistry Research, 62(33), 13009-13024. https://doi.org/10.1021/acs.iecr.3c01589

[2] Anderson, R., Biong, A., & Gómez-Gualdrón, D. A. (2020). Adsorption isotherm predictions for multiple molecules in mofs using the same deep learning model. Journal of Chemical Theory and Computation, 16(2), 1271-1283. https://doi.org/10.1021/acs.jctc.9b00940

[3] Mukherjee, K., Osaro, E., & Colón, Y. J. (2023). Active learning for efficient navigation of multi-component gas adsorption landscapes in a mof. Digital Discovery, 2(5), 1506-1521. https://doi.org/10.1039/d3dd00106g