Author: Mohammadreza Khaniha (University of Iceland) - Reducing CO₂ emissions through electrochemical conversion into value-added chemicals is a promising approach to address climate and energy challenges. However, discovering efficient catalysts remains a bottleneck due to the high cost of density functional theory (DFT) calculations and the vast design space. This project focuses on developing a machine learning framework that combines active learning and causal inference to improve catalyst prediction efficiency and reliability.

Our model learns from a partially labeled dataset of doped transition metal carbides, where only a subset has DFT labels. Active learning is used to select structurally and energetically informative candidates for further evaluation, allowing high model accuracy with minimal computation. This strategy reduces redundancy and accelerates learning by focusing only on data that improve performance.

In parallel, we integrate causal inference techniques, particularly the Probabilistic Variational Causal Effect (PACE) model by Faghihi and Saki (2024). Unlike correlation-based methods, PACE estimates causal effects while accounting for the rarity or frequency of configurations. This helps the model focus on rare but impactful dopant combinations that might drive catalytic performance.

The system uses a modular design with a general model trained on broad catalyst data and a task-specific model adapted for CO₂ reduction. We are currently validating this framework using existing data. The developed model will serve as a foundation for future large-scale catalyst screening efforts.