Let Code Be Code: Generative Graph Diffusion over Maskable Abstract Syntax Trees
ABSTRACT
Current code generation models typically treat source code as flat token sequences, forcing neural networks to implicitly learn strict grammatical rules, which often results in structural hallucinations and compilation failures. To bridge the gap between sequential modeling and the inherent hierarchical structure of code, this study introduces a novel neurosymbolic framework that natively generates programs as Abstract Syntax Trees (ASTs) via discrete diffusion. The proposed approach couples a Maskable AST (MAST) symbolic engine which deterministically enforces context-free grammar rules with a Relational Graph Convolutional Network (R-GCN) that guides probabilistic semantic generation. By offloading structural constraints to the symbolic engine, the framework guarantees 100% syntactic validity by design. Experiments on the Minimp and Imp languages demonstrate that this approach outperforms token-based baselines under strict computational budgets, the latter of which fail to produce valid code. Furthermore, evaluations on the global and local topologies prove that the model captures local programming idioms and global topologies, establishing the AST diffusion as a reliable paradigm for program generation.