Beyond Flat Walks: Compositional Abstraction for Autoregressive Graph Generation

Kaiwen Bian ^*,1,2

Andrew H. Yang ^*,1

Ali Parviz ¹

Gal Mishne ^†,1

Yusu Wang ^†,1

¹ Halıcıoğlu Data Science Institute, UC San Diego

² Computational Neurobiology Laboratory, Salk Institute for Biological Studies

^* Equal contribution ^† Equal advising

Paper Code HuggingFace Visualizer

Abstract

Autoregressive graph generation is commonly formulated by linearizing a graph into a sequence via a flat walk over its nodes and edges. While effective for small molecular graphs, such representations do not scale gracefully to large, structurally complex biological graphs, where higher-order organization is both dense and functionally important. To address this limitation, we introduce a hierarchical graph abstraction together with a coarse-to-fine autoregressive pipeline that separates coarsening, tokenization, and decoding, and we study sequence constructions that expose hierarchy in the tokenization. Concretely, we compare a flat representation with two hierarchical alternatives: one that explicitly factorizes communities and their interactions, and another that induces hierarchy through a depth-first traversal. Across eight benchmarks, differences are modest on simpler domains but become pronounced on larger, more structurally complex settings such as cyclic peptides, proteins, and hard conditional-generation tasks. In these regimes, flat representations degrade in validity or distributional fidelity, while the traversal-based hierarchical representation is consistently more reliable; the explicitly factorized variant is generally weaker, suggesting that how hierarchy is exposed in the sequence matters. Finally, the same coarse-to-fine representation naturally enables motif-conditioned generation without architectural changes: on larger and more structurally complex graphs, conditioning on higher-level prefixes yields strong motif retention together with high validity, uniqueness, and novelty, whereas flat representations can exhibit severe mode collapse under comparable constraints.