A Unified Python Toolkit for Graph Foundation Model Research
Installation · Quick Start · Supported Baselines · Documentation
pygfm is a unified Python toolkit for Graph Foundation Model (GFM) research. It integrates 19 state-of-the-art baseline methods under a single, pip-installable package with shared utilities, standardized interfaces, and fully reproducible experiment pipelines.
Developed by Beihang University · School of Computer Science and Engineering · ACT Lab · MAGIC GROUP.
PyGFM is organized into four stacked layers — Graph Data Abstraction → Alignment & Fusion Bridge → Representation Backbones → Task Heads & Orchestration — with a unified CLI, model recipes, and an auto-experiment tracker sitting on top.
- One package, 19 baselines — prompt-based GFMs, structure-aware models, LLM-integrated approaches, and retrieval-augmented methods all available via a single
pip install. - Reproducible pipelines — every baseline ships with YAML-driven experiment configs, training scripts, and evaluation helpers.
- Shared backbone library — common GNN encoders, loss functions, and data utilities are factored out and reused across all baselines, reducing code duplication.
- CLI-first design — launch pre-training, fine-tuning, and evaluation jobs directly from the command line without writing any boilerplate.
- LLM-ready — first-class support for LLM-integrated GFMs (GraphGPT, GraphText, LLaGA, OneForAll) with HuggingFace-compatible YAML configs.
pip install python-gfm# 1. Install PyTorch with CUDA 12.8 support
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
# 2. Install pygfm with the full ML stack (PyG extensions are resolved automatically)
pip install "python-gfm[torch]" -f https://data.pyg.org/whl/torch-2.8.0+cu128.htmlCPU-only machines: replace the CUDA index URLs with
https://download.pytorch.org/whl/cpuandhttps://data.pyg.org/whl/torch-2.8.0+cpu.htmlrespectively.
git clone <repo-url> && cd pygfm
pip install -e ".[torch,dev]"The dev extra adds pytest and ruff for testing and linting.
import pygfm
print(pygfm.__version__)Run a pre-training job from the CLI:
# SA2GFM contrastive pre-training
gfm-sa2gfm-pretrain -c scripts/sa2gfm/configs/pretrain.yaml
# SA2GFM downstream fine-tuning
gfm-sa2gfm-downstream -c scripts/sa2gfm/configs/downstream.yamlpygfm/
├── src/pygfm/
│ ├── baseline_models/ # 19 GFM baseline implementations
│ ├── public/ # Shared utilities, losses, and backbone encoders
│ │ ├── backbone_models/
│ │ ├── utils/
│ │ └── cli/
│ ├── private/ # Core encoders and internal data generation
│ └── cli/ # Console entry points
└── scripts/ # Per-baseline experiment scripts and configs
├── <baseline>/
│ ├── README.md
│ ├── configs/
│ ├── pretrain.py / downstream.py / ...
│ └── eval_script/
| Category | Methods |
|---|---|
| Prompt-based GFM | MDGPT, SAMGPT, MDGFM, GraphPrompt, HGPrompt, MultiGPrompt, GCoT |
| Structure-aware GFM | SA2GFM, Bridge, GraphKeeper, GraphMore, Graver, BIM-GFM |
| LLM-integrated GFM | GraphGPT, GraphText, LLaGA, OneForAll |
| Retrieval-augmented GFM | RAG-GFM |
| Classic Baseline | Classic GNN |
All scripts are under scripts/<baseline>/ and should be run from the repository root.
# Prompt-based: MDGPT pre-training
python scripts/mdgpt/pretrain.py
# Structure-aware: SA2GFM downstream fine-tuning
python scripts/sa2gfm/downstream.py
# LLM-integrated: GCoT full pipeline
python scripts/gcot/pretrain.py
python scripts/gcot/finetune.py
python scripts/gcot/finetune_graph.py
# LLM-integrated: GraphGPT (YAML-driven HuggingFace-style training)
python scripts/graphgpt/run_with_config.py -c scripts/graphgpt/configs/train_mem_template.yamlAfter installation the following CLI entry points are registered:
| Command | Description |
|---|---|
pygfm / gfm |
Generic YAML-driven runner (-c <config.yaml>) |
gfm-sa2gfm-pretrain |
SA2GFM contrastive pre-training |
gfm-sa2gfm-downstream |
SA2GFM MoE downstream fine-tuning |
All experiment hyperparameters are stored as YAML files under scripts/<baseline>/configs/. Pass configs via the -c flag:
python scripts/<baseline>/pretrain.py -c scripts/<baseline>/configs/default.yamlAPI keys: baselines that call external LLM APIs (e.g., GraphText) read credentials from a local env file. Never commit API keys to the repository. Copy the example template and fill in your keys:
cp scripts/graphtext/config/user/env.yaml.example scripts/graphtext/config/user/env.yaml
# Then edit env.yaml and add your API keyEach baseline ships a dedicated README with setup instructions, data preparation steps, and evaluation notes:
| Baseline | Docs |
|---|---|
| MDGPT | scripts/mdgpt/README.md |
| SA2GFM | scripts/sa2gfm/README.md |
| SAMGPT | scripts/samgpt/README.md |
| MDGFM | scripts/mdgfm/README.md |
| GraphPrompt | scripts/graphprompt/README.md |
| HGPrompt | scripts/hgprompt/README.md |
| MultiGPrompt | scripts/multigprompt/README.md |
| GCoT | scripts/gcot/README.md |
| Graver | scripts/graver/README.md |
| GraphMore | scripts/graphmore/README.md |
| Bridge | scripts/bridge/README.md |
| GraphKeeper | scripts/graphkeeper/README.md |
| GraphGPT | scripts/graphgpt/README.md |
| GraphText | scripts/graphtext/README.md |
| LLaGA | scripts/llaga/README.md |
| OneForAll | scripts/oneforall/README.md |
| RAG-GFM | scripts/rag_gfm/README.md |
| Dependency | Version |
|---|---|
| Python | ≥ 3.12 |
| PyTorch | 2.8.0 (CUDA 12.8 recommended) |
| PyTorch Geometric | ≥ 2.3.0 |
| Transformers | ≥ 4.36.0 |
| Accelerate | ≥ 0.26.0 |
See pyproject.toml for the full dependency specification.
This project is licensed under the Apache License 2.0.
MAGIC GROUP — Beihang University, School of Computer Science and Engineering, ACT Lab.
