-
Notifications
You must be signed in to change notification settings - Fork 56
config.json broken for F2LLM-v2-80M and F2LLM-v2-330M — max_window_layers exceeds num_hidden_layers, causes StrictDataclassClassValidationError in transformers 5.x #45
Description
Summary
Both codefuse-ai/F2LLM-v2-80M and codefuse-ai/F2LLM-v2-330M ship with a structurally invalid config.json that prevents the models from loading under transformers >= 5.0.0. The models are currently impossible to containerise or deploy without manual patching.
Root Cause
The config.json for both models declares:
| Model | num_hidden_layers |
max_window_layers |
|---|---|---|
| F2LLM-v2-80M | 8 |
28 |
| F2LLM-v2-330M | 16 |
28 |
max_window_layers is a Qwen3 architecture field that defines how many layers use sliding window attention. It must not exceed num_hidden_layers — having max_window_layers: 28 in a model with only 8 or 16 total layers is architecturally undefined.
This appears to be a copy-paste artifact from the larger base model (likely Qwen3-0.6B, which has 28 layers). Transformers 4.x silently ignored the inconsistency; transformers 5.x enforces cross-field validation in Qwen3Config and raises a hard error.
Error
transformers.utils.dataclasses.StrictDataclassClassValidationError:
Validation failed for Qwen3Config: max_window_layers (28) must be ≤ num_hidden_layers (8)
Steps to Reproduce
from transformers import AutoModel
# Fails under transformers >= 5.0.0
model = AutoModel.from_pretrained("codefuse-ai/F2LLM-v2-80M")Environment:
transformers >= 5.0.0huggingface_hub >= 0.27.0
Expected Behaviour
The model loads without error.
Requested Fix
A one-line change to config.json in each model repository:
// F2LLM-v2-80M
"max_window_layers": 8,
"use_sliding_window": false
// F2LLM-v2-330M
"max_window_layers": 16,
"use_sliding_window": falseSetting use_sliding_window: false is also advisable since models this small derive no practical benefit from the sliding window mechanism.
Workaround (for users until the fix is published)
import json, os
from huggingface_hub import snapshot_download
from transformers import AutoModel
model_path = snapshot_download("codefuse-ai/F2LLM-v2-80M")
config_path = os.path.join(model_path, "config.json")
with open(config_path, "r") as f:
cfg = json.load(f)
cfg["max_window_layers"] = cfg["num_hidden_layers"]
cfg["use_sliding_window"] = False
with open(config_path, "w") as f:
json.dump(cfg, f, indent=2)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)