Skip to content

config.json broken for F2LLM-v2-80M and F2LLM-v2-330M — max_window_layers exceeds num_hidden_layers, causes StrictDataclassClassValidationError in transformers 5.x #45

@o7g8

Description

@o7g8

Summary

Both codefuse-ai/F2LLM-v2-80M and codefuse-ai/F2LLM-v2-330M ship with a structurally invalid config.json that prevents the models from loading under transformers >= 5.0.0. The models are currently impossible to containerise or deploy without manual patching.


Root Cause

The config.json for both models declares:

Model num_hidden_layers max_window_layers
F2LLM-v2-80M 8 28
F2LLM-v2-330M 16 28

max_window_layers is a Qwen3 architecture field that defines how many layers use sliding window attention. It must not exceed num_hidden_layers — having max_window_layers: 28 in a model with only 8 or 16 total layers is architecturally undefined.

This appears to be a copy-paste artifact from the larger base model (likely Qwen3-0.6B, which has 28 layers). Transformers 4.x silently ignored the inconsistency; transformers 5.x enforces cross-field validation in Qwen3Config and raises a hard error.


Error

transformers.utils.dataclasses.StrictDataclassClassValidationError:
Validation failed for Qwen3Config: max_window_layers (28) must be ≤ num_hidden_layers (8)

Steps to Reproduce

from transformers import AutoModel

# Fails under transformers >= 5.0.0
model = AutoModel.from_pretrained("codefuse-ai/F2LLM-v2-80M")

Environment:

  • transformers >= 5.0.0
  • huggingface_hub >= 0.27.0

Expected Behaviour

The model loads without error.


Requested Fix

A one-line change to config.json in each model repository:

// F2LLM-v2-80M
"max_window_layers": 8,
"use_sliding_window": false

// F2LLM-v2-330M
"max_window_layers": 16,
"use_sliding_window": false

Setting use_sliding_window: false is also advisable since models this small derive no practical benefit from the sliding window mechanism.


Workaround (for users until the fix is published)

import json, os
from huggingface_hub import snapshot_download
from transformers import AutoModel

model_path = snapshot_download("codefuse-ai/F2LLM-v2-80M")
config_path = os.path.join(model_path, "config.json")

with open(config_path, "r") as f:
    cfg = json.load(f)

cfg["max_window_layers"] = cfg["num_hidden_layers"]
cfg["use_sliding_window"] = False

with open(config_path, "w") as f:
    json.dump(cfg, f, indent=2)

model = AutoModel.from_pretrained(model_path, trust_remote_code=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions