`config.json` broken for F2LLM-v2-80M and F2LLM-v2-330M — `max_window_layers` exceeds `num_hidden_layers`, causes `StrictDataclassClassValidationError` in transformers 5.x

**Summary**

Both [`codefuse-ai/F2LLM-v2-80M`](https://huggingface.co/codefuse-ai/F2LLM-v2-80M) and [`codefuse-ai/F2LLM-v2-330M`](https://huggingface.co/codefuse-ai/F2LLM-v2-330M) ship with a structurally invalid `config.json` that prevents the models from loading under `transformers >= 5.0.0`. The models are currently impossible to containerise or deploy without manual patching.

***

**Root Cause**

The `config.json` for both models declares:


| Model | `num_hidden_layers` | `max_window_layers` |
| :-- | :-- | :-- |
| F2LLM-v2-80M | `8` | `28` |
| F2LLM-v2-330M | `16` | `28` |

`max_window_layers` is a Qwen3 architecture field that defines how many layers use sliding window attention. It **must not exceed** `num_hidden_layers` — having `max_window_layers: 28` in a model with only 8 or 16 total layers is architecturally undefined.

This appears to be a copy-paste artifact from the larger base model (likely Qwen3-0.6B, which has 28 layers). Transformers 4.x silently ignored the inconsistency; transformers 5.x enforces cross-field validation in `Qwen3Config` and raises a hard error.

***

**Error**

```
transformers.utils.dataclasses.StrictDataclassClassValidationError:
Validation failed for Qwen3Config: max_window_layers (28) must be ≤ num_hidden_layers (8)
```


***

**Steps to Reproduce**

```python
from transformers import AutoModel

# Fails under transformers >= 5.0.0
model = AutoModel.from_pretrained("codefuse-ai/F2LLM-v2-80M")
```

**Environment:**

- `transformers >= 5.0.0`
- `huggingface_hub >= 0.27.0`

***

**Expected Behaviour**

The model loads without error.

***

**Requested Fix**

A one-line change to `config.json` in each model repository:

```json
// F2LLM-v2-80M
"max_window_layers": 8,
"use_sliding_window": false

// F2LLM-v2-330M
"max_window_layers": 16,
"use_sliding_window": false
```

Setting `use_sliding_window: false` is also advisable since models this small derive no practical benefit from the sliding window mechanism.

***

**Workaround (for users until the fix is published)**

```python
import json, os
from huggingface_hub import snapshot_download
from transformers import AutoModel

model_path = snapshot_download("codefuse-ai/F2LLM-v2-80M")
config_path = os.path.join(model_path, "config.json")

with open(config_path, "r") as f:
    cfg = json.load(f)

cfg["max_window_layers"] = cfg["num_hidden_layers"]
cfg["use_sliding_window"] = False

with open(config_path, "w") as f:
    json.dump(cfg, f, indent=2)

model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
```


***

Model	`num_hidden_layers`	`max_window_layers`
F2LLM-v2-80M	`8`	`28`
F2LLM-v2-330M	`16`	`28`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`config.json` broken for F2LLM-v2-80M and F2LLM-v2-330M — `max_window_layers` exceeds `num_hidden_layers`, causes `StrictDataclassClassValidationError` in transformers 5.x #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

config.json broken for F2LLM-v2-80M and F2LLM-v2-330M — max_window_layers exceeds num_hidden_layers, causes StrictDataclassClassValidationError in transformers 5.x #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`config.json` broken for F2LLM-v2-80M and F2LLM-v2-330M — `max_window_layers` exceeds `num_hidden_layers`, causes `StrictDataclassClassValidationError` in transformers 5.x #45