Skip to content

[Feature]: Compound Context Thresholds and Provider-Level Model Limits #505

@Benedek45

Description

@Benedek45

Problem

The current maxContextLimit / minContextLimit fields accept either a token count or a percentage — but not both simultaneously. This creates a real cost problem for large-context models.

For a 1M context model, the two formats diverge dramatically:

  • "80%" → triggers at 800,000 tokens — extremely expensive before compaction kicks in
  • 200000 → triggers at 200,000 tokens (~20%) — sensible for most coding sessions

There's currently no way to express "compact at whichever comes first" in a single rule. Users with mixed model setups (e.g. Claude Sonnet at 200k + Gemini 2.5 Pro at 1M) have to manually calculate and hardcode per-model token limits in modelMaxLimits, which doesn't scale.

A related issue: modelMaxLimits / modelMinLimits require fully-qualified provider/model keys. There's no way to set a limit for an entire provider at once, meaning users with many models from the same provider (e.g. multiple Gemini variants) must duplicate entries.

Proposed Solution

  1. Compound threshold syntax (OR / AND logic)

Allow maxContextLimit and minContextLimit (and their per-model overrides) to accept an object with an explicit combination mode:

"maxContextLimit": {
    "tokens": 200000,
    "percent": "80%",
    "mode": "first"  // trigger whichever threshold is hit first (OR logic)
    // alternatively: "mode": "both" for AND logic (both must be exceeded)
}

"first" (OR) is the useful default for cost control: cap at 200k absolute even if the model has 1M context, but still respect 80% for smaller models where 200k would never be reached.

Backward compatibility: plain number and plain string continue to work exactly as today.

  1. Provider-level wildcards in modelMaxLimits / modelMinLimits

Allow provider-level keys as a fallback, matched before the global default but after exact model matches:

"modelMaxLimits": {
    "google/*": 180000,                         // all Google models
    "google/gemini-3.0-pro": 200000,            // overrides the wildcard for this specific model
    "anthropic/claude-opus-4-6": "70%"
}

Resolution order (highest to lowest priority):

  1. Exact provider/model match
  2. Provider wildcard provider/*
  3. Global maxContextLimit

Alternatives Considered

  • Manually specifying every model in modelMaxLimits — works today but doesn't scale and breaks silently when new model versions are added
  • Setting a very conservative global percentage — penalizes small-context models unnecessarily

Additional Context

As 1M+ context models become common , the gap between percentage-based and absolute thresholds grows large enough to cause significant unexpected cost. A session that runs to 80% of a 1M context window can cost an order of magnitude more than one capped at 200k. The compound threshold would let users set a single sane rule that works correctly across all model sizes without per-model manual tuning.

Provider wildcards are a quality-of-life improvement for the same scenario — users shouldn't need to enumerate every Gemini variant to apply a consistent limit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions