Problem
The current maxContextLimit / minContextLimit fields accept either a token count or a percentage — but not both simultaneously. This creates a real cost problem for large-context models.
For a 1M context model, the two formats diverge dramatically:
"80%" → triggers at 800,000 tokens — extremely expensive before compaction kicks in
200000 → triggers at 200,000 tokens (~20%) — sensible for most coding sessions
There's currently no way to express "compact at whichever comes first" in a single rule. Users with mixed model setups (e.g. Claude Sonnet at 200k + Gemini 2.5 Pro at 1M) have to manually calculate and hardcode per-model token limits in modelMaxLimits, which doesn't scale.
A related issue: modelMaxLimits / modelMinLimits require fully-qualified provider/model keys. There's no way to set a limit for an entire provider at once, meaning users with many models from the same provider (e.g. multiple Gemini variants) must duplicate entries.
Proposed Solution
- Compound threshold syntax (OR / AND logic)
Allow maxContextLimit and minContextLimit (and their per-model overrides) to accept an object with an explicit combination mode:
"first" (OR) is the useful default for cost control: cap at 200k absolute even if the model has 1M context, but still respect 80% for smaller models where 200k would never be reached.
Backward compatibility: plain number and plain string continue to work exactly as today.
- Provider-level wildcards in
modelMaxLimits / modelMinLimits
Allow provider-level keys as a fallback, matched before the global default but after exact model matches:
Resolution order (highest to lowest priority):
- Exact
provider/model match
- Provider wildcard
provider/*
- Global
maxContextLimit
Alternatives Considered
- Manually specifying every model in
modelMaxLimits — works today but doesn't scale and breaks silently when new model versions are added
- Setting a very conservative global percentage — penalizes small-context models unnecessarily
Additional Context
As 1M+ context models become common , the gap between percentage-based and absolute thresholds grows large enough to cause significant unexpected cost. A session that runs to 80% of a 1M context window can cost an order of magnitude more than one capped at 200k. The compound threshold would let users set a single sane rule that works correctly across all model sizes without per-model manual tuning.
Provider wildcards are a quality-of-life improvement for the same scenario — users shouldn't need to enumerate every Gemini variant to apply a consistent limit.
Problem
The current
maxContextLimit/minContextLimitfields accept either a token count or a percentage — but not both simultaneously. This creates a real cost problem for large-context models.For a 1M context model, the two formats diverge dramatically:
"80%"→ triggers at 800,000 tokens — extremely expensive before compaction kicks in200000→ triggers at 200,000 tokens (~20%) — sensible for most coding sessionsThere's currently no way to express "compact at whichever comes first" in a single rule. Users with mixed model setups (e.g. Claude Sonnet at 200k + Gemini 2.5 Pro at 1M) have to manually calculate and hardcode per-model token limits in
modelMaxLimits, which doesn't scale.A related issue:
modelMaxLimits/modelMinLimitsrequire fully-qualifiedprovider/modelkeys. There's no way to set a limit for an entire provider at once, meaning users with many models from the same provider (e.g. multiple Gemini variants) must duplicate entries.Proposed Solution
Allow
maxContextLimitandminContextLimit(and their per-model overrides) to accept an object with an explicit combination mode:"first"(OR) is the useful default for cost control: cap at 200k absolute even if the model has 1M context, but still respect 80% for smaller models where 200k would never be reached.Backward compatibility: plain number and plain string continue to work exactly as today.
modelMaxLimits/modelMinLimitsAllow provider-level keys as a fallback, matched before the global default but after exact model matches:
Resolution order (highest to lowest priority):
provider/modelmatchprovider/*maxContextLimitAlternatives Considered
modelMaxLimits— works today but doesn't scale and breaks silently when new model versions are addedAdditional Context
As 1M+ context models become common , the gap between percentage-based and absolute thresholds grows large enough to cause significant unexpected cost. A session that runs to 80% of a 1M context window can cost an order of magnitude more than one capped at 200k. The compound threshold would let users set a single sane rule that works correctly across all model sizes without per-model manual tuning.
Provider wildcards are a quality-of-life improvement for the same scenario — users shouldn't need to enumerate every Gemini variant to apply a consistent limit.