481 Add support for the Mamba2 model. #482
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is high risk. It was implemented by an AI with prompting by a user unfamiliar with the codebase. Significant expert review, including a focus on regression testing, is recommended.
This pull request addresses #481.
Anthropic's Claude Code was prompted to port the mamba2 model from the huggingface/transformers repository. The mamba2 model directory was used as a starting point. Initial testing was performed with the AntonV/mamba2-130m-hf pretrained weights. The port successfully generated sensible text in response to the text prompt from the "Usage" section of the "Model card" at AntonV/mamba2-130m-hf. This implementation depends on guillaume-be/rust-tokenizers#105.
Note that AntonV/mamba2-130m-hf weights are published in SafeTensors format. Those weights had to be converted to .bin format and then to the rust-bert format. It seems that publishing the final weight files used by rust-bert to HuggingFace Hub would be a reasonable next step, but that remains to be completed.
Next Steps