481 Add support for the Mamba2 model. #482

donpellegrino · 2025-05-26T14:32:59Z

This pull request is high risk. It was implemented by an AI with prompting by a user unfamiliar with the codebase. Significant expert review, including a focus on regression testing, is recommended.

This pull request addresses #481.

Anthropic's Claude Code was prompted to port the mamba2 model from the huggingface/transformers repository. The mamba2 model directory was used as a starting point. Initial testing was performed with the AntonV/mamba2-130m-hf pretrained weights. The port successfully generated sensible text in response to the text prompt from the "Usage" section of the "Model card" at AntonV/mamba2-130m-hf. This implementation depends on guillaume-be/rust-tokenizers#105.

Note that AntonV/mamba2-130m-hf weights are published in SafeTensors format. Those weights had to be converted to .bin format and then to the rust-bert format. It seems that publishing the final weight files used by rust-bert to HuggingFace Hub would be a reasonable next step, but that remains to be completed.

Next Steps

Pass all GitHub tests for 104 GPTNeoX Tokenizer rust-tokenizers#105
Publish weights for the test case to HuggingFace Hub in the rust-bert format.
Update the test case to use the published weights directly from HuggingFace Hub.
Pass all GitHub tests for this draft pull request.

- Implement complete Mamba2 architecture with SSM layers - Add custom JSON parsing to handle Infinity values in config - Implement cache-based generation for efficient inference - Add support for conv1d with proper cache handling - Include test demonstrating text generation with 25 tokens - Add minimal example that generates until EOS token or max length - Integrate with generation pipeline via Cache enum - Add README with usage instructions The implementation generates coherent text from prompts, with the first several tokens matching the Python transformers implementation exactly. Minor divergence in later tokens is due to numerical precision differences in the SSM scan implementation. Example usage: cargo run --example mamba2 Generated text example: Input: Hey how are you doing? Output: Hey how are you doing? I'm in the process of getting my first project up and running. I'm currently working on a project that will be a real game... 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

- Update Cargo.toml to use local rust-tokenizers with new tokenizer.json support - Replace hard-coded token IDs with dynamic tokenization using GPTNeoX tokenizer - Load tokenizer directly from AntonV/mamba2-130m-hf tokenizer.json file - Add temperature-based sampling instead of greedy decoding for more natural output - Implement smart stopping at sentence boundaries to prevent rambling - Add progress display showing partial generated text every 10 tokens - Decode and display the actual generated text instead of placeholder The example now provides a complete end-to-end demonstration of using the Mamba2 model with proper tokenization and more natural text generation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

donpellegrino and others added 2 commits May 25, 2025 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

481 Add support for the Mamba2 model. #482

481 Add support for the Mamba2 model. #482

Uh oh!

donpellegrino commented May 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

481 Add support for the Mamba2 model. #482

Are you sure you want to change the base?

481 Add support for the Mamba2 model. #482

Uh oh!

Conversation

donpellegrino commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

donpellegrino commented May 26, 2025 •

edited

Loading