Skip to content

Conversation

@donpellegrino
Copy link

@donpellegrino donpellegrino commented May 26, 2025

This pull request is high risk. It was implemented by an AI with prompting by a user unfamiliar with the codebase. Significant expert review, including a focus on regression testing, is recommended.

This pull request addresses #481.

Anthropic's Claude Code was prompted to port the mamba2 model from the huggingface/transformers repository. The mamba2 model directory was used as a starting point. Initial testing was performed with the AntonV/mamba2-130m-hf pretrained weights. The port successfully generated sensible text in response to the text prompt from the "Usage" section of the "Model card" at AntonV/mamba2-130m-hf. This implementation depends on guillaume-be/rust-tokenizers#105.

Note that AntonV/mamba2-130m-hf weights are published in SafeTensors format. Those weights had to be converted to .bin format and then to the rust-bert format. It seems that publishing the final weight files used by rust-bert to HuggingFace Hub would be a reasonable next step, but that remains to be completed.

Next Steps

  • Pass all GitHub tests for 104 GPTNeoX Tokenizer rust-tokenizers#105
  • Publish weights for the test case to HuggingFace Hub in the rust-bert format.
  • Update the test case to use the published weights directly from HuggingFace Hub.
  • Pass all GitHub tests for this draft pull request.

donpellegrino and others added 2 commits May 25, 2025 21:15
- Implement complete Mamba2 architecture with SSM layers
- Add custom JSON parsing to handle Infinity values in config
- Implement cache-based generation for efficient inference
- Add support for conv1d with proper cache handling
- Include test demonstrating text generation with 25 tokens
- Add minimal example that generates until EOS token or max length
- Integrate with generation pipeline via Cache enum
- Add README with usage instructions

The implementation generates coherent text from prompts, with the first
several tokens matching the Python transformers implementation exactly.
Minor divergence in later tokens is due to numerical precision differences
in the SSM scan implementation.

Example usage:
  cargo run --example mamba2

Generated text example:
  Input: Hey how are you doing?
  Output: Hey how are you doing?

I'm in the process of getting my first project up and running.
I'm currently working on a project that will be a real game...

🤖 Generated with Claude Code

Co-Authored-By: Claude <[email protected]>
- Update Cargo.toml to use local rust-tokenizers with new tokenizer.json support
- Replace hard-coded token IDs with dynamic tokenization using GPTNeoX tokenizer
- Load tokenizer directly from AntonV/mamba2-130m-hf tokenizer.json file
- Add temperature-based sampling instead of greedy decoding for more natural output
- Implement smart stopping at sentence boundaries to prevent rambling
- Add progress display showing partial generated text every 10 tokens
- Decode and display the actual generated text instead of placeholder

The example now provides a complete end-to-end demonstration of using the
Mamba2 model with proper tokenization and more natural text generation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant