Conversation
fast_llm/models/gpt/model.py
Outdated
|
|
||
| labels = batch.tokens.crop(labels_begin, labels_end).tokens | ||
|
|
||
| loss_mask = labels >= 0 |
There was a problem hiding this comment.
Is this really what we want? We can't train the model to produce these labels, but it might make sense to compute other losses?
There was a problem hiding this comment.
Can we skip this when not needed?
There was a problem hiding this comment.
Addressed.
AFAIU we want to incorporate all sources of masking into distillation losses:
- padding & spans
- also image placeholder tokens, because we train Image&text => text model, i.e. also in pre-training etc. we do not compute loss on image tokens
fast_llm/models/gpt/model.py
Outdated
|
|
||
| if ( | ||
| self._config.head.distillation_model is not None | ||
| or self._config.decoder.block.distillation_model is not None |
There was a problem hiding this comment.
Activation distillation ignores loss_mask, it uses activation_mask instead.
Does that even make sense? These refer to token prediction which isn't really a thing at the activation stage. I guess we could take the next token but that raises several concerns (especially with MTP). Actually I think we shouldn't mask those. They may not be used for next token prediction, but the keys and values resulting from these activations are used in further down in the sequence, which means we do train these activations. |
✨ Description
Addresses #442
loss_masksshould include padding and image placeholder tokensTODO:
🔍 Type of change
Select all that apply:
📝 Changes
List the key changes introduced in this PR:
✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Dependencies and Configuration
Testing
Performance Impact
📊 Performance Impact Details
If there is any impact on performance, describe it and provide benchmark results, if applicable:
🗒️ Additional Notes
Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.