Skip to content

How to return_attention from Generator.generate_tokens ? #1994

@Gldkslfmsd

Description

@Gldkslfmsd

Hi,

I would like to access attention from decoder-only LLMs. I need it for simultaneous translation, to detect in each generation step, to which part of source (user message in the prompt) the model is attending most, to see if it is near the end of current source. Then I would stop generation, to continue with next partial source.

Am I right that this feature is not yet implemented for Generator, but is there for Translator? Could it be implemented for Generator? How?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions