How to return_attention from Generator.generate_tokens ?

Hi,

I would like to access attention from decoder-only LLMs. I need it for simultaneous translation, to detect in each generation step, to which part of source (user message in the prompt) the model is attending most, to see if it is near the end of current source. Then I would stop generation, to continue with next partial source.

Am I right that this feature is not yet implemented for Generator, but is there for Translator? Could it be implemented for Generator? How?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to return_attention from Generator.generate_tokens ? #1994

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to return_attention from Generator.generate_tokens ? #1994

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions