Create conda environment

Tutorial: Training a Transformer Encoder–Decoder on Interacting RNA Pairs

Overview This tutorial shows how to train a transformer encoder–decoder from scratch on pairs of nucleotide sequences (interacting RNA couples). We use 8 tokens: A, U, C, G, PAD, EOS, UNK, BOS. The model’s number of heads, layers and hidden size are configurable. We add standard sinusoidal positional encoding. We train and validate via the Hugging Face Trainer API.

1. Setup

Create conda environment

conda create -n rna_full_transformer python=3.9 -y conda activate rna_full_transformer

Install dependencies

pip install -r requirements.txt

2. Data Preparation

We assume your data for both training and validation is stored in single TSV files (train.tsv, valid.tsv), where each line contains a source and target sequence separated by a tab ( ), for example:

AUGCUA...	UAGCGA...
GCGUA...	CGCAU...

Each sequence is a contiguous string of A,U,C,G; the STOP token is appended implicitly during tokenization.

rna_transformer/
├── data.py          # loading & preparing Dataset objects
├── tokenizer.py     # vocab definition & Fast tokenizer
├── pos_encoding.py  # sinusoidal positional encoding
├── model.py         # NucConfig + NucTransformer classes
├── train.py         # main() entrypoint: Trainer setup & run
└── utils.py         # seed setting

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data/dry_run		data/dry_run
.gitignore		.gitignore
data.py		data.py
last_eval_nomatch.py		last_eval_nomatch.py
log_callback.py		log_callback.py
model.py		model.py
pos_encoding.py		pos_encoding.py
readme.md		readme.md
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py
train.py		train.py
train_test_split.py		train_test_split.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Setup

Create conda environment

Install dependencies

2. Data Preparation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

1. Setup

Create conda environment

Install dependencies

2. Data Preparation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages