Featured datasets¶

Open Parallel Corpus

This corpus contains an up-to-date, ever-growing collection of multilingual texts aligned to Tibetan texts (bo) at the sentence-level. It is intended to be used to train an MT model.

Get it on GitHub
Vulgate Kangyur

This Kangyur was created with OpenPecha's Vulgate Generator, which compares instances of a work and compiles a new version using the most common character at each position in the work.

Get it on GitHub