Intro¶
OpenPecha Data is a collection of 14,000 repositories—and growing— that each contain free open-source Tibetan text files in the OpenPecha format (OPF), and in some cases aligned translations.
Most repos contain individual texts, and some contain collections. These collections include corpuses, such those created to train translation models, and collections of texts, such as various editions of the Kangyur and Tengyur.
Developers use OpenPecha Data make corpuses, train large language models, and create Tibetan AI. Publishers use it to create e-texts. Academics use it for data-driven research.
-
Download a featured dataset
Get the latest OpenPecha datasets to train Tibetan-language AI models.
-
Get to know the OPF Format
Learn about how the OpenPecha Format is structured and how it works.
-
Understand OpenPecha Data on GitHub
Get up to speed on how OpenPecha Data is organized on GitHub.