Welcome¶
OpenPecha is an etext and annotations store made available on GitHub and through a set of APIs.
The project’s primary aim is to facilitate the collection, proofreading, and enrichment of etexts by leveraging language technology and collaboration.
-
Download a featured dataset
Get the latest Pecha datasets to train Tibetan-language AI models.
-
Get Pecha Toolkit
Install
Pecha Toolkit
withpip
and get up and running in minutes. -
Use Pecha API
Harness the power of OpenPecha with Pecha API.
-
Get the latest news
Read our blog to learn the latest from OpenPecha and the Tibetan AI space.
Key features¶
A dataset of more than 14,000 texts that is continuously increasing in quantity and quality through contributions from core members and apps that use our APIs.
Files are stored in the OpenPecha format(OPF), in which standoff markdown in annotation layers are linked to a base text layer.
OPF includes a base layer, a table of contents layer, a footnotes layer, and a hyperlinks layer by default.
Virtually unlimited additional layers can be added for witnesses, commentaries, layers of same-type tags, and more.
OpenPecha's Character Coordinate Translation Vector (CCTV) ties tags in annotation layers to characters in the base layer. Whenever a character in the base layer changes position, annotations that link to it are automatically updated to point to its new coordinates.