Skip to content

Welcome

OpenPecha is an etext and annotations store made available on GitHub and through a set of APIs.

The project’s primary aim is to facilitate the collection, proofreading, and enrichment of etexts by leveraging language technology and collaboration.

  • Download a featured dataset


    Get the latest Pecha datasets to train Tibetan-language AI models.

    Featured datasets

  • Get Pecha Toolkit


    Install Pecha Toolkit with pip and get up and running in minutes.

    Pecha toolkit

  • Use Pecha API


    Harness the power of OpenPecha with Pecha API.

    Pecha API

  • Get the latest news


    Read our blog to learn the latest from OpenPecha and the Tibetan AI space.

    OpenPecha blog

Key features

A dataset of more than 14,000 texts that is continuously increasing in quantity and quality through contributions from core members and apps that use our APIs.

Files are stored in the OpenPecha format(OPF), in which standoff markdown in annotation layers are linked to a base text layer.

OPF includes a base layer, a table of contents layer, a footnotes layer, and a hyperlinks layer by default.

Virtually unlimited additional layers can be added for witnesses, commentaries, layers of same-type tags, and more.

OpenPecha's Character Coordinate Translation Vector (CCTV) ties tags in annotation layers to characters in the base layer. Whenever a character in the base layer changes position, annotations that link to it are automatically updated to point to its new coordinates.