Skip to content


OpenPecha is an etext and annotations store made available on GitHub and through a set of APIs.

The project’s primary aim is to facilitate the collection, proofreading, and enrichment of etexts by leveraging language technology and collaboration.

New to OpenPecha? Here are a few places to get started using our data and tools:

  • Download a featured dataset

    Get the latest Pecha datasets to train Tibetan-language AI models.

    Featured datasets

  • OCR scanned BDRC books

    Use the OCR Pipeline to OCR scans in the BDRC collection.

    OCR books

  • Get Pecha Toolkit

    Install Pecha Toolkit with pip and get up and running in minutes.

    Pecha toolkit

  • Get the latest news

    Read our blog to learn the latest from OpenPecha and the Tibetan AI space.

    OpenPecha blog