skip to content

Faculty of Asian and Middle Eastern Studies


The International project of CAMBEL (Cambridge-Belgrade Persian Learner Corpus) aims to provide a collection of written materials produced by learners of Persian around the world.


The CAMBEL Persian Learner Corpus is an error-tagged (coded) learner corpus of the written productions of Persian learners who have diverse linguistic backgrounds at A1 to C2 CEFR levels (beginner to Advanced) from all over the world. CAMBEL has been jointly compiled, recorded, and administered as part of academic research collaboration between Persian Studies in the Faculty of Asian and Middle Eastern Studies at the University of Cambridge (Dr. Mahbod Ghattari) and the Centre of Persian Studies in the Faculty of Philology at the University of Belgrade (Dr. Saeed Safari). Work on CAMBEL is ongoing, and new texts are continuously added and tagged.


Linguistic corpora constitute reliable sources and empirical means for analysing linguistic data. They are also widely used in the fields of Second/Foreign Language Acquisition and Foreign Language Teaching research, where the most commonly used type are Learner Corpora.

A learner corpus consists of authentic materials (written, spoken or mixed) produced by learners in the course of their learning endeavours. Systematic Collection, process and analysis of data are crucial for educators, researchers and material developers when it comes to identifying learners' challenges and difficulties, improving curricula, creating effective learning materials and conducting thorough error analysis. As there is a lack of such resources in the field of teaching Persian to non-Persian learners, there is an urgent need for specialised, streamlined corpora tailored to Persian as a second/foreign language. The development of CAMBEL for the Persian language aims to address this need and contribute to the advancement of research in this field.

The Cambridge-Belgrade Persian Learner Corpus (CAMBEL) is formed by merging SFLC Error Tagged Learner Corpus developed at the University of Belgrade with the Persian Learners Written Data (PLWD) at the University of Cambridge. To set up the CAMBEL, three major stages, namely, constructing the corpus, proposing a system of error annotation and developing tools and software, were followed, and the practical phases such as the systematic collection of data and metadata, defining the corpus design criteria, creating the error tagsets and developing the corpus interface, software and specific tools were developed. The CAMBEL software is equipped with four main tools in order to function as an error-tagged learner corpus and provide the statistical reports. The data gathered in this corpus are predominantly written works on variety of topics produced by learners of Persian at different levels (A1-C2) from all over the world, so the corpus present the natural written production of Persian learners who have a range of different first languages. The learners are of both genders, various ages, different educational levels and at different education or academic settings.


The CAMBEL Persian Learner Corpus has been jointly developed by teams of linguists, teachers, lecturers, and professors of Persian around the world under the supervision of Dr Mahbod Ghaffari, University of Cambridge in the UK and Dr Saeed Safari, University of Belgrade, Serbia.




Login to CAMBEL                 

CAMBEL Persian Learner Corpus

How to cite CAMBEL

Ghaffari, M., & Safari, S. (2023). CAMBEL: The Cambridge-Belgrade Persian Learner Corpus. ( (Accessed [insert date accessed]).


PDF Download


For inquiries, please contact Mahbod Ghaffari ( and Saeed Safari (

Faculty Researchers

External researchers