CAMBEL Persian Learner Corpus

The International project of CAMBEL (Cambridge-Belgrade Persian Learner Corpus) aims to provide a collection of written materials produced by learners of Persian around the world.

Overview

The CAMBEL Persian Learner Corpus is an error-tagged (coded) learner corpus of the written productions of Persian learners who have diverse linguistic backgrounds at A1 to C2 CEFR levels (beginner to Advanced) from all over the world. CAMBEL has been jointly compiled, recorded, and administered as part of academic research collaboration between Persian Studies in the Faculty of Asian and Middle Eastern Studies at the University of Cambridge (Dr. Mahbod Ghattari) and the Centre of Persian Studies in the Faculty of Philology at the University of Belgrade (Dr. Saeed Safari). Work on CAMBEL is ongoing, and new texts are continuously added and tagged.

Background

Linguistic corpora constitute reliable sources and empirical means for analysing linguistic data. They are also widely used in the fields of Second/Foreign Language Acquisition and Foreign Language Teaching research, where the most commonly used type are Learner Corpora.

A learner corpus consists of authentic materials (written, spoken or mixed) produced by learners in the course of their learning endeavours. Systematic Collection, process and analysis of data are crucial for educators, researchers and material developers when it comes to identifying learners' challenges and difficulties, improving curricula, creating effective learning materials and conducting thorough error analysis. As there is a lack of such resources in the field of teaching Persian to non-Persian learners, there is an urgent need for specialised, streamlined corpora tailored to Persian as a second/foreign language. The development of CAMBEL for the Persian language aims to address this need and contribute to the advancement of research in this field.

The Cambridge-Belgrade Persian Learner Corpus (CAMBEL) is formed by merging SFLC Error Tagged Learner Corpus developed at the University of Belgrade with the Persian Learners Written Data (PLWD) at the University of Cambridge. To set up the CAMBEL, three major stages, namely, constructing the corpus, proposing a system of error annotation and developing tools and software, were followed, and the practical phases such as the systematic collection of data and metadata, defining the corpus design criteria, creating the error tagsets and developing the corpus interface, software and specific tools were developed. The CAMBEL software is equipped with four main tools in order to function as an error-tagged learner corpus and provide the statistical reports. The data gathered in this corpus are predominantly written works on variety of topics produced by learners of Persian at different levels (A1-C2) from all over the world, so the corpus present the natural written production of Persian learners who have a range of different first languages. The learners are of both genders, various ages, different educational levels and at different education or academic settings.

Team

The CAMBEL Persian Learner Corpus has been jointly developed by teams of linguists, teachers, lecturers, and professors of Persian around the world under the supervision of Dr Mahbod Ghaffari, University of Cambridge in the UK and Dr Saeed Safari, University of Belgrade, Serbia.

Login to CAMBEL

CAMBEL Persian Learner Corpus

How to cite CAMBEL

Ghaffari, M., & Safari, S. (2023). CAMBEL: The Cambridge-Belgrade Persian Learner Corpus. (xxx.xxxx.xxx) (Accessed [insert date accessed]).

References

PDF Download

Contact

For inquiries, please contact Mahbod Ghaffari (mg695@cam.ac.uk) and Saeed Safari (saeed.safari@fil.bg.ac.rs).

Overview

Background

Team

Login to CAMBEL

How to cite CAMBEL

References

Contact

Faculty Researchers

External researchers

Address:

Find us:

Email:

Telephone:

For AMES staff:

Privacy & cookies:

Study at Cambridge

About the University

Research at Cambridge