Download PDFOpen PDF in browserCurrent versionEvaluation of Arabic Named Entity Recognition Models on Sahih Al-Bukhari TextEasyChair Preprint 9573, version 18 pages•Date: January 15, 2023AbstractIn this paper, four Arabic named entity recognition models were applied to the Sahih Al-Bukhari dataset (CAMeLBERT-CA, Hatmimoha, Marefa-NER, and Stanza). The main aim of this study is to find the best performance of the mentioned tools to be used in other Hadith datasets. Stanza and Marefa-NER models are the best because they obtained 0.826191 and 0.807396 in the F1-score, respectively. A new test dataset was created of around 5000 words based on the CANERCorpus annotation. Then, evaluated all the previous models to the new test dataset and found disappointing scores for all the models in the F1-score although Hatmimoha has the best result. This problem has probably arisen since the dataset is small. However, we observed that the model has many named entity classes and matches the CANERCorpus labels that could obtain a high performance such as Hatmimoha and Marefa-NER models. Keyphrases: Arabic NER Models, CANERCorpus Annotation, Models Evaluation, Sahih Al-Bukhari
|