dc.description.abstract |
Amharic is the working language of the Federal Democratic Republic of Ethiopia (FDRE) and
used by the majority of Ethiopians both as mother tongue. Ge'ez on the other hand is the ancient
language that was used in Ethiopia. There are lots of cultural, historical and religious documents
written in Ge’ez language that contain vast amount of important information inscribed in it. There
is a need to access these documents for educational use, scientific research and other purposes.
Most of Amharic speakers are good in formulating queries using Amharic language to access
documents available in Amharic. On the other hand, users that can understand and read Ge’ez
language have difficulty to express their information need by formulating Ge’ez query. Therefore,
to enable such group of users to have access to relevant Ge’ez documents, there should have a way
of translating their queries written in Amharic into Ge’ez queries. Therefore, the purpose of this
study is developing Amharic-Ge’ez Cross Language Information Retrieval (CLIR) by using
Neural Machine Translation (NMT) technique. Recently, neural network-based translation is
applied in different translation tasks and achieved better results than statistical machine translation.
This is the reason behind the selection of this technique to applied in this study.
To develop the NMT model, parallel sentences from religious books such as bible, wudasie
Maryam, Kidasie and Mezmure dawit were collected. The query translation model is developed
using ONMT, an open-source framework and achieved BLEU score of 0.477. The developed
CLIR system is tested with 24 queries and 165 randomly selected documents.
In this study, both monolingual and bilingual experiments were conducted. The performance of
the system measured by using Mean Average Precision (MAP) and recall for both experiments.
Accordingly, the monolingual experiment returned a maximum MAP of 0.62 and recall of 0.74,
and the bilingual experiment returned a maximum MAP of 0.53 and recall of 0.67.
The monolingual experiment achieved better result as compared to the Bilingual experiment. The
reason for low performance of bilingual retrieval is error that arise during query translation.
Therefore, the performance of bilingual retrieval can be improved by ad |
en_US |