Abstract:
Text summarization is one of the application of natural language processing and is becoming
more popular for information condensation. Automatic text summarization has been shown to
be useful for natural language processing tasks such as question answering or text
classification and other related fields of computer science such as information retrieval. A
growing number of Amharic news service providers are publishing their content online. To
mention a few, the Ethiopian Reporter, Addis Admass, and Addis Zena have been updating
their websites regularly with news of all kinds in an average of 15-20 articles per week. Even
though some researches have been done on Amharic text summarization using different
algorithms like LSA, graph based and open text summarizer and most of them were investigated
for summarizing single Amharic text news. However if a user wants comprehensive information
on a certain topic or different topic at once, it is quite likely that a single document would not
provide all the required information. In such a case multiple documents selected by the user
would be given to a multi document summarization system. In this study the capability of the
open source tool (open text summarizer) for Amharic multi document summarization has been
investigated so as to readdress the gaps in single text summarization. The study focus on
Amharic multi document text news summarization using the open text summarizer. In order to
customize and redesign the open text summarizer the researcher used C# programming
language and visual studio 2010 for redesigning the user interface and for recoding the system.
In addition to that Notepad++ were used for preparing the Amharic lexicons in the form of
xml and notepad were also used for preparing the corpus in the forms of txt format. For
selecting important sentence from a given Amharic multi document news the customized
summarizer considers term frequency of keywords and sentence position method. The data
source for this study were Ethiopian news reporter, Addis Admass, and Addis Zena and walta
information centre and totally 35 Amharic single text news were collected and then the single
text news were grouped in to 10 multi document text news and totally 30 testing were
conducted using 20%,30% and 40% extraction rates. The thesis introduces sentence extraction
based summarization for Amharic multi document text news using open text summarizer. In the
study we have examined the best extraction rate for Amharic multi document text
summarization, we have also showed how the open text summarizer is customized and used for
summarizing Amharic multi document text news, the Amharic language rules that are used for
text summarization and their effects for the summarizer were also investigated. In addition to
that the evaluation techniques for Amharic multi document text summarization were identified
iv
and used. The performance of the customized open text summarizer was evaluated using
intrinsic evaluation techniques. From the intrinsic evaluation techniques both subjective and
objective evaluation methods were used. The summary that is generated at 20% extraction rate
got 60.00% performance and the summary that have been generated at 30% extraction rate
has scored 76.00% and the last summary which have been generated at 40% extraction rate
has got 82.00% out of 100% in subjective evaluation technique. In the objective evaluation
technique we used f-measure and the summarizer at 20% extraction rate performs an average
of 52.14% f-measure, at 30% extraction rate it scored 56.31% f- measure and at 40%
extraction rate the summarizer got f-measure of 57.58%. Generally, the researchers have
found a hopeful result in both the objective as well as in the subjective evaluation measures
from the summarizer.