mirage

Automatic Amharic Multi document News Text Summarization using Open Text Summarizer

DSpace Repository

Show simple item record

dc.contributor.author Diges, Tsehay
dc.date.accessioned 2020-03-03T11:34:45Z
dc.date.available 2020-03-03T11:34:45Z
dc.date.issued 2018-03-30
dc.identifier.uri http://hdl.handle.net/123456789/2924
dc.description.abstract Text summarization is one of the application of natural language processing and is becoming more popular for information condensation. Automatic text summarization has been shown to be useful for natural language processing tasks such as question answering or text classification and other related fields of computer science such as information retrieval. A growing number of Amharic news service providers are publishing their content online. To mention a few, the Ethiopian Reporter, Addis Admass, and Addis Zena have been updating their websites regularly with news of all kinds in an average of 15-20 articles per week. Even though some researches have been done on Amharic text summarization using different algorithms like LSA, graph based and open text summarizer and most of them were investigated for summarizing single Amharic text news. However if a user wants comprehensive information on a certain topic or different topic at once, it is quite likely that a single document would not provide all the required information. In such a case multiple documents selected by the user would be given to a multi document summarization system. In this study the capability of the open source tool (open text summarizer) for Amharic multi document summarization has been investigated so as to readdress the gaps in single text summarization. The study focus on Amharic multi document text news summarization using the open text summarizer. In order to customize and redesign the open text summarizer the researcher used C# programming language and visual studio 2010 for redesigning the user interface and for recoding the system. In addition to that Notepad++ were used for preparing the Amharic lexicons in the form of xml and notepad were also used for preparing the corpus in the forms of txt format. For selecting important sentence from a given Amharic multi document news the customized summarizer considers term frequency of keywords and sentence position method. The data source for this study were Ethiopian news reporter, Addis Admass, and Addis Zena and walta information centre and totally 35 Amharic single text news were collected and then the single text news were grouped in to 10 multi document text news and totally 30 testing were conducted using 20%,30% and 40% extraction rates. The thesis introduces sentence extraction based summarization for Amharic multi document text news using open text summarizer. In the study we have examined the best extraction rate for Amharic multi document text summarization, we have also showed how the open text summarizer is customized and used for summarizing Amharic multi document text news, the Amharic language rules that are used for text summarization and their effects for the summarizer were also investigated. In addition to that the evaluation techniques for Amharic multi document text summarization were identified iv and used. The performance of the customized open text summarizer was evaluated using intrinsic evaluation techniques. From the intrinsic evaluation techniques both subjective and objective evaluation methods were used. The summary that is generated at 20% extraction rate got 60.00% performance and the summary that have been generated at 30% extraction rate has scored 76.00% and the last summary which have been generated at 40% extraction rate has got 82.00% out of 100% in subjective evaluation technique. In the objective evaluation technique we used f-measure and the summarizer at 20% extraction rate performs an average of 52.14% f-measure, at 30% extraction rate it scored 56.31% f- measure and at 40% extraction rate the summarizer got f-measure of 57.58%. Generally, the researchers have found a hopeful result in both the objective as well as in the subjective evaluation measures from the summarizer. en_US
dc.language.iso en en_US
dc.publisher UOG en_US
dc.subject Amharic Multi document Text Summarization, Open text summarizer, Keyword approach and Sentence Position Methods. en_US
dc.title Automatic Amharic Multi document News Text Summarization using Open Text Summarizer en_US
dc.title.alternative Automatic Amharic Multi document News Text Summarization using Open Text Summarizer en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search in the Repository


Advanced Search

Browse

My Account