CONSTRUCTING SENTIMENT MINING MODEL FOR OPINIONATED AFAN OROMO TEXTS ON ETHIOPIAN POLITICS

CONSTRUCTING SENTIMENT MINING MODEL FOR OPINIONATED AFAN OROMO TEXTS ON ETHIOPIAN POLITICS

ABDUSELAM M, DURSA

URI: http://hdl.handle.net/123456789/956

Date: 2017-06-15

Abstract:

World Wide Web allows Internet users to collaborate and share information online, and therefore create large virtual societies. Afaan Oromo users of social network sites (Blogs, Twitters, Facebook, Wikis, Web services, podcasting, Multimedia sharing services) generate daily a large volume of Afaan Oromo textual reviews related to different social, political and scientific subjects. This large volume of different Afaan Oromo textual reviews cannot be analyzed manually. Sentiment analysis has demonstrated that the computational recognition of emotional expression is possible. However, success has been limited to a number of coarse-grained approaches to human emotion that have treated the emotional connotations of text in a naive manner: as being positive, negative or neutral. To overcome the problem, the main focus of this research is on sentiment classification of opinionated Afaan Oromo political reviews. Experimentation of data preprocessing task has been performed to make the input review document suitable for the tool which is NLTK (natural language toolkit) is used for preprocessing. In the preprocessing phase we performed tokenization, normalization, stop words removal and stemming. Two classification Machine learning techniques such as Decision Tree (DT) and Naïve Bayes (NB) were experimented for building and evaluating the model. For this study text document corpus used for the experiment is prepared by the researcher encompassing different news websites such as http://www.voaafaanoromoo.com//comments http://www.orto.gov.et/comment, http://www.gulelepost.com/comment, https://twitter.com/eprdf/ comment, http://ethiopolitics.com/news in addition we collected reviews by a questionnaire that organizes structured reviews of the known political organization in Ethiopia. The collected data of review corpus was preprocessed by various techniques of text pre-processing techniques including tokenization; normalization, stop word removal and stemming are used for text so as to identify terms that are expressing political opinions for classification into positive, negative and neutral polarity. An experimental result shows that the polarity classification of feature using Naïve Bayes algorithm registers 88% accuracy as compared to decision tree with 83%. The challenging tasks in the study are absence of standard corpus, handling synonymy and polysemy, inability of the stemmer algorithm to all word variants, and ambiguity of words in the language. These challenges are not solved yet. The performance of the system can be increased if stemming algorithm is improved because stemming is reduce the morphological variation of language that used to identifying most relevant features for learning: it eliminates irrelevant and redundant features of the data thus improving the performance of learning algorithm. We conclude that, the result proved that feature based sentiment analysis is a good indicator in determining the classification types of political reviews. For further research in this area one major recommendation is construction of standardized text corpus sentiment analysis.

Show full item record