Abstract:
World Wide Web allows Internet users to collaborate and share information online, and therefore create
large virtual societies. Afaan Oromo users of social network sites (Blogs, Twitters, Facebook, Wikis, Web
services, podcasting, Multimedia sharing services) generate daily a large volume of Afaan Oromo textual
reviews related to different social, political and scientific subjects. This large volume of different Afaan
Oromo textual reviews cannot be analyzed manually. Sentiment analysis has demonstrated that the
computational recognition of emotional expression is possible. However, success has been limited to a
number of coarse-grained approaches to human emotion that have treated the emotional connotations of
text in a naive manner: as being positive, negative or neutral. To overcome the problem, the main focus of
this research is on sentiment classification of opinionated Afaan Oromo political reviews.
Experimentation of data preprocessing task has been performed to make the input review document
suitable for the tool which is NLTK (natural language toolkit) is used for preprocessing. In the
preprocessing phase we performed tokenization, normalization, stop words removal and stemming. Two
classification Machine learning techniques such as Decision Tree (DT) and Naïve Bayes (NB) were
experimented for building and evaluating the model.
For this study text document corpus used for the experiment is prepared by the researcher encompassing
different news websites such as http://www.voaafaanoromoo.com//comments
http://www.orto.gov.et/comment, http://www.gulelepost.com/comment, https://twitter.com/eprdf/
comment, http://ethiopolitics.com/news in addition we collected reviews by a questionnaire that
organizes structured reviews of the known political organization in Ethiopia. The collected data of review
corpus was preprocessed by various techniques of text pre-processing techniques including tokenization;
normalization, stop word removal and stemming are used for text so as to identify terms that are
expressing political opinions for classification into positive, negative and neutral polarity.
An experimental result shows that the polarity classification of feature using Naïve Bayes algorithm
registers 88% accuracy as compared to decision tree with 83%. The challenging tasks in the study are
absence of standard corpus, handling synonymy and polysemy, inability of the stemmer algorithm to all
word variants, and ambiguity of words in the language. These challenges are not solved yet. The
performance of the system can be increased if stemming algorithm is improved because stemming is
reduce the morphological variation of language that used to identifying most relevant features for
learning: it eliminates irrelevant and redundant features of the data thus improving the performance of
learning algorithm. We conclude that, the result proved that feature based sentiment analysis is a good
indicator in determining the classification types of political reviews. For further research in this area one
major recommendation is construction of standardized text corpus sentiment analysis.