Abstract:
Abstract—Sentiment analysis is a type of natural language
processing for tracking the attitude of the public about a
particular product, service or topic. It is also highly challenging
as natural language processing research topic, and covers many
novel sub-problems. Now business organizations and academics
are putting forward their efforts to find the best system for
sentiment analysis. The focus of this study was an Amharic
unstructured restaurant review on the web. The objective of
the paper was to design Amharic text sentiment analysis model
using supervised machine learning techniques and evaluate the
performance of classifiers. This paper explored the supervised
machine learning classification approaches (na¨ıve Bayes, support
vector machine and k-nearest neighbor ) with different feature
selection schemes to obtain a sentiment analysis model for
domain specific restaurant review dataset at sentence level. The
proposed model has the following components: Data preparation,
preprocessing such as tokenization, normalization, filter
stop words, feature extraction and selection to prepare feature
vector, polarity classification. Performance analysis carried out
on classifiers, based on n-grams proposed. From the results of
the experimental studies, all algorithms are known to be highly
effective classifiers, and are able to achieve good accuracy in
this experiment. The experiments show that Term frequency
(TF) and the TF-IDF scheme gives maximum accuracy 80.43
% and 79.49 % respectively for SVM in bigram features. Term
frequency and term occurrence also give maximum accuracy
78.37% and 78.00% respectively for Na¨ıve Bayes classifier at
bigram features. TF-IDF also give maximum accuracy 78.00%
for KNN at 4-gram. The challenge was opinion holders sometimes
give objective text to express their opinion, but the classifier
did not identify those facts from opinions. These kinds of
complexities of natural languages make sentiment mining systems
more challenging and to resolving this challenge subjectivity and
objectivity classification is needed.