Abstract:
Abstract
With the advent of the Web, the research on spelling correction has received much more attention, particularly on the correction of search engine queries. Studies show that misspelled words are very common in web search engine. When users misspell query, the results are incorrect or provide inconclusive information. An interactive spelling facility that informs users of possible misspells and presents appropriate corrections to their queries is required. Google was the first major search engine to offer this facility for user to reformulate the query by means of correct option suggestion in a link form (i.e... Did you mean?). Misspellings reduce recall rates and influence search behaviors‟, and can even cause the complete failure of a search. Google search engine does not offer spelling correction facility for Amharic misspelled query. To overcome the problem, we developed a prototype that facilitates spelling checking for Amharic queries. The prototype designed based on spell checking and correction algorithms: dictionary lookup algorithm for error detection and Ngram algorithm for error correction. Dictionary lookup technique checks every word of input text for its presence in the dictionary. For this purpose, we build a word list (300,000 words in number) that correctly spelled from Amharic dictionary and web document, search logs, news articles. Together with a dictionary, n-grams used to define the distance between words, but the words always checked against the dictionary. Ngram technique is statistical and language independent in nature. The values of n used were n=two (bi-grams), n=three (tri-grams), n=four (tetra-grams) and n=five (penta-grams).To select candidate words, n-gram similarity calculation technique were performed between input query-document words. The designed prototype is interactive and language-independent spell-checker that based on the n-gram model. The spell checker suggests correction by selecting the most promising candidates from a ranked list of correction candidates that derived based on n-gram statistics. To assess the effectiveness of the designed approach, we used test set consisted of 447 short misspelled words. The retrieval effectiveness of spell correction computed using conventional recall-precision measures. The results show that the proposed system achieves significantly better accuracy in error detection and satisfactory performance in error correction