mirage

OPTICAL CHARACTER RECOGNITION FOR GE’EZ SCRIPTS WRITTEN ON THE VELLUM

DSpace Repository

Show simple item record

dc.contributor.author TEGEN, SHIFERAW
dc.date.accessioned 2017-07-03T14:42:26Z
dc.date.available 2017-07-03T14:42:26Z
dc.date.issued 2017-06-03
dc.identifier.uri http://hdl.handle.net/123456789/803
dc.description.abstract Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to manipulate natural language. One important research area of NLP that has been the focus of researchers is Optical Character Recognition (OCR). OCR is a method which is used to convert the handwritten or printed scanned documents to editable texts. In this research, the recognition of handwritten character with Support Vector Machine (SVM) implementation for the 202 main character set of Ge’ez language is attempted. The training and testing data sets are collected from Ge’ez vellum books. In this research, the researchers used various techniques at each phase from digitization to recognition levels. MATLAB image processing is used for experimentations. The iterative thresholding for binarizing the digitized image, bi-level filtering for noise removal, nearest neighbor interpolation for normalization, morphological analysis for thinning and horizontal profile for feature extraction methods are found to work very well for the problem of interest. Segmentation rate of 90.5% and 77.4% are attained using stage by stage segmentation algorithm for noise free and noisy image document respectively. SVM is used for classification. The SVM classifier is trained with 12 pages documents (7 pages from noise free and 5 pages from noisy documents) which are taken from real-life Ge’ez documents. The Classifier is also tested with 6 pages document (4 pages from noise free and 2 pages from noisy documents) that are not included in the training datasets. Accordingly, an average recognition rate of 63.4% and 51.7% are registered for noise free and noisy document images, respectively. The performance of the system is greatly affected by the similarity of the shape of Ge’ez characters and effectiveness of the preprocessing techniques. Invariant to shape feature extraction techniques and advanced noise detection and removal algorithms should be investigated in the future. en_US
dc.description.sponsorship UOG en_US
dc.language.iso en_US en_US
dc.subject Computer Science en_US
dc.title OPTICAL CHARACTER RECOGNITION FOR GE’EZ SCRIPTS WRITTEN ON THE VELLUM en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search in the Repository


Advanced Search

Browse

My Account