Abstract:
ABSTRACT
The world has experienced a phenomenal growth of the size of multimedia data and
especially document images, which have been increased thanks to the ease to create
such images using scanners or digital cameras. Thus, huge quantities of document
images are created and stored.
Digitalization enables us to better preserve knowledge so that it passes from
generation to generation. This is essential since we are now in a place where
knowledge is power. On top of this digitalization improves the organization of
document, thus reduce space the physical needed to store and the time spent to
find and use the document. But this is not an easy task.
Document image retrieval is a method which mainly focuses on retrieving document
image from document image database. There are two approaches to carry out
document image retrieval: recognition based and recognition free retrieval.
Several works are done to develop Amharic document image retrieval (DIR) system.
But it fails to investigate historical document images. On the top of that these works
focus on the level of noise available. This study proposed a way to improve the DIR
system by integrating a noise removal technique for historical Amharic document
images and it investigates the type of noise exists on historical Amharic document
images.
Six morphological, from basics to complex, noise filtering operations and two
thresholding technique namely, Otsu global thresholding method and Sauvola local
thresholding technique, are investigated and step wise combination of open-close
with Sauvola thresholding algorithm outperform any other noise filtering and
thresholding techniques combination.
The selected noise removal scheme is finally integrated with Amharic DIR system for
evaluation purpose. Accordingly, system performance shows that it registers on the
average 91.67%, 68.86 and 76.89 recall, precision and F-measure respectively. The