dc.description.abstract |
Crime is a behavior disorder that is an integrated result of social, economic, and environmental factors. Crime data analysis and classification have become a more popular and vital problem for the community. The purpose of this study is to classify and analyze crimes by using data mining with ensemble methods. To come up this study introduced data mining with ensemble techniques for crime classification instead of using a single classification to improve classification performance. The data is collected from Kombolcha Regiopolitan City Administration Court Memria from the year of 2008 E.C to 2013 E.C and it consists of around 10,100 records with more than 15 attributes. After a data processed, we have 10,005 datasets and 14 attributes were used to conduct this research. Data preprocessing was done to clean the datasets. After data preprocessing, the collected data has been prepared in a format suitable for the data mining tasks. To achieve this study, a hybrid data mining methodology was adopted and conducted using the Waikato environment for knowledge analysis (WEKA) version 3.9.6 machine learning software. The training models are built using 10-fold cross-validation and tested for reliability by 80 percent split. The performance of the model in this study was evaluated using the standard metrics of prediction accuracy, error rate analysis, FP rate, TP rate, recall, precision, F-measure, and ROC curve, which are calculated using the predictive classification table. The experimentation was divided into two categories: single classification techniques, and data mining with ensemble techniques. The first category, the single classification technique, comprises Naïve Bayes and the J48 classification model. The experiment shows that J48 with an 80% testing option achieved better accuracy of 91.85%. The second category is the data mining with ensemble techniques for combining the base classification model. Boosting with the J48 algorithm produced better results using all attributes in 10-fold cross-validation was 93.44%. In this study, data mining with ensemble technique are always more efficient than single classification technique. To conclude that in this study, crime types, crime occurrence day, crime occurrence season, crime occurrence time, age, sex, occupation, education level, goti, month, and marital status are the determinant factors of crime. In the future, we plan to extend this study to applying huge data and obtaining results based on comparative analysis of different machine learning algorithms and others like deep learning algorithms.
Keywords: Ensemble methods, Data mining, Boosting, Bagging, Crime types |
en_US |