Abstract:
Today, Internet is playing a vital role in our everyday life which is very difficult to survive without it. It is full of semi structured or unstructured information and this information is influencing both web users and providers directly or indirectly. As one of e-business tools, web service has the power to reach people across the world anywhere and anytime. Therefore, studying web users’ behavior is very fundamental to improve web based service.
In this study, hybrid DM process model has been followed. The researchers took three months’ web log data from UOG proxy server starting from March to June using random sampling technique. Glogg and DataPreparator tools are used for data preparation. Moreover, DataPreparator and WEKA are used for statistical analysis and association rule discovery respectively. To discover association rule, Apriori and FP tree algorithm are implemented.
As the statistical result shows, most of the time, UOG staffs’ web interest is accessing educational sites in the first priority and then social media, entertainment and email sites whereas, UOG students’ web use is accessing social media in the first priority and then educational, entertainment and email sites. In terms of web traffic, there is high web traffic in some of staffs’ VLANs especially in school and college VLANs which are mostly constitute academic staffs relative to management VLANs. Similarly, there is a high web traffic in some of the students’ VLANs especially students’ class room computer laboratories. Moreover, as the association rule experimental result shows(in both algorithm), mostly, in the staffs’ dataset; education and email sites, entertainment and social media sites, education and social media plus entertainment sites, correlations happened more frequently. Similarly, entertainment and social media sites, education and social media sites, entertainment and social media plus education sites happened more frequently in the students’ dataset. Apart from this, as apriori algorithm result shows; mostly, college and school VLAN users, focused on browsing educational sites, whereas management VLAN users, focused on social media and entertainment.
The main challenges of this study are dealing with huge volume of data during preprocessing task and due to the existing network VLANs is complex, it is challenging to identify the requests from which users are submitted and identify their behavior accordingly.