Abstract:
Nature create variables using its character component, and variables are sharing
characters from a vary small to relatively large scale. This results, variables to have from
a vary different to a more similar character, and leads to have a relation ship. Literature
suggested different relation measures based on the nature of variable and type
of relation ship exist. Today, due to having high variety of frequently produced large
data size, currently suggested variable filtering and selection methods have gaps to
full fill the need. This research desires to fill this gap by comparing literature suggested
methods to finding out a better variable selection and dimension reduction methods.
The result from regression analysis using all literature suggested factors shows that
none of the predictors for development status of enterprise are significant, and only
10 predictors for number of employer in an enterprise are significant out of 81 factors.
Since, variable selection and dimension reduction methods are applied to find
out predictors of a response by removing variable redundancy, and complexity of
incorporating large number variable. Based on statistical power, for the results from
variable selection methods, specially association and correlation methods showed that,
CANOVA more efficiently detects non-linear or non-monotonic correlation between a
continuous–continuous and a continuous-categorical variables. Spearman’s correlation
coefficient more efficiently detects a monotonic correlation between a continuous
with a continuous, and a continuous with a categorical variable. Pearson correlation
coefficient more efficiently detects the linear correlation between continuous variables.
MIC efficiently detects non-linear or non-monotonic relation between continuous
variables. Chi-square test of independence efficiently detects relation between a
continuous with a continuous, and categorical with categorical variables, but the non
linear or non monotonic relation between a continuous with a categorical are not
well detected. On the other hand, the result from lasso and stepwise methods reveals
that, the relation between the predictor and response due to interaction effect not
detected by correlation and association methods are detected by stepwise variable
selection method, and the multicollinearity is detected and removed by lasso method.
Regressing the response variable “number of employer in an enterprise” based on variables
selected by lasso and stepwise method does bring greater model fitness (based
on adjusted R-squared value) than variables selected by association and correlation
methods. Similarly, regressing the response variable “development status of an enterprise”
based on variables selected by association and correlation methods does bring