Hybrid Correlation Coefficient of Spearman with MM-Estimator


Creative Commons License

Abu Bakar S. H., Lola M. S., Kamil A. A., Zainuddin N. H., Abdullah M. T.

MATHEMATICS AND STATISTICS (ALHAMBRA), vol.11, no.4, pp.693-702, 2023 (Scopus) identifier

Abstract

The Spearman rho nonparametric correlation coefficient is widely used to measure the strength and degree of association between two variables. However, outliers in the data can skew the results, leading to inaccurate results as the Spearman correlation coefficient is sensitive toward outliers. Thus, the robust approach is used to construct a robust model which is highly resistant to data contamination. The robustness of an estimator is measured by the breakdown point which is the smallest fraction of outliers in a sample data without affecting the estimator entirely. To overcome this problem, the aim of this study is two-fold. Firstly, researchers have proposed a robust Spearman correlation coefficient model based on the MMestimator, called the MM-Spearman correlation coefficient. Secondly, to test the performance of the proposed model, it was tested by the Monte Carlo simulation and contaminated air pollution data in Kuala Terengganu, Terengganu, Malaysia. The data have been contaminated from 10% to 50% outliers. The performance of the MMSpearman correlation coefficient properties was evaluated by statistical measurements such as standard error, mean squared error, root mean squared error and bias. The MMSpearman correlation coefficient model outperformed the classical model, producing significantly smaller standard error, mean squared error, and root mean squared error values. The robustness of the model was evaluated using the breakdown point, which measures the smallest fraction of outliers that can be present in sample data without entirely affecting the estimator. The hybrid MM-Spearman correlation coefficient model demonstrated high robustness and efficiently handled data contamination up to 50%. However, the study has a limitation in that it can only overcome data contamination up to a maximum of 50%. Despite this limitation, the proposed model provides accurate and efficient results, enabling management authorities to make sound decisions without being affected by contaminated data. The MM-Spearman correlation coefficient model provides a valuable tool for researchers and decision-makers, allowing them to analyze data with a high degree of accuracy and robustness, even in the presence of outliers.