Machine learning belongs among the advanced methods of data processing techniques. The data set plays a crucial role in machine learning, providing the material to generalize and model specific patterns (Deluna, 2021; McAfee and Brynjolfsson, 2012). However, it is essential to distinguish the model states of “generalizing” and simply “memorizing” (Kotsilieris, Anagnostopoulos, and Livieris, 2022, 1). Consequently, several techniques were developed to adjust the learning process, including regularization (Brand, Koch, and Xu, 2020, 1; Alonso, Blanche, and Avresky, 2011, 163). Regularization is multifaceted – it has different forms with unique features.
The concise definition of regularization coincides with its primary purpose – simplification. Overfitting means over-optimizing the model’s fit to the provided data; in this context, regularization focuses not only on optimizing certain combinations of fit but also on simplifying them (Provost, and Fawcett, 2013, 136; Belkin et al., 2019, 1). The regularization techniques that are of interest to me are L2-norm regularization, dropout, and adversarial regularization.
L2-norm regularization has wide usage in machine learning and statistics. It is usually being used for regularization of linear models (Nusrat and Jang 2018, 8; Zhu et al. 2018, 6-7). Its L1 form imposes a diagonal Gaussian prior with zero mean on the weights (Chen et al., 2019, 4). The technique was extended by using the L2 distance from the trained model’s weights to penalize the weights during testing (Barone et al., 2017). This technique provokes my interest because of its fine-tuning application, such as translation improvement (Google Translate). Another reason is L2 being non-sparse, which makes it more flexible compared to L1. Lastly, it can be used outside the machine-learning, making it a valuable tool in the data processing.
Considering the neural machine translation, dropout is also worth the attention. The principle of dropout’s operation presents another reason for curiosity – dropout randomly drops units from the model during training in each iteration (Barone et al., 2017). In addition, I appreciate the ability to use dropout in a learning model without the need to use it in the testing process. Dropout is sometimes used for computation libraries (Keras framework for Python).
The last regularization technique is adversarial regularization; the reason for attention is the privacy protection. Machine learning models might leak data because of predictions – adversarial regularization makes the predictions untrackable (Nasr, Shokri, and Houmansadr, 2018, 634). Another reason to be interested is the authors’ ambitions to create a truly universal technique. Lastly, I am fascinated by technique’s universality itself – it trains ANN, regularizes it, and ensures privacy protection.
Numerous studies showcase the multifaceted nature of regularization techniques – depending on the needs, different features are required for regularization. In the case of statistical regularization, such as fine-tuning, L2-norm regularization will narrow the data set. In the need for additional regularization outside the learning process, dropout will be of use. Finally, with the substantial concern for data privacy, adversarial regularization can provide the needed protection.
Reference List
Alonso, J., Belanche, L., and Avresky, D. R. (2011) ‘Predicting software anomalies using machine learning techniques.’2011 IEEE 10th international symposium on network computing and applications. Cambridge MA, Massachusetts, USA. Massachusetts: IEEE, pp. 163-170. Web.
Barone, A. M. et al. (2017) ‘Regularization techniques for fine-tuning in neural machine translation.’ EMNLP 2017: Conference on Empirical Methods in Natural Language Processing. The University of Edinburgh, Edinburgh, The United Kingdom. Edinburgh: Association for Computational Linguistics, pp. 1489-1494. Web.
Belkin, M. et al. (2019) ‘Reconciling modern machine-learning practice and the classical bias–variance trade-off.’Proceedings of the National Academy of Sciences, 116(32), pp. 15849-15854. Web.
Brand, J. E., Koch, B., and Xu, J. (2020) ‘Machine learning’, in Atkinson, P. et al. (eds.) SAGE Research Methods Foundations. Web.
Chen, J. et al. (2019) ‘A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide.’Environment international, 130. Web.
Deluna, J.(2021) ‘Supervised vs. unsupervised learning: What’s the differences?’ IBM.
Kotsilieris, T., Anagnostopoulos, I., and Livieris, I. E. (2022) ‘Regularization techniques for machine learning and their applications.’Electronics, 11(4), p. 521. Web.
McAfee, A. and Brynjolfsson, E. (2012) ‘Big data: The management revolution.’ Harvard Business Review, October. (Accessed 27 May 2022).
Nasr, M., Shokri, R., and Houmansadr, A. (2018) ‘Machine learning with membership privacy using adversarial regularization.’ In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, New York, The United States. New York: Association for Computing Machinery, pp. 634-646. Web.
Nusrat, I., and Jang, S. B. (2018) ‘A comparison of regularization techniques in deep neural networks.’Symmetry, 10(11), p. 648. Web.
Provost, F., and Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol: O’Reilly Media.
Zhu, D. et al. (2018) ‘A machine learning approach for air quality prediction: Model regularization and optimization.’Big data and cognitive computing, 2(1), p. 5. Web.