Based on the approach used to overcome overfitting, the forms of regularization can be divided into two categories. Both are used either to prevent interpolation or to change the effective throughput of a class of functions (Belkin et al., 2019). One of them is L1 regularization, which reduces the weight of uninformative objects to zero by subtracting a small amount of weight from each integration. Thus, its main feature is that the weight eventually becomes zero leading to a smoother optimization (Oymak, 2018). This form of regulation may be of interest because it helps to work with big data, effectively constrains sparsity properties, and uses the method of equating the optimum to zero (Lin et al., 2018). Of no less interest is that the form can underlie structures with a reduction in the generalization error (Zhao et al., 2018).
In real life, L1 regularization can be used when making machine predictions. Many of them are used to find sparse block solutions (Janati, Cuturi and Gramfort, 2019). For example, when predicting housing prices, the regularization L1 will consider important factors such as the area, infrastructure, and year of construction. At the same time, the form will exclude minor elements, such as the price of flooring or built-in gas equipment. In another example, when predicting the payback of a business product, the system can use indicators of the area’s population and the presence of competitors in the district, ignoring the age or gender aspects of potential buyers. In general, in this form, the solution to sparsity problems can be taken as representative (Yang and Liu, 2018). Thus, the method provides robust results when working with big data (Alizadeh et al., 2020).
Another form is L2 regularization, which main feature is the optimization of the average cost. This type deploys the most commonly used penalty, the sum of the squares of the weights (Provost and Fawcett, 2013). It may be of interest because of the uniqueness of the final solution, computationally inexpensiveness, and the reduction of the probability of an overall error. Even in the presence of noises, the L2 estimation error may still tend to zero with possibly optimal indicators (Hu et al., 2021). The method can also be used to smooth monotonic regression on a single predictor variable, increasing its interest in the context of analysis (Sysoev and Burdakov, 2019).
In real life, L2 regularization is used to evaluate the significance of predictors. It can become a way to overcome the convergence problem by norm, represented by other regularization methods (Zhang, Lu, and Shai, 2018). In the context of forecasting prices example, the slightest factors will also be considered, which will reduce the difference from the final result. In the machine calculation of business payback example, the L2 regularization can complicate the forecast since weight decay helps less with deeper models on more complex datasets (Tanay and Griffin, 2018).
Reference List
Alizadeh, M., Behboodi, A., van Baalen, M., Louizos, C., Blankevoort, T. and Welling, M. (2020) ‘Gradient L1 Regularization for Quantization Robustness’, ICLR 2020.
Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019) Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), pp.15849-15854.
Hu, T., Wang, W., Lin, C. and Cheng, G. (2021) ‘Regularization matters: A nonparametric perspective on overparametrized neural network’, International Conference on Artificial Intelligence and Statistics, 130(829-837), pp. 829-837.
Janati, H., Cuturi, M. and Gramfort, A. (2019) ‘Wasserstein regularization for sparse multi-task regression’, The 22nd International Conference on Artificial Intelligence and Statistics, 89(1407-1416), pp. 1407-1416.
Lin, P., Peng, S., Zhao, J., Cui, X. and Wang, H. (2018) ‘L1-norm regularization and wavelet transform: An improved plane-wave destruction method’, Journal of Applied Geophysics, 148, pp.16-22.
Oymak, S. (2018) ‘Learning compact neural networks with regularization’, International Conference on Machine Learning, 80(3966-3975), pp. 3966-3975.
Provost, F. and Fawcett, T. (2013) Data Science for Business: What you need to know about data mining and data-analytic thinking. Sebastopol, California : O’Reilly Media, Inc.
Sysoev, O. and Burdakov, O. (2019) ‘A smoothed monotonic regression via L2 regularization’, Knowledge and Information Systems, 59(1), pp.197-218.
Tanay, T. and Griffin, L. D. (2018) ‘A new angle on L2 regularization’, Cornell University. doi : 10.48550/arXiv.1806.11186
Yang, D. and Liu, Y. (2018) ‘L1/2 regularization learning for smoothing interval neural networks: Algorithms and convergence analysis’, Neurocomputing, 272, pp.122-129.
Zhang, Y., Lu, J. and Shai, O. (2018) ‘Improve network embeddings with regularization’, Proceedings of the 27th ACM international conference on information and knowledge management, pp. 1643-1646.
Zhao, Y., Han, J., Chen, Y., Sun, H., Chen, J., Ke, A., Han, Y., Zhang, P., Zhang, Y., Zhou, J. and Wang, C. (2018) ‘Improving generalization based on l1-norm regularization for EEG-based motor imagery classification’, Frontiers in Neuroscience, 12, p.272.