The bias-variation trade-off is the result of the error decomposition into several components. The spread is the variance of the algorithms’ responses and the bias is the expectation of the difference between the true answer and the given algorithm (Papp, 2019). It is related to the noise in the data itself and the algorithm model used (Belkin et al. 2019). The spread characterizes the variety of algorithms due to the randomness of the training sample, including the stochastic nature of the trade-off (Mohammadpour et al. 2018). The bias characterizes the ability of the algorithm model to adjust to the target dependence (Blair et al. 2020). The answers of the algorithms are random variables, there is a variation between them, and they are also strongly biased relative to the correct answer.
Three ways in which they influence the behavior of the prediction models in machine learning are overfitting, underfitting and complexity. Overfitting is a phenomenon when the error on the test sample is noticeably greater than the error on the training one (Provost and Fawcett, 2012). This is the main problem of machine learning (McAfee and Brynjolfsson, 2012). If there was no such effect, then the error on the test would roughly coincide with the error on the training (Voncken et al. 2021). Then all machine learning would be reduced to minimizing the error on the test – the so-called empirical risk.
The second way – underfitting – is a phenomenon when the error on the training sample is large enough. At the same time, it is often not possible to tune in to the selection (Rueda et al. 2020). This process can be explained by the fact that under-learning when setting up algorithms by iterative methods (Blanc and Setzer, 2020). For example, neural networks by the back propagation method can be observed when too few iterations are made, that is, they did not have time to learn (Rocks and Mehta, 2022). Underfitting is a situation when it is not possible to find a function in a parametric family of functions that describes the data well.
The most common reason for underfitting is when the complexity of the data device is higher than the complexity of the model device that the researcher came up with. The third way is increasing the complexity of the algorithm model, when it admits many formalizations (Deluna, 2021). It evaluates how diverse the family of algorithms in the model is in terms of their functional properties, for example, the ability to adjust to samples (Gao, 2021). Increasing complexity, that is, using more intrinsic models solves the problem of underfitting and causes overfitting.
A bias-variance trade-off was considered in a banking organization for its prediction models. For example, it was used as a basis for predicting the solvency of customers when receiving loans within the framework of questionnaire scoring.
Reference List
Belkin, M. et al. (2019) ‘Reconciling modern learning practices and the bias-variance trade-off’, Proceedings of the National Academy of Sciences of the United States of America, 116(32), pp. 15849–15854.
Blair, G., Coppock, A. and Moor, M. (2020) ‘When to worry about sensitivity bias: A social reference theory and evidence from 30 years of list experiments’, American Political Science Review, 114(4), pp. 1297–1315.
Blanc, S. M. and Setzer, T. (2020) ‘Bias–variance trade-off and shrinkage of weights in forecast combination’, Management Science, 66(12), pp. 29–34.
Deluna, J. (2021) Supervised vs. unsupervised learning: What’s the differences? Web.
Gao, J. (2021) ‘Bias-variance decomposition of absolute errors for diagnosing regression models of continuous data’, Patterns, 2(8), pp. 116–136.
McAfee, A. and Brynjolfsson, E. (2012) Big data: The management revolution. Brighton, MA: Harvard Business Review.
Mohammadpour, R. A., Golalizadeh, M. and Moharrami, L. (2018) ‘A bias-variance trade-off in the prediction error estimation behavior in bootstrap methods for microarray leukemia classification’, Journal of Biostatistics and Epidemiology, 4(3), pp. 64–70.
Papp, G. (2019) ‘Bias-variance trade-off in portfolio optimization under expected shortfall with regularization’, Journal of Statistical Mechanics: Theory and Experiment, 106(13), pp. 1–14.
Provost, F. and Fawcett, T. (2013) Data sciences for business: What you need to know about data mining and data-analytics thinking. Sebastopol, CA: O’Reilly Media.
Rocks, J. W. and Mehta, P. (2022) ‘Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models’, Physical Review Research, 4(7), pp. 1–19.
Rueda, V. J., Ramírez, N. C. and Montes, E. M. (2020) ‘Data-driven Bayesian network learning: Towards a bi-objective approach to address the bias-variance decomposition’, Research in Computing Science, 149(3), pp. 9–17.
Voncken, L., Albers, C. J. and Timmerman, M. E. (2021) ‘Bias-variance trade-off in continuous test norming’, Assessment, 28(8), pp. 1932–1948.