Bagging method improves the accuracy of the prediction by use of an aggregate predictor constructed from repeated bootstrap samples. According to Breiman, the aggregate predictor therefore is a better predictor than a single set predictor is (123). To obtain the aggregate predictor, , the replicate data sets, {L (B)}, are drawn from a distribution, L. The aggregate uses the average of the single predictors, ψ(x, L) to improve the accuracy of prediction especially for unstable procedures such as neural sets, regression trees and classification trees. However, bagging reduces the efficiency of stable procedures such as k-nearest neighbor method.
Bagging improves the accuracy when used with classification trees with moderate data sets such as heart and breast cancer data. In constructing the classification tree, the data set is randomly divided into test set, T and Learning set, L, which makes the classification tree, followed by the selection of the bootstrap sample, LB, using the original set, L, for pruning. This procedure is repeated fifty times to give tree classifiers and the errors of misclassification averaged to improve accuracy. For larger data sets, Statlog project, which groups classifiers by their average rank, increases the accuracy of prediction by decreasing the misclassification errors greatly. Bagging can also be used to improve the prediction accuracy of regression trees where a similar procedure is used to construct regression trees followed by averaging the errors generated by each repetition.
Bagging is effective in reducing the prediction errors when the single predictor, ψ(x, L) is highly variable. By use of numerical prediction, the mean square error of the aggregated predictor, ФA(x), is much lower than the mean square error averaged over the learning set, L. This means that bagging is effective in reducing the prediction errors. However, this scenario is only true for unstable data set. Another way to test the effectiveness of bagging in improving prediction accuracy is by classification. Classification predictors like the Bayes predictor give a near optimal correct-order prediction but aggregation improves its prediction to an optimal level. The learning set can also be used as test set to determine the effectiveness of bagging. The test set is randomly sampled from the same distribution from the original set, L. The optimal point of early stopping in neural sets is determined using the test set.
Bagging has some limitations when dealing with stable data as shown by linear regression involving variable selection. The linear regression predictor, is generated through forward entry of variables or through backward variable selection. In this case, small changes in the data causes significant change in the hence not a good subset predictor. Using simulated data, the most accurate predictor is found to be the one that predicts subset data most accurately. Bagging shows no substantial improvement when the subset predictor is near optimal. Linear regression is a stable procedure; however, the stability of this procedure decreases as the number of predictor variable used are reduced making the bagged predictors to produce a larger prediction error than the un-bagged predictors. This indicates an obvious limitation of bagging. For a stable procedure, bagging is not as accurate as with an unstable procedure. As the Residual Sum of Squares (m), which represents the prediction error decreases, instability increases to a point whereby the un-bagged predictor tends to be more accurate than bagged predictor .
Works Cited
Breiman, Leo. “Bagging Predictors.” Machine Learning 24 (1996): 123-140.