Introduction
The purpose of this research paper is to conduct a review of concept drift with reference to machine learning. A concept is defined as a quantity that needs to be predicted where the concept is unstable and its changes over a certain period of time. Common types of concepts are weather patterns, customer preferences, temperature and behavioral changes. The underlying data distribution that is used in explaining concepts will also be subject to some changes as a result of the unstable nature of concepts. Such changes in the underlying data distribution cause the models built on old data to be inconsistent with the new concept’s data which will lead to the updating of the model. This creates a problem known as concept drift which complicates the task of learning the new model and the new data that makes up the concept (Tsymbal 1).
Machine learning in concept drifts involves the learning of a target that is shifting or data that has time changing data streams. It is also the learning of non-stationary environments that have unstable concepts to ensure that the approaches used in dealing with concept drift problems develop the final concept. Since the 1990s, various learning approaches have been developed and implemented to deal with the problem of concept drifts as these problems have become common in every concept. Such learning approaches include the AQ algorithm and the stagger concept which were developed in the 1990s to deal with the problem of concept drifts (Koronacki 23).
The discussion in this research paper will therefore be focused on the aspect of concept drifts and machine learning by examining these two concepts.
Machine Learning
Machine learning is a branch of artificial intelligence as it involves the use of cognitive science, probability theory, behavioral science and adaptive control disciplines to determine changing behaviors of certain concepts. The major focus of machine learning is to identify and learn complex behavioral patterns that precede concept changes so as to develop intelligent decisions that are based on data. Machine learning involves the use of human cognitive processes when performing data analysis and also collaborative approaches that exist between the machine and the user (Bishop 2).
There various types of machine learning that are used to determine the desired outcomes of algorithms. These include supervised learning, unsupervised learning, semi-supervised learning, transduction and reinforcement learning. Supervised learning is where machine learning converts inputs into outputs, unsupervised learning where the machine learning inputs are clustered, reinforcement learning where the machine algorithms are used to input observations and semi-supervised learning where the labeled and unlabeled examples of the target concept are used to generate an appropriate function. The other type of machine learning algorithm is referred to as transduction where the learning algorithm tries to predict new outputs that are based on the training inputs and outputs as well as the testing inputs (Bishop 3).
The main theory that is used in explaining machine learning is referred to as the computational learning theory where the learning theory is focused on the probabilistic performance bounds of the learning algorithm because the training sets are finite and uncertain in nature. This means that the computational learning theory will not provide any absolute results on the learning algorithms. Apart from performance algorithms, computational learning theory studies the complexities of time and the feasibility of machine learning given the unstable nature of concepts. Computations are usually considered as feasible concepts under the computational learning theory especially if they are studied under polynomial time (Yue et al 257).
There are various types of machine learning algorithms that exist which are used in machine learning activities. The most common types of machine learning algorithms include decision tree learning where the algorithm is examined through the use of decision trees that act as predictive models. Decision trees are used to draw up the observations of the algorithm to gain a general conclusion of the target item that is under consideration. The other machine-learning algorithm that is commonly used is referred to as the association rule-learning algorithm which involves discovering relationships and links between variables that exist in large databases. The neural network algorithm, which is a computational model, processes the information that exists in biological networks by using a connectivity approach to computation and simulation (Bishop 225).
Genetic programming is another learning algorithm that is used in machine learning. It deals with the determination of computer programs that can be used to perform user-defined tasks based on biological evolution. Genetic programming also deals with the specializing of genetic algorithms which enable the human user to become a computer program. Genetic programs are mostly used to optimize machine or computer programs by determining the program’s ability to perform a user-defined task. Bayesian networks are other commonly used machine learning algorithms and they are described as graphical models that involve the use of probabilities to represent random variables. This machine-learning algorithm is commonly used in determining the connection between the symptoms/signs of an illness and the illness itself. Once the symptoms have been determined, the Bayesian network can be used to determine the occurrence or presence of various diseases (Bishop 21).
Machine learning has a variety of uses in the modern and technological world. The most common applications include the development of processing activities for natural languages, the detection of credit card fraud, the development of syntactic pattern recognition technology and the medical diagnosis of various types of illnesses through the analysis of symptoms. Machine learning is also used in the analysis of the financial market as well as in the creation of brain and machine interfaces for radiographic equipment, in the classification of DNA sequences and properties, in software engineering processes and in the development of robot locomotion abilities. Machine learning is also used in structural health monitoring activities, bio-surveillance and also in speech and handwriting recognition (Mitchell 2).
Concept Drifts
Concept drifts as described before in the introduction section of the research paper are the problems that are caused by a change in the model that is used in examining the underlying data distribution of the concept. Concept drifts are also described as phenomenon that includes examples that might have legitimate labels at one time and illegitimate labels at another time. To explain this statement, Koronacki (26) uses the example of a cloud as a target concept where concept drifts occurs when the cloud changes its position, shape and size in the sky over a certain period of time. With regards to Bayesian decision theory, the transformations to the cloud equate to the changes that take place on the form of the prior target cloud (Koronacki 26). Concept drifts have become common occurrences in the real world especially when it comes to people’s changing preferences for products and services.
Concepts are subject to change over time which means that they are unstable in nature. Such changes in the underlying data distribution models make the task of learning especially machine learning more complicated. Learning also becomes difficult if there are changes in the hidden context of the target concept which leads to concept drifts. The problem of handling concept drifts usually arises when it comes to distinguishing between the true concept and noise. Some machine learning algorithms might overreact to noise, misinterpreting the noise to be a concept drift while other algorithms might react to noise by adjusting to the changes very slowly (Perner 236).
Most of the research that has been conducted on concept drifts has been theoretical in nature where assumptions have been drawn to determine the kinds of concept drifts that lead to the establishment of performance bounds. Researchers such as Helmbold and Long (Stanley 2) have established bounds that are based on the extent of the concept drift which can be tolerated by assuming a more permanent drift. The extent of a drift is defined as the probability of two successive concepts being irreconcilable in a random variable. Other researchers such as Freund and Mansour, Barve, Long and Bartlett established the necessary bounds in determining the rate of concept drifts by sampling the complexity of an algorithm to learn the structure of a repeating sequence of concept changes (Stanley 2).
There are various types of algorithms that are used to detect concept drifts and they have been divided into two categories which include the single learner based tracker that aims at selecting data that is relevant to learning the target concept otherwise referred to as the data combining approach and the ensemble approach to formulating and restructuring base learning. The data combining approach is described as a conventional way of dealing with concept drift problems through the use of time windows that are fixed over data streams. The time window uses the most recent data streams or batches that are used to construct the computational predictive model. The problem with this approach arises when a large size time window is unable to adapt quickly to the concept drift while a small size time window is unable to track a target concept that is stable or recurrent (Yeon et al 3).
The optimal size of the window in the data combining approach cannot therefore be set unless they type and degree of the concept drift has been determined in advance. Widmer and Kubat in their 1996 study of concept drift incorporated the use of the Window Adjustment Heuristic approach (WAH) in adaptively determining the size of the time window. Other researchers, Klinkenberg and Joachims proposed an algorithm in 2000 that would be used in tracking the concept drift through the use of a support vector machine (SVM) while the target concept was continuously changing. Such methods ensured that the size of the time window could be determined (Dries and Ruckert 235).
While the data combining approach is able to select a subset of past data that is related to the new information, it is unable to define the related data streams to the new information. This method is also unable to retain all or parts of the previous sets of data making it an inefficient approach to managing concept drift problems especially in machine learning. The ensemble approach on the other hand involves the use of an ensemble strategy that is used in learning changing environments. Ensemble approaches such as boosting; bagging and stacking have been known to produce more stable prediction models in static environments than the data combining approaches which incorporate single models (Yeon 4).
Ensemble approaches maintain a set of data descriptions and predictions that will be combined through the use of weighted voting so as to gain the most relevant description of the new data. The methods that have been used to conduct the weighted voting include STAGGER which maintains a set of concept descriptions that will be used to construct the best according to their relevance with the new data. Another method that is used in weighted voting is conceptual clustering where stable hidden contexts are identified by clustering instances of the new concept that are similar to the hidden context. When compared to data combining approaches, ensemble techniques have been more effective in determining concept drift problems than the data combining approaches and they are therefore more suitable in data streams and batches because they do not need to retain any previous data sets as with the data combining methods (Tsymbal 3).
Types of Concept Drift
The two most common types of concept drifts that might occur in the real world include sudden or instantaneous concept drifts such as when an individual graduates from an institution of higher learning to find himself or herself in a different environment that is full of monetary concerns and problems. Another example of a sudden concept drift is the changing preferences of consumers when they demand products or services that will meet their constantly changing needs. The other type of concept drift that exists in the real world is the gradual concept drift where a certain aspect changes over a gradual period of time such as car tires and factory equipment which might cause a gradual change in the production of outputs. Both the sudden and gradual concept drifts are referred to as real concept drifts (Tsymbal 2).
Other types of concept drifts include the virtual concept drift which is defined as the need to change the current model due to a change in the data distribution. The hidden changes that exist in a certain context might cause a change in the target concept which might in turn cause a change in the underlying data distribution of the concept. If the target concept was to remain the same, the underlying data distribution might change to reflect changes to the concept which might create a need to revise the current model that is used in explaining the concept. This creates a virtual concept drift that necessitates a change in the current model (Tsymbal 2).
The major difference between a virtual concept drift and a real concept drift is that virtual concepts might occur in cases of spam categorization while the real concept drifts might not be caused by spam categorizations. Virtual concept drifts ensure that the shifts in the concept have been properly represented in the current model that is used in explaining the underlying distribution data. Virtual concept drifts which are also known as sampling shifts help in determining the types of unwanted messages that remain the same over a long period of time (Tysmbal 2).
Detecting Changes in Concepts
To effectively deal with the problem of concept drifts, the changes that take place in concepts have to be suitably detected. The most common method that is used in detecting concept changes is information filtering where data streams are classified according to whether they are relevant or irrelevant to the target concept. The main purpose of information filtering is to reduce the information load presented to a user that might be of interest to them. Information filters are supposed to remove irrelevant information from the data streams to ensure that only the relevant information has been presented to the user. Because concepts are unstable and constantly changing, information filters that are used in unstable environments have to consider classification accuracy to ensure that the concept changes have been properly documented (Lanquillon and Renz 538).
Information filtering is an important approach in detecting the changes to a data stream of a concept drift because it classifies the problems of the drift that can be solved through the use of learning techniques such as the machine supervised learning techniques. The use of these techniques ensures that the learning of a given set of examples is possible and these examples once learned can be used to determine the new category of data streams. The use of machine supervised learning algorithms in dealing with classification problems has proved to be an important technique in detecting changes to data streams because it is based on important assumptions of the underlying data distribution where the old data is similar to the new data. The hidden context of the data streams changes as time continues to change and it also changes as new data on the concept continues to emerge. The supervised machine learning technique ensures that changes in data streams are suitable detected and the changes are adapted to suit the new data (Lanquillon and Renz 538).
Another method that can be used to detect the changes in a concept’s data stream is the Shewhart control chart which tests whether a single observation will detect any changes to the data stream. This approach assumes that the data streams have been divided into batches that are represented in chronological order. The value that is allocated to these batches is usually used to detect changes to the data streams by calculating each batch separately to determine whether any changes have taken place in the data stream. The Shewhart control chart ensures that changes can be detected by observing deviations in the data batches (Lanquillon and Renz 539).
Conclusion
This research paper has focused on the aspects of machine learning and also concept drifts. The concept of machine learning has been discussed with regards to the various types of machine learning processes, theoretical work on machine learning that exists as well as the applications of machine learning in various processes. Machine learning is commonly used in artificial intelligence activities as well as in the development of various types of technology that are used in the real world such as in the diagnosis of diseases. The discussion has also focused on the concept drifts by defining the term and also identifying the various types of methods that can be used in dealing concept drifts.
References
Bishop, Christopher, M. Pattern recognition and machine learning. New York: Springer Science, 2006. Print.
Dries, Anton and Ulrick Ruckert. Adaptive concept drift detection. n.d. Web.
Koronacki, Jacek. Advances in machine learning. Berlin, Germany: Springer Verlag, 2010. Print.
Lanquillon, Carsten and Ingrid Renz. Adaptive information filtering: detecting changes in text streams. Kansas, US: ACM Press, 2000. Print.
Mitchell, Tom. The discipline of machine learning. 2006. Web.
Perner, Petra. Machine learning and data mining in pattern recognition. Berlin, Germany: Springer Verlag, 2009. Print.
Stanley, Kenneth. Learning concept drift with a committee of decision trees. Austin, Texas: Department of Computer Sciences, 2010. Print.
Tsymbal, Alexey. The problem of concept drift: definitions and related work. 2004. Web.
Yeon, Kyupil, Moon Sup Song, Yongdai Kim, Hosik Choi and Cheolwoo Park. Model averaging via penalized regression for tracking concept drifts. 2010. Web.
Yue, Sun, Mao Guojun, and Liu Xu Liu Chunnian. Mining concept drifts from data streams based on multi-classifiers. Advanced Information Networking and Applications, Vol. 2, pp 257-263, 2007.