Updated: Jul 29th, 2024

Data Mining Classifiers: The Advantages and Disadvantages Essay

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Decision Trees: C4.5 Classifier
KStar Algorithm classifiers
Bayesian Network classifier
Fast effective rule reduction [JRip classifier)
K-Nearest Neighbour Algorithm
Naive Bayes classifier
Works Cited

Decision Trees: C4.5 Classifier

Advantages

Classifiers produced by C4.5 are either expressed as decision trees or rulesets. Our focus in this discussion is on decision trees, and we look at their advantages. The C4.5 classifier is based on the ID32 algorithm, whose primary aim is to find small decision trees. Based on this, we can say that the decision trees produced are small and simple to understand (Witten, Frank, 2000).

Another advantage of this classifier is that the source code is readily available. Thirdly, unavailable attribute values are accounted for in C4.5 by assessing the gain using the records where the particular attribute is defined. A fourth advantage is that; it can handle both continuous and discrete attributes. Attributes with a continuous range, we create partitions based on a predetermined pattern in the training set and calculate the gain on each partition, (Quinlan, 1993). Then the partition that maximizes the gain is picked. Lastly, tree pruning after creation ensures tree simplicity by replacing some branches with leaf nodes.

Disadvantages

C4.5 classifiers are basically slower in terms of processing speed. For instance, a task that will take C4.5 15hours to complete; C5.0 will take only 2.5 minutes. Secondly, it is inefficient in memory usage meaning that some tasks will not complete on 32-bit systems (Witten, Frank, 2000). In terms of accuracy, the rule sets produced by C4.5 classifiers basically contain many errors. C4.5 has limited data types and lacks the facility to label data as not applicable, (Quinlan, 1993). The decision trees produced by C4.5 are relatively bigger but these are catered for in the C5.0 version. All classification errors are treated equally but some are more serious than others and this is another disadvantage. Lack of a provision to quantify the importance of cases is a disadvantage because not all cases are equally important. Lastly, attributes need to be winnowed before a classifier is generated and C4.5 lacks this facility.

KStar Algorithm classifiers

Advantages

Firstly, In terms of accuracy, this algorithm is comparable to C4.5 on voluminous UCI benchmark datasets. KStar performs better and with a higher speed than C4.5 on big numbers of text classifications. Thirdly, this algorithm is low in time complexity that means it is very fast. Its speed can be compared to that of naive Bayes. In addition, this algorithm can be speeded up by combining it with other scaling-up methods (Cormen et al 1990). Another advantage is that it uses entropy as a measure of distance hence providing a consistent approach to management of symbolic attributes, unknown values and continuous value attributes. The results presented compare satisfactorily with many machine learning algorithms. This algorithm is an instance-based classifier. It classifies an instance by comparison; an instance is compared to pre-classified examples stored in a database. New instances to be added to the instance database and choice of instances from the database to be used in classification are determined by a concept description updater (Gray, 1990). This helps reduce memory space requirements and also improve tolerance to noise in data.

Disadvantages

One of the major disadvantages of this algorithm is the fact that it has to generate distance measures for all the recorded attributes (Cormen et al 1990). Another disadvantage is that it is not cost effective. Both building and learning processes are quite expensive. In terms of memory space, as much as the algorithm uses an instance updater to select and determine the instances to be used, the storage of sample instances is also involving in terms of memory space (Gray, 1990). Although the algorithm can be speeded up by combining it with other algorithms, it is sometimes a disadvantage since the combination process involves extra costs.

Bayesian Network classifier

Advantages

The first advantage of this classifier is its computational efficiency. The representation of large and complex computational problems is decomposed into smaller and simpler self sufficient models to enhance efficiency. Secondly, it simplifies the incorporation of domain knowledge into the model design by using the structure of problem domain. Thirdly, the natural combination of EM algorithm and the probabilistic representation helps address the problems with missing data. The ability of this classifier to indicate all the possible classes a new sample may belong and the probability, is another advantage since in case of a misclassification, it is easy to tell which other class the sample may belong, (Mitchell, 1997).Another advantage is the presence of an updater that enables the classifier to learn from new data samples. Bayesian networks have the advantage of being able to capture the complexity of decision making. Lastly, the system’s rules of classification can be semantically determined and justified to both the novice and the expert (Jensen, Graven 2007).

Disadvantages

The mismatch between the data likelihood and the actual label prediction accuracy tends to make the learning method suboptimal. Secondly, Bayesian networks require an expert to give domain information for the creation of the network (Jensen, Graven 2007). Despite the substantial amount of research carried out, the creation of these networks is still limited to data sets consisting of only a few variables which are very informative. Another disadvantage of these networks is the fact that their interpretation and efficiency is quite limited when rulesets are drawn from the network. In comparison to rules derived from decision trees whose interpretation is simple and direct, Bayesian networks are more complex (Mitchell, 1997).

Fast effective rule reduction [JRip classifier)

Advantages

The first advantage of this classifier is the use of propositional rule learner which ensures that errors are minimal. Also, the phase by phase implementation of the algorithm ensures that the overall results are near perfect. During the rule set growing phase, the condition with the highest information gain is picked and this ensures that the rule set is perfect (Arthur, 1996). Another advantage is that the algorithm is made shorter and simpler by pruning any useless parts of a rule. This makes the classifier easy to understand and interpret. The ease of generation of this classifier is a fourth advantage. This classifier is highly expressive; it is comparable to a decision tree. Even in terms of performance, it can be ranked the same level as a decision tree [Bishop, 1997]. JRip classifier has the ability to classify new instances rapidly. Lastly, it is easy for this classifier to handle missing values as well as numeric attributes.

Disadvantages

JRip algorithm requires a large investment in terms of time to learn the algorithm and test the features that can be customized. The fact that the RIPPER algorithm uses some induced rules, which sometimes have to be replaced by expert-derived rules for some applications is another disadvantage. In addition, the accuracy of JRip’s results sometimes vary accordingly. This is because; the results produced differ depending on the option of the rule voting method used (Authur, 1996). Another disadvantage is that, highly accurate results can only be achieved by running the algorithm many times. Assignment of salience which is an order of that has been prescribed for firing rules may lend the expert system’s inference engine powerless. This may also have negative effects on the performance of such a system which is rule-based (Bishop, 1997).

K-Nearest Neighbour Algorithm

Advantages

This algorithm uses local information. This kind of information can produce highly adaptive behaviour. Secondly this algorithm is robust to noisy training data and it is simple to implement. It is also very easy to use in parallel implementations. Another advantage of K-Nearest neighbour algorithm is the ease and simplicity of learning. In addition the training speed is very high and the results are nearly optimal in the case of a large sample limit. This means that the algorithm is more effective if the set of training data is large [Witten, 2005] The ability of this algorithm to approximate complex concepts of the target locally and also differently for every new instance is another advantage of this algorithm. Lastly, the algorithm is intuitive and easy to understand. This makes implementation and modification easy. It also avails a generalisation accuracy that is favourable on many domains (Duda, Hart, 2000).

Disadvantages

One disadvantage of this algorithm is that it has large memory space requirements. It needs to store all the data and hence the need for large memory space. Secondly, during instance classification, all training occurrences have to be visited and this makes the algorithm slow during this procedure. A third disadvantage is that the algorithm is easily fooled by irrelevant data attributes; this means that the accuracy decreases with increase in irrelevant data attributes. Also, the fact that the accuracy decreases with increased noise in the set of training data is another disadvantage. Another shortcoming of this algorithm is its computational complexity [Witten, 2005]. This is due to its intensive computational recall (Duda, Hart, 2000). This algorithm is a supervised learning and this means it runs slowly. The algorithm is highly vulnerable to the dimensionality curse and lastly, it is highly biased by the value of k.

Naive Bayes classifier

Advantages

Naive Bayes classifiers have very simplified assumptions and naive design and hence easy to build. The structure is basically the same and this eliminates the structure learning procedure. Model building is highly scalable and it is possible to parallelize scoring no matter the algorithm. Secondly, these classifiers have worked favourably in numerous real world scenarios. It has outperformed many complex classifiers in on large numbers of datasets, despite its simplicity (Box, Tiao 1992). Thirdly, this classifier requires a small amount of training data to approximate parameters required for classification which include averages and variances of these variables. A fourth advantage is that, it is only the class variances that need be resolved and not the variance of the entire dataset (Witten, Frank 2003). Both binary and multiclass classification problems can be solved by this algorithm. This algorithm relies on basic probability rules making it simple in operation; also being probabilistic the results are presented in a form favourable for incident management policy. Lastly, it facilitates a broader set of model parameters to be used.

Disadvantages

The assumption that every variable is independent of others is sometimes a problem. The class estimates are sometimes absurd and the threshold must be harmonized not set analytically. This classifier has been outperformed by newer approaches such as boosted trees. It lacks the ability to solve more complex problems in classification. Naive Bayes algorithm is used in Bayesian spam filtering and it is vulnerable to Bayesian Poisoning (Box, Tiao 1992). Also the spam filter is beaten by replacing text with pictures. Another disadvantage is that if parameter estimates are improved, the effectiveness of such a classification will be affected. When using the Bayesian filter, a user specific database must be consulted on every message (Witten, Frank, 2003). This database contains word probabilities that are used to detect spam. Lastly, the initialisation of the naive Bayes based filter is a bit time consuming.

Works Cited

Bishop, Christopher. Pattern recognition and Machine Learning. New York, NY: Springer, 1997.

Box, G., and Tiao, G. Bayesian Inference in Statistical Analysis. New York. John Wiley & Sons, 1992.

Duda, Richard; Hart Peter; Pattern classification, 2^nd ed. David Stork. 2000.

Frank, Eibe; Holmes, Geoffrey; and Witten, Ian. Naive Bayes for regression. Machine Learning, 2003.

Jensen, Finn. An Introduction to Bayesian Networks. New York, NY: Springer-Verlag, 1996.

Jensen Finn; Thomas Graven. Bayesian Networks and Decision Graphs. New York, NY: Springer, 2007.

Gray, Robert. Entropy and Information Theory. New York, NY: Springer-Verlag, 1990.

Cormen, Leiserson, and Rivest, Ronald. Introduction to Algorithms. Cambridge. MIT Press, 1990.

Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

Riel, Arthur. Object oriented Design Heuristics. Addison Wesley. 1996.

Witten, Ian. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2005.

Witten, Ian; Frank Eibe. Data Mining: Practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco, 2000.

More related papers Related Essay Examples