## Introduction

Machine learning algorithms are very important in providing real value attributes. Other benefits derived from such machine algorithms include handling of missing values as well as those of symbolic approach. Algorithms with such attributes include K* among others. It refers to an instance-based learner that applies entropy in its distance measure. It also has the advantage of comparing favorably with other machine based learning algorithms. Classifying objects have been utilized over the years by all categories of researchers throughout the world. The task is very involving as some data become noisy and can as well have irrelevant attributes, which makes it difficult to learn from. To achieve this, several approaches and schemes have been tried, these include decision trees, rules and case based classifiers, among others. Real valued features have presented an enormous challenge to instance based algorithm. This was mainly because of inadequate information on theoretical background. K* uses distance measure to examine performance on different problems. There are two ways of Data mining; these are either through supervised or unsupervised methods. The former describes methods that identify target variables while the latter does not. In essence these algorithms identify structures as well as patterns in variables. The main methods utilized in data mining, which are unsupervised, include clustering and association rule, among others. However, it is important to note that most data mining methods, as explained above, are supervised, meaning that they have targeted variables. These include those named above such as decision trees, K-nearest neighbor and neural network, among others. This paper will explore two of those algorithms, that is, the K* and K-nearest neighbor algorithms (Cleary 2-14).

## K*

K* is an instance based learner, which uses distance based measure to classify variables by examining their performance on a variety of problems. These learners classify instances by comparing them to other databases which entail pre-classified examples. The process assumes that similar instances usually have similar classification even though this poses challenge in defining such instances and classifications. Instance based learners include K*, K-nearest neighbor and IBL, among others. Entropy as a distance measure employs the approach of computing distance between instances by using information theory. It therefore employs the intuition that such distances define the complexity of converting one instance into the other. This can be done in two processes, one of which involves defining finite set of transformations to map one instance on another. This is known as a program and is made prefix free. This is done by adding termination symbol at the end of each string. The shortest distance of string between two instances defines the distance of measure. This results in a distance that does not solve issues of smoothness since it is very sensitive to small changes (Cleary 2-14).

K* on the other hand tries to reduce this problem of high sensitivity to change and hence reduced smoothness by summing over all these transformations that exists between any two instances. However, this is also not very clear as to which transformations are summed, and thus it aims to a probability with the sequence. For instance, if the program is c, then the probability becomes 2^{-c}. This method is referred to as the Kolmogorov complexity and its summation satisfies Kraft inequality. This can b interpreted as the probability of generating a program through random selection of such transformations. It can also refer to the probability of arriving at an instance by random walk from the first instance. The units of complexity are therefore obtained by calculating its logarithms. This method has been found to bring out the most realistic and robust measure of the link to DNA sequence (Cleary 2-14).

## K* Algorithm

In order to use K* which applies distance measure, one needs to have a way of selecting parameters x_{0} and s. The individual also needs to find ways of utilizing the results from distance measure to ascertain the predictions. The variables above represent real and symbolic attributes respectively. As the parameters change, distance measure also changes, and in the process drawing interesting facts. For instance, as s tends to 1, instances that different from the current develop very low transformations. On the other hand, instance s of similar symbol will develop high transformation probability. In essence, when this happens, the distance unction will convey nearest neighbor behavior. In the other case where s tends towards 0, transformation probability will show the symbol’s probability distribution. Further change in s causes smooth behavioral change, between the two extreme instances (Cleary 2-14).

Similarities are prominent in distance measures of real valued attributes. For example, probability instances drops heavily with increase in distance when x_{0 }is small and therefore functions as a measure of nearest neighbor. However, in case x_{0} is very large, then virtually all instances shall have similar transformations, which are equally weighted. In both cases the number of instances tends to vary from extreme 1, in which the distribution is nearest neighbor, to that of extreme N, where the instances are equally weighted. In this regard, the effectual number of instances for any function can be calculated as follows (Cleary 2-14).

N_{0 }≤ (∑_{b} P*(b/a)) ^{2 }/ ∑_{b }P* (b/a) ^{2} ≤ N

Where:

- N = effective number of training instances
- N
_{0}= number of training instances at the smallest distance - b = blending parameter

K* algorithm works to choose one value fro x_{0} (s). To achieve this, it selects this number between N and n_{0}. After which, it inverts the expression shown above (Cleary 2-14).

## K-Nearest Neighbor Algorithm

This type algorithm is usually used for classification. In some cases it is also utilized in prediction and estimation. It gives a proper example of instance-based learning, which stores data. It does this to obtain classification for unclassified records which are new. To do this, it compares such records with those similar in the training set. In dealing with this classifier, several issues must be considered. These include the number of neighbors that one should consider, for instance, determination of k; since k represents the nearest neighbors. It also involves other issues such as how to measure the distance from the nearest neighbors as well as combining information from al the observations. The algorithm also involves determination of whether points should be weighted equally or not (Larose 90-105).

### Weighted Voting

In most cases, it would be assumed that neighbors closest to the new record should be considered more than those far and thus weighted heavily. However, analysts tend to apply weighted voting which has the propensity to reduce ties. Several algorithms may be employed in classification of objects. In K-nearest neighbor classification, one looks at the number of nearest similar variables to classify, predict or estimate its performance. This can be utilized in situations such in administering drugs to patients. By using known classifications, one can classify unknown object by using the known ones to classify, estimate or predict its behavior (Larose 90-105).

### Use of K-nearest neighbor algorithm for prediction and estimation

K-nearest neighbor algorithm may also be used for prediction and estimation. This may also included its use for continuous valued target variables. This can by achieved through locally weighted averaging method, among others. In the same manner as classification is done, by comparing the nearest similar neighbor, prediction may be done as well as estimation by using the same technique. For instance, in a hospital prescription, when we have classified variables, we can predict or estimate the unclassified using those that are classified. Such instances can include estimation of systolic blood pressure. In this case, locally weighted method would estimate blood pressure for k = the number of nearest neighbors. This will be accomplished by sing the inverse of weights (Larose 90-105).

For instance, estimated target y = summation of w_{i}y_{i }/ summation of w_{i}, where w = 1/d (new, x_{i}) ^{2} for the records x_{1}, x_{2}, x_{k}. This would give the systolic blood pressure when calculated (Larose 90-105).

### Choosing k

Careful considerations should be taken when choosing k in classifying variables. This is mainly because choosing a small k may result in problems such as noise, among others. On the other hand k that is not very small may smoothen out idiosyncratic behaviors which may be learned from the training set. Moreover, taking a larger k also has the probability of overlooking locally interesting behavior (Larose 90-105).

## Conclusion

There are two ways of Data mining; these are either through supervised or unsupervised methods. The former describes the methods that identify target variables while the latter does not. The paper explored both K* algorithms and K-nearest neighbor algorithm as well as their usage. In doing this, it was found that it works well on real datasets. The fundamental method employed sums the probability of all possible paths from one instance to another. This helps in solving the smoothness problem, which contributes greatly to a robotic and realistic performance. The methods also works to enable integration of real valued attributes, symbolic attributes as well as ethical ways of dealing with missing values. K* can therefore be used to predict real value attributes and its similarity to 2-dimages. K* performance works best for only one of the two simple learning algorithm IR. This can be solved by raising blend for the unimportant attributes and lowering the blend for important ones (Cleary 2-14).

The paper has also explored the use of K-nearest neighbor algorithm in classification as well as in prediction and estimation. These have been enabled by methods such as locally weighted averaging, among others. The paper also goes in detail of how to choose k and how it affects classification, prediction or estimation results. Choosing a small k may result in problems such as noise, among others. On the other hand k that is not very small may smoothen out idiosyncratic behaviors which may be learned from the training set. Moreover, taking a larger k also has the probability of overlooking locally interesting behavior. It is therefore quite important that one considers such implications when choosing k. This may be resolved by allowing the data to solve such problems on its own. To manage this, it may employ the use of cross-validation procedure. The two methods are therefore useful in classification of objects (Cleary 2-14).

## Works Cited

Cleary, John. “K*: An Instance-based Learner Using an Entropic Distance Measure”. *Dept. of Computer Science, University of Waikato*. New Zealand,

Larose, Daniel. “k-nearest neighbor algorithm”. *Discovering Knowledge in Data: An introduction to Data Mining*. John Wiley & Sons, 2005.