Select two application areas for data mining NOT discussed in the textbook and briefly discuss how data mining is being used to solve a problem (or to explore an opportunity)?
Data mining involves rearranging large volumes of data to create comprehensible information that can be used to solve problems. There are several ways in which data mining can be applied in the real world (Han et al. 76). It can be used to solve problems and explore opportunities.
We will write a custom Essay on Data Mining specifically for you
301 certified writers online
Data Mining and the Detection of Disturbances in the Ecosystem
The use of data mining to detect disturbances in the ecosystem can help to avert problems that are destructive to the environment and to society. Such calamities include floods and droughts (Kumar and Bhardwaj 258). Remote sensing and earth science techniques are used to understand the radical changes in the environment. Data is collected and archived. It is later mined and used to detect disturbances.
Data Mining in Sports
Data mining can be used to predict sporting activities. A case in point is the Advanced Scout System developed by IBM (Leung and Kyle 715). The application is used by coaches to improve the performance of players. In most cases, fans predict games by watching. They may also use archived data, which is mined and statistically used to make predictions based on the history of the game.
What is Association Rule Mining? And explain how Market-basket analysis helps retail business to maximize the profit from business transactions?
Association Rule Mining
It is the retrieval of data based on the relationship between a given set of objects. It takes into consideration the ‘togetherness’ of these objects and how they appear in a database. It involves the identification of connections and correlations between objects (Ramageri 304).
Market-Based Basket Analysis and Retail Business
Market basket analysis and association rule mining can be used to maximize profits and improve transactions in the retail business. It is used to study the behavior of customers and their shopping trends. Marketers use the information to design catalogs and undertake customer behavior analysis (Han et al. 99). Consequently, the information can be used in marketing and advertisement to maximize profits and improve business transactions.
Discuss k-Nearest Neighbor (KNN) learning algorithm. What is the significance of the value of k in k-NN?
K-Nearest Neighbor (KNN) Learning Algorithm
The algorithm is a method that is used to classify data obtained from sources with similar sets of parameters. It uses a set of data based on the known classifications of the existing database. It makes use of separate classes to predict a new pattern and classify the new data. The ‘neighbors’ in this case are the separate sets of data with common characteristics (Bhatia and Vandana 304). For instance, a bank may get a customer who wants a loan, but the entity lacks time to calculate the credit rating of the applicant. The bank can use previous credit ratings of people with similar characteristics, such as earnings and collaterals.
The Significance of the Value of k in k-NN
The k represents the number of classes used in the comparison. Lower values of this component are more accurate compared to higher values. On the other hand, increasing the random data point raises the percentage error of approximation (Bhatia and Vandana 304). As such, k can be used to obtain the most accurate approximation in data classification and regression.
Discuss the two estimation methods of classification-type data mining models while considering ANN as a classifier
It is one of the estimation methods of classification data mining models in artificial neural networks (ANN). In this case, a set of example pairs is provided. The objective is to identify or ‘estimate’ a function. The function has to lie within the permitted cluster of functions (Nikam 15). In addition, it has to reflect the given examples.
In this estimation method, the ANN works with a given set of data. The data is usually denoted as x. The cost function to be minimized is also provided. The latter can be a random function of x. It can also be the output of the network. The output is usually denoted as f. The cost function relies on what the network is trying to model (Nikam 16). It is also affected by the assumptions made.
Bhatia, Nitin, and Ashev Vandana. “Survey of Nearest Neighbor Techniques.” International Journal of Computer Science and Information Security, vol. 8, no. 2, 2010, pp. 302-305.
Han, Jiawei, et al. Data Mining: Concepts and Techniques. 3rd ed., Morgan Kaufmann Publishers, 2011.
Kumar, Dharminder, and Deepak Bhardwaj. “Rise of Data Mining: Current and Future Application Areas.” International Journal of Computer Science Issues, vol. 8, no. 5, 2011, pp. 256-260.
Leung, Carson, and Joseph Kyle. “Sports Data Mining: Predicting Results for the College Football Games.” Procedia Computer Science, vol. 35, 2014, pp. 710-719.
Nikam, Sagar. “A Comparative Study of Classification Techniques in Data Mining Algorithms.” Oriental Journal of Computer Science & Technology, vol. 8, no. 1, 2015, pp. 13-19.
Ramageri, Bharati. “Data Mining Techniques and Applications.” Indian Journal of Computer Science and Engineering, vol. 1, no. 4, 2011, pp. 301-305.