Introduction
In recent times, the relatively new discipline of data mining has been a subject of widely published debate in mainstream forums and academic discourses, not only due to the fact that it forms a critical constituent in the more general process of Knowledge Discovery in Databases (KDD), but also due to the increased realization that this discipline can be applied in a number of areas to enhance decision making processes, efficiency, and competitiveness in contemporary organizations (Kusiak, 2006).
The basic concept behind the emergence of data mining, and which has contributed immensely to its admissibility as one of the increasingly used strategies in business establishments as well as scientific and research undertakings, is that by automatically sifting through large volumes of information which may primarily appear irrelevant, it should be possible for interested parties to extract nuggets of useful knowledge which can then be used to drive their agenda forward (Adams, 2010).
Goth (2010) observes that the emergence of data mining has been primarily informed by the rapid growth in data warehouses as well as the recognition that this heap of operational data can be potentially exploited as an extension of both business and scientific intelligence.
The present paper seeks to critically discuss the discipline of data mining with a view to illuminate knowledge about its origins, concepts, applications, and the legal and ethical issues involved in this particular field.
Definition & History of Data Mining
Although data mining as a concept has been defined differentially in diverse mediums, this report will adopt the simple definition given by Payne & Trumbach (2009), that “…data mining is the set of activities used to find new, hidden or unexpected patterns in data” (p. 241-242).
The purpose of data mining, as observed by these authors, is to extract information that would not be readily established by searching databases of raw data alone. Through data mining, organizations are now able to combine data from incongruent sources, both internal and external, from across a multiplicity of platforms with a view to assist in a variety of business applications.
At its most elemental state, data mining utilizes proved procedures, including modeling techniques, statistical investigation, machine learning, and database technology, among others, to seek prototypes of data and fine relationships in the sifted data with the main objective of deducing rules and intricate relationships that will inarguably permit the extrapolation of future outcomes (Pain & Trumbach, 2009; Adams, 2010).
Researchers and practitioners are in agreement that the capability of both generating and collecting data from a wide variety of sources has greatly impacted the growth trajectories of data mining as a discipline.
This capability, according to Adams (2010) and Chen (2006), was precipitated by a number of variables, which can be categorized into the following:
- increased computerization of business, scientific, and government transactions with the view to increase efficiency and productivity,
- extensive usage of electronic cameras, scanners, publication devices, and internationally recognized bar codes for most business-related products,
- advances in data gathering instruments ranging from scanned documents and image platforms to global positioning and remote sensing systems,
- the development and popularization of the World Wide Web and the internet as widely accepted global information systems.
This explosive growth in stored or ephemeral data brought us to the information age, which was, and continues to be, characterized by an imperative need to develop new techniques, procedures and automated tools that can astutely assist us in transforming and making sense of the huge quantities of data collected via the above stated protocols (Goth, 2010).
To dig a bit deeper into the history of data mining, research has been able to establish that the term ‘data mining’, which was introduced in the 1990s, has its origins in three interrelated family lines. It is important to note that the convergence of these family lines to develop a unique discipline in the context of data mining certainly gives it its scientific foundation (Adams, 2010).
This notwithstanding, extant research (Adams, 2010; Chez, 2006) demonstrate that the longest of these family lines to be credited with the gradual development of data mining as a fully-fledged discipline is known as classical statistics.
Researchers are in agreement that it would not have been possible to develop the field of data mining in the absence of statistics as the latter provides the foundation of most technologies on which the former is built, such as “regression analysis, standard distribution, standard deviation, standard variance, discriminant analysis, and confidence intervals” (Goth, 2010, p. 14).
All these concepts, according to this author, are used to study data and data relationships – central aspects in any data mining exercise.
The second longest family line that has contributed immensely to the emergence of data mining as a fully-fledged field is known as artificial intelligence, or simply AI. Extant research demonstrate that the AI discipline, which is developed upon heuristics as opposed to statistics, endeavors to apply human-thought-like processing to statistical challenges while using computer processing power as the appropriate medium (Talia & Trunfio, 2010).
It is important to mention that since this approach was tied to the availability of computers and supercomputers to undertake the heuristics, it was not practical until the early 1980s, when computers started trickling into the market at reasonable prices (Goth, 2010).
The third family line to have influenced the field of data mining is what is generally known as machine learning or, better still, the amalgamation of statistics and AI (Adams, 2010). Here, it is of importance to note that while AI could not have been viewed as a commercial success during the formative years, its techniques and strategies were largely co-opted by machine learning.
It is also important to note that machine learning, while able to take the full benefit of the ever-improving price/performance quotients provided by computers in the decades of the 1980s and 1990s, found usage in more applications because the entry price was lower that that of AI, not mentioning that it was largely considered as an evolved facet of AI as it was effectively able to blend AI heuristics with complex statistical analysis (Chen, 2006).
Review of how Data Mining is used Today and how it could be used in the Future
Presently, there exist broad consensus that data mining is mostly based on the machine leaning techniques; that is, it is fundamentally perceived as the adaptation of machine learning techniques and concepts to a wide variety of areas, such as business and scientific applications (Adams, 2010).
Therefore, the present-day data mining can only be described as the amalgamation of historical and recent developments, particularly in statistics, artificial intelligence, and machine learning, with a view to developing a software program that can run on a standard computer to, among other things, make diverse decisions based on the data under study, use statistical concepts and applications to establish various relationships among the data, and also use more advanced artificial intelligence heuristics and algorithms to achieve its major goal (Talia & Trunfio, 2010).
Extant research demonstrate that the major objective of current data mining applications is to sift through huge volumes of data to extract nuggets of useful data, which can then be used to establish previously-hidden trends or patterns.
Today, more than ever before, data mining is used in the business arena to boost corporate profits by improving customer relations and targeting new customers (Cary et al, 2003).
According to these authors, “…AT&T Wireless was able to increase it’s subscriber base by 20% in less than a year when it contracted with a data-mining company to identify customers that would likely to be interested in AT&T’s new flat-fee wireless service” (p. 158).
The AT&T story demonstrates that visions of achieving good returns continue to drive businesses toward embracing data mining technology.
Data mining is bound to be used along the same lines in the future to enable enterprises make critical decisions from a knowledge-oriented perspective. Consecutive studies have demonstrated that most business organizations fail to wade through the harsh economic waters of modern times due to their perceived inadequacy to base their most important decisions on knowledge and evidence (Adams, 2010; Goth, 2010).
However, it is now evident that data mining can be used to endear organizations closer to a knowledge-based economy, which basically translates into the use of knowledge to generate economic benefits.
Chen (2006) observes that a knowledge-based economy necessitates data mining processes to become more goal-oriented with the view to generating an enabling environment where more tangible results can be achieved.
Consequently, data mining should be used in the future not only to facilitate the uncovering of concealed knowledge beneath the ocean of data readily found in a multiplicity of mediums and applications, but also to ensure that it makes important contributions to the knowledge-based economy with the express intention of coming up with more tangible business and scientific outcomes (Chen, 2006; Adams, 2010).
Types of Data Mining Applications
There exist a multiplicity of data mining applications which can be used in diverse situations and environments depending on the major objective for usage. Some data mining applications, according to Chen (2006), are simple to use and may be offered for free, while others are complex and require a sizeable investment to operationalize.
This section will discuss some data mining applications based on the sector of practice, and will mainly focus attention to the banking and finance, retail, and the healthcare sectors of the economy.
Data Mining Applications in the Banking & Finance Sector
Most banking institutions have over the years employed a multiplicity of data mining applications to model and predict credit fraud, to assess borrower risk, to undertake trend analysis, and to evaluate profitability, as well as to assist in the initiation and management of direct marketing activities (Seifert, 2004).
In equal measure, most finance and credit companies have over the years employed a variety of neural networks and other data mining applications “…in stock-price forecasting, in option trading, in bond rating, in portfolio management, in commodity price prediction, in mergers and acquisitions, as well as in forecasting financial disasters” (p. 191).
Here, it can be noted that the Neural Applications Corporation has developed an effective application known as NETPROPHET, which is increasingly being used by finance companies to make stock predictions by illustrating the real and predicted stock values depending on the type of data that has been keyed into the system (Groth, 1999; Chen, 2006).
The banking sector is continuously been faced with fraud cases, and data mining applications such as HNC Falcon has assisted the institutions to monitor payment-card applications, decreasing fraud cases by almost 75 percent while increasing applications for payment card accounts by as much as 50 percent on a yearly basis (Groth, 1999).
The importance of banks to develop data mining applications that could be used in cross-selling and maintenance of customer loyalty has been well documented in literature.
These applications, according to Groth (1999), mainly assist banking institutions to model the behavior of their customers in such a manner that the resulting relationships could be used to establish the needs and demands of their customers, as well as make objective predictions into the future.
RightPoint software, Security First, and BroadVision are some of the vendors primarily interested in integrating predictive technologies with consumer interaction points to ensure customer needs and demands are efficiently dealt with (Groth, 1999; Chen, 2006), in addition to using predictive technologies to integrate one-to-one marketing strategies to their clients banking sites (Adam, 2010).
According to Groth (1999), “…the RightPoint Real-Time Marketing Suite takes data-mining models and leverages them within real-time interactions with customers” (p. 194). This application is unique in that it is designed to develop, manage and deliver one-to-one marketing initiatives for high-end industries that heavily depend on direct customer interaction to undertake business (Goth, 2010).
As a general prerogative, it is important to note that majority of the data mining applications used in the banking and finance sector attempt to ensure that each customer interaction seizes the prospect of enhancing customer satisfaction, loyalty, motivation, and profit-generation potential (Talia & Trunfio, 2010; Zhang & Segall, 2008).
Data mining Applications in Retail
Intense competition and slim profit margins have obliged retailers to embrace data warehousing strategies earlier than other sectors. As observed by Groth (1999), “…retailers have seen improved decision-support processes lead directly to improved efficiency in inventory management and financial forecasting” (p. 198).
It is a well known fact that expansive retail and supermarket chains are in possession of huge quantities of point-of-sale data that is not only information-rich, but could be employed using appropriate data mining applications to improve the stated decision-support strategies, improve efficiency in financial predictions and inventory management, and analyze customer shopping patterns (Seifert, 2004).
In the retail sector, the AREAS Property Valuation product from HNC software, as well as SABRE Decision Technologies, serves as good examples on how data mining applications can be used in the retail sector to perform valuations, projection and forecasting, customer purchasing behavior analysis, and customer retention analysis, with the underlying purpose of increasing profitability, enhancing customer experience, and making better and more informed business decisions (Zhang & Segall, 2008).
In evaluating customer profitability in the retail sector, a software vendor referred to as Dovetail Solutions has developed a data mining application known as Value, Activity, and LoyaltyTM (VALTM), with a view to utilize transactional business data from the retailers to synthesize information about customer activity and processes, churn rate, and anticipated future purchases (Groth, 1999; Chen, 2006).
Data Mining Applications in the Medical Field
The vast amount of data available within the healthcare industry, including the associated data collected via medical research, biotechs, and the pharmaceutical industry, have provided a fertile ground for data mining applications to grow. The knowledge that data mining has been employed expansively in the medical industry is in the public domain.
For example, we are aware that the vendor NeuroMedical Systems ingeniously employed neural networks to create a pap smear diagnostic aid, while both Vysis Rochester Cancer Center and the Oxford Transplant Center continues to employ a data mining application known as KnowledgeSEEKER, which utilizes a decision tree technology, to assist in various research undertakings (Groth, 1999; Adams, 2010; Chen, 2006).
It is important to note that these applications are beneficial in the medical sector as they enable health practitioners to come up with accurate diagnosis even without subjecting patients to physical examination (Koh & Tan, 2008).
Governments and other interested health agencies can utilize data mining applications, such as MapInfo, KnowledgeSEEKER, and LEADERS, among others, to: demonstrate average costs of health services; show efficiency of a particular prescription over time; reveal efficacy rates of diverse pathogens over time; develop superior diagnosis and treatment protocols; show patient location in order to deliver superior health services; or assist healthcare insurers to detect fraud (Koh & Tan, 2008; Wen-Chung et al, 2010).
Legal & Ethical Issues in Data Mining
As is the case in other disciplines, the field of data mining is faced with a complexity of legal and ethical issues which needs to be addressed for the applications to succeed.
In the legal arena, it is important to evaluate how organizations should employ data mining applications while remaining focused on protecting the private information of their customers so as to avoid customer dissatisfaction or even being subjected to legal action by customers who may feel that the organizations intruded into their privacy (Cary et al, 2003; Wen-Chung et al, 2010).
In terms of ethical issues, it is a well known fact the spread of personal information, as is the case in many data mining applications, can lead to elevated risks of customer identity theft (Cary et al, 2003).
According to Payne & Trumbach (2009), data mining processes brings into the fore a scenario where “…the consumer loses aspects of privacy as all of their basic demographic information, personal interests, correspondence and activities are stored in databases and available to be combined together” (p. 243).
Such a scenario has obvious ethical ramifications since this information can be used to the disadvantage of the customers.
Another ethical query arises from the fact that consumers lose the control over what happens to their personal information held in large databases, implying that such kind of information can be used to the disadvantage of the providers if it happens to fall into the wrong hands (Payne & Trumbach, 2009; Cary et al 2003; McGraw, 2010).
What’s more, customers who provide personal information to organizations face a more ominous challenge that may entail potential discrimination based on the personal information they either provide or refuse to provide to the organizations.
Another important factor to consider when evaluating ethical concerns in data mining is that there is no fine line distinguishing if it is indeed necessary for an organization to use the private information of its customers to enhance its profitability or if such information should be sorely used to improve customer satisfaction and maintain consumer trust (Payne & Trumbach, 2009).
Lastly, it is well known that a number of data mining processes may yield incorrect conclusions, which may be costly to the organization as well as to the customers (McGraw, 2010).
Conclusion
This discussion has brought into the fore important aspects of data mining, its current and future uses, as well as perceived limitations in terms of legal and ethical constraints.
The general consensus among academics and practitioners is that data mining represents the new frontier of growth, particularly in nurturing mutually fulfilling customer relationships, ensuring that customer’s needs and demands are satisfactorily met, and in facilitating organizations to forecast and predict future growth and decision patterns (Adams, 2010; Kusiak, 2006; Wen-Chung et al, 2010).
The task is therefore for developers to continue investing heavily in effective and efficient data mining applications to ensure that such tools are able to achieve what they were originally intended to achieve. Consequently, research and development into these applications and tools is of primary importance.
Reference List
Adams, N.M. (2010). Perspectives on data mining. International Journal of Market Research, 52(1), 11-19. Retrieved from Business Source Premier Database
Cary, C., Wen, H.J., & Mahatanankoon, P. (2003). Data mining: Consumer privacy, ethical policy, and systems development practices. Human Systems Management, 22(4), 157-168. Retrieved from Business Source Premier Database
Chen, Z. (2006). From data mining to behavior mining. International Journal of Information Technology & Decision Making, 5(4), 703-711. Retrieved from Business Source Premier Database
Goth, G. (2010). Turning data into knowledge. Communications of the ACM, 53(11), 13-15. Retrieved from Business Source Premier Database
Groth, R. (1999). Data mining: Building competitive advantage. Upper Saddle River, NJ: Prentice Hall
Koh, H.C., & Tan, G. (2008). Data mining applications in healthcare. Journal of Healthcare Information Management, 19(2), 64-72
Kusiak, A. (2006). Data mining: Manufacturing and service applications. International Journal of Production Research, 44(18/19), 4175-4191. Retrieved from Business Source Premier Database
McGraw, D. (2010). Data identifiability and privacy. American Journal of Bioethics, 10(9), 30-31. Retrieved from Academic Search Premier Database
Payne, D., Trumbach, C.C. (2009). Data mining: Proprietary rights, people and proposals. Business Ethics: A European Review, 18(3), 241-252. Retrieved from Business Source Premier Database
Seifert, J.W. (2004). Data mining: An overview. Retrieved from <https://fas.org/irp/crs/RL31798.pdf>
Talia, D., & Trunfio, P. (2010). How distributed data mining tasks can thrive as knowledge services. Communications of the ACM, 53(7), 132-137. Retrieved from Business Source Premier Database
Wen-Chung, S., Chao-Tung, Y., & Shian-Shyong, T. (2010). Performance-based data distribution for data mining applications on grid computing environments. Journal of Supercomputing, 52(2), 171-198. Retrieved from Academic Search Premier Database
Zhang, Q., & Segall, R.S. (2008). Web mining: A survey of current research, techniques, and software. International Journal of Information Technology & Decision Making, 7(4), 683-720. Retrieved from Business Source Premier Database