Business Intelligence Project: Using Predictive Analytics to Improve a Business Term Paper

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Abstract
Introduction
Literature Review
Proposed System
Results
Discussion and Conclusion
Future Work
References

Abstract

Data mining is a useful tool used to extract pertinent information from a large dataset. It has gained considerable attention in the recent past because of the growing size of data. It is widely used in many fields, such as bioinformatics, marketing, and security sectors. Predictive analysis, and data mining strategies are commonly used in large-scale companies to improve sales. One of such tools is Apriori Algorithm, which was integrated into a in coffee shops to increase sales. The program is efficiently used to train a model based on historical data. It can also be employed to predict or suggest items to customers, making it easier to users to choose items from a pool.

Background and Aim

Predictive analytics has become popular in recent times for huge companies such as “Google” and “Amazon”. In 2020, 35% of Amazon’s sales were obtained from recommended items, amounting to 56$ Billion from a total of 163$ Billion of the online store revenues. The increasing use of predictive analysis illustrates that it is an appropriate technology that can be even be utilized in small businesses. In this regard, the study aims to investigate how predictive analytics affects inventory, customer relations, and sales for small businesses. It is expected that the research will raise awareness about predictive analysis technologies to assist small ventures during uncertain times.

The study focuses on a coffee shop, which is part of hospitality businesses, which have been affected considerably by the ongoing COVID-19 pandemic. About 50% of restaurants in New York are currently closed due to the pandemic because World Health Organization advocates for social distancing and warns against crowding. Overall, the study aims to investigate the impact of predictive analytics by assessing inventory, sales, and customer data of a small business (coffee shop).

Methodology

A local small coffee shop was considered in the analysis, where real inventory, sales, and customer data were obtained. After this, a python program was used to create a model that can be applied to improve sales, customer relation/satisfaction, and inventory tracking. The program and the developed model were then used to determine the performance of the coffee shop.

Results and Conclusion

It was found that Apriori Algorithm can be trained using a python program to help customers select suitable items from a pool based on their historical data. Customer data from a local coffee shop was utilized to investigate how Apriori Algorithm can be used to improve sales of small businesses. It was concluded that the developed system recommends items to clients when using an online platform to order. Therefore, the program enhances customer satisfaction, increasing sales and revenues for the coffee shop. Hence, the proposed system should be implemented by small businesses to improve their sales.

Introduction

Data mining is technology used to extract pertinent information from a huge volume of data. The technology has become popular in the recent years because of the growing amount of data, and the need to make smart decisions based on the assessment of real information. It is widely used in many fields, such as medicine, bioinformatics, marketing, and the security sector. Data mining is used to determine frequent patterns that are utilized for clustering, association mining, and correlation evaluation. Frequent patterns include itemsets, sub-sequences, or substructures, and they should satisfy a predetermined minimum count, which can be determined using Equation 1. Some businesses have embraced data mining technologies, whereas others still employ traditional strategies.

Inventory

Some businesses still utilize traditional methods to assess their inventory, customer relation/satisfaction, and sales performance. Small businesses track inventory manually using paper charts or Ms. Excel, which are limited to small datasets, time-consuming, and prone to errors. Recently, another tool “Square” has been incorporated by many businesses to track their inventory, but it provides low inventory level assessment. The tool cannot predict changes in inventory based on trends, limiting its accuracy. Overall, the traditional tools for tracking inventory for small businesses are limited and should be replaced with smart technology to enhance performance.

Customer Analysis

Evaluation of customer satisfaction and relations is crucial because it determines the sales of a business. The traditional methods for tracking customer retention are not popular among small businesses considering their limited resources. Most of these businesses use standard feedback approaches, such as having a feedback/complaint box. The method is not efficient and it is not easy to recognize customers’ needs and is also challenging to provide customers with feedback. Therefore, online platforms and social media can be integrated to collect customers’ feedback and complaints, and then provide instant response.

Furthermore, the traditional methods do not provide customer recommendations. Recommender systems are being employed to assess customer’s needs, determine their satisfaction, and assist them in selecting the most appropriate item from a pool of available systems. The feedback/complaint box method is not efficient and should be replaced with modern systems that provide instant responses and recommendations based on customers’ needs and satisfaction.

Sales

Increasing sales is the main objective of most businesses, and suitable strategies should be implemented to enhance revenues. Smart decisions based on crunching of real data can help companies improve their sales. Advertisement is one of the crucial tools to promote sales for businesses. However, suitable decisions should be made to determine the most appropriate market, items for sales, and media to use. Therefore, modern technology can be used to assist business management in making pertinent decisions that help improve sales.

Proposed System

The proposed model involves using predictive analytics, which is a new process of transforming huge data into useful information for decision making. The process involves comparing both historical and current data to provide actionable insights. In this regard, the study aims to investigate the impacts of integrating a predictive analytics model into the performance of a small coffee shop. Python program will be utilized to create the model by considering inventory, sales, and customer-related data. It is expected that the algorithm would be able to increase sales by tracking and managing inventory to prevent unnecessary wastage of resources, as well as provide customer recommendations.

Literature Review

Many types of research have been conducted in the past on predictive analytics and their applications into existing and emerging businesses. Some challenges of the existing models have been identified, making it imperative to implement appropriate systems. Yang and Wu found out that time series and sequential data mining are some of challenges for implementing and uptake of predictive analytics tools. Another problem includes the complexity of mining data processes. Therefore, easier clustering, classification, and trend prediction methods should be developed to increase the uptake of the technology.

Previous studies have focused on mining frequent tree patterns because of their efficiency. Wang et al. developed a pattern growth algorithm for mining frequent tree. It generates and tests all possible tree patterns of the database. The authors created two algorithms, Chopper and XSpanner, and the latter comprises mining sequential patterns and the extraction of frequent tree patterns. It was concluded that the XSpanner algorithm is faster than Chopper. Furthermore, Chopper and XSpanner algorithms perform efficiently compared to TreeMinerV of M.J.Zaki, and can be employed to mine frequent trees in a forest of KDD02. Overall, different algorithms have been developed previously for different applications, as summarized in Table 1.

Table 1: Comparison of various Association Rule Mining Algorithms.

Algorithm	Advantages	Disadvantage	Applications
AIS – the algorithm makes several passes over an entire database, and scans all the transactions. Large and frequent items after determined in the first pass, which is utilized to generate candidate itemsets.	AIS algorithm focuses on improving the quality of the database and processes the decision support queries. It investigates the association between different departments and can be utilized to predict customer’s behavior.	The program generates a candidate set generated on the fly.	It is not frequently used, but when used is used to address small problems.
	It is easy to use.	The size of the candidate set is large.
	It is better than STEM.	It requires multiple scans of the whole database.
		The program requires more memory.
STEM – similar to AIS, the algorithm makes multiple passes over a given database. The first scan determines the frequency of individual items. After this, it generates candidate itemsets based on the results of the first pass.	It separates generation from counting. It saves a company of the candidate itemsets and TID to generate transactions in a sequential format.	the algorithm requires prolonged execution time.	It is not frequently used.
		The size of the candidate set is large and requires more space for storage. The method is not efficient.
Apriori –is one of the well-known association algorithms. It generates candidate itemsets by combining large itemsets and deleting smaller ones detected in a previous pass.	It is fast and more efficient than AIS	It takes a lot of memory. It works slowly compared to other algorithms.	It is suitable for closed itemsets.
	Less candidate sets because only large itemsets are considered. It is easy to understand the algorithm. The join and prune steps of the algorithm can be easily implemented on large datasets.
Apriori TID	It does not use an entire database to count candidate sets.	–	It is employed for smaller problems.
	It is better than STEM and it is fast.
Apriori Hybrid	It is efficient than both Aprior and Apriori TID.		It is utilized for the same applications as Apriori and Ariori TID but has better performance.
FP-Growth – it is the most utilized data mining technique utilized for scanning patterns in a transaction dataset.	It considers only 2-passes of a dataset, which reduces execution time.	Using tree structure is complex, making it difficult to implement.	Used in cases of large problems because it does not require the generation of candidate sets.
	It compresses dataset.	It is not appropriate for incremental mining problems,
	No candidate set generation is required.
Rapid Association rule mining –	Avoids candidate generation process.	It requires more storage memory.
	It is faster than FP-Tree algorithm

The proposed system uses Apriori algorithm because it is the most well known and used association rule algorithms. It differs from AIS and STEM in the way candidate itemsets are generated. Apriori algorithm produces candidate itemsets by combining large objects in each scan pass and deleting small items. The method is faster and easier to implement, making it the most suitable program for the current analysis.

Proposed System

The proposed system aims to empower small businesses, such as a coffee shop. The Apriori algorithm was selected as the best strategy, and it can be used to understand customers’ buying patterns, which can be employed to improve sales. Therefore, the study will contribute crucial information that can be used by small businesses and enable them service in the rapidly changing marketplace.

Apriori Algorithm

Apriori algorithm is used to find frequent itemsets in a given dataset using the boolean association rule. It employs prior knowledge of frequent itemset properties to identify k- k+1-frequent itemsets iteratively. In the first step, candidate set is obtained by combining large itemsets. After this, the property deletes all the items that do not meet the requirements, as illustrated in Table 2. Table 3 illustrates an example of a Apriori algorithm, which shows that it is an iterative process.

*Table 3. An example of a Apriori algorithm.*

Steps of Apriori Algorithm

The steps of Apriori algorithm are iterative until the required results are obtained. The ensuing discussion provides an illustrates of how the Apriori algorithm is implemented, and the iterative process. To start with, consider a dataset of items bought by a user, as illustrated in Table 4.

Table 4. Example of a dataset that can be analyzed using the Apriori algorithm.

Items bought together

A1, A2, A5

A2, A4

A2, A3

A1, A2, A4

A1, A3

A2, A3

A1, A3

A1, A2, A3, A5

A1, A2, A3

Step-1. The first step involves candidate set C1, which are then compared, and entries that have support count less than the minimum support count (assume a value 2) are removed. The resultant set is Itemset L1, as illustrated in Table 4.

Table 5. The candidate set C1 developed from the data summarized in Table 4.

Itemset	Support Count
A1	6
A2	7
A3	6
A4	2
A5	2

Step-2: The step involves generating candidate set C2 using L1 dataset. The large itemset that are joined should have (K-2) elements in common, and the data obtained recorded in Table 6.

Table 6. Generated candidate set C2 using L1.

Itemset	Support Count
A1, A2	4
A1, A3	4
A1, A4	1
A1, A5	2
A2, A3	4
A2, A4	2
A2, A5	2
A3, A4	0
A3, A5	1
A4, A5	0

After generating candidate set C2 using L1, the obtained results should be compared and entries that have less count than the minimum support be removed. The resultant set is referred to as Itemset L2.

Table 7. Generated itemset L2 from C2

Itemset	Support Count
A1, A2	4
A1, A3	4
A1, A5	2
A2, A3	4
A2, A4	2
A2, A5	2

Step-3: The step is used to generate candidate set C3 using L2. The large itemsets are combine, and the items that do not meet requirements removed.

Table 8. The generated candidate set C3 using L2.

Itemset	Support Count
A1, A2, A3	2
A1, A2, A5	2

Step-4: Similarly generate a candidate set C4 using L3, and the corresponding L4 after filtering. Stop when no more frequent itemsets are found.

Figures 1 and 2 illustrates the process of Apriori algorithm training. The first flow chart is a typical process that can be utilized for different applications. The second flow chart was utilized in the current investigation that was meant to develop a system that would recommend different types of coffee to customer to improve their satisfaction. It is expected that the coffee shop owner would implement the proposed system to help increase the revenues. The findings can also be used to determine if predictive analytics are suitable for small businesses,

*Figure 1: Flowchart of Proposed System – Training.*

Flowchart of Proposed System - Execution. — *Figure 2: Flowchart of Proposed System – Execution.*

Results

Two activities were performed in the exercise to develop a system than can be used to recommend items to customers in coffee shops. Execution-1 involved developing a program using data from CSV file, with items like I1, I2, … to train the system to select different objects based on defined category. Execution-2 involves developing and running a program using a CSV file data from the coffee shop. The program trains itself with menu objects and the recommends items based on customers’ historical data.

Figures 1and 2 summarizes the flowchart of a typical Apriori algorithm training, and the proposed program for the coffee shop, respectively. The second flowchart was utilized in the analysis, and the steps described in Tables 4 to 8 were employed in developing an Apriori algorithm, as illustrated in 3. Figure 3 is a screenshot of the developed program to train the Apriori algorithm, while Figure 4 is the program employed for the proposed coffee shop. The program was implemented and executed in python, as illustrated in Figures 3 and 4. During the process, a dataset of frequently bought items was considered. The data was contained in a CSV file, which was the analyzed using the python programming language. The program trains itself using the data provided, and then suggest to customers suitable items based their preference.

*Figure 3: Execution-1 (Program to train the Apriori algorithm).*

The screenshot illustrates that the process is iterative and starts with entering the customer data obtained in a CSV file. A cart is then created, and the available items identified (i1, i2, i3, i4, and i5). The second step defines the items that should be added to the cart based on the category requirement. The process then continues until the required categories are developed. After training the program, data obtained from the coffee shop was entered and the performance of the program assessed.

Execution-2 - Apriori algorithm to recommend items to costumers based on their objects added to the cart). — *Figure 4: Execution-2 – Apriori algorithm to recommend items to costumers based on their objects added to the cart).*

The items obtained from the CSV data file include americano, black coffee, cappuccino, dark chocolate, donut, espresso, ginger tea, green tea, latte, and mocha. The program is trained such that it recommends the most suitable items to customers based on the preferences and history. Overall, Apriori algorithm can be utilized for small business, such as a coffee shop.

Discussion and Conclusion

The Apriori Algorithm was implemented in Python language, and the program was designed accept data in a CSV format. It then trains on the dataset by creating Candidate Sets and the corresponding Itemset based on the required minimum frequency. Once the program has been trained on a data, it asks users to add items to their order list. It then recommends appropriate items to customers based on their preference. The algorithm calculates confidence percent and discards items that have values below the threshold. It was concluded that the Apriori algorithm could be used efficiently to train a program based on historical data, and it can predict items to customers.

Such a system, when incorporated into small, medium, and large businesses, can make it easier for customers to find the items from a large pool. The system can also entice users to add more items to their order list, increasing sales. Overall, the proposed system can be employed for small businesses, such as coffee shops, to improve customer experience and increase sales by providing suggestions to the customers based on their reference and historical data.

Future Work

The program can be trained on each execution, such that it executes multiple runs.
The program can be enhanced by enabling complete retraining and self-updating.
The program should accept other data format rather than CSV file, making it easier to integrate with other systems. Collecting data manually can be time consuming and is prone to errors. Therefore, the system should be able to collect and appraise data automatically.

References

K. K. Mümine. “An Overview: The Impact of Data Mining Applications on Various Sectors”, Tehnički glasnik, Vol.11, no.3, pp.128-132, 2017.
J. Fashaya, and T. Ruankaew. “A Study of Inventory Control Systems by Jamaican SMEs in Retail and Manufacturing/Distribution Industries”, International Journal of Business and Management, Vol.12, no.8, pp.1-5, 2017.
S. Nasır. Customer Relationship Management Strategies in the Digital Era. A volume in the Advances in Marketing, Customer Relationship Management, and E-Services (AMCRMES) Book Series, 2016.
H. Karaxha, B. Ramonsaj, and A.Abazi. “The Influence of Advertisements in Increasing the Sales in Kosovo”, ILIRIA International Review, Vol.6, no.2, pp. 75-84, 2016.
Q. Yang and X. Wu, “10 Challenging Problems in Data Mining Research”, Intl J. Information Technology and Decision Making, Vol. 5, no. 4, pp. 597-604, 2006.
J. P. H. Z. W. W. C. Wang, M. Hong and B. Shi, “Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining”, Proc. Pacific- Asia Conf. Knowledge Discovery and Data Mining (PAKDD 04), pp. 441-451, 2004.
M.J. Zaki, “Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications”, IEEE Trans. on Knowledge and Data Eng., Vol. 17, no. 8, pp.1021-1035. 2005.
S. Qaz and L.A. Deshpande, “A Survey Paper on Discovering Patterns from Human Interactions”, International Journal of Latest Trends in Engineering and Technology (IJLTET), Vol.5, no.1, pp. 1-13, 2015.
Bathla, Himani, and Ms Kavita Kathuria. “Association Rule Mining: Algorithms Used”, Int. J. Comput. Sci. Mob. Comput, Vol.4, no.6, pp. 271-277, 2015.
A. Al-Hamodi, S. Lu, and Y. Al-Salhi. “An Enhanced Frequent Pattern Growth Based on Mapreduce for Mining Association Rules.” International Journal of Data Mining & Knowledge Management Process (IJDKP), Vol.6, no.2, pp.19-28, 2016.