A data stack is a collection of technologies that handle data from one or more raw data bundles to organized data which can be studied and reflected on. The various stacking methods are wrangling, storage, and analysis. The Extract, Load, Transform (ETL) is a distinct process that has emerged with the emergence of large data storage providers (“Analytics data stack overview” no date). Raw data is stored separate place from the processed data created for analysis in an ETL model. This paper will outline common data stacking methods and address their limitations and real-life application.
There are several methods for gathering data and even more ways for it to be incomplete, sloppy, or irrelevant (Jebamalar and Kamalakannan, 2021; Provost and Fawcett, 2013). Tools designed for data wrangling make it easier to handle data sources and sanitize recorded data into more readable formats. The clean data must be saved someplace so that it may be referred to later for analysis, which is known as data storage (Hammam et al., 2020; Jafferjee, 2020). The issue with data storage is that big data analytics require the organization to have servers that are capable of processing and collecting large volumes of data. Data analysis methods help utilize the stacks by transforming them and determining dependencies where possible, although a limitation may be linked to the type of data collected and the ability to make conclusions based on it.
The stacking method described above can be a good approach for managing data. McAfee and Brynjolfsson (2012) provided an example of a bookstore that tracks which items were sold and which remained in the storage. Data wrangling can be used to refine the types of information that this bookstore collects, such as the ages of the customers, their interests, and the types of books they purchased. This approach is acceptable because it allows deleting the data that does not provide insight to the management while collecting all data can be costly. Additionally, this approach helps tailor the advertisements to the customers that will be interested in certain books based on what their peers have purchased, making wrangling an important element of data stacking.
Reference list
Analytics data stack overview (no date) Web.
Hammam, A., Elmousalami, H. and Hassanien, A. (2020) ‘Stacking deep learning for early COVID-19 vision diagnosis’, Studies in Big Data, pp. 297-307.
Jafferjee, A. (2020) ‘Building a modern analytics stack’, Towards Data Science. Web.
Jebamalar, J. A. and Kamalakannan, T. (2021) ‘Enhanced stacking ensemble model in predictive analytics of environmental sensor data,’ 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), pp. 482-486.
Provost, F. & Fawcett, T.(2013) Data sciences for business: What you need to know about data mining and data-analytics thinking, 1st edition. O’Reilly Media, Sebastopol, CA.
McAfee, A. & Brynjolfsson, E. (2012) ‘Big Data: the management revolution.’ Harvard Business Review. Web.