Background
Social networks have gained mass popularity in recent years. While social networks help share information and connect like-minded people, they can also lead to misuse of or unintended use of data that they share through data mining attempts. One of the most commonly used professional social networks is linkedin.com. Linkedin.com was launched in 2003 and as of October 2009, has 50 million users worldwide with 11 million users in Europe alone (Weiner, 2009). India is the fastest-growing country with 3 million users and rising day by day (Weiner, 2009). As Jeff Weiner (2009) says, LinkedIn’s mission is “to connect the world’s professionals to make them more productive and successful”. How far LinkedIn has been successful in this attempt is interesting research by itself but more importantly, I find it more interesting to explore whether data mining is assisting LinkedIn in achieving its goal of creating further roadblocks? Getting a user to join LinkedIn is one thing, but being able to retain that user is another thing. So have users found LinkedIn as useful as it claims and have they continued to use LinkedIn or have they moved on? Survival data mining techniques can help answer this question. The survival data mining technique is an interesting technique that provides “rapid feedback about the customers and their behaviors, while at the same time providing a solid basis for quantifying customer value and measuring customer loyalty” (Linoff, 2004).
Aims
The project aims to understand LinkedIn users, improve user retention and a user lifetime to improve profit margins using survival data mining techniques:
- to understand value addition such data mining can result into.
- to understand users, their behaviors, and preferences.
- to understand what strategies work best for LinkedIn
- to understand how LinkedIn data can be misused, used against the user, or made unintentional use of without permission and how that can undermine the LinkedIn goals.
- to understand what steps can be taken and what needs to be taken to reduce the chances of data being misused thereby improving chances of user retention.
Methodology
One of the ways to achieve the aim is to understand how users view data mining of their data on LinkedIn. This can be achieved by interviews or surveys of LinkedIn users whose expectations and concerns regarding the usage of their data can be noted down and their experience regarding the same can be found out.
The second methodology to be employed is to harvest data directly from LinkedIn. This will serve as a proof of concept as to how easy or difficult it is to conduct data mining from LinkedIn to automatically gather large amounts of data and draw conclusions on usage trends. Since there is a multitude of questions to be answered through this research, a variety of techniques will be used including cluster analysis, classification and prediction, and statistical analysis. This data will also be used to perform survival data mining to understand how users can be retained.
*Research Questions
The research seeks to find the answers to the following questions:
- How can survival data mining provide insights into users?
- What are the different types of LinkedIn users?
- How can survival data mining be used to improve user retention?
- How can data mining be used on LinkedIn to provide more value-added features to its users?
- How can data mining reveal usage trends useful for conducting market research?
- Can data mining be used on LinkedIn to help identify crime associations, terrorist activities as can be seen as possibilities with another social network?
- How can data mining lead to misuse of information provided or unintended use of information that takes place without the user’s permission?
- What steps can a LinkedIn user take to protect his privacy? What are the tradeoffs in that case?
*Review of the literature
LinkedIn as a business-oriented social networking site has caught the attention of many researchers (e.g. O’Murchu et al, 2004; Churchill et al, 2005); however, it is used more as an example or for comparison with other social networking sites. There has been little research on the usefulness, privacy, and other data mining aspects although there are blogs and news articles on LinkedIn that are interesting. In a related field, there has been similar research conducted on social networking sites aimed at having more friendly than professional contacts including FaceBook and MySpace (e.g. Jones et al, 2005). Data mining on social networks, in general, has also been researched well enough with security and privacy on social networks receiving the most attention. For example, Clifton et al (1996) discuss the security and privacy implications on social networking and provide several methods that could be used to prevent data mining such as fuzzing the data, eliminating unnecessary groupings, audits, augmenting the data, etc. Seifert (2007) and Pant et al (2009) discuss data mining on social networks as a means to detect fraud and terrorist activities.
Survival data mining approaches have been researched in the medical fields and other areas but not applied to social networks like LinkedIn. Linoff (2004) explains how survival data mining can be applied to a subscription-based business model.
Expected Outcomes, Significance, or Rationale
LinkedIn is growing as a professional networking site and the data stored there could have lots of potential for positive research both for the benefit of the user as well as for the business. However, understanding how it can be misused is more important since it could cause harm to the professional image of a user of this site. The research will look at ways that data mining could benefit the users of this site and the business itself positively. The research will also look at how data mining could prove detrimental to the users and what they can do to prevent such incidents from happening.
*Timetable
List of References
ATA, N., ÖZKÖK, E. & KARABEY, U. (2005) “SURVIVAL DATA MINING: AN APPLICATION TO CREDIT CARD HOLDERS”. Journal of Engineering and Natural Sciences, 26(1), 33-42.
Bishop, K., Draskovich, J., Hottenroth, A., Lee, B. & Pesavento, S. (2005) Business Uses of Data Mining and Data Warehousing. Web.
Breiger, R. L., Carley, K. M. & Pattison, P. (2003) “Dynamic Social Network Modeling and Analysis: workshop summary and papers”. National Academics Press.
Churchill, E. F. & Halverson, C. A. (2005) “Social Networks and Social Networking”. IEEE Computer Society, 2005, 14-19
Clifton, C. & Marks, D. (1996) “Security and Privacy Implications of Data Mining”. Proceedings of the 1996 SIGMOD Workshop on Data Mining and Knowledge Discovery.
Collica, R. (2004) “Data Mining Galore: For Business Applications”. Web.
Han, J. & Kamber, M. (2006) “Data mining: concepts and techniques “. Elsevier Inc..
Han, J. & Kamber, M. (2006) “Data mining: concepts and techniques “. Morgan Kaufmann.
Jensen, D. & Neville, J. (n.d.) Data Mining in Social Networks. Web.
Jones, H. & Soltren, J. H. (2005) Facebook: Threats to Privacy. Web.
Kleinberg, J. M. (2007) “Challenges in mining social network data: processes, privacy, and paradoxes”. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining.
KUSIAK, A., DIXON, B. & SHAH, S. (2006) “Predicting survival time for kidney dialysis patients: a data mining approach”. Computers in Biology and Medicine, 35(4), 311-327.
Linoff, G. S. (2004) “Survival Data Mining”. Web.
Linoff, G. S. (2004) Survival Data Mining for Customer Insight. Web.
Mohammed, Z & Kotze, D (2005) “Survival data mining in the telecommunications industries: usefulness and complications “. Data Mining XI: Data Mining, Text Mining and Their Business Applications., 505-512
Olson, D. L. & Delen, D. (2008) “Advanced Data Mining Techniques”. Springer.
O’Murchu, I., Breslin, J. G., and Decker, S. (2004): “Online Social and Business Networking Communities”. Technical Report.
Pant, D. & Sharma, M. K. (2009) “Web Mining and Social Network Analysis in Cyber war, to warn about terrorist attacks”. Web.
Potts, W. (2006) Survival Data Mining. Web.
Seifert, J. W. (2009) Data Mining and Homeland Security: An Overview . Web.
Wang, J. (2009) “Encyclopedia of Data Warehousing and Mining “. Information Science Reference.
Weiner, J. (200) LinkedIn: 50 million professionals worldwide. Web.