Abstract
Twitter application programming interface (API) is a publicly available extension of Twitter that allows programmers to incorporate various aspects of the social network into their applications, websites, and software. This literature review examines research about the API and about its usage on a number of problems.
The highlighted examples include the usage of the API in gathering data for administration, medical, and scientific usage. As part of the review, the discussion also shows the limitations that come with relying on the Twitter API and the variations that exist with the different APIs, as well as the possible motivations for using them.
The most common usage is the collection of Geo-location data to link content with geographical location for the sake of explaining a given issue. The review also mentions the architecture of Twitter as a system that gets better with additional usage and uses this view as a basis for understanding how seemingly independent uses of APIs contribute to the overall usefulness of the social network.
Literature review
Twitter is an excellent social media tool for individual and group usage, both from the perspective of the information consumer and the creator. However, in addition to its basic architecture as a social network, it is also a very credible tool for gathering crowd-sourced information.
Twitter can offer location data, such as Global Positioning System (GPS) coordinates. Such information is useful for studying and monitoring online health information (Burton, Tanner, Giraud-Carrier, West, & Barnes, 2012). Understanding the location information is not as widespread as the use of Twitter itself.
Part of the reason for the limited usage is the inadequate amount of documentation and research on the subject of Geo-location using Twitter and mining of online health information using Twitter (Burton et al., 2012). Twitter works well for information researchers because it is a collection of information bread crumbs.
People’s interaction online leaves tiny records of their daily experiences (Burton et al., 2012). With traditional behavioral assessments, one would have to go for self-report or observation as methods of research. However, with the proliferation of mobile communication devices, all that is easy using different tools.
With social media apps that link to social media networks, it is both easy and possible to do real-time data collection and study behaviors and health outcomes (Burton et al., 2012). Research by Burton et al. (2012) used Twitter streaming application-programing interface (API) to observe tweets for 2 weeks.
The researchers found out that the parsed state data for the United States, collected using the Twitter API, matched GPS-derived state data 87.69% of the time. This shows the potential of the tool as a viable data collection method for seeking behavioral and self-reported data (Burton et al., 2012).
According to Fitz-Gerald (2010), creative uses of the API available using the platform include science, religion, marketing, social change, money matters, and education. Others are sports and entertainment. With proper adjustment, the API can suit any area of interest in life.
Many uses of the API are indirect, as users interact mostly with already developed applications that employ the Twitter protocol. Thus, research into the usage of the API can as well focus on the applications presented for search, publishing, information streams, and statistics based on the social network.
Being familiar with the elements of web programs is important for anyone looking into the exploration of the Twitter API as a tool. Otherwise, one has to use already developed applications for end users. Key programming elements required include CSS, PHP, and MySQL.
They are the software building blocks. Fitz-Gerald (2010) explains the process of setting up and managing API and the automation process that completes the endeavor. The technical aspects of the tool make it less friendly for ordinary usage and explain why its usage is not very popular, despite a large amount of publicity.
Benhardus and Kalita (2013) researched on Twitter’s ability to show trends on social matters around the globe. They realized that it is a useful tool for examining various aspects of natural language processing (NLP) and machine learning. On Twitter, a topic is usually a persisted collective chatter that is user-initiated.
The topics of discussion or conversation are usually responses to events. When there is a sudden high intensity, it becomes a spike. Trending topics, on the other hand, are combinations of spikes and chatters, but they are mainly characterized by chatter.
This is where many people are speaking about the same thing, but not necessarily reacting to an event. Some challenges for researchers would be to first define the trending topic and to determine what would constitute a certain success rate for the methodology used. Sometimes, when using Twitter’s data, it is useful to have different methodologies for the sake of cross-referencing of data (Benhardus & Kalita, 2013).
The research by Benhardus and Kalita (2013) examined the terms used on Twitter and their potential to match trending topics. The aim was to find out the methodologies most useful in predicting trending topics. The tested ones include normalized term frequency analysis.
In their conclusion, the research explained that future research and applications can look into the relationship in the method, the term, and the trending topic on Twitter and on other sources as search engine trends. Mills (2011) builds on the theory of concretization of technological objects and applies the advanced theory to the area of software studies.
In discussing examples of applications, the researcher looks at the financial markets and Twitter APIs. According to the original theory examined, the joining of two technologies to work as one equipment or system creates a series of problems that will need resolving.
The solving of the problems becomes concretization, or sealing the visible differences in the two technologies. The theory also insists that concretization takes place independent of economic and social concerns and cannot become simply an anterior scientific principle.
The best systems are the ones that get better when more people use them, because usage increases the resolution of the existing problems. Networked systems or networked features in a system are examples examined by Mills (2011). Twitter is an example of an open invented technical individual (ITI), where the creation of a sub-system results in a milieu that needs management and maintenance.
Twitter users operate in a global sphere, but they remain hidden in their limited focus on specific topics and may not be aware of what is going on elsewhere in the platform. Their awareness emerges when they take part in regional or global events through Twitter, such as the FIFA World Cup or the Arab Spring. The streams of tweets are individualized and they resemble separate life forms.
Users edit stream membership to modulate the amount of information flow and character that goes to their personal Twitter milieu (Mills, 2011). Access to Twitter steams using different software further individualizes user experiences. Here Twitter API plays an important role in developing third-party software.
The three components useful for development are REST API, Search API, and Streaming API, which can be used individually or collectively. The first API is just for sending and receiving tweets or un-following or following users. When used, it lets software mimic the functions, just like Twitter.
The search API is for allowing access to trending data, while the last one is for allowing software to access and publish content from the dynamic milieu of Tweets available at the time on the social network (Mills, 2011). Although the applications of Twitter are numerous, the basic construction of tweets and identification of tweets makes it possible for maintenance and management of the milieu of Twitter.
In all cases, there will be id, text, source, screen name, location, and follows count for every tweet or profile (Mills, 2011). Field and O’Brien (2010) explain how search tools incorporating geographical information are useful for location-based software. If the subject examined is the interaction of people around specific topics, then the Twitter Search API comes in handy.
Its incorporation into the ArcGIS flex viewer software, as an example, allows users to search for tweets in a limited geographic area. Unfortunately, the Geo-location option on Twitter is not mandatory for users. Thus, for researchers, the collection of location data or the limiting of data range based on location can be hampered by the provision (Field & O’ Brien, 2010).
The combination of the search API and cartography, as well as web design allows practitioners to come up with websites that present visual rich maps that show common threads and can connote activity emanating from a location based event and relate to discussions about the event taking place outside its geographic location (Field & O’ Brien, 2010).
In fact, the usage of Twitter API is prevalent in politics and administration. The social network and participation by many users makes it a good place for authorities to find information that can help in policy formulation, law enforcement, and voting. In the UK, the London riots triggered police officials to use Twitter to get moment-to-moment updates and to reassure the public (van de Velde, Meijer, & Homburg, 2014).
Examining trends or streams on Twitter must be done with caution. Not all tweets appearing in a person’s timeline are actually read by all followers. However, so far, there is no possibility of noticing what a person reads and does not read.
Thus, much of the data collected or the information is an assumption. However, there is a high chance of people reading tweets about a trending topic because the subject is the matter of discussion for more than one Twitter account on a user’s timeline (van de Velde, Meijer, & Homburg, 2014).
Retweets can show that a user read a message, thereby serving as the most credible form of acknowledging receipt of a tweet. However, it is important to separate retweets from the original messenger and those from the readers. The original messenger can retweet the original message that comes as a reply, which means it has an additional component attached by the user replying.
Thus, it is possible to tell why a tweet is retweeted many times and why others are not. Social parameters such as the authority of the positing account and the profiles mentioned in the tweet can play a part. User specific characteristics like audience size, time of tweeting, and topics discussed, as well as the elements included and the provisions of interactivity contribute to the overall discussion level that a tweet is able to garner (van de Velde, Meijer, & Homburg, 2014).
Researchers are increasingly turning to Twitter to examine various social aspects and their relationships with other non-social parameters. For example, researchers can look at the effect of weather on people’s moods by going after the emotions expressed on Twitter. The streaming API offers a free 1% of tweets (Morstatter, Pfeffer, Liu, & Carley, 2013).
Researchers only need to put in the right parameters to get the data required. Unfortunately, when the volume of a query exceeds the 1% of tweets, then the response given to the examiner is a sample. There is no way to know the criteria of sampling other than to contact Twitter. Presently, the methodology of sampling remains a secret (Morstatter et al., 2013).
The emerging problem of relying on the streaming API for collecting data is the bias that is present with the sampling. Researchers need to undertake additional steps in their methodologies to vet their data sets using the firehorse methodology (Morstatter et al., 2013).
Unfortunately, Firehorse data is costly. As an alternative, researchers can use the sample API to cross check the data sets obtained by the streaming API. The sample API presents a true report of what is happening on Twitter. When time periods are used, it is possible to tell biasness in the streaming API data and eliminate it before processing the datasets for analysis in research (Morstatter et al., 2013).
Twitter is not the first web service that provides an API that allows users to mine data. However, it is among the most popular ones. Others include Google and Facebook APIs. The APIs allow users to integrate various features into their websites (Aboukhalil, 2013).
Now that social networks like Twitter are common as mobile device applications, they have become inseparable from social researchers. They are useful for information gathering and their exploitation is subject to the capabilities presented in the software used for mining the data (Oussalah, Bhat, Challis, & Schnier, 2013).
Technology demand for collection and mining of Twitter content is increasing, as more users use the platform to create or share content. The challenges of designing the right software that can fulfill the increasing needs of the public mainly relate to the semantic aspect of information on Twitter.
In addition, all software seeking to build their functionality on top of Twitter is restricted to rely on available Twitter API only to generate information and reports. Thus, dependency on the available APIs is also a limitation (Oussalah et al., 2013).
There are several architectures for software suggested by researchers, describing the best way to build on the capabilities of the Twitter API and limit exposure to its shortcomings at the same time. For example, there are suggestions to use product-class software framework or the infrastructure component to have software that allows users to search and link to Twitter friends and followers (Oussalah et al., 2013).
However, the building of the software requires developers to have the right knowledge for handling the basic components of Twitter API appropriately. Just as end users are limited in their use of APIs due to the lack of programing knowledge, developers must also possess the right techniques and understand design.
Moreover, they need adequate comprehension of end user intentions before embarking on the development of a software solution (Oussalah et al., 2013). An example comes from research by Ekins, Clark and Williams (2012) that reviewed a chemistry mobile app for collaboration created by Open Drugs Discovery Teams (ODDT) project.
In the past, most of collaboration tools for medical discovery hinged on the desktop computer. The computers have also been mostly restricted to laboratory facilities and special research units. However, Ekins, Clark and Williams (2012) contend that there is an emergence of mobile apps used in general practice and in the drug discovery initiatives.
Cloud computing and its provision of software as a service have enabled researchers and scientists to access software that requires powerful computing resources. However, instead of using desktop computers that are powerful, they are using mobile phones that provide adequate access to the cloud interface and useful input options for manipulating data and making necessary commands to process the data on the remote computer.
With so many online tools allowing scientists to store data related to research, there are efforts to coordinate the stored information and to improve collaboration online. Joint projects to free up stored data and make it available for influencing future research and present practice help.
This is where social media platforms become relevant as tools for sharing data. Scientists can describe their scientific methods and their results in real-time to a larger audience and interact with other scientists, just as they would in a laboratory environment (Ekins, Clark, & Williams, 2012). In their research Ekins, Clark and Williams (2012) looked at how the ODDT used Twitter to harvest information.
As a primary source of content, they relied on its APIs to poll and assimilate data. They were able to look at ODDT topics and content and then augmented their program to recognize emerging data sources and information streams that they deemed relevant to the research.
The project is ongoing, and the researchers explain that presently it can recognize molecular structures, reactions, and data sets presented in the social media network (Nagar et al., 2014). The research ushers a new way for scientists to observe and work around the growing menace of drug-sensitive and drug-resistant pathogens.
The problem for the scientific community has been the lack of data sharing publicly on a global scale. However, real international traction is achievable with the promise of the Twitter API and projects like ODDT (Marcus et al., 2011).
Conclusion
Actual usage of the Twitter API varies depending on desired results. A common feature is that applications will only function within the provisions provided by the API. It also depends on the ingenuity of system developers and application developers to come up with useful uses of their software.
Much of the research and scientific community, as well as ordinary users of curated information and reports generated by Twitter APIs need additional interpretation aids that third-party software mostly provides. It is important to note that the use of one mobile app like ODDT is yet to make an impact on the overall performance neglect of rare disease research.
Thus, it will take more research and, possibly, more applications that collect publicly shared content to come up with lasting changes that influence the medical research and drug discovery community.
References
Aboukhalil, R. (2013). Using the Twitter API to mine the Twitterverse. XRDS: Crossroads, The ACM Magazine for Students- Creativity + Computer Science, 19(4), 52-55.
Benhardus, J., & Kalita, J. (2013). Streaming trend detection in Twitter. International Journal of Web Based Communities, 9(1), 122-139.
Burton, S. H., Tanner, K. W., Giraud-Carrier, C. G., West, J. H., & Barnes, M. D. (2012). “Right time, right place” health communication on Twitter: Value and accuracy of location information. Journal of Medical Internet Research, 14(6), e156.
Ekins, S., Clark, A. M., & Williams, A. J. (2012). Open drug discovery teams: a chemistry mobile app for collaboration. Molecular Informatics, 31(8), 585-597.
Field, K., & O’ Brien, J. (2010). Cartoblography: Experiments in using and organising the spatial context of micro‐blogging. Transactions in GIS, 14(Suppl 1), 5-23.
Fitz-Gerald, S. (2010). Book review of: Twitter API: up and running by Kevin Makice. International Journal of Information Management, 30(3), 283-284.
Marcus, A., Bernstein, M. S., Badar, O., Karger, D. R., Madden, S., & Miller, R. C. (2011). Processing and visualizing data in tweets. SIGMOD Record, 40(4), 21-27.
Mills, S. (2011). FCJ-127 Concrete Software: Simondon’s mechanology and the techno-social. The Fibreculture Journal, 127(18). Web.
Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehorse. Association for the Advancement of Artificial Intelligence. Web.
Nagar, R., Yuan, Q., Freifeld, C. C., Santillana, M., Nojima, A., Chunara, R., & Brownstein, J. S. (2014). A case study of the New York City 2012-2013 influenza season with daily geocoded twitter data from temporal and spatiotemporal perspectives. Journal of medical Internet Research, 16(10), e236.
Oussalah, M., Bhat, F., Challis, K., & Schnier, T. (2013). A software architecture for Twitter collection, search and geolocation services. Knowledge-Based Systems, 37, 105-120.
van de Velde, B., Meijer, A., & Homburg, V. (2014). Police message diffusion on Twitter: analysing the reach of social media communications. Behavior & Information Technology, 1-13.