Introduction
Despite the immense internet traffic that is accessed day in day out by people throughout the world, an interesting fact does exist. This fact is that a very high percentage of these requests on the internet for some web pages are actually similar and hence redundantly create this immense traffic all for nothing. These multiple users create redundant traffic. The redundant traffic leads to infrastructures of the WAN (Wide Area Network) carrying contents that are identical.
Caching can be introduced in computer networking to perform more than just temporarily storing web pages (Kurose, 2001). This paper is therefore going to look at advantages that are associated with network caching. These advantages include a reduction in WAN bandwidth hence saving costs and improved end-user productivity. Other benefits include a secured form of monitoring and access with the ability to get logging operations and information. Caching solutions for the internet that are in existence will be discussed. This will enable the clear portrayal of benefits that are associated with making local patterns of traffic. Cisco being a major player in providing networking solutions, most of their used protocols that might be vendor-specific will be used as a point of reference and example in discussing internet caching.
Benefits of Enabling Networking Caching
One of the primary importance of caching especially of web pages is as earlier stated a reduction in bandwidth of the WAN hence cutting down on costs (Ravi, 1994). However, this ought to be primarily implemented by the Internet Service Provider (ISP) since a big advantage would lie on their side. Engines providing cache services could be strategically placed in their network. This would translate to a lowered demand for bandwidth, especially on the ISP’s backbone. To prevent requests on the web from being accessed from servers on the web that is overrun from a long distance, cache engines could be strategically placed on the WAN. These would serve requests on the web locally from a disk. Dramatic usage of bandwidth reductions, especially in networks for enterprises that owe their reduction to web caching, leads to a WAN link with low bandwidth serving the same base of the user. Thus this leaves the organization with a choice of adding more services to use the now free bandwidth existing on the same link of the WAN.
The second importance of the application of caching in networking is a productivity increase in the end-users. This is from the fact that caching enables up to 3 times increase in time for downloading. This improvement is still on the same WAN link. This dramatic improvement of the average time of responding is visible to the end-users.
With other benefits like secured monitoring and access especially through filtering of the URL (Universal Resource Locator), networking caching is truly amazing clearly, the administrators on the network are provided with a policy for enforcement of access to the site that is secure and simple. Additionally, logging of all the operations is easily provided. This is because such information for example the numbers of requests that the cache is receiving per second. Also, a percentage of URLs that are being served from the cache can be easily known.
Caches that are Network Integrated
According to many networking professionals, there is one very fundamental step that ought to be undertaken while creating a cache engine to run on the network. This step involves ensuring that the localization of traffic is supported by the existing network. This could be achieved through the system-level allowance of routing technology that is content-enabled. The specific parameter could also be set to ensure optimization of traffic (Rabinovich and Spatschak, 2001). As earlier stated, one such technology that could be used as an example of routing technology: that is content-based is one that is provided by Cisco. Famously known as WCCP (Web Cache Communication Protocol), this provided what can be termed as the right foundations of the network before the onset of cache engines that are network integrated.
The common properties that such hardware and software paired networks tend to have included networking equipment that is managed like. This does lead to a general reduction in the cost of operation. Another property is that they are inserted into the created network transparently. This does lead to a reduction in the cost of operating after deployment. There is also a high chance of availability of content that is greater. With minimized costs that are normally an implication of rack space leasing, network extension of some kind is created. The hardware of networking that is created in a high-density manner leads to better interaction physically.
Caching Solutions in Existence
There are majorly 3 common caching solutions today in the market. These caching solutions include standalone caches, caches that are browser-based, and proxy servers.
Proxy Server
This can be defined as an intermediary for client requests for server resources. Normally, a client would connect to this server so as to request some services for example a web page, file, or just simply for a connection. This request is subject to the rules of filtering traffic that are employed by the server. Filtering rules could be for example IP (Internet Protocol) based. On a request meeting a validation after an appropriately applied filter, the server connects to the appropriate server that is hosting the resource being needed by the client. However relevant to the caching process, this proxy server might serve a certain client’s request without ever contacting the server that is hosting the service needed by that client. This is through the utilization of the cache memory of a similar previous request by the same or a different client.
The caching that is applied by the proxy server functions to assist in maintaining the anonymity of client machines hence helping to maintain security. Additionally, accessing speed is improved thanks to the caching process itself i.e. pages of websites from a service known as the webserver is cached. Services on the network can be managed using various procedures. One such move could be site blocking. Auditing of internet use could also be easily done. Additionally, an easy and efficient way of scanning for malware in the content before being delivered is easily done. Circumventing of restrictions regionally is also managed among other advantages of a proxy server.
Therefore simply put, a proxy server is application software running on an OS (Operating System) and hardware that could be considered general purpose. It is placed in between client applications e.g. browser for the web and its respective server: web. It thus receives any request from clients. If also it can fulfill those requests, it will do so. Otherwise, it will end up making the request to serve as if it were it’s own.
However, one thing stands out quite clearly from the functioning of the proxy server. This is that it was never really meant for providing caching services. Thus they do fail to properly scale especially in events of heavy loads from the network. Additionally, due to its strategic location on the path of traffic that originates from all users within the network, there arise two main troubles that can occur at any moment. The first problem is that all requests would eventually take longer to materialize since this server has to individually inspect every single packet of data from all clients. Secondly, should the server for any reason fail, then there would be no connectivity for everyone that is connected to it.
More complications with the proxy server are the need for the configuration of browsers from each client. Clearly, this is what can only be termed as un-scalable and most importantly costly. This is so especially for ISP’s and enterprises that are huge in terms of the number of clients. Additionally, proxies arranged hierarchically lead to the formation of overlay networks. This would lead to contradictions of a strategic convergence of networks that are disparate into one.
Standalone Cache
These were specifically created to mitigate proxy server shortcomings. To achieve this, they were specifically created to focus on caching. They enhance the software that performs caching. They also eliminate other implementations of a proxy server that might be associated with any form of being slow in its operations.
Unfortunately, though, these standalone are not integrated into networks as much as they seem to be in the right direction. They are thus not quite desirable for implementing the deployment of a wide-scale nature. This is due to the increasing cost of ownership with every increase in network scale.
Caching That Is Browser Based
Applications for browsing on the internet permit individual cache use. This involves website pages like HTML (Hypertext Transfer Markup Language), images, etc. they are usually saved on the client’s hard disk locally. A user of a local machine can additionally configure the desired amount of space on their disk that can be used specifically for caching purposes (Luotonen, 1997).
The working of this cache is quite simple and is very important especially in cases where a site is accessed at least twice by a client machine. Normally, the contents of a website are automatically saved the first time a website is viewed. These contents are saved on the machine’s hard drive under a certain subdirectory. They are saved as files. Therefore, when next to a user on this very computer tries to access files that are pointed towards the previous website, contents of the cache are instead retrieved by the browser. Thus a request is served without having to access any network. A major difference that is usually experienced by users is the faster appearance of a page. This is more vivid especially in scenarios where there was the use of very large graphics on the page. Thus the user will notice that buttons, images o icons tend to quickly appear the second time they are being accessed than when the page was first accessed.
Unfortunately however as much as it has its glamorous looks, it has won shortcomings. This is especially so from the fact that the browsers do not run on a network. Therefore, it leads to a scenario where benefits are only enjoyed by one machine and only one. Take a scenario for example that a certain user on machine A accesses the website of CNN to check some out breaking news and as soon as the user is through, he closes the browser. Assuming that there were other 50 or so client computers connected in a LAN (Local Area Network) and the users of these computers also felt the urge to access the CNN website, they are still going to have to interact with the WAN for that page despite that when the user of client A decide o revisit this site, he would not require any access o the network. There is clearly therefore no effect whatsoever on the time of download for other users accessing a similar website for the very first time
Localized Caching Through WCCP
Various vendor-specific protocols do exist out there specifically to enhance the caching process. Other than the Cisco-based WCCP, other vendors do have their protocols for example the MS-PCHC (Microsoft Peer Content Caching and Retrieval Hosted Cache Protocol from Microsoft. It is known to offer to the hosted server for cache some metadata (Anon, 2010). Other vendors like HP, Mercury, D-link, and many others do have vendor-specific solutions.
Cisco however introduced their WCCP in 1997. It is known to localize traffic on the network. It also provides intelligence in the network while offering distribution across caches that are multiple on the network hence leading to a maximization of performance in downloading. This eventually translates to maximal insurance of availability of content. They can be referred to as being network-integrated due to the fact that they provide capabilities to manage the network. These are known to already be in existence in this company’s networking gadgets e.g. The Cisco Internetwork Operating System (IOS). Basically, they are implemented after design as hardware for networking that offers caching for that particular network (Johnson and Yih-Chun, 2000). These high-density engines offering to cache thus physically integrate better than the usual adapted standalone cache servers. They do this because they can be inserted transparently as extensions of a network into an already available network and can easily adapt to the conditions of the network. This would eventually translate to minimization of costs of operating ad availability of content.
Caching that is network-based is thus easily achieved via WCCP. This is because the engine for caching was designed as a loosely coupled system for the network in multimode. This was with the aim of providing shared caching that could be termed as being robust. This Cisco-based cache consists of WCCP and at least one engine for cache by Cisco that stores the local network data. Communication between the router and the engine offering caching is defined by WCCP. Thus this protocol enables the direction of requests for web engines instead of the servers that were initially intended. The availability of the cache engine is also determined by the router. Thus should a cache engine not be available, the router automatically redirects requests other engines that have been installed to a network
To achieve its goal, the cache engine of Cisco applies some algorithms. These algorithms are known to be optimized highly to ensure this important single purpose. The transparent operation of the cache engine ensures fewer hustles hence less cost and complications. Therefore, there arises no need for clients having to configure browsers so as to ensure that they point to some proxy server. Thus big enterprises with huge client machine connections and ISP’s can effectively use this. This is from a reduced configuration cost and ease of management.
Additional benefits include ease of deployment in a hierarchical fashion and scalability in clustering. An automatic healing process through redistribution of a failed cache’s load to other engines makes it tolerant to fault and provides safety for failure. Multi home support by WCCP router is also ensured in addition to the ability to ensure an overload bypass. This bypass ensures that a cluster of a cache engine forwards overloaded requests automatically to the webserver of origin. There is also the dynamic bypass of the client especially on websites that require authentication of client via IP address. This is achieved using the cache’s address.
Therefore, in summary, web traffic is usually mostly redundant. This means that the same content is usually accessed every now and then by users often in a similar location. The elimination of these telecommunications that are recurrent often translates to significant savings. This could especially be enjoyed by ISP’s and corporations or enterprises with a huge number of client computers. Thus this technique to ensure that information that is frequently accessed is kept close leads to some benefits. These include lowered costs and an improvement in usability. There is thus eventual optimization of the WAN bandwidth with enabled monitoring of contents.
Conclusion
Clearly, vendor-specific caching seems to be more applicable than the other forms of caching e.g. using the proxy server, web browser, or caches that are standalone. This is from the fact that they create caching solutions whose main purpose is to ensure cutting down of cost and a reduction of time taken to access resources by a client. Therefore, further research ought to be conducted on how to seamlessly apply caching techniques of one vendor into gadgets that are made by another vendor. Alternatively, a protocol whose major task is to ensure cost-effective caching across any platform of devices i.e. whether Cisco-based or Microsoft-based or any other vendor-specific specifications. Such a scenario would not only ensure easier implementation for those who have opted to use the protocol but rather will ensure that an organization does not have to incur additional costs of purchasing some new gadgets from a specific vendor. This expensive purchase would only be so as to enjoy the low cost and effective caching.
References
Anon. (2010). Peer Content Caching and Retrieval: Hosted Cache Protocol Specification. New York: Microsoft.
Johnson, D & Yih-Chun, H. (2000). Caching Strategies in On-Demand Routing Protocols for Wireless Ad Hoc Networks. Boston: Mobicom.
Kurose, J. (2001). Computer Networking: A-Top down Approach Featuring the Internet. India: Pearson Education.
Luotonen, A. (1997). Web Proxy Servers. New York: Prentice Hall.
Rabinovich, M. & Spatschak, O. (2001). Web Caching and Replication. New Jersey: Addison Wesley.
Ravi, J. (1994). A Caching Strategy to Reduce Network Impacts of PCS. IEE Journal on Selected Areas in Communication, 12(3): 89-95.