Recommender Systems on the Web Research Paper

Exclusively available on Available only on

Introduction

This part will include a detailed description of the techniques and algorithms of recommender systems on the Web. We will be also including diagram for each individual techniques as well as tables to compare between the techniques based on knowledge impression.

Technical aspects of web recommender systems

Recommendation systems are basic aspects of the current collaborative Web era that complement search engine algorithms in information discovery. Nowadays, most of the web-based businesses uses search engine algorithms to discover information. In addition, these websites uses information filtering methods to propose products and services to a customer like a “Virtual” salesperson (Liu, Dolan, & Pedersen nd, p. 1; Kabore 2012, p.14).

The recommendation systems is considered as an information filters that understand user’s preference and characteristics and based on that it predict future behavior of a user. In this section, we will describe recommender systems techniques and algorithms as well as a framework of existing researches in recommendation systems that focus on how to apply recommender systems techniques in web businesses (Vafopoulos & Oikonomou 2010, p. 5)

The Classifications of Recommender Systems

The idea of recommendations systems based on different researches is that it is assumed the existence of n+m dimensional space were N refers to users with n distinct feature which may request for M Web Goods with m distinct feature. In the simplest case, a single feature as unique identification describes Users and Web Goods, resulting into a two-dimensional space. The User-Web Good rating matrix represents the ratings given by the Users to Web Goods. The below table includes a list of symbolic representation of the variables in RS framework (Vafopoulos & Oikonomou 2010, p. 7).

list of symbolic representation of the variables in RS framework — Source: (Vafopoulos & Oikonomou 2010, p. 7)

Multiple recommender systems techniques are deeply explained in many researches. The logic of these techniques is based on the type of the recommender system whether the recommendation is designed for travel system, education, or wholesale business. They are classified in different ways and some researchers have characterized them based on decision modeling or utility functions. In this research, we include the basic approach, which expands the domain knowledge into the RS process. We also describe different approaches like content-based approach, collaborative-based approach and hybrid approach (Amini, Ibrahim & Othman 2011, p. 3)

Content-based Recommender

Content-based approach is depending on the user’s profile from the contents of pages the users rated or visited. In this type of approach, items are being described using keywords and a user profile is built to indicate the type of item this user likes. In other words, this approach tries to recommend items that are very similar to the choices made by the user in the past (Lu 2004, p. 375). So various items are compared with items previously rated and the best matching items are recommended.

Content-based approach applies different models to find similarity between objects in order to generate reasonable recommendations for the users and to model the relationship between objects within a corpus. These models could be Vector Space Model such as Term Frequency Inverse Document Frequency (TF/IDF), Probabilistic models such as Naïve Bayes Classifier, Decision Trees or Neural Networks. Based on (Table 1), content-based approach can be formed as below

The figure below represents content-based recommender system’s high level of architecture. This is done through a three level process handled through distinct component.

Content Analyzer

This step involves pre-processing of information without structure like, text, for extraction of relevant structured information. Content of items for example, news, web pages, documents, and product descriptions, are represented in this component in a suitable structure for the successive processing steps. Analysis of data items in this step through techniques of feature extraction occurs to align item representation to the target one from the original information (Lops, de Gemmis & Semeraro 2011, p. 75). PROFILE LEARNER AND FILTERING COMPONENT are fed from this representation.

Profile Learner

User-tailored preferences are collected in this module as a representative data to generalize this data for the user profile construction (Middleton, De Roure, & Shadbolt nd, p.2). Machine learning techniques are employed in this step for the generalization strategy. This involves user interests model that infer past liked or disliked items (Lops, de Gemmis & Semeraro 2011, p. 75).

Filtering Component

Suggestion of relevant items is exploited in this module through by creating a match of profile representation against recommended items. Continuous or binary relevant judgment results from this module after similarity metrics computation. This creates potentially interesting items ranked in a list (Lops, de Gemmis & Semeraro 2011, p. 76).

Item Representation in Content-Based Approach

A database table is normally used to store the user-recommended items. The item representation database contains rows with the records of various restaurants, and the column with names of the properties of restaurants. The properties outlined in the column are also known as “characteristics,” “variables,” “fields,” or “attributes” in various publications. Each attribute’s value is contained in the records row. In order to distinguish items with similar names, a unique ID is used. The unique identifier also helps in retrieving various attributes of the record (Pazzani & Billsus 2007, p. 326). The table below shows an example of item representation.

ID	NAME	CUISINE	SERVICE	COST
10001	Mike’s Pizza	Italian	Counter	Low
10002	Chris’s Cafe	French	Table	Medium
10003	Jacques Bistro	French	Table	High

Source: (Pazzani & Billsus 2007, p. 326)

User Profile

A number of recommendation systems use the user interests’ profile. Varied information types are contained in this profile. These include user’s preferences model, and the history of the interaction of the user with the recommendation system. The example of item representation in the table above can be used to create a user profile of a restaurant in a website. The table depicts a structured data involving few attributes describing each item as well as known set of values. In order to learn the user profile in this case, a number of machine learning algorithms may be employed. Alternatively, creation of menu interface may be used to create user profile (Pazzani & Billsus 2007, p. 328).

Recommendation process success is highly dependent on the user profiles. This is because they represent and model the actual needs of a user. The direct user interaction with the recommender systems develops the user profiles (Knijnenburg, Willemsem, Gantner, Soncu, & Newell nd, p. 3). However, some research focused deriving decision rules automatically using machine-learning techniques in categorizing users depending on their demographic characteristics into several classes. This makes personalization of the classifications possible. For instance, a number of e-commerce applications to give item recommendation use manually generating rule-based techniques. This results into provision of products or services that are loved by the customers. Website administrators to create manual decision rules (Amini, Ibrahim & Othman 2011, p. 3) in this case use user personal interests.

User customization in amazon com. — Source: (Pazzani & Billsus 2007, p. 330)

Ski-Europe and Travelocity destionation recommendation tools. — Source: (Ricci 2002, p. 55)

The screenshots above shows Amazon.com book, and Ski-Europe and Travel destination recommendation tool, a perfect collaborative recommendation example. The interface for user customization in Amazon allows users to select the item categories based bon their favorites. The user history offers a platform for recommendation system that allows user to view products based on his or her purchase history (Pazzani & Billsus 2007, p. 329). This system is known as rule-based recommendation system.

Learning a User Model

Decision Trees and Rule Induction

Decision Tree learners are built by partitioning training data recursively, for example, ID3. The training data in this case is the text documents that are partitioned into subgroups and further into a single class (Pazzani & Billsus 2007, p. 332). A test on some feature forms a partition for text classification. A criterion commonly used is expected information gain that involves selection of the most informative features for use in the partition tests.

Various studies have pointed at the use of decision trees with structured data such as the one shown in the item representation table in the previous page. In the case where restaurant feedback is given, a decision tree (Pazzani & Billsus 2007, p. 332) can easily do representation and learning of user profile. Decision tree however, is effective for few tests. In such a case of few structured attributes, decision trees in content-based models indicate high performance, understandability and simplicity. This explains the use of decision trees on web pages for personalized advertisements (Pazzani & Billsus 2007, p. 332).

A rule induction algorithm known as RIPPER, relates closely to the decision trees because of the approach of using recursive data partitioning. RIPPER is more accurate than decision trees because it uses post-running algorithm that are sophisticated. It also offers support to multi-valued attributes resulting into text classification tasks naturally represented in a single feature. RIPPER has the capability of classifying e-mail messages into categories that are user defined (Pazzani & Billsus 2007, p. 333).

Nearest Neighbor Methods

The Nearest Neighbor algorithm offers all round storage for its training data in memory, that is, textual descriptions of explicitly or implicitly labeled items. Classification of unlabeled, new item is carried by the Nearest Neighbor algorithms involve its comparison with all items stored through a similarity function and determination of the k nearest neighbors or the “nearest neighbor” (Pazzani & Billsus 2007, p. 333).

The type of data determines the nearest neighbor algorithm using the similarity function. Euclidean distance metric is normally used for structured data. Use of the cosine similarity measure is often applied with the vector space model. Usage of the Euclidean distance function, same features with large values in two examples gets same treatment as same feature with a small value in both examples. Contrary, the cosine similarity function projects small values if corresponding features found in two examples got small values, thus suitable for similarity in two documents about a similar topic ONLY (Pazzani & Billsus 2007, p. 333).

Gixo personalized news. — Source: (Pazzani & Billsus 2007, p. 333)

Linear Classifiers

Linear classifiers refer to the algorithms learning linear decision boundaries. This involves a multidimensional space separated by hyper planes. An n-dimensional weight vector w, with a dot n-dimensional instance, produces the outcome of the learning process, resulting into a numeric score prediction. Linear regression approach results from retained numeric prediction (Pazzani & Billsus 2007, p. 335). The derivation of the weight vector w differs based on the training algorithm methods used in linear classifiers. This is evident in the Widrow-Hoff rule or gradient descent rule, or delta rule shown below, whereby the weight vector w is derived through incremental vector movements in the negative gradient direction of the squared error example.

Formula. — Source: (Pazzani & Billsus 2007, p. 335)

Probabilistic Methods and Naïve Bayes

The Probabilistic Methods and Naïve Bayes classifier are classification approaches that in recent works proved to be the best algorithms for text classification. The multinomial and the Bernoulli model are the frequently used two models of naïve Bayes. A parameterized mixture model generates the text documents by an underlying generative model shares the principles of both models as shown in the equation below:

P (d_i /θ)∑^C_j=1 P( c _j /θ )P (d_i /c _j ;θ)

Source: (Pazzani & Billsus 2007, p. 335)

Here c represents each class corresponding to a component mixture parameterized by θ, representing disjoint subset, as well as sum of total probability on components of entire mixture determining the document likelihood.

Collaborative-Based Recommender

The collaborative-based recommender is an alternative approach that seeks to improve on the shortcomings of the content-based approach. It utilizes the profiles of the users in the same to exploit and recommend new items not earlier seen or rated by the user. It assumes that similar users in the same community harbor similar interests (Burke 1999, p. 69). The user similarity determines the recommendations made based on the same community’s interesting list of items. Community users’ preferences and common characteristics guide the recommendation process of items based on item ratings, and user profiles (Amini, Ibrahim & Othman 2011, p. 4). The target user is determined based on comparisons made with other nearest user profiles.

k-Nearest –Neighbor (k-NN) approach

In order to fulfill the recommendation tasks of this approach, the k-Nearest –Neighbor (k-NN) approach is one of the significant techniques used as a standard memory-based classification method.

K-Nearest –Neighbor (k-NN) approach involves benchmarking other community user’s profiles with active user profiles. This is done through computation of leading k users harboring same preferences with the current user. However, this approach lacks scalability and disadvantaged in collaborative filtering (Amini, Ibrahim & Othman 2011, p. 5). The available Web data in this, approach is also sparse. In order to mitigate these limitations, techniques like dimensionality reduction approach and offline categorization approach are employed. Integration of information from other sources are also used to enhance collaborative filtering.

Hybrid Approach

Hybrid approach involves combination of more than two approaches to enhance recommendation performance. This approach tackles limitations associated with other approaches like data sparsity and cold-start problem (Amini, Ibrahim & Othman 2011, p. 5). This approach combines for instance, collaborative and knowledge-based system to improve recommender system.

It is however significant to note a few limitations of this approach like limited contextual information thus difficulty bin predicting user tastes in complex objects like education. The approach also lacks multi criteria rating. Other limitations include scalability, and nearest neighbor-based computing (Amini, Ibrahim & Othman 2011, p. 6).

The hybrid approach widely buses collaborative-filtering and content-based approaches. This is coupled with other operation-based classifications like switching, feature-based, weighted, mixed, cascade, meta-level, and feature combination.

Weighted Hybrid Recommender

Computation of recommended item’s score is obtained from outcomes of all recommendation techniques available in the system. For instance, combination of linear recommendation scores would form the simplest combined hybrid. Such a hybrid system is employed by the P-Tango system.

This system has the advantage of ensuring that all capabilities of a system in a straightforward way bear on the recommendation process. Post-hoc credit assignment performance is also made easy during the hybrid adjustment process (Burke nd, p. 6).

Switching Hybrid

Item-level sensitivity in the hybridization strategy is enhanced in this approach. A criterion is used by the system to switch between recommendation systems.

Mixed

This hybrid method involves simultaneous making of many recommendations. This approach is used mainly by the PTV system in television viewing program (Burke nd, p. 7).

Feature Combination

This approach involves association of additional data feature with every example and utilizing content-based techniques over the data set augmented (Burke nd, p. 8). This approach does not rely exclusively on the collaborative data thus, reducing system sensitivity on a rated item by users. It also relays information that would otherwise be opaque because of their inherent similarity to a collaborative system (Burke nd, p. 8).

Cascade

This approach unlike other hybridization methods is a staged process using one recommendation technique in its first stage to produce ranking of coarse candidates, and the next technique to refine various candidate set recommendation for example in a restaurant recommender EntreeC (Burke nd, p. 8).

Feature Augmentation

This approach involves usage of one technique in producing a classification or rating of an item, incorporated in the recommendation technique processing for example, the Libra system used in Amazo.com through the Bayes text classifier (Burke nd, p. 8). The GroupLens research also uses this approach in collaboration with Usenet news. Its ability to enhance core system performance without modification makes it more attractive.

Meta-level

This approach involves combination of two recommendation techniques with one used as the input of another from the generated model (Burke nd, p. 9). This model is widely used because of its user’s interest compressed representation and its collaborative mechanism on dense information.

Analysis of Recommender systems

We will describe this part in details and include table about “The comparison of recommender approaches based on the knowledge impression” shown below.

The above tables draw comparison and summaries of basic recommendation techniques with regard to the knowledge suppression. Content-based approaches and collaborative-based approaches in this context involve exploration of various methods based on knowledge integration into user profile and possibility of improved recommendation process. Reasoning utilities and ontological knowledge use context acquired knowledge by user or domain modeling behavior (Amini, Ibrahim & Othman 2011, p. 6).

Overview of the recommender approach

Content-based approach draws its strengths from personalization of content-based paradigm to improve the overspecialized problem. This approach overcomes the problem of information overload of TV viewers, as well as digital receivers’ overwhelming interaction. Filtration of thousands of TV programs to correspond to the viewers preferences is done through reasoning-based approach. This approach preserves the privacy of other users while offering various programs. This is a resolve to syntactic restrictions characterized in the content-based approach through techniques derived from Semantic Association (SA), and Spreading Activation techniques of the Semantic Web Technology (Amini, Ibrahim & Othman 2011, p. 8).

Three kind of augmenting knowledge into recommender systems. — Source: (Amini, Ibrahim & Othman 2011, p. 8)

The Semantic Web Technology focuses on the link between formalized items and user’s preferences, thus detaching from the conventional syntactic similarities (Bedi , Kaur, & Marwaha 2007, p. 2678). Application of this knowledge enhances the recommendation system thus, offering more flexible and accurate items that meet users’ interests.

The comparison of recommender approaches based on the knowledge impression. — Source: (Amini, Ibrahim & Othman 2011, p. 10).

The screenshot below shows usefulness of content-based technique and Content-based technique as they apply in Technology Enhanced Learning (TEL).

Recommendation techniques and their usefulness for TEL by Drachsler. — Source: Manouselis, Drachsler, Vuorikari, Hummel, & Koper nd, p. 13)

The table below explores the advantages and disadvantages of various recommendation techniques.

Tradecoffs between recommendation techniques. — Source: (Burke nd, p. 6)

References

Adomavicius G & Tuzhilin A nd, Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions, pp. 1-43.

Amini B, Ibrahim R & Othman M S 2011, “Discovering the Impact of Knowledge in Recommender Systems: A Comparative Study,” International Journal of Computer Science & Engineering Survey (IJCSES) Vol.2, No.3, pp. 1-14

Bedi P, Kaur H, & Marwaha S 2007, Trust based Recommender System for the Semantic Web, Department of Computer Science, University of Delhi, Delhi, pp. 2677-2682

Burke R 1999, “Integrating Knowledge-based and Collaborative-filtering Recommender Systems,” AAAI Technical Report WS-99-01, pp. 69-72

Burke R nd, Hybrid Recommender Systems: Survey and Experiments, California State University, Fullerton, pp. 1-29

Kabore CS 2012, “Master Thesis: Design and Implementation of a recommender System as a Module for Liferay Portal,” Master in Information Technologies, Barcelona School of Computing (FIB), University Polytechnic of Catalunya (UPC), pp. 1-127

Knijnenburg BP, Willemsem MC, Gantner Z, Soncu H, & Newell C nd, “Explaining the User Experience of Recommender Systems,” Journal of User Modeling and User-Adapted Interaction (UMUAI), vol. 22, pp. 1-78

Liu J, Dolan P, & Pedersen ER nd, Personalized News Recommendation Based on Click Behavior, Amphitheatre Parkway, Mountain View, CA, pp. 1-10

Lops P, de Gemmis M & Semeraro G 2011, Chapter 3: Content-based Recommender Systems: State of the Art and Trends, In F. Ricci et al. (eds.), Recommender Systems Handbook, Springer Science Business Media, LLC, pp. 73-105.

Lu J 2004, “A Personalized e-Learning Material Recommender System,” Proceedings of the 2^nd International Conference on Information Technology for Application (ICITA), pp. 374-379

Manouselis N, Drachsler H, Vuorikari R, Hummel H, & Koper R nd, Recommender Systems in Technology Enhanced Learning, pp. 1-31

Middleton SE, De Roure D, & Shadbolt nd, Capturing Knowledge of User Preferences: Ontologies in Recommender Systems, Department of Electronics and Computer Science, University of Southampton, Southampton, pp. 1-8

Pazzani MJ & Billsus D 2007, Content-Based Recommendation Systems, In P Brusilovsky, A Kobsa, & W Nejdl (Eds.), The Adaptive Web, LNCS, Springer-Verlag Berlin Heidelberg, vol. 4321, pp. 325– 341.

Ricci F 2002, “Travel Recommender Systems,” eCommerce and Tourism Research Laboratory, pp. 55-57

Vafopoulos M & Oikonomou M 2010, Recommendation Systems: A Joint Analysis of Technical Aspects with Marketing Implications, webscience.org, pp. 1-30

More related papers