Hotel Recommender System based on Knowledge Graph and Collaborative Approach

With the rapid increase of Information technology, online services and social media, recommendation system becomes an important issue and a need for both the customer and business sectors. The main aim of traditional and online recommendation systems is to recommend the desired and the necessary services that are appropriate recommendations to users. Traditional recommendation systems often suffer from inefficient data analysis techniques, rating the different services without regard to the previous preferences of the users and do not meet the personal demands of the users. Therefore, in this paper we used a hybrid approach based on Knowledge graph and Machine Learning similarity function as a recommendation system. We used real datasets to conduct the experiment. We built the knowledge graph for the visitors, hotels and their ranks, and we used the knowledge graph and similarity scores to recommend a hotel or a set of hotels for the visitors based on former preferences and ratings of other visitors. The results show significant accuracy and good quality of service recommender systems with 93.5% for f-measure.


I. INTRODUCTION
ECOMMENDER systems [1] are support systems that help users in finding and selecting products (items) from given categories (i.e., accommodation, hotels, smart phones, movies, books, songs, insurance services, and so on) based on different and inhomogeneous information [2]. Schafer et al. [3], define recommender system as a tool designed to allow users to look through related knowledge that meets their interests and preferences.
Recommender systems are integrated in e-commerce applications and web applications to help the users make their choice. Hotels recommendation system is a hotel recommender system that helps users in selecting a hotel to book based on the preferences of users.
The main benefit of the recommender systems is providing better services for the customers. Using the information in the online hotel booking systems like scores or ranking, preferences and product evaluation by users, the companies are planning their goal based on the accuracy of the analysis of these data. The quality of the results may lead to more efficient planning, confident and better decision making and reduce cost and risk.
This paper discusses recommending a hotel to customers or visitors. In our approach, we use a database of hotel preferences and ratings based on customer stay. We collect all the hotel ratings the customer gave to a subset of hotels he visited.
Our recommender system calculates the similarities between two users based on hotel preferences and users ranking for the hotels. Then the system generates the best hotels recommendation for the user. The similarity between users is calculated based on Cosine similarity function to recommend the best hotel for the customer. Therefore, the need for an efficient and accurate recommendation system is very important and plays a crucial role in providing relevant information related to user or customer preferences, that is worthy of reliance or trust recommendations for the user.

A. KNOWLEDGE GRAPH
The Knowledge Graph is a model of a knowledge base created by Google in May 2012 and connects information in a visual way. It enhances the organic Google search engine's results. Google presents the information gathered from a variety of sources in an info box next to the search results.
One of the most common tools presenting knowledge graph or graph database is Neo4j. A graph database is used to represent relationships. An example of that is the Hotel Graph Database as well as the Recommendation relationships. You can see some of that in the graphic in Fig. 1. It is a sample graph Database from our hotel system using Neo4j. The circles represent nodes Visitors and Hotels. The lines, called edges, indicate relationships, the relation here is rated. Properties of that node like name of the hotel or the visitor are represented inside the circles.

B. SIMILARITY MEASUREMENTS
There are different methods to calculate similarity between users and hotels using similarity measures (neighborhood techniques). Researchers use one of the following measures: Euclidian distance equation (1), Cosine distance equation (2) and Pearson correlation coefficient equation (3).
where ⃗. ⃑⃗ = ∑ = 1 1 + 2 2 + ⋯ + 1 is the product of the two vectors Cosine similarity measure is a metric used to determine how similar the items or visitors are. In our case, we are looking for hotels.
The rest of the paper is organized as follows. In Section II, we discuss related work on the different approach to recommendation systems and hotel recommendation. Section III, presents the proposed system based on knowledge graph technology was provided. Section IV discussed the implementation of the proposed system, experiment and results are discussed. Finally, Section V highlighted the conclusion.

II. RELATED WORK
There are several approaches to performing recommendation. Researchers used Collaborative, Contentbased, Hybrid and Knowledge based approaches.
Collaborative approach [4,5] is the most common used approach. This approach recommends Hotels or products by finding similar users to these hotels. It recommends hotels to you or the user based on their ranking and opinion [6][7][8].
Ringo is an online system that uses collaborative filtering (CF) approach and rating of the users on music albums to construct a user profile [9].
Content-based approach [10,11], normally the system uses a user profile to give recommendation. This approach ignores all related information from other users [9,12]. As an example, Letizia [13] is a recommendation system based on the behavior of the user on the user interface to advise him in browsing the Internet. The system tracks the user's browsing methods to predict web pages that are of interest to the user. Other researchers use the concepts of Neural Network (NN) in predicting and rating different Usenet news pages as either hot or cold [14]. Pazzani et al. [10] use the technology of intelligent agent and naive Bayesian classifier to recommend which web pages will interest a user.
Hybrid approach [15,16] is widely used in recommendation system by combining two or more approaches. This approach is proposed to increase the quality, efficiency, accuracy and the performance of recommender systems [17,18]. Cunningham et al. proposed a hybrid recommender system that combines two approaches the collaborative and content-based approaches [19]. Konstas et al. used a hybrid approach to design a music recommendation system, which combined tagging information, play counts and social relations [20]. Another work proposed a hybrid model that combines content-based with collaborative filtering (CF) for hotel recommendation. This model considers both hotel popularity in input destination and users preference. It produces the prediction with 53.6% accuracy [21]. Pawar et al. in [22] described their system KASR and the results showing that KASR significantly improves the accuracy and scalability of service recommender systems. They used techniques such as Map Reduce for parallel processing paradigm and Collaborative approach for generating appropriate recommendations. Hassannia et. al. [23] purposed a hybrid recommendation system based on agent and web technologies.
Al-Ghossein, et al. [24] based on their approach learnt preferences for regions and used these preferences for hotel recommendation by mapping space of regions to user's preferences and computed the similarity which is called Recommendation System Based on Multi-Source Information [25].
The study of [26] proposed a recommender system that lists the name(s) of the travelers based on their preferences, by analyzing the other travelers' reviews together with the rating value to improve the prediction accuracy.
Finally, the knowledge-based approach is used to enhance the effectiveness and performance of the recommendation system using semantic classifications [27,28]. This approach recommends items based on a predefined set of constraints (rules) and/or similarity metrics [29].

A. BASIC MODELS OF RECOMMENDER SYSTEMS
There are different approaches to recommender systems and most of them are based on a single numerical value like rating to create recommendation services. In fact, evaluating a service through only a single numerical value is not enough. However, evaluating service through multiple criteria and taking into account user references can help to make recommendations that are more effective for the users.
In our approach, we present a combined approach, which is based on preference of user and information related to hotels rating to implement a scalable recommender system by using Knowledge Graph and similarity functions techniques.
This section will show the steps used to build the graph database, add the customer data, hotels data and hotel ratings to our graph database and perform personalized recommendations To evaluate the performances of the recommendation system, we select a list of users who visited mostly the same hotels dataset and combine their own ratings of these hotels. We then computed the similarity between the user ratings. Based on these measurements of similarities; the system suggests hotels to the customer to visit it.
In our approach, the system suggests hotel(s) similar to the ones visitor likes or suggests hotel(s) liked by other visitors who are similar to the customer. The system considers the rated of all the hotels that a user has given and then looks for hotels that are similar to what the user likes. Fig. 2 is a general architecture of our system. As seen in Fig. 2, Hotel-to-Hotel similarity is computed by looking into co-rating Hotels only. For example, for both Hotel H2 and Hj, the similarity Sij computed by looking into them. We have to remember that each of these co-rated pairs are obtained from different users. In Fig. 2, they come from visitor V1 and Vi.
Referring to Fig. 2 as an example, each column can be viewed as a vector. Therefore, the item hotel H1 has a vector and the item Hotel Hi has a vector of rated hotel. The similarity between these two vectors is measured by computing the cosine of the angle between them.

B. KNOWLEDGE GRAPH AND ITEM BASED COLLABORATIVE APPROACH
In our approach, we used Item-based Collaborative Filtering Algorithm. The approach computes how similar are the target hotels i and then selects k most similar items based on the set of items the target visitors has rated. Once the most similar items are found, the prediction is then computed by taking a weighted average of the target visitor's ratings on these similar items.
We implement the following strategy, which is based on visitor ratings. This means we can collect all the previous ratings of the visitors: • To highlight subsets of visitors who are similar to one another, we collect all names of all hotels visited by the customers, what did they visit? • Hotel Ratings, what ratings did they attribute?
To perform a recommendation for one given Visitor Vi, the ratings and recommendation of all its similar Visitors will be studied. To do that we followed the different steps (Fig. 3): • Computing similarityfind the similarity between two Visitors Vi and Vj • Visitors Selectionreturn Visitor who are the most similar to the Visitor you want to recommend hotels to, Vi • Ranking hotelsdefine a way to rank hotels among the ones visited by the similar visitors • Hotel Recommendationrecommend to Vi the hotels with best ranking, that he/she has not already visited.

C. THE HOTELS DATASET
First, one has to build the graph database of hotels and visitors data that describe the dataset using Neo4j [30,31]. We used the Hotels dataset we gathered on the Hotel Booking online website. This dataset consists of 125 ratings (4-10) from 15 users (visitors) in 11 hotels. The dataset has been cleaned up such that each user has rated at least more than one time. Data is gathered in the form of a set of excel file, the following paragraph describes dataset in Excel and in Knowledge Graph. As regards the Hotels dataset, it can be represented in the form of a very simple graph, with ( Fig. 1

D. INITIAL DATA MODEL
As we mentioned earlier our dataset consists of two types of nodes and one relationship. To add any new node or relationship to our graph database we use Create command. For example, to create a relationship called RATED we use the following command: Relationthe relation between nodes in our datasets is RATED. To create a relation between persons and hotels, where person1 rated hotel3 with a score of 6.7, and to add this relation to graph databases we used the create command too: CREATE (Person1)-[:RATED {rating:6.7}]->(Hotel1), We added all the nodes and the relations to our databases. After creating our graph database we explored our graph database by using the following commands: Match (n) Return n Running the pervious command will show us the graphic representation of our databases. As shown in Fig. 4 we have 15 persons, 11 hotels and 125 rated relations.  Fig. 4, it is clear that the blue circles represent the persons and the gray circles represent the hotels, the line shows the relation Rated between the persons and the hotels.

IV. RESULTS AND DISCUSSIONS
In our approach we used the Item-Based Collaborative Filtering (IB-CF) to find the similarity. Therefore, the system recommends hotels to the visitors based on the similarity that the other visitor rated. To compute the similarity we can use a Cosine or Pearson Correlation Similarity. Referring to Fig. 2, in IB-CF we based on vertical data not the horizontal data as in user-based Collaborative Filtering.
To compute the similarity, we used a [:SIMILARITY] [32] relationship between each person in the knowledge graph, the cosine similarity is considered as a property of the relationship. We used equation (2) to create the following Cypher query code to do that. Fig. 5 shows the knowledge graph after adding the similarity properties to the relation. The yellow lines represent the similarities. If we are interested in computing the similarity between two visitors, we need to find the list of all hotels and the rating for both of them and then compute the similarity. For example, let us find the hotels that both 'Maymoona' and Minu rated. To do that we can write the following code to Cypher query our knowledge graph. The output of the Cypher query of the knowledge graph can be represented by graphical and tabular format. Fig. 5 shows the result in tabular format. The list shows the visitors who rated hotels most similarly to 'Emre'.
If we are interested in finding the list of the top three hotels, where they are rated based on similarity between visitors. We can create the following Cypher query, Fig. 8:   Figure 7. The list of the first three visitors who rated hotels most similarly to 'Emre' Figure 8. Top three hotels rated based on similarity Another important function in our system is recommendation. A recommendation presented by recommending new hotels to the visitors that she/he did not book. To accomplish that we used Content-Based recommendation approach to implement the recommendation system that uses properties and users ratings of hotels. The fundamental assumptions from which recommendations began or developed or calculated are based on hotel recommendations to visitor Vi that are similar to previous hotels rated highly by other visitors Vj.
The following algorithm summaries the main steps: 1. Build hotel-profile in vector form where vector elements are features that can be binary (rating). In our knowledge graph, we have the nodes for visitors and hotels with preferences of booked and rated hotel. 2. Get all of the visitors who rated the hotels that Visitor V did not rate, their ratings for those hotels. 3. Calculate the similarity between all of the visitors who rated the hotels that Visitor V did not rate: Calculate cosine similarity equation (2). 4. Infer the recommendation by adding the vectors of the hotels rated by the visitors and compute the average vector (visitor-profile). By averaging the hotel ratings from that person's k-nearest neighbors out of the neighbors who rated the relevant hotel. Average the ratings in the ratings collection (equation (4)). The system returns the hotel name and the average rating as the recommendation: where the subset of visitors that have similar rating, ℎ −the subset of visitors who have rated hotel h.
The following cypher code accomplishes the recommendation:   Fig. 9 shows the list of recommended hotel with score in descending order of recommendation based on the similarity. The system returns the hotel name and the average rating as the recommendation to the visitor in the query "Maymoona".
In this paper, we implement hotels recommender system based on similarity function. The principle is to use similarity cosine function to automatically identify other hotels that are similar to the ones the person likes. Figure 9. Recommending new hotels to the visitors that she/he did not book The Hotel recommender system identifies Hotels that the user has rated and has not rated, and then suggests hotels very similar to his or her preferences and likes.
The performances of our hotel recommender systems are evaluated based on real cases. Based on our evaluation our hotels recommendation system that uses a hybrid knowledge based approach represented by the new technology of knowledge graph and similarity measurement has proven to be quite effective for addressing that kind of issues.
We used three different scores to evaluate our recommendations system. These are Precision, recall and fmeasure. Precision tells us how good our recommendation system is and can be calculated using the following equation precision = (relevant hotel ∩ retrieved hotels)/ retrieved hotels.
Recall tells the visitor that total amount of hotels retrieved are relevant, we can find recall using the following formula: recall = (relevant hotel ∩ retrieved hotels) /relevant hotels. We use the following formula to find f-measure = (2*precision *recall)/ (precision + recall).
To do that, we removed part of our dataset and asked our system to recommend the visitor a set of hotels. We selected the top five visitors who have visited similar hotels. We compared the new recommendations with the actual data, and then we found the precision, recall and f-measure. Fig. 10 represents the finding and the results. The test of results showed that a good recommendation quality has been achieved with 92.8% Precision, 94.5% for recall and 93.5 for F-measure.
The results obtained from this experiment demonstrate the validity of this approach, by using the concept of knowledge graph and similarity function techniques to recommend a visitor relevant and interested hotels. Using such an innovative approach, we can achieve effectiveness, efficiency, interactivity, and high quality.

V. CONCLUSIONS
In this paper we proposed new two approaches: the similarity based approach and the knowledge graph approach to incorporate semantic and leveraging multi criteria rating information in recommender systems. Our recommender system will help users in choosing an appropriate option for accommodation.
We tested our approaches with a real dataset collected from websites for booking accommodation online. Based on the analysis of our datasets we evaluated our approach with respect to the degree of similarity between the visitors. The results of our evaluation show that our approach is able to identify recommendations of similar solutions in a highly efficient manner with 93.5% for f-measure. Further results and confirmations may be reached by extending the knowledge graph of datasets.