Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finding People Compatibility on Social Networks...

Umut Benzer
December 26, 2011

Finding People Compatibility on Social Networks using a Modified Perceptron

Social Networks are exist for one simple reason: To connect people. To create new friendships, it is usually a good thing to recommend people who is compatible with them instead of showing just random people.
This paper studies a new method for measuring people compatibility on social network, Facebook.

Keywords: Perceptron, Social Networks, Compatibility Engine, Facebook

Umut Benzer

December 26, 2011
Tweet

More Decks by Umut Benzer

Other Decks in Research

Transcript

  1. Finding People Compatibility on Social Networks using a Modified Perceptron

    BENZER Umut, MSc in Computer Engineering Department, Ege University ([email protected]) Abstract Social Networks are exist for one simple reason: To connect people. To create new friendships, it is usually a good thing to recommend people who is compatible with them instead of showing just random people. This paper studies a new method for measuring people compatibility on social network, Facebook. Keywords: Perceptron, Social Networks, Compatibility Engine, Facebook Introduction To make new connections it is important to recommend people, new people. It is done with different methods in different social networking sites. In Facebook (which is the point of interest in this paper) it is done by “People You May Know” section. Today, Facebook is generally used to get connected with people you already know. But it is always possible to extend this idea: People should be able to connect with other people whom they are compatible. Compatibility of two people means, how do they do when they get to know each other. Compatibility can be measured specific to a category (such as, love, social, career, parenting) or in general. In this research compatibility in general is studied. It is intended to support findings of this research with physiological background. Compatibility can be measured in different ways and different data can be used. In the next section, researches done in the past are discussed. The section following it, a new method purposed. Related Work Some researches done about compatibility measurements on social networks. One research [1] recommends using semantics of words of users’ status updates and find relevance using semantic distance. It is a good approach since, it doesn’t blindly counts interactions but analyses to contents of interactions. However, it is not applicable to most of the languages over the world, since there is no semantic relationship/similarity map of words for most of the languages. There are currently four different algorithms [2] which doesn’t include semantics. First of them is Content Matching. Content Matching algorithm means, if two people post content on similar topics, than they may be interested in to know each other. First, people’s textual entries converted into “word bags”. Stop words and other common words excluded. Than people’s word bags compared against each other to find most similar word bags. Similar word bags mean that two people writes about similar topics so they are compatible. This is unpractical to use since people don’t write obeying grammar rules on Facebook. The second algorithm is an addition to the first one. In Content-plus-Link (CplusL) algorithm, people that have same interests recommended to each other only if there is “valid link” between them. The “valid link” is described in detail in original article, but in brief summary, two people recommended to each other if they are connected with another people in between. (e.g. A is connected to B and B commented to C, then A recommended to C) This algorithm has the same issues with the first one. Besides, recommending people this way requires all connection data between people, which can only be achieved by seeing the whole of social network. In Facebook, custom Facebook applications doesn’t have that kind of data [3] which makes this algorithm infeasible for our purposes.
  2. The third algorithm is “Friend of a Friend (FoF)” algorithm.

    In this algorithm contents, points of interests and things like that are not relevant. Only thing that is important is friends. If person A is friends with B and B is friends with C, than A and C may knew each other. This is the current people recommendation technique Facebook uses with some other parameters. [4] Third method is not valid for measuring people compatibility since it “just recommend” some people which user may have met. Knowing someone doesn’t mean being compatible. The last algorithm is SONAR which is developed by IBM. SONAR is a system which aggregates social relationship information from IBM’s relevant sources. [5] This algorithm calculates a compatibility point between [0-1] by equally weighting all inputs. This algorithm has few advantages. First, more than one source used in calculation, which is believed to make results more relevant. Second, why two people’s comparison resulted in being compatible is known, because sources are known. For example, it can be easily said: “You may be compatible because, you have 5 common friends, 3 common interests etc.” On the other hand, SONAR has a shortcoming: In real life, not all sources are equal. It is unlikely that, loving same songs and loving same kind of food is equally important when deciding compatibleness. Weights of input sources cannot be decided easily “by just giving some number”. That would unlikely to predict correct compatibility results. Our method uses a modified perceptron (which is strengthened to work better on environments that have lots of missing data) to determine input sources’ importance on compatibility. Purposing a method to find weights Weights of input categories can be found out by two different ways: A three layer neural network (multi- layer perceptron) may be designed and trained with some training data. That may work on theory but in practical there are three issues which makes this method unlikely feasible: 1. It is very hard to collect data which can be used in training since there are lots of inputs. 2. There are a lot of missing data. Not all data is available for all test users. For example, a user can share her likes but can hide her wall posts. It is important that, used method can handle missing input on production, and missing input on training, too. As mentioned in issue 1, with too little training set with lots of missing data causes neural network to be trained insufficiently. 3. Using layered neural network gives “just some numbers” not actual weights. Importance of each input (for compatibility) cannot be known. For example, this kind of comment can’t be done: “Liking the same band is three times important than liking same writer in compatibility analysis.” This three issue led us to use single perceptron. Using single perceptron is better for following reasons: 1. An input’s actual contribution to compatibleness is known. It is useful to know “why this two people is decided as being compatible”. 2. It is enough to have a linear solution. (probably better than nonlinear) Having a nonlinear solution, especially with insufficient training set may cause over- educated neural network which can only answer inputs in training set, creating logically true but practically meaningless results for real life inputs. If three common photos and one comment means 70% compatibility, than four common photos and two comment should mean better compatibility (which do in linear solutions).
  3. Dealing with too many missing data in training set People

    in social network are free to share their information or to restrict applications to access it, either. Because of this reason the perceptron must be able to handle missing data, and be able to be trained with the data that has been obtained. In traditional perceptron, input count is fixed. The input values at the all inputs multiplied by inputs’ weights. Then bias added and the resulting value is used input on an activation function, which generates output of that perceptron. Figure 1: A perceptron. [6] Output is f ( ∑ (xn * wn ) ) for particular perceptron above. (n=1 to 7) This perceptron doesn’t have bias. A single perceptron with nonlinear activation function seems to fit to our compatibility calculation problem, if inputs are somehow managed to be normalized into range [0-1]. Representing social network data for perceptions Social network gives a lot of data about users. (A full list with their weights can be found at the end of this paper.) This data needs to be categorized and digitized then normalized to [0-1] input. There are two kinds of data retrieved from social network: 1. Data of two people in numbers, separate for two users, and common count. (For example, user 1’s event count, user 2’s event count and common events.) 2. Common data without specific numbers for each user. (For example: user 1 and 2 has 49 common links. But user 1’s and user 2’s link count cannot be known.) This two kind of data normalized with different methods. For first kind of data, two methods experimented: 1. Jaccard Index: [7] Jaccard Index (aka. Jaccard Similarity Coefficient) calculates sameness of two sets. For example, let there be two sets A and B. Using Jaccard Index, A and B’s similarity can be measured by (A intersection B) / (A union B) 2. Simple percentage: A’s similarity to B can be calculated by common things count / B Our experiments shows that second method yields to more accurate results. Due to this, first method was abandoned in early stages of the research. For the second kind of data, some kind of normalization should be done. Since usage of social network per user isn’t same, this kind of input may be normalized inaccurately easily. We used a constant bottom and top limit (based on our expertise on social network) and normalized using this ranges. For example, common link count for user A and B is 5. Maximum value for link counts are defined 10. Than normalized A-B link value is (5-0) / (10-0) = 0.5. (value-min) / (max-min) In our study, we allowed normalized data, above 1.0. (For example, if A-B common link count is 12, than normalized value is (12- 0) / (10-0) = 1.2 This causes compatibility values over 1.0 in results, but allowing values over 1.0 drastically improve result accuracy. Using unmodified single layer perceptron for compatibility analysis Our first experiment was to use single layer perceptron. Before using an unmodified perceptron a question raises: What should I do about missing data? An answer to this is giving a predefined number for unknown data. We used -1 for
  4. unknown inputs and used actual values between [0 – 1..

    for other inputs and 1.0 for bias. The results showed as that, there were too much unknown data that, perceptron couldn’t handle it. Weights of trained perceptron showed us that, in some causes “unknowing the input data” increases compatibility of two people! Maximum error rate was over 80%. Using modified perceptron to handle missing training data Instead of representing the missing data, we decided not to include them as input at the beginning at all. In our perceptron, if an input data is missing, it doesn’t count as input. It’s weight doesn’t recalculated on that iteration either. This idea required an activation function, which can handle different count of inputs. If we had used a function which doesn’t use input count on calculation, not including an input might had caused meaningless/not equally calculated dot product. We used average function as an activation function. Inputs: At least one input, at most unlimited inputs. (let n be the count of inputs) Dot Product Calculation: ∑ (in * wn ) for existing inputs. Activation Function: Dot Product / n This perceptron trained with the steps bellow: (∆ as error rate) = Output desired in training set – Actual output. For each inputs’ weight (missing data weights omitted) new weight = old weight - ∆ * learning factor * average input value) Omitted inputs’ (because the data is missing) weights are not updated. This algorithm yields to better results than unmodified perceptron. Experiment Details The test data was collected from 5 volunteers. Each volunteer gave his/her friends a compatibility point between [0-100] which was normalized to [0-1] and used as desired output vector or training set. The final training set had 292 friend compatibility point and comparison data fetched from social network. Error means |Expected – Calculated|. Error rate was calculated with two different measures: 1. Max error: The maximum error on the results, after training has been done. 2. Average error: The mean error of the result set. For example, if #C1, Expected: 50, Calculated: 20, Error: 30; #C2, Expected: 90, Calculated: 80, Error: 10 then Max error is 30 and average error is 20. Our modified perceptron resulted with %22 average error rate and %48 maximum error rate for our particular training set, learning rate 0.01, 5000 iterations. If we strictly restrict inputs and weights to be in range [0-1] than with same parameters, maximum error rate raised to %68. In result, we didn’t restrict maximum value of inputs and weights but restricted minimum values to zero, since it doesn’t make sense a common thing makes people more incompatible in real life. Our actual weight findings are listed on table attached to this paper. Conclusion & Future Work This research can only be a guidance to compatibleness calculation. It should be supported with physiological profiling to yield better results. The representation to user can be done by predefined range names for better experience. (For example, 0-40 may be incompatible, 40-60 somehow compatible, 60- 80 compatible, 80+ extremely compatible) Training set is vulnerable to personal opinions. For example, volunteers may vote
  5. very low, if they had fight recently and daily events

    like that. A better way to collect training set and stricter rules for voting should be in action when further researches are being done. A better method for normalizing inputs can be used to get better results. This may be done by a different normalizing method, or collecting more accurate maximum values by using data mining and statistics. References [1] Finding Compatible People on Social Networking Sites, a Semantic Technology Approach, Kazemi, A.; Nematbakhsh, M.; Comput. Eng. Dept., Univ. of Isfahan, Isfahan, Iran Intelligent Systems, Modelling and Simulation (ISMS), 2011 Second International Conference on 306 – 309 25-27 Jan. 2011 [2] Make new friends, but keep the old: recommending people on social networking sites, Jilin Chen, University of Minnesota, Minneapolis, MN, USA; Werner Geyer, IBM T.J Watson Research, Cambridge, MA, USA; Casey Dugan IBM T.J Watson Research, Cambridge, MA, USA; Michael Muller IBM T.J Watson Research, Cambridge, MA, USA; Ido Guy IBM Haifa Research Lab, Haifa, Israel; CHI '09 Proceedings of the 27th international conference on Human factors in computing systems 201-210 [3] Permissions, Facebook Developers 10/2011 http://developers.facebook.com/docs/reference /api/permissions/ [4] People You May Know, Facebook Help Center http://www.facebook.com/help/?page=199421 896769556 [5] Guy, I., Jacovi, M., Meshulam, N., Ronen, I., Shahar, E. 2008. Public vs. Private – Comparing Public Social Network Information with Email. ACM CSCW’08. [6] Perceptron, Wikipedia http://en.wikipedia.org/wiki/Perceptron 12/12/2011 [7] Jaccard Index, Wikipedia http://en.wikipedia.org/wiki/Jaccard_index 2/12/2011
  6. Table of Weights This table shows our findings about connection

    type-compatibility relationship. Higher weight values mean it is more effective to compatibility. Lower values mean it is less important in compatibility analysis. # Input’s System Name Weight Calculated 1 pageLike_common_Tv show 4.92218 2 pageLike_common_Furniture 1.78E-08 3 pageLike_common_Computers/technology 1.444334 4 photo_friendTaggedUserPhoto 0.666485 5 pageLike_common_News/media 0.458035 6 pageLike_common_Bank/financial institution 0.300492 7 pageLike_common_Education 2.501953 8 interaction_user1wall_taggedInPostComments 0.937331 9 pageLike_common_Movie 1.945135 10 pageLike_common_Arts/humanities 0.271252 11 photo_taggedBy3rdPartyPhoto 1.704069 12 pageLike_common_Radio station 0.713533 13 pageLike_common_Personal website 0.779151 14 pageLike_common_Local business 2.744438 15 event 0.376024 16 link_common 0.004156 17 interaction_user2wall_taggedInPostComments 1.220579 18 pageLike_common_Artist 3.038284 19 pageLike_common_Community 0.982134 20 pageLike_common_Science 0.773524 21 hometown 0.002773 22 interaction_user1wall_commonPhotoComments 0.002654 23 pageLike_common_Politician 0.288269 24 pageLike_common_Travel/leisure 0.744645 25 pageLike_common_Professional sports team 7.19E-05 26 pageLike_common_Internet/software 0.998086 27 education_commonConcentration 1.441132 28 pageLike_common_Transportation 0.449238 29 education_commonSchool 9.62E-05 30 friends_common 0.168995 31 pageLike_common_Cars 1.292369 32 pageLike_common_Interest 0.533597 33 pageLike_common_Application 1.15E-04 34 otherLikes_user2wall_commonPhotoLikes 0.005312 35 interaction_user1wall_commonAlbumComments 2.84E-04 36 pageLike_common_Home/garden 0.094279 37 closeFriends 4.159213 38 pageLike_common_Telecommunication 4.387123 39 otherLikesuser2wall_commonPostLikes 2.800188 40 interaction_user2wall_commonPostComments 7.06E-04 41 pageLike_common_Non-governmental organization (ngo) 1.347335 42 photouser_TaggedFriendPhoto 0.001184 43 pageLike_common_Website 2.5626 44 pageLike_common_Software 1.099623 45 pageLike_common_Teacher 1.45E-08
  7. 46 pageLike_common_Games/toys 3.904593 47 pageLike_common_Aerospace/defense 1.9E-09 48 pageLike_common_Movie general 0.728632

    49 pageLike_common_University 1.123068 50 otherLikes_user2wall_commonAlbumLikes 1.324883 51 pageLike_common_Club 0.875836 52 pageLike_common_Attractions/things to do 1.160209 53 pageLike_common_Public figure 1.216314 54 subscribee 0.574371 55 pageLike_common_Reference 1.429452 56 pageLike_common_Actor/director 0.473849 57 otherLikes_user1wall_commonPostLikes 0.001399 58 pageLike_common_Coach 0.732074 59 pageLike_common_Business person 0.182278 60 pageLike_common_Health/medical/pharmaceuticals 1.35E-06 61 pageLike_common_Magazine 2.934436 62 pageLike_common_Society/culture 0.828716 63 pageLike_common_Computers/internet 4.351207 64 pageLike_common_App 1.141994 65 checkin_commonPlaces 0.574912 66 otherLikes_user1wall_commonAlbumLikes 7.10E-04 67 pageLike_common_Museum/art gallery 1.522884 68 pageLike_common_Product/service 3.878547 69 pageLike_common_Entertainment 0.251157 70 pageLike_common_Community organization 1.03E-05 71 pageLike_common_Event planning/event services 1.332829 72 pageLike_common_Book 0.334206 73 checkin_atTheSameCheckin 0.540088 74 pageLike_common_Engineering/construction 0.934735 75 pageLike_common_Food/beverages 0.483015 76 pageLike_common_Kitchen/cooking 0.22661 77 pageLike_common_Non-profit organization 1.233347 78 otherLikes_user1wall_commonPhotoLikes 0.008273 79 pageLike_common_Personal blog 3.219735 80 currentLocation 0.001825 81 interaction_user2wall_commonAlbumComments 1.577139 82 relationship 3.26146 83 pageLike_common_Company 2.600332 84 interaction_user1wall_commonPostComments 0.001039 85 affiliation_common 1.414702 86 pageLike_common_Library 0.989996 87 pageLike_common_Musician/band 3.809847 88 isFamily 3.798747 89 interaction_user2wall_commonPhotoComments 0.002326 90 pageLike_common_Author 0.486221 91 pageLike_common_Tv network 0.854202 92 pageLike_common_Fictional character 1.545697 93 pageLike_common_Organization 0.842013 94 group 0.860418 95 subscriber 5.33E-04 96 photo_userTaggedFriendPhoto 0.505485