the web. ◦ e.g. Filter Bubbles, Echo chamber ◦ These phenomena have been investigated by questionnaires. • We would like to clarify these phenomena by analyzing behavior data. ◦ In this study, using user activity logs in news application. ◦ For evaluating diversity of recommender systems, improving long-period user satisfaction, and so on.
differs between user attributes? ◦ Ideally, we would like to analyze users based on their interest. ◦ Instead of user’s interest, we analyze users based on their attributes. • Our Contributions: ◦ Clarify relationships of user behavior between user attributes. ◦ Detect keywords that are biased by attribute, using regression analysis.
• news articles ◦ politics, society • 2 type action ◦ Click, Like • Clicked more than 100 times • User Attributes ◦ users register own attributes to that application. ▪ if users don’t register, their attributes are predicted by supervised learning. ◦ age ▪ - 29 (younger), 30-39 (middle), 40- (older) ◦ gender ▪ male, female 5
by attributes where compared using correlation coefficient. ◦ Click number has strong positive correlations between attribute. ◦ Like number has weak correlations compared to click’s. • User behavior between attributes has strong correlation. ◦ we are able to discuss about their differences by user behavior data.
the behavior differ between user attributes on the topic of news articles. ◦ There are various definitions of news topics. ◦ This study compares articles based on the keywords included in the title • Extract keywords from news articles. ◦ Divide the title of the news article into morphemes using Mecab ▪ These morphemes are taken as keyword candidates. ◦ Count news articles including each keyword candidate. ◦ We adopt top 100 words in this count as keywords. ▪ meaningless words are excluded.
compare keywords between user attributes. ◦ If the correlation coefficient of the keyword is weak, that keyword is not comparable. • Keywords with weak correlation coefficient are included articles with very few number of actions. Click Like
adopt regression analysis. • By regression analysis, Slope and Intercept are obtained. ◦ exclude keywords whose coefficient of determination is 0.5 or less. ▪ coefficient of determination is similar to correlation coefficient
keywords are divided into three categories based on mean ± σ. ◦ lager than upper ( x > mean + σ) ◦ smaller than lower (x < mean - σ) ◦ within the section ( mean - σ < x < mean + σ) • These category is defined under the assumption that the distribution of these parameter is normal distribution. ◦ belonging to 95% or not. • If one is within section and other is not, this keyword is biased.
a Japanese politician who presented papers on LGBT in magazines. The claims in these papers is caused controversy. • There is news about the possible introduction of Summer Time before the 2020 Summer Olympic Games in Tokyo. • A 2-year-old boy was missing in the forest and was rescued by a volunteer. politics society click like click like Upper (biased to male) House of Representatives, China Police Obscenity Lower (biased to female) Sugita Mio, Summer Time, Cabinet, Olympics Child, Mother Boy, Crush, Mother, Children
a Japanese politician who presented papers on LGBT in magazines. The claims in these papers is caused controversy. • There is news about the possible introduction of Summer Time before the 2020 Summer Olympic Games in Tokyo. • A 2-year-old boy was missing in the forest and was rescued by a volunteer. politics society click like click like Upper (biased to male) House of Representatives, China Police Obscenity Lower (biased to female) Sugita Mio, Summer Time, Cabinet, Olympics Child, Mother Boy, Crush, Mother, Children
a Japanese politician who presented papers on LGBT in magazines. The claims in these papers is caused controversy. • There is news about the possible introduction of Summer Time before the 2020 Summer Olympic Games in Tokyo. • A 2-year-old boy was missing in the forest and was rescued by a volunteer. politics society click like click like Upper (biased to male) House of Representatives, China Police Obscenity Lower (biased to female) Sugita Mio, Summer Time, Cabinet, Olympics Child, Mother Boy, Crush, Mother, Children
a Japanese politician who presented papers on LGBT in magazines. The claims in these papers is caused controversy. • There is news about the possible introduction of Summer Time before the 2020 Summer Olympic Games in Tokyo. • A 2-year-old boy was missing in the forest and was rescued by a volunteer. politics society click like click like Upper (biased to male) House of Representatives, China Police Obscenity Lower (biased to female) Sugita Mio, Summer Time, Cabinet, Olympics Child, Mother Boy, Crush, Mother, Children
on the user behavior log of news applications and extracted keywords with biased behavior. • Using regression analysis, we obtain a biased keyword from the degree of departure from the average value of slope and intercept. • Future Works ◦ Verify whether this result is valid according to social science knowledge. ◦ Discover a strong bias topic due to user's interests rather than user attributes. ◦ Create a measure that can extract keywords more simply.