Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search mission is a related set of information needs, resulting in one or more goals. session mission mission mission 7
goal goal goal goal goal goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search goal is an atomic information need, resulting in one or more queries. A search mission is a related set of information needs, resulting in one or more goals. 8
Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 9
current word using CKIP (中文斷詞系統) • Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Preprocessing Intent tagging 24 Intent Tagging
式 燒 肉 愛 樂 廚 房 • Syntactic 公館 (N) 韓式燒肉 (N) 愛樂廚房 (N) • Yahoo! Life+ Title Match 公館韓式燒肉 (Y) 愛樂廚房 (Y) 25 • Lexical: Current word • Syntactic: POS tag of the current word using CKIP (中文斷詞系統) • Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Intent Tagging
Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳 金山鴨肉 彰化肉圓 萬巒豬腳大王 Yahoo Life+ Database query Taiwan Road & Street Database 31 Intent Tagging
log 2008 Total goals 1275 Total missions 712 Goal Stats Mission Stats Avg queries of each goal 1.5699 Avg queries of each mission 2.8101 Max queries of each goal 7 Max queries of each mission 11 Min queries of each goal 1 Min queries of each mission 1 42
+ S L + Y + S P R F1 P R F1 P R F1 P R F1 IH All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 43
= NEW 台北車站 LUCCA PASTA 台北車站 義大利麵 • INSERT Target intent only appeared in q2 • DELETE Target intent has been deleted in q2 • NEW Target intent has been changed if Jaccard distance>=0.5 • MODIFY Target intent has been changed if Jaccard distance<0.5 • SAME Target intent is equal in both q1 and q2 • NONE Target intent has not been tagged in both q1 and q2 Intent tagging Statement Transition Statement Transition 47 IM:T q1 q2
appeared in q2 • DELETE Target intent has been deleted in q2 • NEW Target intent has been changed if Jaccard distance>=0.5 • MODIFY Target intent has been changed if Jaccard distance<0.5 • SAME Target intent is equal in both q1 and q2 • NONE Target intent has not been tagged in both q1 and q2 IM:T = SAME IM:L = SAME IH = INSERT IM:L IM:L IM:T IM:T Intent tagging Statement Transition Statement Transition 48 IH q1 q2
time = threshold as a binary feature (5 mins, 30 mins, 60 mins, 120 mins) • time diff = inter-query time in seconds • sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user 50 Preprocessing Intent tagging baseline
mins, 30 mins, 60 mins, 120 mins) • time diff = inter-query time in seconds • sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user Temporal Features 築地鮮魚 台北 築地鮮魚 台北 blog 築地鮮魚 台北 blog 大大茶樓 南京 大大茶樓 南京 築地鮮魚 台北 blog 51 Preprocessing Intent tagging baseline
distance • word_pov = num. characters in common starting from the left • word_suf = num. characters in common starting from the right • commonw = num. words in common • wordr = jaccard distance between sets of words q1 q2 52 Preprocessing Intent tagging baseline
first 50 search results for the query terms Web Search Features Yahoo! BOSS Search API, http://developer.yahoo.com/boss/search/ 55 Preprocessing Intent tagging baseline
to predict vertical search query intent, and also let search engine know what users wanted. Intent tagging predict goal/mission boundary • It is not only improve 3% ~ 5% accuracy in both goal and mission boundary, but also tagging the search query in (IH, IM:Type, IM:Location or Others) in vertical search domain. 66
of manual tagging is high, and we would like to auto- generate answers to make it cost down. B. Intent tagging predict goal/mission boundary • Implement our method to other similar vertical search domain, for example: automobile, movie etc. 67
the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs, CIKM 2008 ! Xiao Li, Understanding the Semantic Structure of Noun Phrase Queries, ACL 2010 ! Alexander Kotov, Paul N Bennett, Ryen W White, Susan T Dumais, and Jaime Teevan, Modeling and Analysis of Cross-Session Search Tasks, SIGIR 2011 68
L + S 0.8972 0.8826 Y + S 0.7827 0.7471 L + Y + S 0.9038 0.8934 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic Backup slides
Character Edit 0.5103 wordr Word and Character Edit 0.4660 word_pov Word and Character Edit 0.4185 commonw Word and Character Edit 0.3287 Prisma Web Search 0.1980 crf_ih_state Intent Tagging 0.1593 crf_imt_state Intent Tagging 0.1302 crf_iml_state Temporal 0.0835 crf_iml_state Intent Tagging 0.0511 pq12 Query Log Sequence 0.0279 Backup slides Goal boundary feature weight using fselect in libsvm
in libsvm Feature Set Weight wordr Word and Character Edit 0.8284 commonw Word and Character Edit 0.7014 lev Word and Character Edit 0.6206 inter_query_time Temporal 0.4788 word_pov Word and Character Edit 0.4109 Prisma Web Search 0.1164 crf_iml_state Intent Tagging 0.0835 word_suf Word and Character Edit 0.0348 crf_ih_state Intent Tagging 0.0291 entropy_q1_X Query Log Sequence 0.0183 Backup slides
boundary Goal boundary Predict X • Though the mission boundary is higher than goal boundary detection, so the result does not get improved goal Backup slides
Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 • A goal can be thought of as a group of related queries to accomplish a single discrete task. • The queries need not be contiguous, but may be interleaved with queries from other goals A search goal is an atomic information need, resulting in one or more queries. Problem Definition Backup slides
search mission is a related set of information needs, resulting in one or more goals. Problem Definition Mission Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 Backup slides
Semantic: Current N-gram occurs in lexicon L (N=1~4) Syntactic: POS tag of the current word Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides
et al. Do you want to take notes? Identifying research missions in Yahoo! Search Pad. WWW 2010. ! Claudio Lucchese, Salvatore Orlando et al. Identifying Task-based Sessions in Search Engine Query Logs. WSDM 2011. Backup slides
Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 86 Backup slides
Right boundary Right boundary Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 87 Backup slides
a class of statistical modeling method often applied in pattern recognition and machine learning, where they are used for structured prediction. CRF++, http://crfpp.googlecode.com/svn/trunk/doc/index.html Backup slides
2 IH IM:Location IM:Type Real Predict Real Predict Real Predict 燒鳥 台北 燒鳥串燒 台北 MODIFY MODIFY SAME SAME NONE DELETE 燒鳥串燒 台北 師大 早餐 DELETE DELETE NEW NEW INSERT INSERT 師大 早餐 師大 中式早餐 NONE NONE SAME SAME MODIFY MODIFY 中壢 拉麵 中壢 赤坂拉麵 INSERT INSERT SAME SAME DELETE DELETE 中壢 赤坂拉麵 中壢 赤坂拉麵 時間 SAME SAME SAME SAME DELETE NEW 中壢 拉麵 推薦 中壢 伊太郎 INSERT INSERT SAME SAME DELETE NEW 中壢 伊太郎 中壢 伊太郎 推 薦 SAME SAME SAME SAME NONE DELETE 中壢 伊太郎 推薦 風車 故鄉餐廳 時間 NEW NEW DELETE NEW NONE INSERT 內湖 水鳥22 法式 小館 桃園 川菜 DELETE DELETE NEW NEW INSERT NEW 桃園 川菜 桃園 福利川菜 INSERT NONE SAME SAME DELETE MODIFY Backup slides
B I B I O IH B 871 41 56 1 57 8 38 I 40 3942 52 316 1 101 102 IM:Location B 63 33 746 1 9 8 15 I 1 249 8 1689 0 14 31 IM:Type B 55 5 4 0 1309 16 7 I 0 117 0 6 11 2332 18 O O 31 161 31 45 21 41 4240 predicted class actual class Using L + Y + S feature combination Backup slides
the user will return to this task in the future 1. Given a user query, identify all related queries from previous sessions that the user has issued Task Continuation Alexander Kotov, Paul N. Bennett, Ryen W. White, Susan T. Dumais, and Jaime Teevan. Modeling and Analysis of Cross-Session Search Tasks. SIGIR 2011 93 Same Task Related Research Related Work Backup slides