Citation - Noll, M. G., & Yeung, A. (2009). Telling experts from spammers: Expertise ranking in folksonomies.
區別專家與灌水者:以群眾分類學進行專業評比
Keyword - folksonomy, social network analysis
研究對象: del.icio.us 網路社群當中的網路資源 tagging 行為
(行為) tagging: Freely annotating resources with keywords
(行為的目的): self organizing resources, sharing, self-promotion,…
(協同標記平台): 讓網友們自己進行資源關鍵詞註記的網路服務與社群平台 (e.g., delicious.com, flickr.com )
(行為導致的現象)tagging result phenomena : bottom-up “categorization” by end users, aka “folksonomy”
研究問題現況: 目前的排序只能根據數量與頻率, 無法區分專業性標記與大量灌水性標記行為
研究目標: 設計新的演算法 SPEAR (SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]), 此方法區分專業家與灌水者, 進而改善搜尋的相關性。
設計原則(研究假定): 使用者在特定主題的專業性程度, 主要取決於:
We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors: (1) there should be a relationship of mutual reinforcement between the expertise of a user and the quality of a resource; and (2) an expert should be one who tends to identify useful resources before other users discover them.
(1)越專業的人與所分享資源的品質越好;
Mutual reinforcement of user expertise and document quality: Expert users tend to have many high quality documents, and high quality documents are tagged by users of high expertise.
(2)專家比其他人更早發現有用的資源
Discoverers vs. followers: Expert users are discoverers – they tend to be the first to bookmark and tag high quality documents, thereby bringing them to the attention of the user community. Think: researchers in academia.
(研究設計)演算法設計: graph-based algorithm (網絡關係為基礎的演算法)
(研究檢驗分析) : We carry out experiments on both simulated and real-world data sets obtained from Delicious, and show that SPEAR is able to detect the difference between different types of experts, and is more resistant to spammers than other methods.
實驗設計 Experimental
在真實世界系統中,放入模擬的使用者 Workaround: Inserting simulated users into real-world data from Delicious.com and check where they end up after ranking
比較 Delicious.com 中 50 tags ,當中包含了 515,000 真的使用者、71,300 實際上的頁面、2,190,000 實際上的書籤
模擬使用者的變項 Probabilistic simulation, simulated users generated with four parameters
P1: 使用者收錄的書籤數量 Number of user’s bookmarks – active or inactive user?
P2: 網頁的新穎性 Newness – fraction of Web pages not already in data set
P3: 使用者收錄網頁的時間偏好 Time preference – discoverer or follower?
P4: 網頁的品質 Document preference – high quality or low quality?
區分六種不同使用者類型
技客 Geek – 收錄大量高品質網頁,發掘者(跨領域研究者)
lots of high quality documents, discoverer Distinguished Researcher)
老鳥 Veteran – 收錄高品質網頁,發掘者(教授)
high quality documents, discoverer (Professor)
菜鳥 Newcomer – 收錄高品質網頁,跟隨者(博士生)
high quality documents, follower (PhD student)
氾濫 Flooder – 隨機的收錄大量網頁,跟隨者
lots of random documents, follower (found in Delicious)
促銷者 Promoter – 主要收錄自己的網頁,發掘者
some documents (most are his own), discoverer (found in Delicious)
(鄉民)特洛伊人 Trojan – 收錄少數網頁,跟隨者
some documents, follower (next-gen spammer)
比較三種不同演算法的成效