This shows you the differences between two versions of the page.

Link to this comparison view

study:noll_m._g._yeung_a._2009_._telling_experts_from_spammers [2018/08/27 02:22]
study:noll_m._g._yeung_a._2009_._telling_experts_from_spammers [2018/08/27 02:22] (current)
Line 1: Line 1:
 +== Telling experts from spammers: Expertise ranking in folksonomies / Noll & Yeung (2009) ==
 +**Citation** - Noll, M. G., & Yeung, A. (2009). Telling experts from spammers: Expertise ranking in folksonomies. \\ 區別專家與灌水者:以群眾分類學進行專業評比
 +**Keyword** - [[:​folksonomy]],​ [[:social network analysis]]
 +* 研究對象:​ del.icio.us 網路社群當中的網路資源 tagging 行為
 +** (行為) tagging: Freely annotating resources with keywords
 +** (行為的目的):​ self organizing resources, sharing, self-promotion,​...
 +** (協同標記平台):​ 讓網友們自己進行資源關鍵詞註記的網路服務與社群平台 (e.g., delicious.com,​ flickr.com )
 +*** (平台上的利用行為):​ 搜尋相關資源(relevant resources), 搜尋相關專家(experts in particular domain)
 +** (行為導致的現象)tagging result phenomena : bottom-up “categorization” by end users, aka “folksonomy”
 +* 研究問題現況:​ 目前的排序只能根據數量與頻率,​ 無法區分專業性標記與大量灌水性標記行為
 +* 研究目標:​ 設計新的演算法 SPEAR (SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]),​ 此方法區分專業家與灌水者,​ 進而改善搜尋的相關性。
 +* 設計原則(研究假定):​ 使用者在特定主題的專業性程度,​ 主要取決於: \\ We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors: (1) there should be a relationship of mutual reinforcement between the expertise of a user and the quality of a resource; and (2) an expert should be one who tends to identify useful resources before other users discover them.
 +## (1)越專業的人與所分享資源的品質越好; ​ \\ **Mutual reinforcement of user expertise and document quality**: Expert users tend to have many high quality documents, and high quality documents are tagged by users of high expertise.
 +## (2)專家比其他人更早發現有用的資源 \\ **Discoverers vs. followers**:​ Expert users are discoverers – they tend to be the first to bookmark and tag high quality documents, thereby bringing them to the attention of the user community. Think: researchers in academia.
 +* (研究設計)演算法設計:​ graph-based algorithm (網絡關係為基礎的演算法)
 +** 根據在 IR 研究中, 以專家辨識作為改善檢索相關性的相關研究成果。這類似引文分析的作法。
 +* (研究檢驗分析) : We carry out experiments on both simulated and real-world data sets obtained from Delicious, and show that SPEAR is able to detect the difference between different types of experts, and is more resistant to spammers than other methods.
 +* SPEAR – SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]
 +** 基於[超文本連入主題搜尋]演算法 Based on the HITS (Hypertext Induced Topic Search) algorithm
 +*** Hubs [樞紐]: 指向許多品質優良頁面的頁面 pages that points to good pages
 +*** Authorities [權威來源]:​ 被許多優良頁面連結的頁面 pages that are pointed to by good pages
 +** 專業性(Expertise)與品質(Quality)的概念類似於樞紐與權威
 +*** 專家是樞紐 Users are hubs – we find useful pages through them
 +*** 品質優良的頁面是權威 Pages are authorities – provide relevant information
 +** 不同之處:​ 只有使用者(專家)可以指向文件(權威來源),而不能反轉這種關係。
 +* 演算法
 +* 實驗設計 Experimental
 +** 在真實世界系統中,放入模擬的使用者 Workaround: Inserting simulated users into real-world data from Delicious.com and check where they end up after ranking
 +** 比較 Delicious.com 中 50 tags ,當中包含了 515,000 真的使用者、71,​300 實際上的頁面、2,​190,​000 實際上的書籤
 +** 模擬使用者的變項 Probabilistic simulation, simulated users generated with four parameters
 +*** P1: 使用者收錄的書籤數量 Number of user’s bookmarks – active or inactive user?
 +*** P2: 網頁的新穎性 Newness – fraction of Web pages not already in data set
 +*** P3: 使用者收錄網頁的時間偏好 Time preference – discoverer or follower?
 +*** P4: 網頁的品質 Document preference – high quality or low quality?
 +** 區分六種不同使用者類型
 +*** 技客 Geek – 收錄大量高品質網頁,發掘者(跨領域研究者) \\ lots of high quality documents, discoverer Distinguished Researcher)
 +*** 老鳥 Veteran – 收錄高品質網頁,發掘者(教授) \\ high quality documents, discoverer (Professor)
 +*** 菜鳥 Newcomer – 收錄高品質網頁,跟隨者(博士生) \\ high quality documents, follower (PhD student)
 +*** 氾濫 Flooder – 隨機的收錄大量網頁,跟隨者 \\ lots of random documents, follower (found in Delicious)
 +*** 促銷者 Promoter – 主要收錄自己的網頁,發掘者 \\ some documents (most are his own), discoverer (found in Delicious)
 +*** (鄉民)特洛伊人 Trojan – 收錄少數網頁,跟隨者 \\ some documents, follower (next-gen spammer)
 +** 比較三種不同演算法的成效
 +*** SPEAR
 +*** HIT 
 +*** frequency count ranking algorithm, FREQ,
 +* 研究結果
 +** SPEAR 較另兩種演算法,更能有效的區別出三種不同類型的Spammer
 +== Note ==
 +這篇文章在定義上,混同了 folksonomy 與 collaborative tagging 。這可能會有一些理論上的爭議,但若使用 folksonomy 是一種現象的定義則可。作者的 folksonomy 比較像是 collaborative-tagging-graph 。
 +== Metadata/​Backlinks ==