Telling experts from spammers: Expertise ranking in folksonomies / Noll & Yeung (2009)

Citation - Noll, M. G., & Yeung, A. (2009). Telling experts from spammers: Expertise ranking in folksonomies.
區別專家與灌水者:以群眾分類學進行專業評比

Keyword - folksonomy, social network analysis

  • 研究對象: del.icio.us 網路社群當中的網路資源 tagging 行為
    • (行為) tagging: Freely annotating resources with keywords
    • (行為的目的): self organizing resources, sharing, self-promotion,…
    • (協同標記平台): 讓網友們自己進行資源關鍵詞註記的網路服務與社群平台 (e.g., delicious.com, flickr.com )
      • (平台上的利用行為): 搜尋相關資源(relevant resources), 搜尋相關專家(experts in particular domain)
    • (行為導致的現象)tagging result phenomena : bottom-up “categorization” by end users, aka “folksonomy”
  • 研究問題現況: 目前的排序只能根據數量與頻率, 無法區分專業性標記與大量灌水性標記行為
  • 研究目標: 設計新的演算法 SPEAR (SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]), 此方法區分專業家與灌水者, 進而改善搜尋的相關性。
  • 設計原則(研究假定): 使用者在特定主題的專業性程度, 主要取決於:
    We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors: (1) there should be a relationship of mutual reinforcement between the expertise of a user and the quality of a resource; and (2) an expert should be one who tends to identify useful resources before other users discover them.
    1. (1)越專業的人與所分享資源的品質越好;
      Mutual reinforcement of user expertise and document quality: Expert users tend to have many high quality documents, and high quality documents are tagged by users of high expertise.
    2. (2)專家比其他人更早發現有用的資源
      Discoverers vs. followers: Expert users are discoverers – they tend to be the first to bookmark and tag high quality documents, thereby bringing them to the attention of the user community. Think: researchers in academia.
  • (研究設計)演算法設計: graph-based algorithm (網絡關係為基礎的演算法)
    • 根據在 IR 研究中, 以專家辨識作為改善檢索相關性的相關研究成果。這類似引文分析的作法。
  • (研究檢驗分析) : We carry out experiments on both simulated and real-world data sets obtained from Delicious, and show that SPEAR is able to detect the difference between different types of experts, and is more resistant to spammers than other methods.
  • SPEAR – SPamming-resistant Expertise Analysis and Ranking [防灌水專業性分析排序法]
    • 基於[超文本連入主題搜尋]演算法 Based on the HITS (Hypertext Induced Topic Search) algorithm
      • Hubs [樞紐]: 指向許多品質優良頁面的頁面 pages that points to good pages
      • Authorities [權威來源]: 被許多優良頁面連結的頁面 pages that are pointed to by good pages
    • 專業性(Expertise)與品質(Quality)的概念類似於樞紐與權威
      • 專家是樞紐 Users are hubs – we find useful pages through them
      • 品質優良的頁面是權威 Pages are authorities – provide relevant information
    • 不同之處: 只有使用者(專家)可以指向文件(權威來源),而不能反轉這種關係。
  • 演算法
  • 實驗設計 Experimental
    • 在真實世界系統中,放入模擬的使用者 Workaround: Inserting simulated users into real-world data from Delicious.com and check where they end up after ranking
    • 比較 Delicious.com 中 50 tags ,當中包含了 515,000 真的使用者、71,300 實際上的頁面、2,190,000 實際上的書籤
    • 模擬使用者的變項 Probabilistic simulation, simulated users generated with four parameters
      • P1: 使用者收錄的書籤數量 Number of user’s bookmarks – active or inactive user?
      • P2: 網頁的新穎性 Newness – fraction of Web pages not already in data set
      • P3: 使用者收錄網頁的時間偏好 Time preference – discoverer or follower?
      • P4: 網頁的品質 Document preference – high quality or low quality?
    • 區分六種不同使用者類型
      • 技客 Geek – 收錄大量高品質網頁,發掘者(跨領域研究者)
        lots of high quality documents, discoverer Distinguished Researcher)
      • 老鳥 Veteran – 收錄高品質網頁,發掘者(教授)
        high quality documents, discoverer (Professor)
      • 菜鳥 Newcomer – 收錄高品質網頁,跟隨者(博士生)
        high quality documents, follower (PhD student)
      • 氾濫 Flooder – 隨機的收錄大量網頁,跟隨者
        lots of random documents, follower (found in Delicious)
      • 促銷者 Promoter – 主要收錄自己的網頁,發掘者
        some documents (most are his own), discoverer (found in Delicious)
      • (鄉民)特洛伊人 Trojan – 收錄少數網頁,跟隨者
        some documents, follower (next-gen spammer)
    • 比較三種不同演算法的成效
      • SPEAR
      • HIT
      • frequency count ranking algorithm, FREQ,
  • 研究結果
    • SPEAR 較另兩種演算法,更能有效的區別出三種不同類型的Spammer

Note

這篇文章在定義上,混同了 folksonomy 與 collaborative tagging 。這可能會有一些理論上的爭議,但若使用 folksonomy 是一種現象的定義則可。作者的 folksonomy 比較像是 collaborative-tagging-graph 。