The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data

Item

Title

The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data
CHI '17

Creator

Eshwar Chandrasekharan
Mattia Samory
Anirudh Srinivasan
Eric Gilbert

Abstract

Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC)---a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy---no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100,000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.

Date

2017

Is Part Of

Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

Publisher

New York, NY, USA
ACM

pages

3175–3187

Language

EN English

doi

10.1145/3025453.3026018

isbn

978-1-4503-4655-9

short title

The Bag of Communities

uri

Item sets