Using R to Visualizing Information Flows on Wikipedia Talk Pages

Posted by on Mar 7, 2012 in Data Visualization, Information Visualization, R, r-project | 3 comments

Wikipedia talk pages allow editors to discuss the evolving content on related Wikipedia articles. Sometimes the topic of a page is controversial and the talk page threads can become heated with different posts invoking a wide range of values in the kinds of appeals they use in their arguments. For example, in one thread you could have someone arguing that it is morally wrong to expose people to specific content, but others may argue in favor of posting the content on the grounds that Wikipedia’s mission is to provide free access to information. But as a social scientist interested in visualizing information flows, the question is: how do you visualize the change in time of overall thread volume and posts per-thread, while also capturing threads that are rich in valued appeals?

In this case I used a stacked spline approach where the overall height gives you an idea of both the total threads and the total posts on any given day. For example,between the 2nd and 5th discussion on the talk page was heaviest, and then started to diminish. We can also see that it was the most active time in terms of the number threads that people were posting too. To get an idea of the number of posts per thread, notice the vertical distance between the spline curves. Around the 19th a large red bulge indicates that one thread received nearly all the posts for that day, and indeed probably the most nearly-simultaneous posts of all the threads we looked at.

We capture value richness by finding the number of appeals for a thread and dividing it by the number of posts. This is an imperfect measure for a number of reasons. First, you can have many different types of arguments (appeals) in one post. If we had a thread with 7 posts, only one of which contained appeals, and it contained 7 appeals, we would have a density of 1. We would think we have a “hot topic” when in fact, we may just have one person being argumentative.  But when we add thread labels, we can look at this and get an idea of which threads might be most interesting to look at.

This graph was generated using the open source analysis program R with no special packages being used. Let me know if you have questions. I’ll be happy to post example code if there is interest!

 
That’s all for now. You can contact me on Twitter @JeffHemsley. Happy to answer any questions.
 

3 Comments

  1. 1-15-2013

    Could you?? Post sample code? This looks great!!!

    • 1-15-2013

      I am actually working on that right now! The data for it is not something I want to share, so I’m working up an example using randomized data (but that hopefully looks similar).

      • 1-16-2013

        Ok. I have most of the code in order and will write the post in a week (I have another in the pipe before it).

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>