If you’ve read any of my previous posts you know that I am constantly experimenting with different ways to represent and explore social network data with R. For example, in previous posts I’ve written about sonification of tweet data, animation of dynamic twitter networks, and various ways to plot social networks (here and here). In each case the underlying idea is finding different ways to explore data under the assumption that sometimes just looking at something from a different point of view reveals something novel. In this post I will briefly discuss how to go from data to 3D model network, to 3D object using R most of the way.Read More
Finally and for the first time, Facebook released a report that allegedly discloses the global government requests for data. The report details (among other things) the name of country, the number of requests and the percentage of disclosed data by Facebook.
It is NICE that Facebook finally publishes a report about government data request. OK, it is important!
Google for example has been doing that regularly since 2009 with the Google Transparency Report. A report that is more detailed and gives more information (as it should) to users.
“Transparency and trust are core values at Facebook ” the report says. But now after Snowden-Gate, we know that certain governments (say the US) have direct access to data from Facebook and other big companies.
Shouldn’t Facebook disclose information about this as well? How much data and what kind of data is extracted on a daily basis by governments (say the US)? Of course Facebook is not the issue. The same request can be made to Google, Yahoo, Apple, Microsoft or name it.
Publishing a report and pretending it is transparency, is a good way to mask relevant information that should be accessible to users.Read More
What does a tweet sound like? Not the kind that flies around in the air, but the kind that zips to and from our mobile devices. I’m intensely interested in finding ways to make sense of data. Sonification of data – representing data with sound – offers one way to do that. This post steps through R code to take the text of tweets and turn them into short chirping sounds. It also uses different tones for different users so that each user has a “voice”. In other words, this post shows how to use R to make Twitter data sing.Read More
Robert Mason, Shawn Walker and Jeff Hemsley participated in the University of Washington, Information School’s iAffiliates Day, “an event that fosters new partnerships and showcases the innovative work being done at the iSchool. The event is an unconference format with the theme of discovering information partnerships”. Participants give a two minute lightning talk intended to “enlighten, inspire, educate, or otherwise engage the audience” about a given topic. Jeff chose to “otherwise engage” the audience with a two minute rap about data visualization. Read more for the full text.Read More
One difficulty with which social media researchers grapple is the separation of “noise” from “signal.” Noise is traditionally those data that don’t contain relevance to a given query – in this case, tweets about Occupy Wall Street, or more specifically, Occupy Oakland. Occupy Oakland took on the hashtag “#oo” fairly early into the occupations, which has served to create a headache for those of us at the SoMe Lab. Continuing my exploration into topic modeling and taking inspiration from digital humanists and practices of ‘pataphysics, I decided to explore the noise to see what it contains. After all, one person’s noise is another’s signal! Click through to read about #oo, emotions, tokenization, and linguistic difference and to see our first topic modeling visualization — and figure out what Ducktales has to do with #ows!Read More
In the case of social media researchers, the situation today is that much of our research is like the flight data recorder: we collect, store, and report data and analyses, but we follow the dictum on the outside and “do not open” the box.
We’re discovering this is a mistake.
By keeping the black box closed, we can create a misleading impression when we report our research results. We inhibit others from replicating our findings or testing the limits of our results if we do not fully disclose the details of our processes. We may also miss the chance to ask research questions if we ignore the opportunities to explore the data by testing the sensitivity of our findings to changes in our research procedures. There are some things we can do from the outside—approaches borrowed from systems theory and systems analysis approaches—but all of us will improve our research as we make our methods more visible…as we open up the black box.
Let’s look at some examples. In conducting research with social media data, it’s helpful to think about the sequential ELT steps in data warehousing systems. In following these steps, we: Extract (data from streams or sources), Transform (the data by parsing it and including metadata that enable us to address our research questions), and Load (the transformed data into an accessible dataset). And these are just the first steps—before we begin our analysis. At each step, small variations in the procedures or rules we use can result in significant shifts to our later findings, to the questions we are capable of answering, and even to questions we can imagine asking. For example, suppose we want to do an analysis of Twitter messages. In extracting Twitter data, do we use the Twitter API? If so, do we collect the data in real time (streaming API) or do we employ queries (search API), getting some retrospective tweets? If we opt not to use the API, we could use one of several developer-based or commercial services (e.g., Gnip) to get our data, but can we afford it? Each may have advantages, but the samples that result from each may be different. If the samples differ, can we be confident in our research results in each case?Read More
In The Practice of Everyday Life, Certeau describes the process of “walking the city,” noting that the ways in which people experience the city are qualitatively different than what urban planners and sociologists are capable of measuring. I argue that this process of “walking a space” can be applied to the spaces of social media as well, particularly in regard to the spaces of discourse created by emergent hashtags. I’m also playing with MALLET, a tool for Latent Dirichlet Allocation (LDA) topic modeling for “big data” texts. I’m just getting started in the process of learning some of the computational tools needed for performing these “distant readings,” but already I’ve discovered ways in which “walking the data” might inform our practice as researchers. Click through to read an explanation of what I mean, an example or two of MALLET topic output, and how my own experience of “walking the data” as a lived event informs the analysis.Read More