Published: July 14, 2016

When disaster strikes, those affected often turn to social media to request aid, offer assistance, or share other information in real time. In recent years, data scientists have begun analyzing millions of Facebook posts and tweets in order to study the collective response before, during and after a crisis. 

In the face of this mountain of information, however, it can be hard to identify the most relevant posts and trends. But thanks to a close collaboration between social science and software engineering, University of Colorado Boulder researchers Leysia Palen and Kenneth Anderson are innovating new ways to find the underlying human behaviors hidden within noisy data.

“The trick is understanding the potential of large-volume social media information along with its limits,” says Palen, chair of the Department of Information Science in the College of Media, Communication and Information at CU Boulder. “Just because we have a lot of data doesn’t mean that we have all the answers.”

Palen’s research centers on “crisis informatics,” a relatively new interdisciplinary field that investigates how people use technology to communicate and coordinate during natural disasters and other upheavals. The work combines intensive qualitative and quantitative analysis to sort through the staggering amounts of data posted to social media accounts in the wake of a crisis.

“In these disaster scenarios, the volume of information is far too large to sift through manually,” says Anderson, a professor of computer science and associate dean for education at CU Boulder’s College of Engineering and Applied Science. “The Hurricane Sandy data set from 2012, for example, consisted of approximately 220 million tweets and we used that core data set to study many aspects of that disaster.”

In an article recently published in the journal Science, Palen and Anderson write that data sets often have to expand (e.g. by incorporating a person’s last hundred tweets rather than just the one they tweeted with a keyword of interest) before they can be sampled and studied accordingly.  In other words, the haystack has to get bigger in order to truly locate the needle.

The reason for that, says Palen, is to fill in contextual gaps in the data. Researchers can hone in on relevant posts by searching for particular terms, but can often miss other important information that way. A person might tweet once about “Hurricane Sandy,” for instance, but then continue their train of thought across several subsequent tweets without mentioning “hurricane” or “sandy” again. A keyword search would overlook that potentially crucial follow-up.

The researchers note that viewing the data through a social science lens can shine a light on interesting corners of disaster response, such as the people who donated cell phone minutes to first responders on the ground during the 2010 Haiti earthquake, or those who set up a Facebook group to help families reunite with lost pets after Hurricane Sandy. A series of geotagged posts might identify people who have evacuated, even if they never use the word “evacuate” in any of their posts.

“You might start down the road with one question in mind and then realize halfway through that the more interesting behavioral aspect is completely different, so then you recalibrate,” says Anderson.

The researchers have also found that social media analysis without a social science context tends to erroneously smooth over key differences between various types of disasters, such as a hurricane versus a terrorist attack.

“One big mistake is conflating different types of disasters and assuming that the social media response will be the same,” says Palen. “The way that people interact online is very different for different events, which is especially important to know for law enforcement purposes.”

Parsing this amount of data will always be challenging, notes Anderson, for a variety of reasons that include a relative lack of geotagged tweets and posts (due to default platform settings); people using older versions of social media apps (which send information in disparate formats); and identifying a truly representative sample of the population involved in any given disaster.

Going forward, Palen and Anderson plan to continue improving the predictive capabilities of their model in order to shrink the time window between the event and the social media analysis in order to help public safety officials and emergency response personnel respond faster and more effectively.

Overall, both researchers agree that those affected by disasters demonstrate consistent innovation and resourcefulness.

“I’m constantly surprised by people’s ingenuity in these situations,” says Palen. “You can really see people coordinating volunteer efforts and inventing solutions to problems in real time. That’s how you really know that people are altruistic, smart, and always trying to help themselves and each other in times of need.”