Skip to main content

Data dump: Meta killed CrowdTangle. What does it mean for researchers, reporters?

Data dump: Meta killed CrowdTangle. What does it mean for researchers, reporters?

By Joe Arney

In Brian C. Keegan’s telling, the loss of tools like CrowdTangle and Pushshift—which allow researchers to study user behavior and how information is shared on social media—is like particle physicists one day waking up to find out they can no longer access the Large Hadron Collider.

Headshot of Brian Keegan
“I have grad students interested in how online extremism works, the consequences of political polarization, whether content moderation is actually effective at stopping hate speech,” said Keegan, an assistant professor of information science at the College of Media, Communication and Information at the University of Colorado Boulder. “To be able to understand questions like these requires access to data from these platforms—and restricting it imperils our ability to be impactful in our work.”

Earlier this month, Meta announced it was shutting down CrowdTangle, one of the most effective tools for understanding how Facebook and Instagram’s algorithms work and how disinformation is created and spread on the company’s platforms.

That’s a blow to researchers, watchdogs and journalists who will be less able to track how disinformation, hate speech and other poisons pollute the social media atmosphere—but in the context of business decisions, there are strong financial and reputational benefits to obfuscating its operations. Not only is the platform sitting on mountains of data that can be licensed to companies building models to train generative artificial intelligence, Keegan said, “it’s easy to imagine a world where Meta doesn’t want its name attached to a paper about how neo-Nazis are using Facebook groups to organize themselves.”

The economic case for ‘privacy washing’

  “The loss of these data tools imperils our ability to do that kind of scholarship and is ultimately a detriment to democracy and civic institutions.”
Brian C. Keegan, assistant professor, information science

It’s becoming a more common story, as platforms that once made their data public are increasingly erecting paywalls, blocking APIs or cutting deals with A.I. companies. Often, those platforms mask their motivations behind what Keegan calls “privacy washing,” citing concerns about safeguarding user data in justifying the removal of key features for research labs, newsrooms and the public.

This particular example comes at an inauspicious time, with digital disinformation ratcheting up ahead of Election Day and more Americans than ever getting their news from social media.

“To address the challenges we’re up against, that are happening in real time, that we see journalists trying to grapple with, requires different models of publicly engaged scholarship, beyond just academic papers that take a year or two to publish,” Keegan said. “The loss of these data tools imperils our ability to do that kind of scholarship and is ultimately a detriment to democracy and civic institutions.”

It’s not just the media or public at large that are affected. When these tools are taken offline, it hurts the quality of the online communities, as well. Keegan has volunteered as a moderator on Reddit, and said PushShift—which Reddit limited access to beginning last summer—was vital to forming context about user behavior that could determine whether someone was having a bad day, or whether that person was truly a bad actor.

Classroom impact

That’s a challenge as a moderator, but it’s having a bigger impact on his professional life, both as a researcher and teacher. He can use case studies from the 2016 U.S. presidential election cycle to show how fake news circulated, and the role of actors like Cambridge Analytica, “but that data and those strategies are now eight years old, and those contexts no longer exist—we’re in a different world now,” Keegan said. “Can we prepare our students to be better engineers, managers, artists and citizens with such old case studies?”

Meta purchased Crowdtangle in 2016, and Keegan acknowledged that the tech platform isn’t required to make its data publicly available. “But researchers have built our careers, infrastructure and programs on assumptions that we’d have access to these tools, so to have that rug pulled from under us has been profoundly disruptive to our ability to provide transparency, engage and ask critical questions,” he said.

Keegan hopes to learn more through a grant he’s pursuing from the National Science Foundation. If awarded, he hopes to study the consequences of actions like Meta’s in the scientific research community.

“When that data disappears, how does that impact scholarship?” he asked. “Can we measure how research methods changing, the way we collaborate, the strategies we’ll need to develop to make sure we’re able to ask critical questions?”