By Dr. Vidya Narayanan, et.al.*
Postdoctoral Researcher
Computational Propaganda Project
Oxford Internet Institute

Abstract

What kinds of social media users read junk news? We examine the distribution of the most significant sources of junk news in the three months before President Donald Trump’s first State of the Union Address. Drawing on a list of sources that consistently publish political news and information that is extremist, sensationalist, conspiratorial, masked commentary, fake news and other forms of junk news, we find that the distribution of such content is unevenly spread across the ideological spectrum. We demonstrate that (1) on Twitter, a network of Trump supporters shares the widest range of known junk news sources and circulates more junk news than all the
other groups put together; (2) on Facebook, extreme hard right pages—distinct from Republican pages—share the widest range of known junk news sources and circulate more junk news than all the other audiences put together; (3) on average, the audiences for junk news on Twitter share a wider range of known junk news sources than audiences on Facebook’s public pages.

Polarization on Social Media

Social media has become an important source of news and information in the United States. An increasing number of users consider platforms such as Twitter and Facebook a source of news. At important moments of political and military crises, social media users not only share substantial amounts of professional news, but also share extremist, sensationalist, conspiratorial, masked commentary, fake news and other forms of junk news.[1,2]

News on social media also reaches users indirectly, when they browse social media for other
purposes. With more than 2 billion monthly active users, Facebook is the most popular social media network. The Reuters Digital News Report 2017 finds that 71% of US respondents are on Facebook, with 48% of US respondents using it for news.[3]

Given the central role that social media play in public life, these platforms have become a target
for propaganda campaigns and information operations. In its review of the recent US elections,
Twitter found that more than 50,000 automated accounts were linked to Russia.[4] Facebook has revealed that content from the Russian Internet Research Agency has reached 126 million US citizens before the 2016 presidential election.[5] Adding to reports about foreign influence campaigns, there is increasing evidence of a rise in polarization in the US news landscape in response to the 2016 election. Trust in news is strikingly divided across ideological lines, and an ecosystem of alternative news is flourishing, fueled by extremist, sensationalist, conspiratorial, masked commentary, fake news and other forms of junk news. At the same time, legacy publishers like the New York Times and the Washington Post have reported an increase in subscriptions.

Social media algorithms can be purposefully used to distribute polarizing political content and
misinformation. Pariser’s claim is that filter bubble effects—highly personalized algorithms that select what information to show in news feeds based on user preferences and behavior—have polarized public life.[6] Vicario et al. find that misinformation on social media spreads among homogeneous and polarized groups.[7] In January 2018, Facebook announced changes to its algorithm to prioritize trustworthy news, responding to ongoing public debate as to whether its algorithms promote junk content.[8] Consequently, social polarization is a driver—just as much as it may be a result—of polarized social media news consumption patterns.

In this study, we present a three-month study of junk news and political polarization among groups of US Twitter and Facebook users. In particular, we examine the distribution of posts and comments on public pages that contain links to junk news sources, across the political spectrum in the US. We then map the influence of central sources of junk political news and information that regularly publish content on hot button issues in the US. In particular, we consider patterns of interaction between accounts that have (i) shared junk news, (ii) and that have engaged with users who disseminate large amounts of misinformation about major political issues.

Social Network Mapping

Visualizing social network data is a powerful way of understanding how people share information and associate with one another. By using selected keywords, seed accounts, and known links to particular content, it is possible to construct large network visualizations. The underlying networks of these visualizations can then be examined to find communities of accounts and clusters of association. These clusters of accounts and content can then be
coded with political attributes based on knowledge of account history, content type, association metrics and social interaction between accounts.

These social network maps provide insight into both social structure and flow of information. In
this study, we use the Graphika visualization suite to map and code accounts that are associated with prominent political accounts, topics, political affiliations, and geographical areas. Social network mapping also allows us to catalogue users and content, and generate both descriptive statistics and statistical models that explain changes in network structure and therefore things like information flow over time.

Social network maps comprise nodes representing the individual accounts, which are
connected to other nodes in the map via social relationships. A Fruchterman–Reingold visualization algorithm can be used to represent the patterns of connection between these nodes.[9] It arranges the nodes in a visualization through a centrifugal force that pushes nodes to the edge and a cohesive force that pulls strongly connected nodes together. This mapping process produces focused “segments” of users who share very similar and specific kinds of
content with each other. Segments that share some content with each other are aggregated into “groups”.

The nodes in a network may all belong to a group with a shared pattern of interests. These groups can be constructed from a number of geographically, culturally, or socially similar segments. For example, segments of House Democrats, Democratic Party, Left-leaning NGOs, Liberal and anti-GOP pages, and Liberal Memes could be collectively labeled as a “Democratic Party Group”. This method of segmenting users, coding groups, and generating broad observations about association is an iterative process drawing on qualitative, quantitative and
computational methods. These are run many times over a period of time to identify stable and consistent communities in a network of social media users.

To create a map of segments and groups, we use a bipartite graph to provide a structural similarity metric between nodes in the map, which is used in combination with a clustering algorithm to segment the map into distinct communities. For this study, hierarchical agglomerative clustering was used to automatically generate segments and groups from sampled data (see online supplement for details).

Different social media platforms have their own unique attributes that are effective in identifying
communities that persist over time. For instance, clustering Twitter users by following and follower relationships yields much more stable communities than clustering by mention or retweet relationship. Likewise, clustering Facebook users by the “like” relationship yields similarly stable results. Therefore, for this study, we have used these attributes to generate maps of stable clusters on Twitter and Facebook.

The outputs of this clustering algorithm have been extensively tested by others in studies of social media maps from Iran, Russia and the United States.[2,10,11] After clustering, the map-making process uses supervised machine learning techniques to generate labels for segments and groups from a training set labeled by human experts. After these labels are assigned, they are then manually verified and checked for accuracy and consistency.

Study Sample and Method

For this study, a seed of known propaganda websites across the political spectrum was used, drawing from a sample of 22,117,221 tweets collected during the US election, between November 1-11, 2016. (The full seed list is in the online supplement and available as a standalone spreadsheet.) We identified sources of junk news and information, based on a grounded typology. Sources of junk news deliberately publish misleading, deceptive or incorrect information purporting to be real news about politics, economics or culture. This content includes various forms of extremist, sensationalist, conspiratorial, masked commentary, fake news and other forms of junk news. For a source to be labeled as junk news it must fall in at least three of the following five domains:

Professionalism: These outlets do not employ the standards and best practices of professional
journalism. They refrain from providing clear information about real authors, editors, publishers and owners. They lack transparency, accountability, and do not publish corrections on debunked information.

Style: These outlets use emotionally driven language with emotive expressions, hyperbole, ad hominem attacks, misleading headlines, excessive capitalization, unsafe generalizations and fallacies, moving images, graphic pictures and mobilizing memes.
Credibility: These outlets rely on false information and conspiracy theories, which they often employ strategically. They report without consulting multiple sources and do not employ fact-checking methods. Their sources are often untrustworthy and their standards of news production lack credibility.
Bias: Reporting in these outlets is highly biased and ideologically skewed, which is otherwise described as hyper-partisan reporting. These outlets frequently present opinion and commentary essays as news.
Counterfeit: These outlets mimic professional news media. They counterfeit fonts, branding and stylistic content strategies. Commentary and junk content is stylistically disguised as news, with references to news agencies, and credible sources, and headlines written in a news tone, with bylines, date, time and location stamps.

Sources of junk news were evaluated and reevaluated in a rigorously iterative coding process. A team of 12 trained coders, familiar with the US political and media landscape, labeled sources of news and information based on a grounded typology. The Krippendorff’s alpha value for inter-coder reliability among three executive coders, who developed the grounded typology, was 0.805. The 91 sources of political news and information, which we identified over the course of several years of research and monitoring, produce content that includes various forms of propaganda and ideologically extreme, hyper-partisan, and conspiratorial political
information. We tracked how the URLs to these websites were being shared over Twitter and Facebook (see online supplement for details).

Specifically, we computed the coverage and consistency scores for each group. Coverage of a
group refers to the percentage of all propaganda domains identified in our junk news sources list that a group posted links to. The Consistency of a group refers to the percentage of the total of number of links to all the propaganda domains identified in our junk news sources list, that is shared by the group. A high value for coverage shows that the group is sharing a wide range of propaganda, while a high value for consistency shows that the group is playing a key role in the spreading of such propaganda. Coverage and consistency scores were calculated from the number of links shared from the groups to the junk news sources.

Finding: Polarization and Junk News on Twitter

Our Twitter dataset contains 13,477 Twitter users collected during a 90-day period between October 20, 2017 and January 18, 2018. To study the polarization among US audience groups on Twitter, we first identified the accounts of Democratic and Republican party members, at both state and national levels. Further, we identified Twitter accounts of members of congress from both parties. Next, we included all the followers of these accounts in our dataset. We
identified a follower network of 93,711 Twitter accounts. We then reduced this sample of Twitter
users to a set of well-connected accounts using a variant of k-core reduction (see online supplement for details).[12] This reduced the dataset to 13,477 Twitter users. Finally, we collected all Twitter users followed by any account in the reduced set of Twitter users, in order to segment this set into communities of interest.

We used Twitter’s REST API to collect publicly available data for our analysis. Twitter’s
REST API provides data on a) who follows whom on Twitter (100% of all data), and b) recent tweets for each user (up to 3,200 tweets per user in reverse chronological order).

Twitter’s APIs give access only to public data and do not provide any information about
suspended accounts or users who set their accounts private. The latter limitation is not a concern here, given that 100% of Twitter users in this study have public accounts.[13]

We were able to group our sample of 13,477 user accounts into 10 groups of affiliation. The groups emerged through network association, and by interpretation of the kinds of content these users distributed and indicated as a “favorite”. Table 1 identifies the main groupings of US Twitter users sampled, as labelled by our iterative machine-learning process and expert manual review.

From Table 1, we see that the Trump Support Group has a coverage of 96%, indicating that
those pages share the widest range of junk sources on Twitter. This is followed by the Conservative Media Group, with a coverage of 95%. We also see from Table 1 that the Trump Support group, with a consistency score of 55%, contributes more to the spreading of junk news, compared to all other groups put together.

Next, we calculated a heterophily score for each combination of group pairings. This is a measure of the connections between groups in a network, where a ratio is calculated of the actual ties between two groups, compared with the expected number of ties between them, if all the ties in the network were distributed evenly. We calculate ties for groups on Twitter from follower accounts and accounts followed, and Facebook ties from page likes. The natural log of the ratios is then taken along with a zero correction to create a balanced index. A high heterophily score between groups indicates more connections between the two groups. A high
heterophily score for a group to itself indicates a high number of within-group connections. It is important to note however that these scores indicate only first order (direct) connections between groups, and not second, third, or higher-order (indirect) connections. These values are shown in Table 2.

From Table 2, we see that the Democratic Party Group and the Mainstream Media Group have a heterophily index of 1.7, indicating a deep connection between the two groups. A heterophily score of 1.0 would indicate a perfectly neutral level of connection between groups; less than 1.0 would indicate a lack of connection. Similarly, we see that the Republican Party Group shares a heterophily index of 1.6 with the Conservative Media Group, indicating strong interactions between them. The Democratic Party also shares a high heterophily index of 1.9 with the
Progressive Movement Group, demonstrating significant interaction. The Mainstream Media Group also shares a high heterophily score with both the Progressive Movement (1.5), and the Resistance (1.2) Groups. The Republican Party and Trump Supporters share a heterophily score of 1.4, also indicating a strong connection between them.

Figure 1 is a basic visualization of the 10 groups on Twitter. The size of each group is
determined by the number of Twitter accounts that belong to it (see Table 1). The connections between the groups in the figure are computed using the heterophily scores (see Table 2). The width of the line linking groups in the figure, represents the strength of connection between them.

Finding: Polarization and Junk News on Public Facebook Pages

We mapped the public Facebook pages by combining: 1) harvested Facebook public page seeds from political tweets shared during the US election and a snowball sample of the wider Facebook network around these key online interest groups; 2) a snowball sample of all the Facebook pages associated with party Twitter accounts considered for the Twitter study; 3) iteration of clear US Liberal and Conservative clusters from previous US political maps on Facebook.

This resulted in a dataset of 47,719 public Facebook pages. We then reduced this sample to a set of well-connected pages using a variant of k-core reduction (see online supplement for details) From this reduced dataset of 10,691 pages, we collected all posts from the 90 days between October 20, 2017 and January 19, 2018, using the Facebook Graph API. We extracted all URLs from posts, and analyzed the pattern of web citations across the major groupings we identified in the US news ecosystem on Facebook. Additionally, we collected the share counts for all posts containing the identified URLs from our seed list in order to measure the degree to which junk news content from various sources is shared across the Facebook network. This value includes shares that occur on private pages.

Table 3 identifies the main groupings of the US Facebook pages sampled. The Facebook groups were identified by following the same procedure that we used for the Twitter dataset.

From the coverage and consistency scores in Table 3, we see that the Hard Conservatives Group has a coverage score of 91%, followed by the Military and Guns Group at 45% and then the Conspiracy Group and Democrats Group at 40%. The Hard Conservatives Group also has a consistency score of 58%, indicating that this group has a greater share in the distribution of junk news on Facebook than all the other groups put together.

The heterophily scores for each pair of Facebook groups is shown in Table 4. We see that the
heterophily score between the Conspiracy Group and almost all other groups is less than 1.0, indicating a low level of social interaction. The two key exceptions are the Libertarians Group at 2.5 and the Occupy Group at 1.0. These scores show that the Conspiracy Group is most connected to the fringes of the US political spectrum. Further, we observe that the Hard Conservative and the Libertarian Groups also interact closely with each other (heterophily score of 2.2).

Figure 2 is a basic visualization of the 13 groups on Facebook. The size of each group is
determined by the number of Facebook pages that belong to it (see Table 3). The connections between the groups in the figure are computed using the heterophily scores (see Table 4). The width of the lines linking groups in the figure represents the strength of connection between them.

Conclusions

On Twitter, the Trump Support Group shares 95% of the junk news sites on the watch list, and accounted for 55% of junk news traffic in the sample. Other kinds of audiences shared content from these junk news sources, but at much lower levels. On Facebook, the Hard Conservative Group shares 91% of the junk news sites on the watch list, and accounted for 58% of junk news traffic in the sample. The coverage and consistency scores for Facebook and Twitter reveal some important features of these platforms when it comes to junk news circulation. The average coverage score for the major audiences of junk news on Twitter and Facebook is 54 and 33, respectively. This means that on average, groups of Twitter share 54% of the junk news watch list and groups of Facebook users share 33%.

The social networks mapped from public Twitter and Facebook data show that the junk political
news and information was concentrated among Trump’s supporters. The two main political parties, Democrats and Republicans, prefer different sources of political news, with limited overlap. For instance, the Democratic Party shows high levels of engagement with mainstream media sources and the Republican Party with Conservative Media Groups. On Twitter in particular, the Democratic Party have interacted closely with the Progressive Movements Group, suggesting a broad intersection of interests. On Facebook, most connections between groups conform to the partisan polarization found on Twitter. We also find close interactions between the Occupy Group and the Conspiracy Group.

References

Howard, P. N., Kollanyi, B., Bradshaw, S. & Neudert, L. M. Social Media, News and
Political Information during the US Election: Was Polarizing Content Concentrated in Swing States? (Oxford Internet Institute, Oxford University, 2017).
Barash, V., Howard, P. N., Kelly, J. & Kollanyi, B. Junk News on Military Affairs and National Security: Social Media Disinformation Campaigns Against US Military Personnel and Veterans. (Oxford Internet Institute, Oxford University, 2017).
Newman, N., Fletcher, R., Kalogeropoulos, A., Levy, D. & Kleis Nielsen, R. Reuters Institute Digital News Report 2017. (Reuters Institute, Oxford University, 2017).
Twitter Policy. Update on Twitter’s Review of the 2016 U.S. Election. (2018). Available at: https://blog.twitter.com/official/en_us/topics/company/2018/2016-election-update.html. (Accessed: 30 January 2018)
Isaac, M. & Wakabayashi, D. Russian Influence Reached 126 Million Through Facebook Alone. The New York Times (2017)
Hempel, J. Eli Pariser Predicted the Future. Now He Can’t Escape It. WIRED (2017). Available at: https://www.wired.com/2017/05/eli-pariserpredicted-the-future-now-he-cant-escape-it/.
Vicario, M. D. et al. The Spreading of Misinformation Online. PNAS 113, 554–559 (2016).
Bond, S. Facebook to prioritise news rated trustworthy by users. Financial Times (2018). Available at: https://www.ft.com/content/2d303634-fd65-11e7-9b32-d7d59aace167. (Accessed: 20 January 2018)
Fruchterman, T. M. & Reingold, E. M. Graph drawing by force-directed placement. Software: Practice and experience 21, 1129–1164 (1991).
Kelly, J. & Etling, B. Mapping Iran’s Online Public: Politics and Culture in the Persian Blogosphere. 1–36 (Berkman Center for Internet & Society, 2008).
Kelly, J. and Barash, V. and Alexanyan, K. and Etling, B. and Faris, R. and Gasser, U. and Palfrey, J., Mapping Russian Twitter. (Berkman Center for Internet & Society, 2012) SSRN: https://ssrn.com/abstract=2028158
Alvarez-Hamelin, J. I., Dall’Asta, L., Barrat, A. & Vespignani, A. k-core decomposition: A tool for the visualization of large scale networks. arXiv preprint cs/0504107 (2005).
Udani, G. An Exhaustive Study of Twitter User Across the World. (Beevolve Technologies, 2012).

*: Co-Authors:

Dr. Vladimir Barash: Science Director, Graphika
Dr. John Kelly: Co-Founder and CEO, Graphika
Bence Kollanyi: PhD Candidate in Sociology, Corvinus University
Lisa-Maria Neudert: DPhil Student, Oxford University
Dr. Philip N. Howard: Professor of Internet Studies, Oxford University

Originally published by the Oxford Internet Institute, reprinted with permission for non-commercial, educational purposes.

Matthew McIntosh

Administrator

Visit Website View All Posts

Related Stories

Propaganda and Lies: How Religious Fundamentalists Use Power

No Lie: AI Fact-Checking Turns against Trump

The Willing Believers: A Modern History of Supporting Leaders Known to Lie