Enabling Online Safety
through rigorous academic research

As part of The Alan Turing Institute’s public policy programme we provide objective, evidence-driven insight into the technical, social, empirical and ethical aspects of online safety, supporting the work of policymakers and regulators, informing civic discourse and extending academic knowledge.
We are working to tackle online hate, harassment, extremism and mis/disinformation.

To achieve our goal of providing evidence-driven insight into all aspects of
online safety we have three core workstreams

Data-Centric AI

Building cutting-edge tools and critically examining technologies to create a step-change in the use of AI for online safety.

Online Harms Observatory

Mapping the scope, prevalence, impact and motivations behind content and activity that could inflict harm on people online.

Policymaking for Online Safety

Working to understand the challenges faced to ensure online safety, and supporting the creation of ethical and innovative solutions.

Where we are and where we are headed

This year we are producing academic research, creating dashboards, curating open source resources, writing policy reports and much more! Curious to find out how we are helping to solve online safety, or would you like to get involved? Watch our video introducing our ongoing projects and reach out to us at onlinesafety@turing.ac.uk.

Online Safety Research

Our most recent publications are listed below. Click to read the abstract, or see our full list of publications here.

Kirk, Vidgen and Hale (2022) – Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning [COLING]

Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm. However, most machine learning research has prioritized maximizing effectiveness (i.e., F1 or accuracy score) rather than data efficiency (i.e., minimizing the amount of data that is annotated). In this paper, we use simulated experiments over two datasets at varying percentages of abuse to demonstrate that transformers-based active learning is a promising approach to substantially raise efficiency whilst still maintaining high effectiveness, especially when abusive content is a smaller percentage of the dataset. This approach requires a fraction of labeled data to reach performance equivalent to training over the full dataset.

Radical right actors routinely use social media to spread highly divisive, disruptive, and anti-democratic messages. Assessing and countering such content is crucial for ensuring that online spaces can be open, accessible, and constructive. However, previous work has paid little attention to understanding factors associated with radical right content that goes viral. We investigate this issue with a new dataset (the ‘ROT’ dataset) which provides insight into the content, engagement, and followership of a set of 35 radical right actors who are active in the UK. ROT contains over 50,000 original entries and over 40 million retweets, quotes, replies and mentions, as well as detailed information about followership. We use a multilevel model to assess engagement with tweets and show the importance of both actor- and content-level factors, including the number of followers each actor has, the toxicity of their content, the presence of media and explicit requests for retweets. We argue that it is crucial to account for role of actors in radical right viral tweets, and therefore, moderation efforts should be taken not only on a post-to-post level but also on an account level.

Textual data can pose a risk of serious harm. These harms can be categorised along three axes: (1) the harm type (e.g. misinformation, hate speech or racial stereotypes) (2) whether it is elicited as a feature of the research design from directly studying harmful content (e.g. training a hate speech classifier or auditing unfiltered large-scale datasets) versus spuriously invoked from working on unrelated problems (e.g. language generation or part of speech tagging) but with datasets that nonetheless contain harmful content, and (3) who it affects, from the humans (mis)represented in the data to those handling or labelling the data to readers and reviewers of publications produced from the data. It is an unsolved problem in NLP as to how textual harms should be handled, presented, and discussed; but, stopping work on content which poses a risk of harm is untenable. Accordingly, we provide practical advice and introduce HARMCHECK, a resource for reflecting on research into textual harms. We hope our work encourages ethical, responsible, and respectful research in the NLP community.

Online Safety Resources

We aim to maximise the accessibility of our work. Below is a collection of tools and resources can be used to monitor, understand and counter online hate.

Online Hate Research Hub

This ongoing project collates and organises resources for research and policymaking on online hate. These resources aim to cover all aspects of research, policymaking, the law and civil society activism to monitor, understand and counter online hate. Resources are focused on the UK, but include international work as well.

Catalogue of datasets annotated for Hate Speech

We have catalogued 50+ datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language. The catalogue includes datasets in 15 languages, including Arabic, Danish, English, French, Hindu-English, Indonesian and Turkish.

Online Harms Observatory

The Online Harms Observatory is a new platform which will provide real-time insight into the scope, prevalence and dynamics of harmful online content. The observatory will help policymakers, regulators, security services and other stakeholders better understand the landscape of online harms.