Monitoring Stance towards Vaccination in Twitter Messages

Wed, 02/19/2020 - 07:51

0 comments

Affiliation

Radboud University (Kunneman, Van den Bosch); Dutch National Institute for Public Health and Environment (Lambooij, Wong, Mollema); KNAW Meertens Institute (Van den Bosch); Vrije Universiteit Amsterdam (Kunneman)

Date

Tue, 02/18/2020 - 12:00

Summary

"Such a system makes it possible to monitor the ongoing stream of messages on social media, offering actionable insights into public hesitance with respect to vaccination."

In the light of increased vaccine hesitancy in various countries, consistent, real-time monitoring of public beliefs and opinions, in the form of unsolicited, voluntary user-generated content - that is, social media data about vaccination - may enable detection and response to possible vaccine concerns in a timely manner. These researchers developed a system to automatically classify stance towards vaccination in Twitter messages, with a focus on messages with a negative stance. While systems are already in place that make use of automatic coding of tweets, a system for monitoring vaccination stance on Twitter should ideally be trained and applied to tweets with a similar language and nationality. In this case, the context was Dutch Twitter messages.

In short, the researchers set out to curate a corpus of tweets annotated for their stance towards vaccination from January 1 2012 until February 8 2017 and to employ this corpus to train a machine learning (ML) classifier to distinguish tweets with a negative stance towards vaccination from other tweets. They describe this process. In short, after filtering, 27,534 messages were left; this is the data set that was used for experimentation. They annotated the stance and feeling in relation to vaccination of 8,259 tweets that mention a vaccination-related keyword were annotated (6,472 were annotated twice). Subsequently, they used these coded data to train and test different machine learning set-ups. With the aim to best identify messages with a negative stance towards vaccination, they compared set-ups at an increasing dataset size and decreasing reliability, at an increasing number of categories to distinguish, and with different classification algorithms.

Two different classifiers were compared: Multinomial Naive Bayes and Support Vector Machines (SVM). The researchers found that SVM "trained on a combination of strictly and laxly labeled data with a more fine-grained labeling yielded the best result, at an F1-score of 0.36 and an Area under the ROC [receiver operating characteristic] curve of 0.66..." Further, the recall of the system "could be optimized to 0.60 at little loss of precision."

ML and rule-based sentiment analysis are two diverging approaches to detecting the stance towards vaccination on Twitter. Pattern is a rule-based off-the-shelf sentiment analysis system that makes use of a list of adjectives with a positive or negative weight, based on human annotations. The researchers compare ML and rule-based sentiment analysis. To make this difference concrete, they present a selection of the messages predicted as Negative by both systems in Table 9 of the paper. The first 3 are only predicted by the best ML system as Negative, and not by Pattern, while the fourth through the sixth examples are only seen as Negative by Pattern. (The use of sarcasm, where typically positive words are used to convey a negative valence, complicates the task of stance prediction.) On the whole, ML (that is, the SVM) considerably outperformed sentiment analysis, with an optimal F1-score of 0.25 and an Area under the ROC curve of 0.57.

The outcomes of the study indicate that measuring stance towards vaccination in tweets is a too difficult task to assign only to a machine. Nonetheless, the SVM model showed sufficient recall on identifying negative tweets so as to reduce the manual effort of reviewing messages. In short, say the researchers, an approach is needed in which the use of a larger training dataset is combined with a setting in which a human-in-the-loop provides the system with feedback on its predictions. The messages that are judged as correctly and incorrectly predicted could be added as additional reliable training data to improve upon the model. The researchers have installed a dashboard that is catered for such a procedure, starting with the ML system that yielded the best performance in the current study.

Web link

Click here to read the article online or to download it in PDF format (14 pages…

Source

BMC Medical Informatics and Decision Making (2020) 20:33. https://doi.org/10.1186/s12911-020-1046-y. Image credit: CNN Philippines

Legacy Partners

Monitoring Stance towards Vaccination in Twitter Messages

Red de La Iniciativa de Comunicación

Soul Beat Africa Network

The Drum Beat Network