Using social media to understand health behaviour

Guest post by Ava Hodson and Dale Weston from Public Health England

Twitter has amassed more than 190 million registered users since its inception in 2006; as of September 2018, there were 335 million monthly active users which results in around 55 million tweets being processed, every day (Aslam, 2018; Signorini, Segre & Polgreen, 2011). Despite the ‘noise’ amongst the general Twitter chatter and user-to-user interaction, the Twitter feed can provide useful information, recent news-related content and personal concerns and opinions about relevant issues.

A previous blog post by Simon McLoughlin has highlighted some of the complications around using social media data to capture the public’s voice. However, a burgeoning research area has also examined the possibility of analysing “What’s happening?” on Twitter to monitor public health issues and enable fast and cost-effective health communication. The very nature of composing and publishing a 280-character-long ‘Tweet’ results in the sporadic expression of current thoughts, attitudes or actions. Whilst some may possess variable informational value, others may provide important and insightful views or experiences (Paul & Dredze, 2011). Given the sheer volume of messages being published on the internet, social media sites such as Twitter could facilitate the real-time assessment of illness or health concerns, including illness outbreak detection, public sentiments, expectation of particular symptoms, and illness prognosis (Paul & Dredze, 2011; Signorini et al., 2011; Salathé, Freifeld, Mekaru, Tomasulo, & Brownstein, 2013). In addition, these Tweets are not “isolated” and the further integration of certain geographic and demographic information, either provided by Twitter or through linkage with additional, heterogeneous datasets, means that such illnesses may be better understood, for example in terms of time-, place-, age-, gender-, or language-related issues.

The potential for the use of social media data in the surveillance of illness is best understood in terms of the positive relationships observed between more conventional indicators of influenza incidence (e.g., clinical rates) and the frequency of illness-related Tweets, and has most commonly been used in reference to influenza (Aslam et al., 2014; Paul & Dredze, 2014; Signorini et al., 2011). Research from the U.S. monitored Tweets mentioning “flu” and correlated these with influenza-like illness (ILI) sentinel and emergency department rates; stronger correlations were observed between the tweet data and emergency department ILI data, than between the tweet data and sentinel ILI data (Aslam et al., 2014). Similarly, Hartley and colleagues found that a time-series of tweets referring to “flu”, “coughing”, or “headache” was both correlated with, and predictive of ILI visits to urgent care clinics and emergency departments at a large children’s hospital (Hartley et al., 2017). Social media data could therefore be used to supplement existing traditional datasets to help build up a picture of disease transmission and healthcare seeking behaviour.

This is not to say that there are no issues related to the use of social media as a potentially significant or informative source. Some pertinent challenges and complications associated with sieving through Twitter for information that is both reliable and correct, have already been highlighted above and in Simon’s previous blog post. That said, however social media could still provide a novel means of understanding health behaviour and disease transmission; the brief review presented above speaks to this possibility. In particular, the integration of social media data with additional heterogeneous datasets seems to have particular appeal for the understanding and interpretation of health issues and disease transmission. Such linkage relates explicitly to the ongoing work of the MIDAS platform to integrate of disparate datasets and facilitate evidence-based policy making. However, this field of study is very much in its infancy; more work is definitely required to fully explore the feasibility, utility, and ethical appropriateness of utilising and integrating social media data to aid our understanding of health related concerns in the public health arena.


Aslam, S. (2018, September 18). Twitter by the Numbers: Stats, Demographics & Fun Facts. Omnicore. Retrieved from

Aslam, A. A., Tsou, M. H., Spitzberg, B. H., An, L., Gawron, J. M., Gupta, D. K., . . . Lindsay, S. (2014). The reliability of tweets as a supplementary method of seasonal influenza surveillance. J Med Internet Res, 16(11), e250.

Hartley, D. M., Giannini, C. M., Wilson, S., Frieder, O., Margolis, P. A., Kotagal, U. R., . . . Macaluso, M. (2017). Coughing, sneezing, and aching online: Twitter and the volume of influenza-like illness in a pediatric hospital. PloS one, 12(7), e0182008.

Paul, M. J., & Dredze, M. (2011). You are what you Tweet: Analyzing Twitter for public health. Icwsm20, 265-272.

Signorini, A., Segre, A.M., Polgreen, P.M. (2011). The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS ONE 6(5): e19467.

Salathé, M., Freifeld, C. C., Mekaru, S. R., Tomasulo, A. F., & Brownstein, J. S. (2013). Influenza A (H7N9) and the importance of digital epidemiology. The New England journal of medicine, 369(5), 401.