Accepted for/Published in: JMIR Formative Research
Date Submitted: Jan 3, 2019
Open Peer Review Period: Jan 7, 2019 - Mar 4, 2019
Date Accepted: Sep 26, 2019
Date Submitted to PubMed: Jun 23, 2020
(closed for review but you can still tweet)
Using Natural Language Processing to examine the uptake, content, and readability of media related to an observational research study on isotretinoin exposure and pregnancy
ABSTRACT
Background:
Isotretinoin, for treating cystic acne, increases risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada-approved product monograph for isotretinoin includes guidance for pregnancy prevention, mandating certain precautionary and monitoring processes by healthcare providers and patients. A recent study by the Canadian Network of Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to pregnancy prevention processes. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication.
Objective:
To understand how the media present pharmacoepidemiological research using the CNODES’ isotretinoin study as a case study.
Methods:
Google News was searched (April 25 – May 6, 2016), using a predefined set of terms, for mention of the CNODES study. Twenty-six articles and three CNODES publications (original article, press release, podcast) were identified. The article texts were cleaned (e.g. advertisements and links removed) and the podcast was transcribed. A dictionary of 1295 unique words was created using Natural Language Processing (NLP) techniques (TF-IDF, Porter stemming, stop-word filtering) to identify common words and phrases in the articles. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied, each using different formulas to measure text readability based on the number of words, difficult words, syllables, sentence counts, and other textual metrics.
Results:
The top five dictionary words were pregnancy (250 appearances), isotretinoin (220), study (209), drug (201), and women (185). Three distinct clusters were identified: Clusters 2 (five articles) and 3 (four articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources. Two articles fell outside these clusters. Use of the term isotretinoin vs. Accutane (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term “pregnanc” appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high: Flesch-Kincaid (13), Gunning Fog (15), SMOG Index (10), Coleman Liau Index (15), Linsear Write Index (13), and Text Standard (13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th - 15th grade), but exceeded the recommended health information reading level (7th grade), overall.
Conclusions:
Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus, as well as above-average reading levels. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is an important step in understanding how drug safety studies are taken up and re-distributed in the media.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.