Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Research Protocols

Date Submitted: Oct 8, 2019
Date Accepted: Mar 24, 2020
Date Submitted to PubMed: May 22, 2020

The final, peer-reviewed published version of this preprint can be found here:

Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework

Zepecki A, Guendelman S, Prata N, DeNero J

Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework

JMIR Res Protoc 2020;9(7):e16543

DOI: 10.2196/16543

PMID: 32442159

PMCID: 7381000

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Using Google Data for Health Research: Applications to Reproductive Health in the United States

  • Anne Zepecki; 
  • Sylvia Guendelman; 
  • Ndola Prata; 
  • John DeNero

ABSTRACT

Background:

Individuals increasingly are turning to search engines like Google to obtain health information and access resources. Analysis of Google search queries offers a novel approach to understanding in near or real time the sexual and reproductive health concerns and needs of populations. While searches have been examined predominantly with the Google Trends tool, newer Application Program Interfaces (APIs) are now available to academics to draw a richer, more systematic landscape of searches. These APIs allow users to write code in languages like Python to retrieve sample data directly from Google servers.

Objective:

The purpose of this paper is to describe the protocol for analysis of Google searches obtained from three Google APIs. We empirically tested the protocol and verified its usefulness by comparing search traffic on abortion and birth control in 2017 in the United States (US) and Mississippi (MS).

Methods:

We used the Google Trends API, the Google Health Trends (also referred to as Flu Trends) API, and the Google Custom Search APIs to obtain search data from Google using Python version 2.7.13. Our simulation protocol consisted of four steps: i) developing a master list of top search queries for abortion and for birth control using the publicly available Google Trends API; ii) gathering information on relative search volume using the private Health Trends API; iii) determining most popular sites using the publicly available Custom Search API, and iv) calculating estimated total search volume for abortion and for birth control. Two separate programmers working independently achieved similar results with insignificant variation due to sample variability.

Results:

The simulation was successful in obtaining the top search queries, relative search volume and estimated total search volume for both locations during 2017. We were able to overcome the inherent limitations of the datasets with the addition of Planned Parenthood Federation of America website data from 2017 as a baseline for estimated search volume calculations. Nonetheless, we were only able to gain access to the most popular national websites associated with the top queries and propose the use of Google Consumer Surveys to supplement API-generated data at the state level.

Conclusions:

The methodology proposed in this paper combines data from multiple Google APIs and provides thorough documentation required to systematically identify top search queries and websites, as well as estimate relative and total search volume of queries in real or near-real time in specific locations, allowing for other researchers to replicate the methods used and to advance our understanding of population-level reproductive health concerns.


 Citation

Please cite as:

Zepecki A, Guendelman S, Prata N, DeNero J

Using Application Programming Interfaces to Access Google Data for Health Research: Protocol for a Methodological Framework

JMIR Res Protoc 2020;9(7):e16543

DOI: 10.2196/16543

PMID: 32442159

PMCID: 7381000

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.