The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations

Joseph Chervenak, Harry Lieman, Miranda Blanco-Breindel, Sangita Jindal

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

Objective: To compare the responses of the large language model-based “ChatGPT” to reputable sources when given fertility-related clinical prompts. Design: The “Feb 13” version of ChatGPT by OpenAI was tested against established sources relating to patient-oriented clinical information: 17 “frequently asked questions (FAQs)” about infertility on the Centers for Disease Control (CDC) Website, 2 validated fertility knowledge surveys, the Cardiff Fertility Knowledge Scale and the Fertility and Infertility Treatment Knowledge Score, as well as the American Society for Reproductive Medicine committee opinion “optimizing natural fertility.” Setting: Academic medical center. Patient(s): Online AI Chatbot. Intervention(s): Frequently asked questions, survey questions and rephrased summary statements were entered as prompts in the chatbot over a 1-week period in February 2023. Main Outcome Measure(s): For FAQs from CDC: words/response, sentiment analysis polarity and objectivity, total factual statements, rate of statements that were incorrect, referenced a source, or noted the value of consulting providers. For fertility knowledge surveys: Percentile according to published population data. For Committee Opinion: Whether response to conclusions rephrased as questions identified missing facts. Result(s): When administered the CDC's 17 infertility FAQ's, ChatGPT produced responses of similar length (207.8 ChatGPT vs. 181.0 CDC words/response), factual content (8.65 factual statements/response vs. 10.41), sentiment polarity (mean 0.11 vs. 0.11 on a scale of -1 (negative) to 1 (positive)), and subjectivity (mean 0.42 vs. 0.35 on a scale of 0 (objective) to 1 (subjective)). In total, 9 (6.12%) of 147 ChatGPT factual statements were categorized as incorrect, and only 1 (0.68%) statement cited a reference. ChatGPT would have been at the 87th percentile of Bunting's 2013 international cohort for the Cardiff Fertility Knowledge Scale and at the 95th percentile on the basis of Kudesia's 2017 cohort for the Fertility and Infertility Treatment Knowledge Score. ChatGPT reproduced the missing facts for all 7 summary statements from “optimizing natural fertility.” Conclusion(s): A February 2023 version of “ChatGPT” demonstrates the ability of generative artificial intelligence to produce relevant, meaningful responses to fertility-related clinical queries comparable to established sources. Although performance may improve with medical domain-specific training, limitations such as the inability to reliably cite sources and the unpredictable possibility of fabricated information may limit its clinical use.

Original languageEnglish (US)
Pages (from-to)575-583
Number of pages9
JournalFertility and sterility
Volume120
Issue number3
DOIs
StatePublished - Sep 2023

Keywords

  • Artificial intelligence
  • counseling
  • fertility knowledge
  • natural language processing
  • online

ASJC Scopus subject areas

  • Reproductive Medicine
  • Obstetrics and Gynecology

Fingerprint

Dive into the research topics of 'The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations'. Together they form a unique fingerprint.

Cite this