2025 GenAI Whitepaper

Build Medical Chatbots with AWS HealthScribe and Amazon Bedrock

Generative AI & LLMOps

Learn how AWS HealthScribe and Amazon Bedrock are enabling medical chatbots to handle complex workflows, generate clinical documentation, and deliver personalized, accurate responses—enhancing patient care and operational efficiency.

Artificial Intelligence (AI) is a game-changer for medical chatbots, helping them address user inquiries with unprecedented accuracy and efficiency compared to traditional rules-based or taxonomy-driven approaches. 

AI technologies, like AWS HealthScribe, enable chatbots to identify speaker roles, classify dialogues, and extract medical terms to generate preliminary clinical documentation. This streamlines data collection and documentation, allowing healthcare professionals to focus more on patient care. Given the vast amount of data in the medical field, the diverse personalization requirements, and the complex reasoning needed, AI agents are essential for making medical chatbots truly effective beyond simple Q&A.

AI agents, leveraging generative AI from Amazon Bedrock, assist healthcare workers with complex workflows, such as completing specific tasks, sorting through multiple data types, and using customized large language models (LLMs). These agents can retrieve information from various multimodal healthcare datasets or tools, as well as external datasets or tools, and provide responses with high accuracy, backed by faithfulness and context relevancy scores. 

For instance, a clinician can use an AI agent to determine a course of action for a patient with end-stage COPD. The agent can access the patient's EHR records (from AWS HealthLake), imaging data (from AWS HealthImaging), genetic data (from AWS HealthOmics), and other relevant information to generate a detailed response. The agent can also search for clinical trials, medications, and biomedical literature using an index built on Amazon Kendra to provide the most accurate and relevant information for the clinician to make informed decisions. 

Additionally, multiple purpose-specific agents can work in synchronization to execute complex workflows, such as creating a detailed patient profile. These agents can autonomously implement multi-step knowledge generation processes, which would have otherwise required human intervention. For more details, see the demo video here.

Assessing the Accuracy of Medical Chatbots

Quantifying accuracy in healthcare applications is challenging, but several benchmarks have been developed to measure it. These benchmarks include:

Medical Entrance Exams

The MedQA dataset includes multiple-choice questions from the US Medical Licensing Examination, assessing general medical knowledge and reasoning skills necessary for obtaining a medical license in the U.S. Similarly, MedMCQA covers a wide range of medical subjects from Indian medical entrance exams. It includes multiple-choice questions, each accompanied by an explanation, to evaluate a user's general medical knowledge and reasoning abilities.

General Clinical Knowledge

PubMedQA focuses on questions that can be answered using specific medical abstracts from the PubMed database. Each question is linked to an abstract and requires a yes, no, or maybe response, testing the ability to comprehend and analyze scientific biomedical literature.

Medicine and Biology Knowledge 

Part of the MMLU (Massive Multitask Language Understanding) benchmark, these subsets cover areas like clinical knowledge, medical genetics, human anatomy, and professional medicine, and college-level biology and medicine. Each subset aims to evaluate understanding in particular medical and biological domains.

The largest, state-of-the-art LLMs are averaging 80-85% accuracy across these benchmarks which is a remarkable achievement. These models continually improve with advancements in AI, machine learning, and human/clinical feedback. However, the benchmarks are not perfect, as models can be trained to be efficient on a benchmark without generalizing to all questions it is meant to represent. 

It's important to note that while these tools can provide valuable insights today, they are not yet at the level of a replacement for professional medical advice. It is crucial to understand that while these tools offer valuable insights, they are supplementary and not a replacement for professional medical advice.

The Business Impact of Medical Chatbots

Medical chatbots have the potential to revolutionize the business of medicine by enhancing efficiency, reducing costs, and improving patient care. They automate routine tasks such as appointment scheduling and follow-up reminders, providing immediate responses to patient inquiries, even outside regular office hours. Proactive outreach by chatbots enhances patient engagement, monitoring, and preventive care. 

These chatbots offer several advantages over traditional care methods:

  • Healthcare organizations can gain valuable data insights from chatbot interactions.
  • Clinicians can dedicate more time to complex cases as chatbots manage routine matters.

This trend of the "consumerization of healthcare" reduces barriers to entry for patients. For example, new classes of medicine like semaglutide weight loss medication are so widely popular and utilized that the need to handle care for more patients per clinician is growing. Exemplified by new entrants such as Eli-Lilly’s lillydirect.com, this has the power to re-shape who is providing care, as the pharmaceutical manufacturers with the most financial incentive to accelerate care provide services that were once reserved for local physicians.

AI-powered chatbots democratize healthcare, making medical information more accessible to a broader audience. They can also contribute to personalized medicine by providing insights based on individual patient data, as demonstrated by the AWS HealthAgent system.

At Caylent, we're committed to helping healthcare companies leverage these technologies to drive innovation and growth. Our team of specialists collaborates closely with clients to identify and implement the most valuable AI use cases, exemplified by our successful partnership with RLDatix. If you’re interested in deploying AI within your organization, get in touch to see how our team can help.

Whitepaper: The Transformative Potential of Generative AI in Healthcare

Read Here
Generative AI & LLMOps
Ryan Gross

Ryan Gross

Ryan Gross leads Cloud Data/AI/ML delivery at Caylent. Through his 15+ years of experience, Ryan has guided over 50 clients in building tech-driven data and AI cultures across various industries. By identifying technology trends, and leading the development of asset backed consulting offerings to realize value, he builds a growth culture within his team. Ryan is also a frequent conference speaker on emerging data and AI trends.

View Ryan's articles

Learn more about the services mentioned

Caylent Catalysts™

Generative AI Knowledge Base

Learn how to improve customer experience and with custom chatbots powered by generative AI.

Caylent Services

Artificial Intelligence & MLOps

Apply artificial intelligence (AI) to your data to automate business processes and predict outcomes. Gain a competitive edge in your industry and make more informed decisions.

Accelerate your GenAI initiatives

Leveraging our accelerators and technical experience

Browse GenAI Offerings

Related Blog Posts

The Art of Designing Bedrock Agents: Parallels with Traditional API Design

Learn how time-tested API design principles are crucial in building robust Amazon Bedrock Agents and shaping the future of AI-powered agents.

Generative AI & LLMOps

Prompt Caching: Saving Time and Money in LLM Applications

Explore how to use prompt caching on Large Language Models (LLMs) such as Amazon Bedrock and Anthropic Claude to reduce costs and improve latency.

Generative AI & LLMOps

Speech-to-Speech: Designing an Intelligent Voice Agent with GenAI

Learn how to build and implement an intelligent GenAI-powered voice agent that can handle real-time complex interactions including key design considerations, how to plan a prompt strategy, and challenges to overcome.

Generative AI & LLMOps