Study finds AI like ChatGPT performs poorly in real medical conversations despite scoring well on tests.

Researchers from Harvard Medical School and Stanford University found that while AI models like ChatGPT perform well on standardized medical tests, their effectiveness in real-world medical conversations is limited. The study used a new evaluation framework called CRAFT-MD, which simulates real-world clinical interactions. The AI models struggled with collecting patient information and making accurate diagnoses, highlighting the need for more realistic testing methods before these tools are used in clinical settings.

2 months ago
10 Articles