In healthcare settings where patients use LLMs as a medical assistant, LLM performance differs between evaluation and deployment. (a) Bean et al. (2025) find a 61 percentage point difference between evaluation and deployment. (b) We argue this gap arises not from poorly designed benchmarks, but from implicit assumptions embedded i...
Get updates from Machine Learning Blog | ML@CMU | Carnegie Mellon University via email, on your phone or read them on follow.it on your own custom news page.
You can filter the news from Machine Learning Blog | ML@CMU | Carnegie Mellon University that get delivered to you using tags or topics or you can opt for all of them. Unsubscription is also very simple.
See the latest news from Machine Learning Blog | ML@CMU | Carnegie Mellon University below.
Site title: Machine Learning Blog | ML@CMU | Carnegie Mellon University