An open benchmark for evaluating large language models on real clinician chat tasks across care consult, writing and documentation, and medical research.

Recent stories
1 linked story
An open benchmark for evaluating large language models on real clinician chat tasks across care consult, writing and documentation, and medical research.
