· research, technical

Measuring Multi-Turn Safety Degradation in Medical AI

How the safety behavior of medical language models degrades across multi-turn clinical conversations.

  • CIS 700 Agentic AI final project at Penn
  • builds on PatientSafetyBench
  • measures degradation across conversation length and prompt structure

[full write-up coming soon]