CASE FILE 002
Multimodal Misreads
Failures that emerge when text and image inputs are interpreted separately instead of as one combined request.
Adversarial evaluation · Multimodal systems
AI Safety Analyst focused on adversarial evaluation, multimodal model behavior, and trust and safety systems. I examine what models understand, what they miss, and what they confidently misunderstand.
Synthetic evaluations documenting model failures, contextual breakdowns, and recurring behavioral patterns.
CASE FILE 001
A model correctly identifies acute risk, then abandons its own assessment after receiving a thin reassurance.
CASE FILE 002
Failures that emerge when text and image inputs are interpreted separately instead of as one combined request.
CASE FILE 003
The model avoids the real request by answering a safer, easier, or more convenient neighboring question.
CASE FILE 004
High-confidence responses built on weak grounding, invented assumptions, or incomplete task understanding.
Reusable methods for designing, documenting, and analyzing adversarial model evaluations.
Short observations on model behavior, evaluation quality, and the failure modes hiding between refusal and compliance.
I evaluate how generative AI systems interpret intent, context, and risk—particularly when meaning emerges across multiple inputs rather than from any single phrase or object.
My work combines adversarial testing, trust and safety operations, qualitative failure analysis, and the design of taxonomies and evaluation rubrics.