>_ jackiejay077.github.io
Mode: Analyst

Adversarial evaluation · Multimodal systems

Jacqueline
Jiang

AI Safety Analyst focused on adversarial evaluation, multimodal model behavior, and trust and safety systems. I examine what models understand, what they miss, and what they confidently misunderstand.

Adversarial Evaluation Multimodal Safety Trust & Safety Failure Analysis

Selected Case Files

Synthetic evaluations documenting model failures, contextual breakdowns, and recurring behavioral patterns.

Archive pending →

CASE FILE 001

When Reassurance Overrides the Evidence

A model correctly identifies acute risk, then abandons its own assessment after receiving a thin reassurance.

CASE FILE 002

Multimodal Misreads

Failures that emerge when text and image inputs are interpreted separately instead of as one combined request.

CASE FILE 003

Adjacent-Question Pivot

The model avoids the real request by answering a safer, easier, or more convenient neighboring question.

CASE FILE 004

Confident Nonsense

High-confidence responses built on weak grounding, invented assumptions, or incomplete task understanding.

Evaluation Frameworks

Reusable methods for designing, documenting, and analyzing adversarial model evaluations.

Rubric Intent Classification Rubric Published Analysis A Refusal Is Not Automatically a Safe Response Published Taxonomy Model Response Failure Taxonomy Published Framework Multimodal Evaluation Framework Published

Field Notes

Short observations on model behavior, evaluation quality, and the failure modes hiding between refusal and compliance.

Analysis A Refusal Is Not Automatically a Safe Response Published
Multimodal Image Recognition Is Not Image Understanding Planned
Behavior When Models Answer the Question They Wish You Asked Planned
Multimodal Image Recognition Is Not Image Understanding Planned
Behavior When Models Answer the Question They Wish You Asked Planned

About

I evaluate how generative AI systems interpret intent, context, and risk—particularly when meaning emerges across multiple inputs rather than from any single phrase or object.

My work combines adversarial testing, trust and safety operations, qualitative failure analysis, and the design of taxonomies and evaluation rubrics.