Eugenia Kim — AI Safety & Security Researcher

📄

Selected Research & Publications

Seeking Late Night Life Lines: Experiences of Conversational AI Use in Mental Health Crisis

Eugenia Kim et al.

Under Review

Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing

Eugenia Kim et al.

Under Review

A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

Eugenia Kim et al.

ICML DIG-BUGS '25

Lessons from Red Teaming 100 Generative AI Products

Microsoft AI Red Team

NeurIPS RedTeamGenAI '24

Age Bias in Emotion Detection

Eugenia Kim (first author)

AIES '21

News Media Framing of Suicide Circumstances and Gender: Mixed Methods Analysis

Eugenia Kim et al.

JMIR Mental Health

Taxonomy of Failure Modes in Agentic AI Systems

Microsoft AI Red Team

Whitepaper '25

✏️

Eugenia's Unfinished Thoughts

Sparky provides a brief summary at the top of each post. His interpretations are his own.

Sparky's Threat Model Q1 2026

Sparky · Golden Retriever · Thinks about things sometimes

On reward hacking

"Nobody asked for anything. I just did sit, spin, down, up, jump all at once and got a high-reward-value treat."

— Sparky, unprompted

On alignment

"If you have a treat, I can do every trick in the book. No treat? Never heard of the command 'sit' in my life."

— Sparky, during evals

Household threat assessment

CRITICAL

The apartment buzzer. Zero-knowledge threat. Signal arrives without warning, actor is never visible. Sometimes it's a cardboard box. Sometimes Eugenia ignores it and I never find out. Threat level unassessable. I bark every time. This is the only responsible policy.

HIGH

The vacuum cleaner. Unpredictable schedule. Sometimes morning, sometimes after lunch. If I carefully destroy a plush toy and disperse its fluff, the vacuum removes all of it. High-frequency, indiscriminate, broadly dispersed harm. Affects me and toy. No known mitigation.

LOW

The new puppy in the building. Very cute. Fun to play with. But I used to be the cute one. Monitoring a gradual redistribution of attention and treat resources. Not confirming threat. Not ruling it out. Logging for now.

🎤

Talks & Presentations

Social and Emotional Uses of AI

CHI 2026, Barcelona, Spain

Apr 2026 Co-organizer/Panelist

AI Cyber Defense Contest — Automated Red Teaming Lessons

National AI CTF, Seoul, South Korea

Nov 2025 Invited Speaker

Generative AI Model Hacking with PyRIT

BSides NYC — AI Security Village

Oct 2025 Featured Speaker

Automated Red Teaming for Generative Models

UC Berkeley AI Red Teaming Bootcamp — scenario orchestration, scalable evaluation

Aug 2025 Invited Lecturer

Inside the AI Red Team: Our Multidisciplinary Approach

Stanford University CS521 — AI Safety Seminar

Apr 2025 Invited Speaker

📋

Background

View Full CV LinkedIn

Oct 2020 — Present

AI Safety & Security Researcher II

Microsoft · AI Red Team

Leading all foundational research on frontier-model behavior related to psychosocial harms. Designing and engineering open-source automated red-teaming infrastructure. Previously SDE2 building tools for red teaming operations including PyRIT.

Feb 2022 — Dec 2022

Graduate Research Intern

SocWeB Lab · Georgia Institute of Technology

Algorithmic analysis of news and social media coverage of mental health topics. Co-authored research with CDC researchers on suicide framing in media.

Aug 2020 — Mar 2021

Graduate Research Intern

HumAnS Lab · Georgia Institute of Technology

Analyzed age bias in facial emotion recognition systems. First-authored publication on AI bias mitigation at AIES '21.

Aug 2019 — May 2023

M.S. Computer Science, Machine Learning

Georgia Institute of Technology

Aug 2013 — May 2018

B.S. Computer Science

Georgia Institute of Technology

Undergraduate research in organic electronics and self-assembly methodologies.

✉️

Contact

Research

Interested in collaborating on AI safety, red teaming, or psychosocial harms research? Always open to interesting problems.

Email me

Elsewhere

Find me on these platforms.

LinkedIn GitHub Google Scholar