Engineer by training, researcher by drift.

A short bio on origins, trajectory, and the cultures that shaped the way I read systems.

bornSeoul, KR

countries4

fluentKO · EN

workingHI · AR (some)

stackPython · Torch

edgeFastAPI · Celery

infraRedis · Docker

I'm an AI engineer and software engineer currently serving as a Staff Sergeant with the Republic of Korea Air Force, where I build computer-vision systems for runway integrity.

Before the service, I helped found AIxamine at QCRI — a platform that stress-tests language models against safety benchmarks. I'm a Carnegie Mellon CS '22 grad, with a minor in Mathematical Sciences.

I've been moving since I was three. Seoul, then a small town in the US, then back to Seoul, then India for secondary, then CMU, then Qatar for work, then home again. Cultures stack, like middleware. The interesting work happens in the seams.

What I'm doing this week.

Updated automatically.

Two papers out. Wrapping up service.

Two papers from the ROKAF runway project published in the Journal of the Korean Society of Airport (KOSAP, 2025) — crack detection dataset (first author) and PCI assessment pipeline. System in operational use. Service wrapping up in Q4.

Published: ROKAF Runway Crack Dataset — KOSAP Vol. 1 No. 2 (Dec 2025)
Published: Deep Learning for PCI Assessment — KOSAP Vol. 1 No. 1 (Aug 2025)
Pulling Korean military service to a close in Q4 — looking for what comes next

Six years, three time zones.

Full record on LinkedIn. Highlights below.

2025 — present

AI Engineer · Staff Sergeant · Squad Leader

Republic of Korea Air Force / AI-Based Technology Team

Led the squad that built and deployed an AI-driven runway pavement evaluation system at an active ROKAF airbase. Constructed a 231,347-image dataset — 52,800 real captures augmented with 178,547 alpha-blended synthetic images across 9 defect classes (SSIM 0.98163, FID 4.2145) — published as the ROKAF Runway Crack Dataset in KOSAP (Vol. 1, No. 2, Dec 2025) as first author. Co-designed the PCI scoring pipeline around YOLOv11, achieving 86.8% detection accuracy and a 98.2% reduction in manual assessment time; published in KOSAP (Vol. 1, No. 1, Aug 2025). Full project lifecycle as technical lead.

CVYOLOv11Co-DETRFastAPICeleryRedisSquad lead

2023 — 2025

Research Engineer

Qatar Computing Research Institute (QCRI)

Co-developed aiXamine — a black-box LLM safety evaluation platform with 40+ benchmarks across 8 security dimensions. Built the modular reporting + visualization architecture; evaluated 50+ models across 2K+ exams, surfacing vulnerabilities in GPT-4o, Grok-3, and Gemini 2.0. Also investigated backdoor Trojan attacks on code-focused LLMs (finetuning + susceptibility testing).

LLM evalSafetyBackdoor attacksPython

2022 — 2023

Software Engineer

KARTY · Spend, Save, and Manage

Built a multi-channel notification system (SMS, email, push) for the consumer fintech app. Migrated payment processing to a compliant platform under regulatory scrutiny. Designed and shipped a Clubhouse-style waiting list + lottery system tied to FIFA World Cup Qatar 2022.

FintechPaymentsNotifications

2021 — 2022

Teaching Assistant · 11-785 Deep Learning (PhD-level)

Carnegie Mellon University · Pittsburgh

Planned and delivered lectures, recitations, and assignments to 350+ students in CMU's flagship deep-learning course. Mentored research projects and guided exploration of novel directions. Sample recitation on YouTube →

TeachingDeep learning

2018 — 2022

B.S. Computer Science · Minor, Mathematical Sciences · University Honors

Carnegie Mellon University

Coursework concentrated in systems, machine learning, and applied math.

CMUCSMathHonors

Things I built that went live.

A non-exhaustive list. The ones I can talk about are below; ask for the rest.

view all projects →

Runway Evaluation System

Live · ROKAF

Tech lead · data + model + backend

Detects cracks and surface defects on airbase runways and computes PCI scores from high-res imagery. In operational use — 86.8% detection accuracy, two KOSAP papers published.

YOLOv11Co-DETRFastAPICeleryRedisPostgresDocker

repo · access on request KOSAP · crack dataset → KOSAP · PCI pipeline →

AIxamine

Live · public

Founding member · benchmark harness

A safety-evaluation platform for language models — runs models through bias, robustness, and jailbreak benchmarks for an honest scorecard. Founding member, co-author on the paper.

PythonLLMEvalBenchmarks

aixamine.qcri.org arxiv · 2504.14985

Papers and conference work.

Journal of the Korean Society of Airport · arXiv · newest first.

KSMI · 2026 · Extended abstract

Music Tagging Graph Neural Network with Tag Labels

Y. Park · J. Park · E. Jeong

MTGNN is a graph neural network framework for music auto-tagging. It adapts ATGNN's graph-based audio-tagging idea to music by redesigning node generation around semantic and timbre features, then uses a CLAP-initialized Graph Transformer to model dependencies between tag labels.

KSMI 2026 Music tagging GNN MIR CC BY 4.0

openreview → pdf →

KOSAP · Vol. 1, No. 2 · Dec 2025

ROKAF Runway Crack Dataset: Construction and Application of a Large-Scale AI-Based Runway Defect Detection Dataset

E. Jeong · S. Ji · M. Kim · H. Lee

A large-scale runway defect dataset built from real airfield captures and synthetic augmentation for AI-based crack and surface-defect detection.

KOSAP 2025 First author Computer vision Dataset Runway inspection

10.23379/jkosap.1.2.114 read → english version planned

KOSAP · Vol. 1, No. 1 · Aug 2025

Deep Learning for Pavement Management System: Proposing an Automated Pipeline for Pavement Condition Index (PCI) Assessment

S. Ji · E. Jeong · M. Kim · H. Lee

An automated pavement-management pipeline that combines runway defect detection with PCI scoring to reduce manual inspection work.

KOSAP 2025 Second author Deep learning PCI Pavement management

10.23379/jkosap.1.1.52 read → english version planned

arXiv · Apr 2025

AIxamine: A Comprehensive Safety Evaluation Platform for Large Language Models

… · E. Jeong · … (see paper for full author list)

A safety-evaluation platform for large language models, covering bias, robustness, jailbreak, and other benchmark-driven risk checks.

arXiv 2025 LLM safety Evaluation Benchmarks QCRI

2504.14985 read → site →

Notes & long-form.

Six cities, one stack.

Seoul · USA · India · Doha · Pittsburgh · back to Seoul. The cities, in order.

1998

SEOUL · born here

2001

USA · kindergarten

2004

SEOUL · elementary

2010

INDIA · secondary

2018

DOHA · CMU-Q

2020

PITTSBURGH · CMU main

2022

DOHA · QCRI · AIxamine

2025

SEOUL · service

EUISUH JEONG