publications | Jiahao Yu's Page

2026

arXiv

Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization

1st on SWEBench-Lite(Open-weight)
5th on SWEBench-Verified(Open-weight)

Jiahao Yu*, Zelei Cheng*, Xian Wu, and 1 more author

arXiv preprint arXiv:2509.12434 2026

PDF
ICSE

Locus: Agentic Predicate Reasoning for Directed Fuzzing

Jie Zhu, Chihao Shen, Ziyang Li, and 3 more authors

In Proceedings of the International Conference on Software Engineering 2026
ICML

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

Haozheng Luo, Yimin Wang, Jiahao Yu, and 2 more authors

In Proceedings of the International Conference on Machine Learning 2026
TIFS

PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs

Jiahao Yu*, Yangguang Shao*, Hanwen Miao, and 1 more author

IEEE Transactions on Information Forensics and Security 2026

2025

NIPS

GPO: Learning from Critical Steps to Improve LLM Reasoning

Featured in MIT Technology Review China

Jiahao Yu*, Zelei Cheng, Xian Wu, and 1 more author

In 2025

PDF
NIPS

BlockScan: Detecting Anomalies in Blockchain Transactions

Jiahao Yu*, Xian Wu*, Hao Liu, and 2 more authors

In 2025

PDF
USENIX

Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs’ Ethical Boundaries

Long Talk

Jiahao Yu*, Haozheng Luo*, Jerry Yao-Chieh, and 3 more authors

In Proceedings of the 2025 USENIX Security 2025

PDF
USENIX

PATCHAGENT: A Practical Program Repair Agent Mimicking Human Expertise

Long Talk
Patched over 10 real-world bugs
CSAW 2025 Finalist

Zheng Yu, Ziyi Guo, Yuhang Wu, and 5 more authors

In Proceedings of the 2025 USENIX Security 2025
ICML

The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)

Zihao Wang, Yibo Jiang, Jiahao Yu, and 1 more author

In Proceedings of the 42nd International Conference on Machine Learning 2025
ACL@LLMSEC

UTF: Undertrained Tokens as Fingerprints A Novel Approach to LLM Identification

Jiacheng Cai*, Jiahao Yu*, Yangguang Shao, and 2 more authors

In 2025

PDF
ICML@MemFM

Knowledge-Distilled Memory Editing for Plug-and-Play LLM Alignment

Haozheng Luo, Jiahao Yu, Wenxin Zhang, and 6 more authors

In The Impact of Memorization on Trustworthy Foundation Models: ICML 2025 Workshop 2025
arXiv

PoisonCraft: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models

Yangguang Shao, Xinjie Lin, Haozheng Luo, and 4 more authors

arXiv preprint arXiv:2505.06579 2025
arXiv

A survey on explainable deep reinforcement learning

Zelei Cheng, Jiahao Yu, and Xinyu Xing

arXiv preprint arXiv:2502.06869 2025
arXiv

GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models

Haozheng Luo, Chenghao Qiu, Yimin Wang, and 5 more authors

arXiv preprint arXiv:2505.10983 2025
arXiv

BandFuzz: An ML-powered Collaborative Fuzzing Framework

Wenxuan Shi, Hongwei Li, Jiahao Yu, and 3 more authors

arXiv preprint arXiv:2507.10845 2025

2024

USENIX

LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks

Jiahao Yu, Xingwei Lin, Zheng Yu, and 1 more author

In Proceedings of the 2024 USENIX Security 2024
NIPS

Soft-Label Integration for Robust Toxicity Classification

Featured in MIT Technology Review China

Zelei Cheng, Xian Wu, Jiahao Yu, and 3 more authors

In Proceedings of the 38th Conference on Neural Information Processing Systems 2024

PDF
ICML

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Spotlight Top-3.5%

Zelei Cheng, Xian Wu, Jiahao Yu, and 3 more authors

In Proceedings of the 41st International Conference on Machine Learning 2024
ICSE@SBFT

BandFuzz: A Practical Framework for Collaborative Fuzzing with Reinforcement Learning

1st Place in SBFT Challenge

Wenxuan Shi, Hongwei Li, Jiahao Yu, and 2 more authors

In The 17th Intl Workshop on Search-Based and Fuzz Testing 2024

PDF
ICLR@SET-LLM

Assessing Prompt Injection Risks in 200+ Custom GPTs

Featured in WIRED

Jiahao Yu, Yuhang Wu, Dong Shu, and 3 more authors

In ICLR 2024 Workshop on Secure and Trustworthy Large Language Models 2024

PDF
arXiv

PromptFuzz: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs

Jiahao Yu*, Yangguang Shao*, Hanwen Miao*, and 2 more authors

In 2024

PDF
arXiv

Decoupled Alignment for Robust Plug-and-Play Adaptation

Haozheng Luo*, Jiahao Yu*, Wenxin Zhang, and 4 more authors

In 2024

PDF

2023

arXiv

GPTFuzzer: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Geekcon 2023 Annual Themed Debate Breakthrough Awards

Jiahao Yu, Xingwei Lin, Zheng Yu, and 1 more author

In 2023

PDF

NIPS

StateMask: Explaining Deep Reinforcement Learning through State Mask

Zelei Cheng*, Xian Wu*, Jiahao Yu*, and 3 more authors

In Proceedings of the 37th Conference on Neural Information Processing Systems 2023

Bib PDF

@inproceedings{statemask,
  title = {StateMask: Explaining Deep Reinforcement Learning through State Mask},
  author = {Cheng*, Zelei and Wu*, Xian and Yu*, Jiahao and Sun, Wenhai and Guo, Wenbo and Xing, Xinyu},
  booktitle = {Proceedings of the 37th Conference on Neural Information Processing Systems},
  year = {2023},
}

2022

USENIX

AIRS Explanation for Deep Reinforcement Learning based Security Applications

Jiahao Yu, Wenbo Guo, Qi Qin, and 3 more authors

In Proceedings of the 2023 USENIX Security 2022

Abs PDF

Recently, we have witnessed the success of deep reinforcement learning (DRL) in many security applications, ranging from malware mutation to selfish blockchain mining. Like all other machine learning methods, the lack of explainability has been limiting its broad adoption as users have difficulty establishing trust in DRL models’ decisions. Over the past years, different methods have been proposed to explain DRL models but unfortunately, they are often not suitable for security applications, in which explanation fidelity, efficiency, and the capability of model debugging are largely lacking. In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Unlike previous works that pinpoint important features to the agent’s current action, our explanation is at the step level. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus outputs the steps that are most critical towards the final reward the agent has gathered. Using four representative security-critical applications, we evaluate AIRS from the perspectives of explainability, fidelity, stability, and efficiency. We show that AIRS could outperform alternative explainable DRL methods. We also showcase AIRS’s utility, demonstrating that our explanation could facilitate the DRL model’s failure offset, help users establish trust in a model decision, and even assist the identification of inappropriate reward designs.

2021

TMC
Matrix Gaussian Mechanisms for Differentially-Private Learning

Jungang Yang, Liyao Xiang, Jiahao Yu, and 4 more authors

In IEEE Transactions on Mobile Computing 2021

Abs Bib PDF

The wide deployment of machine learning algorithms has become a severe threat to user data privacy. As the learning data is of high dimensionality and high orders, preserving its privacy is intrinsically hard. Conventional differential privacy mechanisms often incur significant utility decline as they are designed for scalar values from the start. We recognize that it is because conventional approaches do not take the data structural information into account, and fail to provide sufficient privacy or utility. As the main novelty of this work, we propose Matrix Gaussian Mechanism (MGM), a new (, δ)-differential privacy mechanism for preserving learning data privacy. By imposing the unimodal distributions on the noise, we introduce two mechanisms based on MGM with an improved utility. We further show that with the utility space available, the proposed mechanisms can be instantiated with optimized utility, and has a closed-form solution scalable to large-scale problems. We experimentally show that our mechanisms, applied to privacy-preserving federated learning, are superior than the state-of-the-art differential privacy mechanisms in utility.
@inproceedings{yang2021matrix, author = {Yang, Jungang and Xiang, Liyao and Yu, Jiahao and Wang, Xinbing and Guo, Bin and Li, Zhetao and Li, Baochun}, booktitle = {IEEE Transactions on Mobile Computing}, title = {Matrix Gaussian Mechanisms for Differentially-Private Learning}, year = {2021}, }

CIKM

Speedup robust graph structure learning with low-rank information

Hui Xu, Liyao Xiang, Jiahao Yu, and 2 more authors

In Proceedings of the 30th ACM International Conference on Information & Knowledge Management 2021

Bib PDF

@inproceedings{xu2021speedup,
  title = {Speedup robust graph structure learning with low-rank information},
  author = {Xu, Hui and Xiang, Liyao and Yu, Jiahao and Cao, Anqi and Wang, Xinbing},
  booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  pages = {2241--2250},
  year = {2021}
}

2020

INFOCOM
Voiceprint mimicry attack towards speaker verification system in smart home

Lei Zhang, Yan Meng, Jiahao Yu, and 3 more authors

In Proceedings of IEEE INFOCOM 2020

Abs Bib PDF

The advancement of voice controllable systems (VC-Ses) has dramatically affected our daily lifestyle and catalyzed the smart home’s deployment. Currently, most VCSes exploit automatic speaker verification (ASV) to prevent various voice attacks (e.g., replay attack). In this study, we present VMask, a novel and practical voiceprint mimicry attack that could fool ASV in smart home and inject the malicious voice command disguised as a legitimate user. The key observation behind VMask is that the deep learning models utilized by ASV are vulnerable to the subtle perturbations in the voice input space. To generate these subtle perturbations, VMask leverages the idea of adversarial examples. Then by adding the subtle perturbations to the recordings from an arbitrary speaker, VMask can mislead the ASV into classifying the crafted speech samples, which mirror the former speaker for human, as the targeted victim. Moreover, psychoacoustic masking is employed to manipulate the adversarial perturbations under human perception threshold, thus making victim unaware of ongoing attacks. We validate the effectiveness of VMask by performing comprehensive experiments on both grey box (VGGVox) and black box (Microsoft Azure Speaker Verification) ASVs. Additionally, a real-world case study on Apple HomeKit proves the VMask’s practicability on smart home platforms.
@inproceedings{zhang2020voiceprint, title = {Voiceprint mimicry attack towards speaker verification system in smart home}, author = {Zhang, Lei and Meng, Yan and Yu, Jiahao and Xiang, Chong and Falk, Brandon and Zhu, Haojin}, booktitle = {Proceedings of IEEE INFOCOM}, pages = {377--386}, year = {2020}, organization = {IEEE}, }
J Phys Conf Ser

Research on Application of Artificial Intelligence Technology in Electrical Automation Control

Chao Jiang, Xiaorui Xiong, Tanqing Zhu, and 2 more authors

In Journal of Physics: Conference Series 2020

2019

arXiv

Invisible backdoor attacks against deep neural networks

Shaofeng Li, Benjamin Zi Hao Zhao, Jiahao Yu, and 3 more authors

In 2019

PDF