Tianhang Zhu @ Fundamental Research Labs

Tianhang Zhu

Hi, I'm Tianhang Zhu. I'm Head of LLM Training at Fundamental Research Labs, where I lead training for Ava and our broader effort to build digital human beings — autonomous, collaborative, and socially intelligent agents. Previously I was Head of Reinforcement Learning at 01.ai, leading online RLHF training, reward modeling, and reasoning research for Yi models, and before that I was a Senior Research Scientist on the Qwen LLM team at Alibaba DAMO, where I helped deploy Qwen-max and co-authored the Qwen, Qwen2, and Qwen3 technical reports. I hold two Master's degrees from the Georgia Institute of Technology and a Bachelor's in Computer Science and Actuarial Science from the University of Waterloo. My research interests include reinforcement learning from human feedback, large language model training, emergent reasoning, and multi-agent simulation.

Menlo Park, CA

tianhang.zhu[at]alibaba-inc.com / bobzhu1991[at]outlook.com

+1 (404) 281 9788

Google Scholar / GitHub / Fundamental Research Labs

News

[2026/06] Joined Fundamental Research Labs as Head of LLM Training, building Ava and digital human agents.
[2026/06] Wrapped up tenure as Head of Reinforcement Learning at 01.ai.
[2024/07] Joined 01.ai as Head of Reinforcement Learning.
[2024/04] Helped deploy Qwen-max, achieving the highest averaged subjective and objective scores in China at the time.

Working Experience

Head of LLM Training, Fundamental Research Labs, Menlo Park, CA. June 2026 – Present. Leading LLM training for Ava and the lab's digital human agents program.
Head of Reinforcement Learning, 01.ai, Seattle, USA. July 2024 – June 2026. Online RLHF for Yi models; built RL training infrastructure, reward modeling, and early O1-style reasoning research.
Senior Research Scientist, Qwen LLM Team, Alibaba DAMO, Seattle, USA. Jan 2022 – May 2024. Deployed Qwen-max; scaled RLHF training; core contributor to the Chatlearn RLHF framework.
Machine Learning Intern, Worthix, Georgia, USA. June 2021 – Dec 2022. Novel clustering for unsupervised NER and multi-label classification methodology.
Graduate Teaching Assistant, Georgia Institute of Technology, Atlanta, USA. Sep 2017 – May 2021. TA for CS7642 (Graduate Reinforcement Learning, OMSCS).

Selected Publications

— 2025 —

Qwen3 Technical Report
An Yang, Anfeng Li, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Dayiheng Liu, Fei Huang, Huan Lin, Jian Yang, Junyang Lin, Peng Wang, Tianhang Zhu, et al.
arXiv preprint arXiv:2505.09388

— 2024 —

Yi-Lightning Technical Report
Alan Wake, Bei Chen, Chao Li, Chengen Huang, Chujie Zheng, Fan Zhou, Feng Hu, Ge Zhang, Guoyin Wang, Heng Ji, Tianhang Zhu, et al.
arXiv preprint arXiv:2412.01253

Qwen2 Technical Report
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Junyang Lin, Peng Wang, Tianhang Zhu, et al.
arXiv preprint arXiv:2407.10671

— 2023 —

Qwen Technical Report
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Yang Fan, Fei Huang, Binyuan Hui, Junyang Lin, Runji Lin, Dayiheng Liu, Rui Men, Jianxin Ma, Xingzhang Ren, Peng Wang, Shijie Wang, An Yang, Tianhang Zhu, et al.
arXiv preprint arXiv:2309.16609

Education

Georgia Institute of Technology — Master of Mathematics / CSE, Sep 2019 – May 2021. 17 courses in numerical methods, geometry & topology, optimization, statistics, and differential equations.
Georgia Institute of Technology — Master of Computer Science, Sep 2016 – May 2020. Algorithms, theory, and large-scale scientific computation.
University of Waterloo — Bachelor of Computer Science & Actuarial Science (Double Major), Jan 2011 – Sep 2015.