Experience

Co-Founder & Research Scientist·Nettverk
Sweden / Germany·Jul. 2025 – Present
- Co-founded a voice-first AI platform for operating rooms, enabling passive clinical documentation and real-time decision support.
- Selected for the Johnson & Johnson AI Health QuickFire program; established research collaboration with Karolinska Institutet (KI).
- Designed and led development of speech-driven ML systems for low-latency inference in noisy, multi-speaker surgical environments.
- Built hardware-free, scalable pipelines integrating ASR, speaker understanding, and context-aware information retrieval.
Associate Researcher·KTH Speech, Music and Hearing Lab
Stockholm, Sweden·Mar. 2025 – Present
- Contributed to Språkbanken Tal (CLARIN SPEECH), Sweden's national speech technology infrastructure, building scalable tools and datasets for the European Language Grid.
- Pioneered WaveEGG, the first ML model to predict physiological voice signals (EGG) directly from acoustic input — eliminating contact-based hardware and enabling non-invasive, real-time clinical voice assessment at scale.
- Integrated ML-based TTS evaluation into clinical-grade software, optimizing runtime performance.
Doctoral Researcher·KTH Speech, Music and Hearing Lab
Stockholm, Sweden·Jan. 2021 – Mar. 2025
- Invented and led the development of VoiceMap, a hybrid DSP + ML benchmarking framework for speech synthesis, supporting large-scale multi-model comparison and voice system characterization and classification.
- Built automated pipelines for 500+ hours of synthetic and pathological speech data, enabling real-time voice quality mapping and load-aware batch inference.
- Collaborated with clinical and software engineers to co-design signal evaluation tools with low-latency processing and multi-platform deployment.
TTS Engineer·TikTok / ByteDance AI Lab
Beijing, China·Jan. 2020 – Jan. 2021
- Engineered and productionized backend systems for TikTok/Douyin's automated voice-over tool, supporting daily generation of millions of user videos.
- Designed an A/B testing platform for perceptual TTS quality, integrating real-time performance tracking and workload-aware model selection.
- Reduced inference latency by 30% through model quantization and batch-serving optimization across GPU clusters under global user load.
Research Engineer·Peking University — Linguistics Lab
Beijing, China·Sep. 2017 – Jan. 2020
- Architected a web-based speech data infrastructure with structured metadata, enabling query-efficient access to 1000+ hours of annotated corpus for national-scale phonetic research.
- Engineered automated signal processing pipelines in Python for large-scale acoustic data collection, annotation, and post-processing.
Instructional Systems Developer·Tsinghua University
Beijing, China·Jun. 2016 – Dec. 2016
- Built backend infrastructure for a MOOC platform serving 50,000+ learners, including automated grading pipelines, video delivery systems, and real-time analytics dashboards.

Service

Peer Reviewer · JASA, Journal of Voice, Laryngoscope, Scientific Reports, IEEE/ACM TASLP (20+ reviews) ·(Ongoing)
Organizing Committee Member · World Voice Day (Världsröstdagen), Sweden ·(May 2024)
Member · Voice Foundation, Philadelphia, USA ·(May 2023)

Grants & Awards

Research Fellow · Karl Engvar Foundation Research Fellowship ·(2024)
Grant Recipient · KTH Travel Research Grant ·(2022)
International Fellowship Recipient · Doctoral Study Support Program for Overseas Researchers ·(2021)
Academic Excellence Award · Peking University ·(2019)
National Scholarship · Ministry of Education, China ·(2017)
Industrial Scholarship · Yiyang Industry Fund ·(2016)
Innovation Award 1st Prize · College Student Innovation and Entrepreneurship Project ·(2015)