About Me

I am a third-year PhD candidate in Computer Science at the University of Montreal and Mila. I am fortunate to be co-advised by Professor Aaron Courville and Professor Chengzhi Anna Huang. My doctoral research focuses on developing interactive and creative music generation models. I am also interested in multimodal learning approaches integrating music and audio. I am a percussionist who has performed timpani in orchestral settings. In my spare time, I also enjoy playing the guitar and harmonica.

Selected Publications and Manuscripts

Yusong Wu, Christos Tsirigotis, Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, Oriol Nieto, Prem Seetharaman, Justin Salamon: Flam: Frame-wise language-audio modeling. ICML 2025
Alexander Scarlatos, Yusong Wu, Ian Simon, Adam Roberts, Tim Cooijmans, Natasha Jaques, Cassie Tarakajian, Anna Huang: ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers. CHI 2025 (Extended Abstract)
Yusong Wu, Tim Cooijmans, Kyle Kastner, Adam Roberts, Ian Simon, Alexander Scarlatos, Chris Donahue, Cassie Tarakajian, Shayegan Omidshafiei, Aaron Courville, Pablo Samuel Castro, Natasha Jaques, Cheng-Zhi Anna Huang: Adaptive Accompaniment with ReaLchords. ICML 2024
Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov: MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. ICASSP 2024
Yusong Wu*, Ke Chen*, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov: Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation. ICASSP 2023
Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling. arXiv preprint arXiv:2209.14458
Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel: MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling. ICLR 2022 oral (5%)
Yusong Wu, Kun Chen, Ziyue Wang, Xuan Zhang, Fudong Nian, Xi Shao, Shengchen Li: Audio Captioning Based on Transformer and Pre-Training for 2020 DCASE Audio Captioning Challenge. Technical Report, DCASE 2020 Challenge (2nd place in the challenge and Reproducible System Award)
Yusong Wu, Shengchen Li, Chenzhu Yu, Heng Lu, Chao Weng, Dong Yu: Peking Opera Synthesis via Duration Informed Attention Network. INTERSPEECH 2020
Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu: DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System. INTERSPEECH 2020

Projects

ReaLchords and GenJam: Real-time Melody-to-chord Accompaniment via RL

CLAP: Large-scale Contrastive Language-audio Model

MusicLDM: Text-to-Music Generation with Mixup

3rd Place at AI Song Contest 2022

Hierarchical Music Generation with Detailed Control

Automatic Audio Captioning with Transformer

Expressive Peking Opera Synthesis

Chinese Guqin Dataset

Experience

Mila, University of Montreal

PhD Candidate in Computer Science - September 2022 - now

Conducting research on real-time, interactive music accompaniment models using reinforcement learning (RL) and multi-agent RL (MARL).

Interactive Music Generative Models Reinforcement Learning

Adobe Research, Co-Creation for Audio, Video, & Animation

Student Researcher - June 2024 - November 2024

Researched open-vocabulary audio event localization techniques conditioned on text prompts.

Multi-modal Representation Learning Open-vocabulary sound event detection

Google DeepMind, Magenta Team

Student Researcher - August 2023 - May 2024

Developed a reinforcement learning-based system, ReaLchords, for real-time melody-to-chord accompaniment, and built GenJam, an interactive framework that enables delay-tolerant inference and anticipatory output visualization.

Real-Time Music Interaction Generative Models

Mila, University of Montreal

CS Research Master - September 2020 - September 2022

Worked on hierarchical music generation models with detailed control.

Proposed MIDI-DDSP, a model for controlling musical performance and synthesis with hierarchical representation.
MIDI-DDSP enables high-fidelity audio reconstruction, accurate performance prediction, and novel audio generation from note sequences.

Hierarchical Music Models Audio and Symbolic Music Generation

Tencent AI Lab

Research Intern - August 2019 - May 2020

Developed expressive singing voice synthesis methods and explored dynamic vocal modeling.

Peking Opera Synthesis: Developed expressive singing synthesis using Peking Opera dynamics.
Learning Singing from Speech: Created singing synthesis by fine-tuning speech synthesis with pitch inputs.

Singing Voice Synthesis

Beijing University of Posts and Telecommunications

September 2016 - June 2020

Research focused on audio captioning, symbolic music datasets, and computational musicology.

Placed 2nd in the DCASE 2020 Challenge for Automatic Audio Captioning.
Developed the Chinese Guqin Dataset, a symbolic music dataset.
Proposed statistical approaches to distinguish musical genres in computational musicology.

Music Generation Automatic Audio Captioning Computational Musicology