Yusong Wu (吴雨松)

2nd year research master at Mila, University of Montreal.
Working on music generation.


About Me

I am a first year PHD student of computer science at the University of Montreal and Mila. I am fortunate to be co-advised by Prof. Aaron Courville and Prof. Chengzhi Anna Huang. My research direction for PhD is on interactive music generative models and creative generative models. I am also interested in multi-modality learning with music and audio. I am a percussion player, and I used to play timpani in the orchestra. I will also play guitar and harmonica in my spare time.

Selected Publications and Manuscripts

  • Yusong Wu*, Ke Chen*, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov: Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation. Submitted to ICASSP 2023
  • Yusong Wu, Kyle Kastner, Tim Cooijmans, Cheng-Zhi Anna Huang, Aaron Courville: Datasets That Are Not: Evolving Novelty Through Sparsity and Iterated Learning. Accepted by Workshop on Machine Learning for Creativity and Design at NeurIPS 2022.
  • Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, Jesse Engel: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling. https://arxiv.org/abs/2209.14458
  • Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel: MIDI-DDSP: detailed control of musical performance via hierarchical modeling. ICLR 2022 oral (5%), outstanding paper award of CtrlGen Workshop at NeurIPS 2021.
  • Yusong Wu, Kun Chen, Ziyue Wang, Xuan Zhang, Fudong Nian, Xi Shao, Shengchen Li: Audio Captioning Based on Transformer and Pre-Training for 2020 DCASE Audio Captioning Challenge. Technical Report, DCASE2020 Challenge (2nd place in the challenge and Reproducible System Award)
  • Yusong Wu, Shengchen Li, Chenzhu Yu, Heng Lu, Chao Weng, Dong Yu: Peking Opera Synthesis via Duration Informed Attention Network. INTERSPEECH 2020
  • Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu: DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System. INTERSPEECH 2020
  • Xinhao Mei, Qiushi Huang, Xubo Liu, Gengyun Chen, Jingqian Wu, Yusong Wu, Jinzheng Zhao, Shengchen Li, Tom Ko, H Lilian Tang, Xi Shao, Mark D Plumbley, Wenwu Wang: An encoder-decoder based audio captioning system with transfer and reinforcement learning for DCASE challenge 2021 task 6 Technical Report, DCASE2021 Challenge (3rd place in the challenge)
  • Yusong Wu, Shengchen Li: Guqin Dataset: A symbolic music dataset of Chinese Guqin collection. Proceedings of China Conference on Sound and Music Technology (CSMT 2019)
  • Yusong Wu, Shengchen Li: Distinguishing Chinese Guqin and Western Baroque pieces based on statistical model analysis of melodies. International Symposium on Computer Music Multidisciplinary Research (CMMR 2019)


3rd Place at AI Song Contest 2022

Hierarchical Music Generation with Detailed Control

Automatic Audio Captioning with Transformer

Automatic Audio Captioning with Transformer

Expressive Peking Opera Synthesis

Chinese Guqin Dataset


Placeholder image

Mila, University of Montreal

CS PhD - -

Work on interactive music generative models and creative generative models.

Interactive Music Generative Models Creative Generative Models
Placeholder image

Mila, University of Montreal

CS Research Master - -

Work on music generation.

  • Propose MIDI-DDSP, a hierarchical music generation model with explicit and interpretable representation for controlling musical performance and synthesis.
  • MIDI-DDSP can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence.
Audio Music Generation Symbolic Music Generation
Placeholder image

Tencent AI Lab

Research Intern - -

Working on singing voice synthesis.

  • Expressive Singing Performance: Experimented synthesizing Peking Opera singing with expressiveness in singing by inputting musical note, with the dynamics in Peking opera singing learned from the spectrogram.
  • Learning Singing from Speech: Experimented generating singing with the voice timbre learned from speech by jointly training singing and fine-tuning speech synthesis using fundamental frequency input.
Singing Voice Synthesis
Placeholder image

Beijing University of Posts and Telecommunications


Received Bachelor of Engineer in Automation. Doing research advised by Prof. Shengchen Li.

  • Working on DCASE 2020 Challenge of Automatic Audio Captioning. Won 2nd place of challenge and Reproducible System Award.
  • Collect and construct a symbolic music dataset of Chinese Guqin Dataset.
  • Work on computational musicology, proposed statistical approach to distinguishing different music genre.
Music Generation Automatic Audio Captioning Computational Musicology