Streaming Generation for Music Accompaniment

Yusong Wu¹, Mason Wang², Heidi Lei², Stephen Brade², Lancelot Blanchard², Shih-Lun Wu²,
Aaron Courville^1,3, Anna Huang²

¹Mila, Université de Montréal ²MIT ³Canada CIFAR AI Chair

arXiv Appendix Code

Each example row shares the same vocal input used in the listening study. For every system we provide the rendered mix (input plus accompaniment) and the standalone generated accompaniment.

Example 1 - Bass Input

Ground Truth

Mix

Output

Offline StemGen

Mix

Output

Offline Prefix Decoder

Mix

Output

t_f = -1 s

Mix

Output

t_f = 0 s

Mix

Output

t_f = 1 s

Mix

Output

Random Pairing

Mix

Output

Example 2 - Piano Input

Ground Truth

Mix

Output

Offline StemGen

Mix

Output

Offline Prefix Decoder

Mix

Output

t_f = -1 s

Mix

Output

t_f = 0 s

Mix

Output

t_f = 1 s

Mix

Output

Random Pairing

Mix

Output

Example 3 - Piano Input

Ground Truth

Mix

Output

Offline StemGen

Mix

Output

Offline Prefix Decoder

Mix

Output

t_f = -1 s

Mix

Output

t_f = 0 s

Mix

Output

t_f = 1 s

Mix

Output

Random Pairing

Mix

Output

Example 4 - Guitar Input

Ground Truth

Mix

Output

Offline StemGen

Mix

Output

Offline Prefix Decoder

Mix

Output

t_f = -1 s

Mix

Output

t_f = 0 s

Mix

Output

t_f = 1 s

Mix

Output

Random Pairing

Mix

Output

Example 5 - Percussive Input

Ground Truth

Mix

Output

Offline StemGen

Mix

Output

Offline Prefix Decoder

Mix

Output

t_f = -1 s

Mix

Output

t_f = 0 s

Mix

Output

t_f = 1 s

Mix

Output

Random Pairing

Mix

Output

Interactive Model Comparison With More Configurations

Select a sample and configure two different models to compare their outputs side by side.

Sample

Input Audio

Ground Truth Mix

Ground Truth Track

Configuration A

Model Type

Mix

Output

Configuration B

Model Type

Mix

Output