Streaming Generation for Music Accompaniment

Yusong Wu1, Mason Wang2, Heidi Lei2, Stephen Brade2, Lancelot Blanchard2, Shih-Lun Wu2,
Aaron Courville1,3, Anna Huang2
1Mila, Université de Montréal 2MIT 3Canada CIFAR AI Chair

Each example row shares the same vocal input used in the listening study. For every system we provide the rendered mix (input plus accompaniment) and the standalone generated accompaniment.

Example 1 - Bass Input
Ground Truth
Mix
Output
Offline StemGen
Mix
Output
Offline Prefix Decoder
Mix
Output
tf = -1 s
Mix
Output
tf = 0 s
Mix
Output
tf = 1 s
Mix
Output
Random Pairing
Mix
Output
Example 2 - Piano Input
Ground Truth
Mix
Output
Offline StemGen
Mix
Output
Offline Prefix Decoder
Mix
Output
tf = -1 s
Mix
Output
tf = 0 s
Mix
Output
tf = 1 s
Mix
Output
Random Pairing
Mix
Output
Example 3 - Piano Input
Ground Truth
Mix
Output
Offline StemGen
Mix
Output
Offline Prefix Decoder
Mix
Output
tf = -1 s
Mix
Output
tf = 0 s
Mix
Output
tf = 1 s
Mix
Output
Random Pairing
Mix
Output
Example 4 - Guitar Input
Ground Truth
Mix
Output
Offline StemGen
Mix
Output
Offline Prefix Decoder
Mix
Output
tf = -1 s
Mix
Output
tf = 0 s
Mix
Output
tf = 1 s
Mix
Output
Random Pairing
Mix
Output
Example 5 - Percussive Input
Ground Truth
Mix
Output
Offline StemGen
Mix
Output
Offline Prefix Decoder
Mix
Output
tf = -1 s
Mix
Output
tf = 0 s
Mix
Output
tf = 1 s
Mix
Output
Random Pairing
Mix
Output

Interactive Model Comparison With More Configurations

Select a sample and configure two different models to compare their outputs side by side.

Configuration A

Mix
Output

Configuration B

Mix
Output