CocoChorales Dataset
Github
Contents
Overview
Data Examples
Statistics
Data Format
Overview
We pair a generative model of notes (Coconet) with a structured synthesis model (MIDI-DDSP) for producing a large-scale open-source dataset (CocoChorales) of chorale audio with rich annotations including mixes, stems, MIDI, performance attributes, and fine-grained synthesis parameters.
CocoChorales consists of 240,000 pieces, totaling 1411 hours of mixture data. The CocoChorales is one or two magnitudes larger than current MIR datasets. CocoChorales consists of four ensembles in equal number of pieces: string, brass, woodwind, and random.
Data Examples
Here we show one example for each ensemble. The data for examples shown here are available for download. They are in the same format and ID as the CocoChorales full dataset.
String (string_track001010)
MIDI
Audio
Mix |
---|
Soprano - Violin | Alto - Violin | Tenor - Cello | Bass - Double Bass |
---|---|---|---|
Synthesis Parameters (Soprano)
Note Expressions (Soprano): download CSV
Metadata: download YAML
Download all data of the piece
Brass (brass_track049013)
MIDI
Audio
Mix |
---|
Soprano - Trumpet | Alto - French Horn | Tenor - Trombone | Bass - Tuba |
---|---|---|---|
Synthesis Parameters (Soprano)
Note Expressions (Soprano): download CSV
Metadata: download YAML
Download all data of the piece
Woodwind (woodwind_track097010)
MIDI
Audio
Mix |
---|
Soprano - Flute | Alto - Oboe | Tenor - Clarinet | Bass - Bassoon |
---|---|---|---|
Synthesis Parameters (Soprano)
Note Expressions (Soprano): download CSV
Metadata: download YAML
Download all data of the piece
Random (random_track145011)
MIDI
Audio
Mix |
---|
Soprano - Clarinet | Alto - Clarinet | Tenor - Saxophone | Bass - Double Bass |
---|---|---|---|
Synthesis Parameters (Soprano)
Note Expressions (Soprano): download CSV
Metadata: download YAML
Download all data of the piece
Data Format
Please check this Readme file about the data format and file structure.