CocoChorales Dataset

Github

Contents

Overview

Data Examples

Statistics

Data Format

Overview

We pair a generative model of notes (Coconet) with a structured synthesis model (MIDI-DDSP) for producing a large-scale open-source dataset (CocoChorales) of chorale audio with rich annotations including mixes, stems, MIDI, performance attributes, and fine-grained synthesis parameters.

Overview

CocoChorales consists of 240,000 pieces, totaling 1411 hours of mixture data. The CocoChorales is one or two magnitudes larger than current MIR datasets. CocoChorales consists of four ensembles in equal number of pieces: string, brass, woodwind, and random.

Dataset Size Comparison

Data Examples

Here we show one example for each ensemble. The data for examples shown here are available for download. They are in the same format and ID as the CocoChorales full dataset.

String (string_track001010)

MIDI

Audio

Mix
Soprano - Violin Alto - Violin Tenor - Cello Bass - Double Bass

Synthesis Parameters (Soprano)

Note Expressions (Soprano): download CSV

Metadata: download YAML

Download all data of the piece

Brass (brass_track049013)

MIDI

Audio

Mix
Soprano - Trumpet Alto - French Horn Tenor - Trombone Bass - Tuba

Synthesis Parameters (Soprano)

Note Expressions (Soprano): download CSV

Metadata: download YAML

Download all data of the piece

Woodwind (woodwind_track097010)

MIDI

Audio

Mix
Soprano - Flute Alto - Oboe Tenor - Clarinet Bass - Bassoon

Synthesis Parameters (Soprano)

Note Expressions (Soprano): download CSV

Metadata: download YAML

Download all data of the piece

Random (random_track145011)

MIDI

Audio

Mix
Soprano - Clarinet Alto - Clarinet Tenor - Saxophone Bass - Double Bass

Synthesis Parameters (Soprano)

Note Expressions (Soprano): download CSV

Metadata: download YAML

Download all data of the piece

Data Format

Please check this Readme file about the data format and file structure.