Skip to content

v1.2.0 Multi-vocabulary tokenizers for CP Word, Octuple & MuMIDI

Compare
Choose a tag to compare
@Natooz Natooz released this 29 May 11:47
· 409 commits to main since this release

Changes

  • 7fe9df6 becea47 : CP Word, Octuple and MuMIDI tokenizers now have several Vocabulary objects within self.vocab, each for every token type (Pitch, Duration ...). This allows to easily create several input / output layers of different sizes, fitting the token types vocabulary sizes. example here
  • 05c1ab9 MIDITokenizer base class now has MIDITokenizer call (link to midi_to_tokens), len (returns len(self.vocab)) and getitem (returns self.vocab[item], converting a token to an event and vice versa) magic methods.

Compatibility

  • CP Word, Octuple and MuMIDI tokenizations from < v1.2.0 will not be compatible anymore, datasets have to be retokenized

Thanks

Special thanks to @envilk for his contribution !