Skip to content

v1.4.0 Data augmentation and optimization

Compare
Choose a tag to compare
@Natooz Natooz released this 13 Jan 16:46
· 307 commits to main since this release

This pretty big update brings data augmentation, some bug fixes and optimizations, allowing to write more elegant code.

Changes

  • 8f201e0 308fb27 Data augmentation methods ! 🙌 They can be applied on both MIDI and tokens, to augment data by shifting the pitch, velocity and duration values.
  • 1d8e903 You can perform data augmentation while tokenizing a dataset (tokenize_midi_dataset method) with the data_augment_offsets argument. This will be done at the token level, as its faster than augmenting MIDI objects.
  • 0634ade BPE is now implemented in the main tokenizer class! This means all tokenizers can benefit form it in a much prettier way!
  • 0634ade bpe method renamed to learn_bpe, and now returns metrics (that are also showed in the progress bar during the learning) on the number of token combinations and sequence length reduction
  • 7b8c977 Retrocompatibility when loading tokenizer config files with BPE from older versions
  • 3cea9aa @nturusin Example notebook of GPT2 Hugging Face music transformer: fixes in training
  • 65afa6b The tokens_to_midi and save_tokens methods can now receive tokens as Tensors and numpy arrays. PyTorch, TensorFlow and Jax (numpy) tensors are supported. The convert_tokens_tensors_to_list decorator will convert them to lists, you can use it on your custom methods.
  • aab64aa The __call__ magic method now automatically route to midi_to_tokens or tokens_to_midi following what you give it. You can now use more elegantly tokenizers as tokenizer(midi_obj) or tokenizer(generated_tokens).
  • e90b20a Bugfix in Structured causing a possible infinite while loop with illegal token types successions
  • 947af8c Big refactor of MuMIDI, which have now fixed vocab / type idx. It is easier to handle and use. (thanks @gonzaloarca)
  • 947af8c CPWord "Ignore" tokens are all renamed Ignore_None by convention, making operations easier in data augmentation and other methods.

Compatibility

  • code with BPE would have to updated: remove bpe(tokenizer) and just declare tokenizers normally, rename the bpe method to learn_bpe
  • MuMIDI tokens and tokenizers will be incompatible with v1.4.0