Skip to content

Releases: Natooz/MidiTok

v1.2.8

13 Sep 14:55
Compare
Choose a tag to compare

Changes

  • 82b2a1b Fix in MuMIDI token_types_errors()
  • 0869c23 Fix, BPE tokenizers now update the vocabulary _token_types_indexes attribute after being modified
  • b3642c1 EOS key added to token_types_graph, prevents crash just in case
  • 7d873ca MIDI objects converted from tokens now have max_tick attribute calculated
  • 770d8b8 0869c23 small fixes and typo corrections
  • Fixes in tests and GitHub Action integration

Compatibility

  • All good !

v1.2.7 Small improvements

02 Aug 07:17
Compare
Choose a tag to compare

Changes

  • 22fee1d TimeSignature parameter automatically set to False for incompatible tokenizers, also fixing a bug when it was not provided by the user
  • 2e958f1 TimeSignature of MIDI set to 4/4 if the original MIDI had none (rare but can happen)
  • a46fd56 unused import removed
  • f416ff5 BPE calculation in apply_bpe method speed up by precomputing token successions in a class attribute

Compatibility

  • All good !

v1.2.6 Bugfixes

22 Jul 08:17
Compare
Choose a tag to compare

Changes

  • 168c8c3 Bugfix in Octuple vocabulary creation, now only creates the selected programs
  • bfe987e fix in MuMIDI and Octuple token_types_errors methods that could make crash when analyzing special tokens (Pas, Mask ...)
  • 9567387 bugfix in CPWord decoding (crash with special tokens), and Octuple now saves _sos_eos and _mask attributes in save_params

Compatibility

  • All good !

v1.2.5 TSD tokenizer & small fixes

16 Jul 10:24
Compare
Choose a tag to compare

Changes

  • 67c2926 Introducing TSD tokenization (Time Shift Duration). It is similar to MIDI-Like but uses Duration tokens instead of Note-Off, and its main difference with REMI is the way it represents time.
  • 8af6a6b _add_pad_type_to_graph method has been renamed _add_special_tokens_to_types_graph, and now also adds SOS, EOS, and MASK tokens to the graph.
  • f755c70 and 4b069a2 add_bpe_to_tokens_type_graph method for byte pair encoding, fixing a bug when loading a tokenizer from config file.

Compatibility

  • _add_pad_type_to_graph is still supported but will be removed in a future update, you should replace it by _add_special_tokens_to_types_graph in your code to stay up to date

v1.2.4 Byte Pair Encoding

10 Jul 13:31
93517e2
Compare
Choose a tag to compare

Changes

  • Byte Pair Encoding is up ! it works with any tokenizer (except multi-embedding like CP Word or Octuple) as a wrapper to use as bpe(tokenizer_class, params) (see example in readme)
  • 72a0f32 Vocabulary class now have a update_token_types_indexes method to create its _token_types_indexes attribute, which can be called after loading a tokenizer with its vocabulary saved (as with BPE)
  • d232f4a Structured now takes additional_tokens as constructor argument, to aligning with all other tokenizers
  • 4b0dc9f Bugfix in MIDITokenizer base class for rest and beat range attributes when loading class from params
  • eb3612f save_tokens now saves tokens as a dictionary with tokens and programs keys so that the distinction is clear
  • tqdm is now used (and required) in tokenize_dataset and bpe methods

Compatibility

  • Structured now takes additional_tokens as constructor argument, to aligning with all other tokenizers
  • As from v1.2.4, tokens saved with the save_tokens method will now be saved as a dictionary, so that no confusion is made between tracks and programs (as it could before). You can still load tokens saved with < v1.2.4 with load_tokens with no consequences, as you then handle how to index from it.

v1.2.3 Bufix in merge_tracks_per_class

02 Jul 17:02
Compare
Choose a tag to compare

Changes

  • 87db480 fix in merge_tracks_per_class, some tracks were omitted when filtering pitch / tessitura

v1.2.2 Multitrack Tokenization Program reduced sets

02 Jul 14:43
Compare
Choose a tag to compare

Changes

  • bd951ec merge_tracks_per_class now allows to remove the notes with pitch out of the recommended range (tessitura) as defined by the General MIDI 2 specs. Use the filter_pitches argument.
  • 611754d MuMIDI and Octuple now allowing to use custom sets of programs, reducing their vocabulary size. Use the program argument when constructing the the tokenizers.

v1.2.1 Constants format update & utils module

01 Jul 17:25
Compare
Choose a tag to compare

Changes from 4141e00

  • get_midi_programs, remove_duplicated_notes, detect_chords, merge_tracks, merge_same_program_tracks and current_bar_pos methods have been moved from miditok/midi_tokenizer_base.py to miditok/utils.py, you can call them with miditok.utils.the_method()
  • New method merge_tracks_per_class which allows to merge tracks of a MIDI of the same instrument class
  • MIDI_INSTRUMENTS pitch range value changed from tuple to range
  • INSTRUMENT_CLASSES changed from type Dict[int: Tuple[int, str]] to List[Dict[str: Union[str, range]]] so its fits the format of other constants. The index of the list corresponds to the index of each class.
  • INSTRUMENT_CLASSES_RANGES replaced by CLASS_OF_INST to easily gets the class of any instrument / track by its program
  • Minor cleans in imports

Compatibility

  • See first point above if you used utils functions
  • See above if you used MIDI_INSTRUMENTS, INSTRUMENT_CLASSES and INSTRUMENT_CLASSES_RANGES constants

v1.2.0 Multi-vocabulary tokenizers for CP Word, Octuple & MuMIDI

29 May 11:47
Compare
Choose a tag to compare

Changes

  • 7fe9df6 becea47 : CP Word, Octuple and MuMIDI tokenizers now have several Vocabulary objects within self.vocab, each for every token type (Pitch, Duration ...). This allows to easily create several input / output layers of different sizes, fitting the token types vocabulary sizes. example here
  • 05c1ab9 MIDITokenizer base class now has MIDITokenizer call (link to midi_to_tokens), len (returns len(self.vocab)) and getitem (returns self.vocab[item], converting a token to an event and vice versa) magic methods.

Compatibility

  • CP Word, Octuple and MuMIDI tokenizations from < v1.2.0 will not be compatible anymore, datasets have to be retokenized

Thanks

Special thanks to @envilk for his contribution !

v1.1.11 Octuple bugfix & mask class argument

13 May 19:57
Compare
Choose a tag to compare

Changes

  • #13 d930de5 Fail check when decoding tokens with Octuple, could lead to errors with wrong TimeSignature tokens
  • a39b390 mask argument is now present for all tokenizer constructors. Masking tokens are then added to vocabularies at initialization.
  • af85740 unused Bar token removed from the vocabulary of Structured

Compatibility

  • Structured: Bar token (value 1) has been removed, subsequent tokens values should be decreased by 1
  • MASK token is now added to vocabulary at tokenization initialization, token indexes could be shifted in comparison with previous versions < 1.1.11, you should probably re-tokenize your data and retrain your models with v1.1.11 if you used masking tokens