Skip to content

Commit

Permalink
docs + readme update
Browse files Browse the repository at this point in the history
  • Loading branch information
Natooz committed Oct 24, 2023
1 parent 8dddcca commit 87a4988
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ tokenizer.learn_bpe(

# Saving our tokenizer, to retrieve it back later with the load_params method
tokenizer.save_params(Path("path", "to", "save", "tokenizer.json"))
# And pushing it to the Hugging Face hub (you can download it back with .from_pretrained)
tokenizer.push_to_hub("username/model-name", private=True, token="your_hugging_face_token")

# Applies BPE to the previous tokens
tokenizer.apply_bpe_to_dataset(Path('path', 'to', 'tokens_noBPE'), Path('path', 'to', 'tokens_BPE'))
Expand Down
4 changes: 2 additions & 2 deletions docs/hf_hub.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ Hugging Face hub
What is the Hugging Face hub
---------------------------------

The `Hugging Face Hub <https://huggingface.co>`_ is a model and dataset sharing platform which is widely used in the AI community. It allows to freely upload, share and download models and datasets, directly in your code. Its interactions rely on an open-source Python package named `huggingface_hub <https://github.com/huggingface/huggingface_hub>`_. As it works seamlessly in the Hugging Face ecosystem, especially the `Transformers <https://huggingface.co/docs/transformers/index>`_ or `Diffusers <https://huggingface.co/docs/diffusers/index>`_ libraries, it stood out and became one of the preferred way to openly share and download models.
The `Hugging Face Hub <https://huggingface.co>`_ is a model and dataset sharing platform which is widely used in the AI community. It allows to freely upload, share and download models and datasets, directly in your code in a very convenient way. Its interactions rely on an open-source Python package named `huggingface_hub <https://github.com/huggingface/huggingface_hub>`_. As it works seamlessly in the Hugging Face ecosystem, especially the `Transformers <https://huggingface.co/docs/transformers/index>`_ or `Diffusers <https://huggingface.co/docs/diffusers/index>`_ libraries, it stood out and became one of the preferred way to openly share and download models.

Now when downloading a Transformer model, you will need to also download its associated tokenizer to be able to "dialog" with it. MidiTok allows you to push and download tokenizers in similar way to what is done in the Hugging Face Transformers library.
Now when downloading a Transformer model, you will need to also download its associated tokenizer to be able to "dialog" with it. Likewise, if you want to share one of your models, you will need to share its tokenizer too for people to be able to use it. MidiTok allows you to push and download tokenizers in similar way to what is done in the Hugging Face Transformers library.

How MidiTok interoperates with the hub
------------------------------------------
Expand Down

0 comments on commit 87a4988

Please sign in to comment.