char-mamba

This repository contains a simple script for Mamba-based Character-level Language Modeling. It can be considered the Mamba version of char-rnn. Due to its simplicity, this script can serve as a template for training Mamba models from scratch, applicable to a wide array of sequence-to-sequence problems.

Requirements

Usage

main.py supports two subcommands: train and generate.

Train

To get started, use the following command to train a simple model:

python main.py train --cut-dataset=100

This command will train Mamba on the first 100 * 256 characters of the Tiny Shakespeare dataset (downloading it if necessary) for 10 epochs, save the model, and produce a sample generation. It takes about 10 seconds on GTX 1650, and the resulting model is able to generate legitimate English words.

Once you make sure that it's working, you can train on the whole dataset by removing --cut-dataset=100 argument. For more command line arguments, see the end of main.py.

The training code is based on mamba-dive's fine-tuning script, which in turn is based on mamba-chat.

Generate

After training the model, you can use the generate subcommand to load the saved model and generate text:

python main.py generate
# Generate with a prompt:
python main.py generate --prompt=First
# Generate batched:
python main.py generate --batch=4

The generation code is based on this script and supports most of the same arguments.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

char-mamba

Requirements

Usage

Train

Generate

About

Releases

Packages

Languages

necrashter/char-mamba

Folders and files

Latest commit

History

Repository files navigation

char-mamba

Requirements

Usage

Train

Generate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages