Skip to content

Simple Mamba-based Character-level Language Modeling

Notifications You must be signed in to change notification settings

necrashter/char-mamba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

char-mamba

This repository contains a simple script for Mamba-based Character-level Language Modeling. It can be considered the Mamba version of char-rnn. Due to its simplicity, this script can serve as a template for training Mamba models from scratch, applicable to a wide array of sequence-to-sequence problems.

Requirements

Usage

main.py supports two subcommands: train and generate.

Train

To get started, use the following command to train a simple model:

python main.py train --cut-dataset=100

This command will train Mamba on the first 100 * 256 characters of the Tiny Shakespeare dataset (downloading it if necessary) for 10 epochs, save the model, and produce a sample generation. It takes about 10 seconds on GTX 1650, and the resulting model is able to generate legitimate English words.

Once you make sure that it's working, you can train on the whole dataset by removing --cut-dataset=100 argument. For more command line arguments, see the end of main.py.

The training code is based on mamba-dive's fine-tuning script, which in turn is based on mamba-chat.

Generate

After training the model, you can use the generate subcommand to load the saved model and generate text:

python main.py generate
# Generate with a prompt:
python main.py generate --prompt=First
# Generate batched:
python main.py generate --batch=4

The generation code is based on this script and supports most of the same arguments.

About

Simple Mamba-based Character-level Language Modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages