Learning to Learn Math Competition (COMS 4995 Summer 2021)

Submission for Hanna Kartynnik (hk3129) and Jeffrey Bennett (jjb2238)

Overview

The learning to learn math competition provided a real test case for us to apply some of the tools and techniques learned during the Summer 2021 Deep Learning course. In addition to submitting results with our classmates to get on the Kaggle leader board, our team had the following goals:

Gain practical experience setting up and configuring various machines and environments with GPU resources
Work on different data processing techniques
Experiment with different transformer architectures
Use graph neural networks and the DGL library
Develop code to split models across multiple GPUs
Investigate the effectiveness of pre-training on similar datasets

Hardware Setup

Throughout the competition we utilized and configured many different local and cloud resources. The largest models that we tested were used on a local desktop we had access to with 3 NVIDIA Quadro RTX 6000 GPUs.

Installation

To install the python dependencies, first run the following command:

pip install -r requirements.txt

We used CUDA 11 for all of our GPU support. Because of this, it is necessary to build DGL from source in order to work with this version of CUDA. This can be done as follows:

## Install the necessary linux libraries
sudo apt-get update
sudo apt-get install -y build-essential python3-dev make cmake

## Clone the DGL source code
git clone --recurse-submodules https://github.com/dmlc/dgl.git

## Build with CUDA
mkdir build
cd build
cmake -DUSE_CUDA=ON ..
make -j4

## Install the Python bindings
cd ../python
python setup.py install

## Depending on the system, if the above didn't work we have had success also using the following
pip install install --upgrade dgl-cu111

Running

Throughout the competition our team tried multiple different methods to arrive at a solution. The below instructions show how to utilize our code to generate different types of results:

Baseline GPT2 Code

As provided, the starter code utilizes GPT2 to solely solve the learning math problem. First, the model can be trained by running the following:

python src/train_gp2_model.py --MATH-dataroot "./dataset/train/*/*" --save-steps 10 --epochs 2 --batch-size-per-replica 4 --arch distilgpt2 --grad-acc-steps 1

Next, the trained model can be evaluated to produce the predictions.csv file by running (where the --load flag is changed to match the path of the trained model):

python src/eval_gpt2_model.py --load "./checkpoints/TEMP/08-14-2021__14:31:24/final_checkpoint/" --math-dataroot "./dataset/test/*/*" --arch distilgpt2

Other Transformer Code

In addition to GPT2, we also tested using different transformer only architectures by changing the starting scripts as needed to work with these formats. The following scripts are provided, which can be trained and evaluated in a similar manner as the GPT2 starter code:

## BERT and Roberta
python src/train_bert_model.py -h

python src/eval_bert_model.py -h

## T5
python src/train_t5_model.py -h

python src/eval_t5_model.py -h

Graph2Tree Code

In order to train a model, the following command can be used:

cd src
python trainer.py

The command can be run with the -h flag to see the various parameters that can be set such as the location of the dataset, the learning rate, max number of epochs, and beam search parameters.

Graph2Tree with Transformer Code

Another promising path was to build off of the provided code to combine transformers with a graph2tree methodology. To do this, first we can run the following to process and tokenize the dataset with the T5 transformer:

python src/export_dataset_t5.py --tokenizer t5 --mode train

After this is run, there is a pickle file generated which contains the processed dataset (t5-math-data.pickle). Once created, a jupyter notebook server can be started, and the user can execute the src/custom-graph2tree-t5.ipynb file which will load the dataset and use the provided code to begin training based on a combination of the provided code and some custom functions to fill in the missing pieces:

jupyter notebook ./src

Resources

The following code bases and repositories were used to complete this competition:

Additionally, we also looked at incorporating the following datasets to use for pre-training:

AMPS

Papers

@inproceedings{xie2019goal,
  title={A Goal-Driven Tree-Structured Neural Model for Math Word Problems.},
  author={Xie, Zhipeng and Sun, Shichao},
  booktitle={IJCAI},
  pages={5299--5305},
  year={2019}
}
@article{lample2019deep,
  title={Deep learning for symbolic mathematics},
  author={Lample, Guillaume and Charton, Fran{\c{c}}ois},
  journal={arXiv preprint arXiv:1912.01412},
  year={2019}
}
@inproceedings{zhang2020graph,
  title={Graph-to-tree learning for solving math word problems},
  author={Zhang, Jipeng and Wang, Lei and Lee, Roy Ka-Wei and Bin, Yi and Wang, Yan and Shao, Jie and Lim, Ee-Peng},
  year={2020},
  organization={Association for Computational Linguistics}
}
@article{li2020graph,
  title={Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word Problem},
  author={Li, Shucheng and Wu, Lingfei and Feng, Shiwei and Xu, Fangli and Xu, Fengyuan and Zhong, Sheng},
  journal={arXiv preprint arXiv:2004.13781},
  year={2020}
}
@article{pikekos2021measuring,
  title={Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning},
  author={Pi{\k{e}}kos, Piotr and Michalewski, Henryk and Malinowski, Mateusz},
  journal={arXiv preprint arXiv:2106.03921},
  year={2021}
}
@article{hendrycks2021measuring,
  title={Measuring mathematical problem solving with the math dataset},
  author={Hendrycks, Dan and Burns, Collin and Kadavath, Saurav and Arora, Akul and Basart, Steven and Tang, Eric and Song, Dawn and Steinhardt, Jacob},
  journal={arXiv preprint arXiv:2103.03874},
  year={2021}
}
@article{liang2021mwp,
  title={MWP-BERT: A Strong Baseline for Math Word Problems},
  author={Liang, Zhenwen and Zhang, Jipeng and Shao, Jie and Zhang, Xiangliang},
  journal={arXiv preprint arXiv:2107.13435},
  year={2021}
}
@article{griffith2021solving,
  title={Solving Arithmetic Word Problems with Transformers and Preprocessing of Problem Text},
  author={Griffith, Kaden and Kalita, Jugal},
  journal={arXiv preprint arXiv:2106.00893},
  year={2021}
}
@article{meng2019solving,
  title={Solving math word problems with double-decoder transformer},
  author={Meng, Yuanliang and Rumshisky, Anna},
  journal={arXiv preprint arXiv:1908.10924},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
dataset		dataset
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to Learn Math Competition (COMS 4995 Summer 2021)

Submission for Hanna Kartynnik (hk3129) and Jeffrey Bennett (jjb2238)

Overview

Hardware Setup

Installation

Running

Baseline GPT2 Code

Other Transformer Code

Graph2Tree Code

Graph2Tree with Transformer Code

Resources

Papers

About

Releases

Packages

Contributors 2

Languages

anna-kartynnik/learning-math

Folders and files

Latest commit

History

Repository files navigation

Learning to Learn Math Competition (COMS 4995 Summer 2021)

Submission for Hanna Kartynnik (hk3129) and Jeffrey Bennett (jjb2238)

Overview

Hardware Setup

Installation

Running

Baseline GPT2 Code

Other Transformer Code

Graph2Tree Code

Graph2Tree with Transformer Code

Resources

Papers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages