Merge pull request #2 from yashkant/update-py3

Add VisDial Code, Add VisDial-Captioning Worker, Remove Legacy Code
Cloud-CV · Aug 21, 2019 · 73af9f7 · 73af9f7
2 parents ad95136 + e5d26d5
commit 73af9f7
Show file tree

Hide file tree

Showing 58 changed files with 3,458 additions and 2,134 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,10 @@
-data/
+# Demo
+/data/
 media/
+viscap/captioning/detectron/
+viscap/captioning/model_data/
+viscap/checkpoints/
+viscap/data/
 
 *.pyc
 db.sqlite3

diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,9 @@
-[submodule "neuraltalk2"]
-	path = neuraltalk2
-	url = https://github.com/karpathy/neuraltalk2.git
+[submodule "viscap/captioning/vqa-maskrcnn-benchmark"]
+	path = viscap/captioning/vqa-maskrcnn-benchmark
+	url = https://gitlab.com/yashkant/vqa-maskrcnn-benchmark/
+[submodule "viscap/captioning/fastText"]
+	path = viscap/captioning/fastText
+	url = https://github.com/facebookresearch/fastText
+[submodule "viscap/captioning/pythia"]
+	path = viscap/captioning/pythia
+	url = https://github.com/facebookresearch/pythia/
diff --git a/README.md b/README.md
@@ -1,120 +1,133 @@
-# Visual Chatbot
 
-## Introduction
+Visual Chatbot
+============
+Demo for the paper (**Now upgraded to Pytorch, for the Lua-Torch version see [tag]()**). 
 
-Demo for the paper
-
-**[Visual Dialog][1]**  
+**[Visual Dialog][1]**  (CVPR 2017 [Spotlight][4]) </br>
 Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra  
-[arxiv.org/abs/1611.08669][1]  
-[CVPR 2017][4] (Spotlight)
-
+Arxiv Link: [arxiv.org/abs/1611.08669][1]  
 Live demo: http://visualchatbot.cloudcv.org
 
+[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")
+
+Introduction
+---------------
 **Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’!
 
-[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")
+What has changed since the last version?
+---------------------------------------------------
+The model-building code is completely shifted to Pytorch, we have put in a much improved [Bottom Up Top Down][12] captioning model from [Pythia][10] and Mask-RCNN feature extractor from [maskrcnn-benchmark][13]. The Visdial model is borrowed from [visdial-challenge-starter][14] code. 
 
-## Installation Instructions
+Please follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [tag](). 
 
-### Installing the Essential requirements
+Setup and Dependencies
+------------------------------
+Start with installing the Build Essentials , [Redis Server][5] and [RabbiMQ Server][6].
+```sh
+sudo apt-get update
 
-```shell
+# download and install build essentials
 sudo apt-get install -y git python-pip python-dev
-sudo apt-get install -y python-dev
-sudo apt-get install -y autoconf automake libtool curl make g++ unzip
+sudo apt-get install -y autoconf automake libtool 
 sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
-sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
-```
-
-### Install Torch
-
-```shell
-git clone https://github.com/torch/distro.git ~/torch --recursive
-cd ~/torch; bash install-deps;
-./install.sh
-source ~/.bashrc
-```
-
-### Install PyTorch(Python Lua Wrapper)
-
-```shell
-git clone https://github.com/hughperkins/pytorch.git
-cd pytorch
-source ~/torch/install/bin/torch-activate
-./build.sh
-```
+sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
 
-### Install RabbitMQ and Redis Server
-
-```shell
+# download and install redis-server and rabbitmq-server
 sudo apt-get install -y redis-server rabbitmq-server
 sudo rabbitmq-plugins enable rabbitmq_management
 sudo service rabbitmq-server restart 
 sudo service redis-server restart
 ```
 
-### Lua dependencies
-
-```shell
-luarocks install loadcaffe
-```
-
-The below two dependencies are only required if you are going to use GPU
+#### Environment Setup
 
-```shell
-luarocks install cudnn
-luarocks install cunn
-```
-
-### Cuda Installation
+You can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [downloads page][17] and proceed below. 
 
-Note: CUDA and cuDNN is only required if you are going to use GPU
 
-Download and install CUDA and cuDNN from [nvidia website](https://developer.nvidia.com/cuda-downloads) 
+```sh
+# clone and download submodules
+git clone --recursive https://www.github.com/yashkant/visual-chatbot.git
 
-### Install dependencies
+# create and activate new environment
+conda create -n vischat python=3.6.8
+conda activate vischat
 
-```shell
-git clone https://github.com/Cloud-CV/visual-chatbot.git
-cd visual-chatbot
-git submodule init && git submodule update
-sh models/download_models.sh
+# install the requirements of chatbot and visdial-starter code
+cd visual-chatbot/
 pip install -r requirements.txt
 ```
 
-If you have not used nltk before, you will need to download a tokenization model.
+#### Downloads
+Download the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files.
+```sh
+sh viscap/download_models.sh
+```
 
-```shell
-python -m nltk.downloader punkt
+#### Install Submodules
+Install Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction. 
+```sh
+# install fastText (dependency of pythia)
+cd viscap/captioning/fastText
+pip install -e .
+
+# install pythia for using butd model
+cd ../pythia/
+sed -i '/torch/d' requirements.txt
+pip install -e .
+
+# install maskrcnn-benchmark for feature extraction
+cd ../vqa-maskrcnn-benchmark/
+python setup.py build
+python setup.py develop
+cd ../../../
 ```
+#### Cuda Installation
 
-Change lines 2-4 of `neuraltalk2/misc/LanguageModel.lua` to the following:
+Note: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [nvidia website][18].  
 
-```shell
-local utils = require 'neuraltalk2.misc.utils'
-local net_utils = require 'neuraltalk2.misc.net_utils'
-local LSTM = require 'neuraltalk2.misc.LSTM'
+#### NLTK
+We use `PunktSentenceTokenizer` from nltk, download it if you haven't already. 
+```sh
+python -c "import nltk; nltk.download('punkt')"
 ```
 
-### Create the database
 
-```shell
+## Let's run this now! 
+#### Setup the database
+```
+# create the database
 python manage.py makemigrations chat
 python manage.py migrate
 ```
+#### Run server and worker
+Launch two separate terminals and run worker and server code.   
+```sh
+# run rabbitmq worker on first terminal
+# warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing
+python worker_viscap.py
+
+# run development server on second terminal
+python manage.py runserver
+```
+You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
 
-### Running the RabbitMQ workers and Development Server
+## Issues
+If you run into incompatibility issues, please take a look [here][7] and [here][8]. 
 
-Open 3 different terminal sessions and run the following commands:
+## Model Checkpoint and Features Used
+Performance on `v1.0 test-std` (trained on `v1.0` train + val):
 
-```shell
-python worker.py
-python worker_captioning.py
-python manage.py runserver
-```
+  Model  |  R@1   |  R@5   |  R@10  | MeanR  |  MRR   |  NDCG  |
+ ------- | ------ | ------ | ------ | ------ | ------ | ------ |
+[lf-gen-mask-rcnn-x101-demo][20]  | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 |
 
-You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
+Extracted features from `VisDial v1.0` used to train the above model are here: 
+
+- [features_mask_rcnn_x101_train.h5][21]: Mask-RCNN features with 100 proposals per image train split.
+- [features_mask_rcnn_x101_val.h5][22]: Mask-RCNN features with 100 proposals per image val split.
+- [features_mask_rcnn_x101_test.h5][23]: Mask-RCNN features with 100 proposals per image test split.
+
+*Note*: Above features have key `image_id` (from earlier versions) renamed as `image_ids`.
 
 ## Cite this work
 
@@ -131,24 +144,43 @@ If you find this code useful, consider citing our work:
 ```
 
 ## Contributors
-
+* [Yash Kant][19] ([email protected])
 * [Deshraj Yadav][2] ([email protected])
 * [Abhishek Das][3] ([email protected])
 
 ## License
 
 BSD
 
-## Helpful Issues 
-Problems installing uwsgi: https://github.com/unbit/uwsgi/issues/1770 
-
-Problems with asgiref: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer 
-## Credits
+## Credits and Acknowledgements
 
 - Visual Chatbot Image: "[Robot-clip-art-book-covers-feJCV3-clipart](https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png)" by [Wikimedia Commons](https://commons.wikimedia.org) is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
-
+- The beam-search implementation was borrowed as it is from [AllenNLP](15).
+- The vqa-maskrcnn-benchmark code used was forked from @meetshah1995's [fork](16) of the original repository.
+- The VisDial model is borrowed from [visdial-starter-challenge ][14].
+- The BUTD captioning model comes from this awesome repository [Pythia][10].
 
 [1]: https://arxiv.org/abs/1611.08669
 [2]: http://deshraj.github.io
 [3]: https://abhishekdas.com
 [4]: http://cvpr2017.thecvf.com/
+[5]: https://redis.io/
+[6]: https://www.rabbitmq.com/
+[7]: https://github.com/unbit/uwsgi/issues/1770
+[8]: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer
+[9]: https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark)
+[10]: https://github.com/facebookresearch/pythia/
+[11]: https://github.com/facebookresearch/fastText/
+[12]: https://arxiv.org/abs/1707.07998
+[13]: https://github.com/facebookresearch/maskrcnn-benchmark
+[14]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/
+[15]: https://www.github.com/allenai/allennlp
+[16]: https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/
+[17]: https://conda.io/docs/user-guide/install/download.html
+[18]: https://developer.nvidia.com/cuda-downloads
+[19]: https://github.com/yashkant
+[20]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth
+[21]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5
+[22]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5
+[23]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5
+
diff --git a/captioning.lua b/captioning.lua