Shanti Stewart1,
Kleanthis Avramidis1 *,
Tiantian Feng1 *,
Shrikanth Narayanan1
1 Signal Analysis and Interpretation Lab, University of Southern California
* Equal contribution
This repository is the official implementation of Emotion-Aligned Contrastive Learning Between Images and Music (accepted at ICASSP 2024).
In this work, we introduce Emo-CLIM, a framework for Emotion-Aligned Contrastive Learning Between Images and Music. Our method learns an emotion-aligned joint embedding space between images and music. This embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of SupCon. By evaluating the joint embeddings through downstream cross-modal retrieval and music tagging tasks, we show that our approach successfully aligns images and music.
We provide code for contrastive pre-training and downstream cross-modal retrieval and music tagging evaluation tasks.
We recommend using a conda environment with Python >= 3.10
:
conda create -n emo-clim python=3.10
conda activate emo-clim
Clone the repository and install the dependencies:
git clone https://github.com/shantistewart/Emo-CLIM
cd Emo-CLIM && pip install -e .
You will also need to install the CLIP model:
pip install git+https://github.com/openai/CLIP.git
Emo-CLIM/
├── climur/ # core directory for pretraining and downstream evaluation
│ ├── dataloaders/ # PyTorch Dataset classes
│ ├── losses/ # PyTorch loss functions
│ ├── models/ # PyTorch Module classes
│ ├── scripts/ # training and evaluation scripts
│ ├── trainers/ # PyTorch Lightning LightningModule classes
│ └── utils/ # utility functions
├── configs/ # configuration files for training and evaluation
├── data_prep/ # data preparation scripts
├── figures/ # Emo-CLIM figures
├── plots/ # t-SNE plots
├── results_test/ # cross-modal retrieval evaluation results on test set
├── results_val/ # cross-modal retrieval evaluation results on validation set
└── tests/ # test scripts
If this project helps your research, please cite our paper:
@inproceedings{Stewart-2024-EmoCLIM,
title={Emotion-Aligned Contrastive Learning Between Images and Music},
author={Stewart, Shanti and Avramidis*, Kleanthis and Feng*, Tiantian and Narayanan, Shrikanth},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP},
year={2024}
}
If you have any questions, please get in touch: [email protected]