Emotion-Aligned Contrastive Learning Between Images and Music

Shanti Stewart¹, Kleanthis Avramidis^{1 *}, Tiantian Feng^{1 *}, Shrikanth Narayanan¹
¹ Signal Analysis and Interpretation Lab, University of Southern California
^* Equal contribution

This repository is the official implementation of Emotion-Aligned Contrastive Learning Between Images and Music (accepted at ICASSP 2024).

In this work, we introduce Emo-CLIM, a framework for Emotion-Aligned Contrastive Learning Between Images and Music. Our method learns an emotion-aligned joint embedding space between images and music. This embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of SupCon. By evaluating the joint embeddings through downstream cross-modal retrieval and music tagging tasks, we show that our approach successfully aligns images and music.

We provide code for contrastive pre-training and downstream cross-modal retrieval and music tagging evaluation tasks.

Installation

We recommend using a conda environment with Python >= 3.10 :

conda create -n emo-clim python=3.10
conda activate emo-clim

Clone the repository and install the dependencies:

git clone https://github.com/shantistewart/Emo-CLIM
cd Emo-CLIM && pip install -e .

You will also need to install the CLIP model:

pip install git+https://github.com/openai/CLIP.git

Project Structure

Emo-CLIM/
├── climur/               # core directory for pretraining and downstream evaluation
│  ├── dataloaders/          # PyTorch Dataset classes
│  ├── losses/               # PyTorch loss functions
│  ├── models/               # PyTorch Module classes
│  ├── scripts/              # training and evaluation scripts
│  ├── trainers/             # PyTorch Lightning LightningModule classes
│  └── utils/                # utility functions
├── configs/              # configuration files for training and evaluation
├── data_prep/            # data preparation scripts
├── figures/              # Emo-CLIM figures
├── plots/                # t-SNE plots
├── results_test/         # cross-modal retrieval evaluation results on test set
├── results_val/          # cross-modal retrieval evaluation results on validation set
└── tests/                # test scripts

Citation

If this project helps your research, please cite our paper:

@inproceedings{Stewart-2024-EmoCLIM,
  title={Emotion-Aligned Contrastive Learning Between Images and Music}, 
  author={Stewart, Shanti and Avramidis*, Kleanthis and Feng*, Tiantian and Narayanan, Shrikanth},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP}, 
  year={2024}
}

Contact

If you have any questions, please get in touch: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
climur		climur
configs		configs
data_prep		data_prep
figures		figures
plots		plots
results_test/single_task		results_test/single_task
results_val		results_val
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotion-Aligned Contrastive Learning Between Images and Music

Installation

Project Structure

Citation

Contact

About

Releases

Packages

Contributors 4

Languages

License

shantistewart/Emo-CLIM

Folders and files

Latest commit

History

Repository files navigation

Emotion-Aligned Contrastive Learning Between Images and Music

Installation

Project Structure

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages