Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem about downstream task data. #2

Open
PosoSAgapo opened this issue Dec 28, 2020 · 1 comment
Open

Problem about downstream task data. #2

PosoSAgapo opened this issue Dec 28, 2020 · 1 comment

Comments

@PosoSAgapo
Copy link

PosoSAgapo commented Dec 28, 2020

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.

As I noticed, downstream task data is provided in processed formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.

The pattern_extraction.py code seems only works for generating the pre-train TacoML data, but cannot process downstream tasks data and these downstream data in data directory is provided in processed format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in pattern_extraction.py , so I guess down stream task used a different extraction code.

This puts a dead end in reproducing these results using other transformer models. Any method in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ? Or any plan in releasing these downstream task processing codes and original data ? It is such a pity if this code only works for Bert :)

@Slash0BZ
Copy link
Member

Thanks, we have realized this and will work on something new. Meanwhile, if you want to work with other transformers, you will need to modify pattern_extraction.py, which contains the entire process of parsing raw textual data into the Bert format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants