Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigBird #10183

Merged
merged 88 commits into from Mar 30, 2021
Merged

BigBird #10183

merged 88 commits into from Mar 30, 2021

Conversation

@vasudevgupta7
Copy link
Contributor

@vasudevgupta7 vasudevgupta7 commented Feb 15, 2021

What does this PR do?

This PR will add Google's BigBird "Roberta".

Fixes #6113.

This PR adds three checkpoints of BigBird:

Here a notebook showing how well BigBird works on long-document question answering: https://colab.research.google.com/drive/1DVOm1VHjW0eKCayFq1N2GpY6GR9M4tJP?usp=sharing

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.
@patrickvonplaten

@isollid
Copy link

@isollid isollid commented Feb 24, 2021

Will BigBird-Pegasus be added, and then BigBirdForConditionalGeneration so that summarization will be possible?

@vasudevgupta7
Copy link
Contributor Author

@vasudevgupta7 vasudevgupta7 commented Feb 24, 2021

Yes, we will be adding that soon.

Will BigBird-Pegasus be added, and then BigBirdForConditionalGeneration so that summarization will be possible?

@vasudevgupta7
Copy link
Contributor Author

@vasudevgupta7 vasudevgupta7 commented Feb 25, 2021

Once pre-trained checkpoints are uploaded to huggingface_hub, model & tokenizer can be accessed this way:

from transformers import BigBirdForMaskedLM, BigBirdForPreTraining, BigBirdTokenizer

tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-roberta-base")

# model with LM head
model_with_lm = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base")

# model with pertaining heads
model_for_pretraining = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")
@patrickvonplaten patrickvonplaten changed the title Add BigBird BigBird Mar 29, 2021
Copy link
Member

@sgugger sgugger left a comment

Amazing add! This is a big model and will make for a nice addition. I have left quite a few comments for styling mainly.

On top of that, don't forget to add your model to the main README!

src/transformers/models/auto/modeling_auto.py Outdated Show resolved Hide resolved
tests/test_modeling_big_bird.py Outdated Show resolved Hide resolved
tests/test_modeling_big_bird.py Show resolved Hide resolved
tests/test_modeling_big_bird.py Show resolved Hide resolved
tests/test_tokenization_big_bird.py Outdated Show resolved Hide resolved
Co-authored-by: Sylvain Gugger <[email protected]>
Copy link
Member

@sgugger sgugger left a comment

Made typos in my suggestions, sorry!

Copy link
Member

@LysandreJik LysandreJik left a comment

This is great @vasudevgupta7! I've left a few comments, mostly nits.

This made me think we should really push for fast tokenizers in the templates, as they're arguably more important and useful than their python counterparts.

Thanks a lot for working on this @vasudevgupta7, this is a tremendous effort!

docs/source/model_doc/bigbird.rst Outdated Show resolved Hide resolved
src/transformers/models/big_bird/modeling_big_bird.py Outdated Show resolved Hide resolved
tests/test_modeling_big_bird.py Outdated Show resolved Hide resolved
@vasudevgupta7
Copy link
Contributor Author

@vasudevgupta7 vasudevgupta7 commented Mar 29, 2021

@sgugger, @LysandreJik I updated the code based on your suggestions. Please let me know if I have missed something.

@patrickvonplaten patrickvonplaten merged commit 6dfd027 into huggingface:master Mar 30, 2021
14 checks passed
14 checks passed
@github-actions
run_tests_templates
Details
ci/circleci: build_doc Your tests passed on CircleCI!
Details
ci/circleci: check_code_quality Your tests passed on CircleCI!
Details
ci/circleci: check_repository_consistency Your tests passed on CircleCI!
Details
ci/circleci: run_examples_torch Your tests passed on CircleCI!
Details
ci/circleci: run_tests_custom_tokenizers Your tests passed on CircleCI!
Details
ci/circleci: run_tests_flax Your tests passed on CircleCI!
Details
ci/circleci: run_tests_git_lfs Your tests passed on CircleCI!
Details
ci/circleci: run_tests_pipelines_tf Your tests passed on CircleCI!
Details
ci/circleci: run_tests_pipelines_torch Your tests passed on CircleCI!
Details
ci/circleci: run_tests_tf Your tests passed on CircleCI!
Details
ci/circleci: run_tests_torch Your tests passed on CircleCI!
Details
ci/circleci: run_tests_torch_and_flax Your tests passed on CircleCI!
Details
ci/circleci: run_tests_torch_and_tf Your tests passed on CircleCI!
Details
@LysandreJik
Copy link
Member

@LysandreJik LysandreJik commented Mar 30, 2021

Thank you for taking care of the comments @vasudevgupta7 and for this PR altogether!

@vasudevgupta7 vasudevgupta7 mentioned this pull request Mar 31, 2021
8 tasks done
@sayakmisra
Copy link

@sayakmisra sayakmisra commented Apr 7, 2021

@vasudevgupta7 great work, when are you planning to add the BigBirdForConditionalGeneration? And any plans on adding the pubmed pre-trained models?

@vasudevgupta7
Copy link
Contributor Author

@vasudevgupta7 vasudevgupta7 commented Apr 7, 2021

@sayakmisra I am currently working on it. You can track PR #10991.

@jigsaw2212
Copy link

@jigsaw2212 jigsaw2212 commented Apr 28, 2021

@vasudevgupta7 currently loading vasudevgupta/bigbird-pegasus-large-bigpatent into BigBirdForConditionalGeneration leads to some weights of the checkpoint not being used for initializing the model. Is there a workaround for this?

Can we have separate pretrained checkpoints for BigBird and Pegasus without the finetuning, so that we can use the Pegasus decoder along with the BigBird encoder in our code?

@patrickvonplaten
Copy link
Member

@patrickvonplaten patrickvonplaten commented Apr 29, 2021

Hey @jigsaw2212,

we are still working on integrating BigBirdPegasus -> for now only the google/bigbird-... are fully supported. BigBirdPegasus will be merged in 1,2 weeks

Iwontbecreative added a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
* init bigbird

* model.__init__ working, conversion script ready, config updated

* add conversion script

* BigBirdEmbeddings working :)

* slightly update conversion script

* BigBirdAttention working :) ; some bug in layer.output.dense

* add debugger-notebook

* forward() working for BigBirdModel :) ; replaced gelu with gelu_fast

* tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)

* BigBirdModel working in block-sparse attention mode :)

* add BigBirdForPreTraining

* small fix

* add tokenizer for BigBirdModel

* fix config & hence modeling

* fix base prefix

* init testing

* init tokenizer test

* pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

* remove position_embedding_type arg

* complete normal tests

* add comments to block sparse attention

* add attn_probs for sliding & global tokens

* create fn for block sparse attn mask creation

* add special tests

* restore pos embed arg

* minor fix

* attn probs update

* make big bird fully gpu friendly

* fix tests

* remove pruning

* correct tokenzier & minor fixes

* update conversion script , remove norm_type

* tokenizer-inference test add

* remove extra comments

* add docs

* save intermediate

* finish trivia_qa conversion

* small update to forward

* correct qa and layer

* better error message

* BigBird QA ready

* fix rebased

* add triva-qa debugger notebook

* qa setup

* fixed till embeddings

* some issue in q/k/v_layer

* fix bug in conversion-script

* fixed till self-attn

* qa fixed except layer norm

* add qa end2end test

* fix gradient ckpting ; other qa test

* speed-up big bird a bit

* hub_id=google

* clean up

* make quality

* speed up einsum with bmm

* finish perf improvements for big bird

* remove wav2vec2 tok

* fix tokenizer

* include docs

* correct docs

* add helper to auto pad block size

* make style

* remove fast tokenizer for now

* fix some

* add pad test

* finish

* fix some bugs

* fix another bug

* fix buffer tokens

* fix comment and merge from master

* add comments

* make style

* commit some suggestions

Co-authored-by: Sylvain Gugger <[email protected]>

* Fix typos

* fix some more suggestions

* add another patch

Co-authored-by: Sylvain Gugger <[email protected]>

* fix copies

* another path

Co-authored-by: Lysandre Debut <[email protected]>

* update

* update nit suggestions

* make style

Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

7 participants