What does this PR do?
This PR will add Google's BigBird "Roberta".
This PR adds three checkpoints of BigBird:
Here a notebook showing how well BigBird works on long-document question answering: https://colab.research.google.com/drive/1DVOm1VHjW0eKCayFq1N2GpY6GR9M4tJP?usp=sharing
Who can review?
Anyone in the community is free to review the PR once the tests have passed.
…tion ; till now everything working :)
Once pre-trained checkpoints are uploaded to
from transformers import BigBirdForMaskedLM, BigBirdForPreTraining, BigBirdTokenizer tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-roberta-base") # model with LM head model_with_lm = BigBirdForMaskedLM.from_pretrained("google/bigbird-roberta-base") # model with pertaining heads model_for_pretraining = BigBirdForPreTraining.from_pretrained("google/bigbird-roberta-base")
…tn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements
sgugger left a comment
Amazing add! This is a big model and will make for a nice addition. I have left quite a few comments for styling mainly.
On top of that, don't forget to add your model to the main README!
LysandreJik left a comment
This is great @vasudevgupta7! I've left a few comments, mostly nits.
This made me think we should really push for fast tokenizers in the templates, as they're arguably more important and useful than their python counterparts.
Thanks a lot for working on this @vasudevgupta7, this is a tremendous effort!
@vasudevgupta7 currently loading
Can we have separate pretrained checkpoints for BigBird and Pegasus without the finetuning, so that we can use the Pegasus decoder along with the BigBird encoder in our code?
* init bigbird * model.__init__ working, conversion script ready, config updated * add conversion script * BigBirdEmbeddings working :) * slightly update conversion script * BigBirdAttention working :) ; some bug in layer.output.dense * add debugger-notebook * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :) * BigBirdModel working in block-sparse attention mode :) * add BigBirdForPreTraining * small fix * add tokenizer for BigBirdModel * fix config & hence modeling * fix base prefix * init testing * init tokenizer test * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements * remove position_embedding_type arg * complete normal tests * add comments to block sparse attention * add attn_probs for sliding & global tokens * create fn for block sparse attn mask creation * add special tests * restore pos embed arg * minor fix * attn probs update * make big bird fully gpu friendly * fix tests * remove pruning * correct tokenzier & minor fixes * update conversion script , remove norm_type * tokenizer-inference test add * remove extra comments * add docs * save intermediate * finish trivia_qa conversion * small update to forward * correct qa and layer * better error message * BigBird QA ready * fix rebased * add triva-qa debugger notebook * qa setup * fixed till embeddings * some issue in q/k/v_layer * fix bug in conversion-script * fixed till self-attn * qa fixed except layer norm * add qa end2end test * fix gradient ckpting ; other qa test * speed-up big bird a bit * hub_id=google * clean up * make quality * speed up einsum with bmm * finish perf improvements for big bird * remove wav2vec2 tok * fix tokenizer * include docs * correct docs * add helper to auto pad block size * make style * remove fast tokenizer for now * fix some * add pad test * finish * fix some bugs * fix another bug * fix buffer tokens * fix comment and merge from master * add comments * make style * commit some suggestions Co-authored-by: Sylvain Gugger <[email protected]> * Fix typos * fix some more suggestions * add another patch Co-authored-by: Sylvain Gugger <[email protected]> * fix copies * another path Co-authored-by: Lysandre Debut <[email protected]> * update * update nit suggestions * make style Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>