Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ImageClassificationPipeline #11598

Merged
merged 3 commits into from May 7, 2021
Merged

Conversation

@LysandreJik
Copy link
Member

@LysandreJik LysandreJik commented May 5, 2021

This PR adds the ImageClassificationPipeline. It is tested on DeiT and ViT and should enable the inference API for these models.

Since I encountered an issue with the AutoExtractor, I fixed it as seen with @sgugger (namely switched it from using names such as "deit" to using the configuration class as a key, similarly to what we do in tokenizers and the other auto classes.

Please let me know if you would like me to split this PR into multiple PRs for simpler reviews, happy to do so.

@NielsRogge, happy to have your review.

``./my_model_directory/feature_extraction_config.json``.
``./my_model_directory/preprocessor_config.json``.
Comment on lines -229 to +229

This comment has been minimized.

@LysandreJik

LysandreJik May 5, 2021
Author Member

Changed this to the actual name of the configuration file

("deit", DeiTFeatureExtractor),
("s2t", Speech2TextFeatureExtractor),
("vit", ViTFeatureExtractor),
("wav2vec2", Wav2Vec2FeatureExtractor),
(DeiTConfig, DeiTFeatureExtractor),
(Speech2TextConfig, Speech2TextFeatureExtractor),
(ViTConfig, ViTFeatureExtractor),
(Wav2Vec2Config, Wav2Vec2FeatureExtractor),
Comment on lines -41 to +34

This comment has been minimized.

@LysandreJik

LysandreJik May 5, 2021
Author Member

As seen with @sgugger

@@ -359,19 +383,7 @@ def pipeline(
# At that point framework might still be undetermined
model = get_default_model(targeted_task, framework, task_options)

# Try to infer tokenizer from model or config name (if provided as str)

This comment has been minimized.

@LysandreJik

LysandreJik May 5, 2021
Author Member

Moved the tokenizer initialization logic beneath the model initialization, with the feature processor. This way it benefits from having access to the model configuration.

@@ -522,7 +523,8 @@ class Pipeline(_ScikitCompat):
def __init__(
self,
model: Union["PreTrainedModel", "TFPreTrainedModel"],
tokenizer: PreTrainedTokenizer,
tokenizer: Optional[PreTrainedTokenizer] = None,

This comment has been minimized.

@LysandreJik

LysandreJik May 5, 2021
Author Member

The tokenizer becomes an optional value, alongside the feature_extractor.

supported_models = [item[1].__name__ for item in supported_models.items()]
supported_models_names = []
for config, model in supported_models.items():
# Mapping can now contain tuples of models for the same configuration.
if isinstance(model, tuple):
supported_models_names.extend([_model.__name__ for _model in model])
else:
supported_models_names.append(model.__name__)
supported_models = supported_models_names
Comment on lines -633 to +649

This comment has been minimized.

@LysandreJik

LysandreJik May 5, 2021
Author Member

This was forgotten with the addition of the possibility to have two different models for a single configuration in #11150.

This comment has been minimized.

@sgugger

sgugger May 5, 2021
Member

Good catch!

from collections import OrderedDict

from transformers import DeiTFeatureExtractor, Speech2TextFeatureExtractor, ViTFeatureExtractor

This comment has been minimized.

@LysandreJik

LysandreJik May 5, 2021
Author Member

This enables the use of dummy objects here rather than having None objects

@LysandreJik LysandreJik requested review from sgugger, Narsil and patil-suraj May 5, 2021
@sgugger
sgugger approved these changes May 5, 2021
Copy link
Member

@sgugger sgugger left a comment

Nice addition! Thanks for working on this!

src/transformers/pipelines/base.py Show resolved Hide resolved
supported_models = [item[1].__name__ for item in supported_models.items()]
supported_models_names = []
for config, model in supported_models.items():
# Mapping can now contain tuples of models for the same configuration.
if isinstance(model, tuple):
supported_models_names.extend([_model.__name__ for _model in model])
else:
supported_models_names.append(model.__name__)
supported_models = supported_models_names

This comment has been minimized.

@sgugger

sgugger May 5, 2021
Member

Good catch!

Copy link
Member

@patil-suraj patil-suraj left a comment

Nice! Thank you for adding this, LGTM.

@LysandreJik LysandreJik requested a review from patrickvonplaten May 6, 2021
Copy link
Member

@patrickvonplaten patrickvonplaten left a comment

Nice!

The only thing, I feel quite strongly about is to wrap the forward pass into a with torch.no_grad(): to avoid computing costly activations.

The rest is all nits

@NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented May 6, 2021

LGTM, thank you for adding this.

Copy link
Contributor

@Narsil Narsil left a comment

Biggest change is the tokenizer logic modification, we should really respect that if a user sends us something that is not a str(or tuple...) then we need to assume it's a valid tokenizer and pass it as is.

Also the tests are a bit bloated IMO compared to what they could be. In that instance, because we can't test parity with TF, we should just make the tests much simpler.
Also we don't test pipeline outputs which is a big issue for maintaining BC.

The image loading logic should be fixed too.

The rest are NITs.

src/transformers/pipelines/__init__.py Show resolved Hide resolved
src/transformers/pipelines/base.py Show resolved Hide resolved
tests/test_pipelines_image_classification.py Outdated Show resolved Hide resolved
tests/test_pipelines_image_classification.py Outdated Show resolved Hide resolved
src/transformers/pipelines/image_classification.py Outdated Show resolved Hide resolved
src/transformers/pipelines/image_classification.py Outdated Show resolved Hide resolved
src/transformers/pipelines/image_classification.py Outdated Show resolved Hide resolved
src/transformers/pipelines/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: patrickvonplaten <[email protected]>
@LysandreJik LysandreJik force-pushed the image-classification-pipeline branch from 60b0263 to 25594b5 May 6, 2021
@LysandreJik
Copy link
Member Author

@LysandreJik LysandreJik commented May 6, 2021

I should have addressed all comments.

@Narsil, if you could review once again when you have time, would love your feedback on the updated tests and on the updated __init__ method.

@Narsil
Narsil approved these changes May 6, 2021
Copy link
Contributor

@Narsil Narsil left a comment

LGTM.

Last nit, maybe the load_image function could be put not inlined (like at the module level).

It should make it easier to test and overload.

@LysandreJik LysandreJik merged commit 39084ca into master May 7, 2021
15 checks passed
15 checks passed
@github-actions
run_tests_templates
Details
@github-actions
torch_hub_integration
Details
ci/circleci: build_doc Your tests passed on CircleCI!
Details
ci/circleci: check_code_quality Your tests passed on CircleCI!
Details
ci/circleci: check_repository_consistency Your tests passed on CircleCI!
Details
ci/circleci: run_examples_torch Your tests passed on CircleCI!
Details
ci/circleci: run_tests_custom_tokenizers Your tests passed on CircleCI!
Details
ci/circleci: run_tests_flax Your tests passed on CircleCI!
Details
ci/circleci: run_tests_hub Your tests passed on CircleCI!
Details
ci/circleci: run_tests_pipelines_tf Your tests passed on CircleCI!
Details
ci/circleci: run_tests_pipelines_torch Your tests passed on CircleCI!
Details
ci/circleci: run_tests_tf Your tests passed on CircleCI!
Details
ci/circleci: run_tests_torch Your tests passed on CircleCI!
Details
ci/circleci: run_tests_torch_and_flax Your tests passed on CircleCI!
Details
ci/circleci: run_tests_torch_and_tf Your tests passed on CircleCI!
Details
@LysandreJik LysandreJik deleted the image-classification-pipeline branch May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants